A place in history: a guide to using GIS in historical research


CHAPTER 2: THE WORLD AS VIEWED THROUGH A GIS

 

Guide to Good Practice Navigation Bar

2.2 Attribute data

Attribute data are data in the form that most people understand by the term. Many GIS software packages include their own attribute databases but allow the user to link to external database management systems (DBMS) or spreadsheets. Examples of this include MapInfo, allowing the user to link to data in Microsoft Excel, ArcView to DBase, and ArcInfo to Oracle. Attribute data are frequently either statistical or textual. As the software improves and becomes more flexible, they can be in virtually any format that the DBMS used to store them can support. Increasingly this includes image formats, animations, hyperlinks, multimedia, and so on.

Most of the DBMSs used in GIS are relational database management systems. This means that two or more tables can be joined together based on a common field known as a key. With historical data this can often be either a place name or an ID number (note that place names are not considered to be spatial data: to be spatial the data must have a coordinate-based location), and it allows data from various sources to be integrated without requiring spatial data. For example, a user has a table of Poor Law data organised by Poor Law Unions (these were a type of administrative unit used in England and Wales in the 19th and early 20th centuries to administer relief of the poor), a table of voting statistics organised by parliamentary constituency, and some employment statistics based on towns. A relation join will join all three tables together and all of the data for 'Bristol', for example, will appear on a single row.

There are three main problems with doing this: firstly, the join has no knowledge that the entity referred to as 'Bristol' may be a different entity in each table. Secondly, problems will occur where names are not unique, such as 'Whitchurch' which appears in both Hampshire and Shropshire. Different software will handle this in different ways, the most common (and theoretically sound) being to duplicate rows of data. One way round this is to use more than one column as the key, for example, place name and county. The third problem with using place names is that their spellings must be identical to produce a match. Even minor differences in the use of hyphens or apostrophes will cause a non-match. This can be worked around using gazetteers that standardise all possible spellings and create a single spelling from an authority list, or through the use of ID numbers. Creating these can be time consuming.

Many attribute databases use Structured Query Language (SQL) to allow flexible querying and joining. This is often implemented though a Graphical User Interface but follows the basic structure:

select <column names>
from <table names>
where <condition>

So, for example, we have two tables: 'unemp' that contains data on unemployment rates, and 'inf_mort' that contains data on infant mortality. These have the following fields:

Unemp
 
Inf_mort
Field Name Contents Field Name Contents
Name The area's place name Name The area's place name
U_rate The area's unemployment rate Im_rate The area's infant mortality rate
Tot_pop The area's total population Births The number of births in the area

Table 2.1: Sample tables of attribute data

The SQL query:

select name, u_rate, tot_pop
from unemp
where u_rate>10.0

will select the names, unemployment rates, and total populations from the table unemp for places with an unemployment rate of over 10% as is shown in Table 2.2.

Name U_rate Tot_pop
Bolton
13.2
10,000
Oldham
12.1
7,500
Rochdale
10.9
8,000

Table 2.2: Sample data returned by the query above

Relational joins are also implemented in this way. For example the query:

select unemp.name, unemp.u_rate, inf_mort.im_rate
from unemp, inf_mort
where unemp.name=inf_mort.name

will select the unemployment rates from unemp and infant mortality rates from inf_mort where the values in the name fields in both tables are identical.

Unemp Unemp Inf_mort
Name U_rate Im_rate
Bolton
13.2
120
Oldham
12.1
115
Rochdale
10.9
106
Burnley
9.4
98
Colne
9.1
89

Table 2.3: Sample data returned by the query above

While relational databases and SQL are not fundamental to an understanding of GIS, they are a useful skill that can enhance an understanding of GIS data and GIS software. Many guides to the use of SQL and relational databases are available: see the bibliography for further information.

Guide to Good Practice Navigation Bar
Valid XHTML 1.0!
 

 


© Ian Gregory 2002

The right of Ian Gregory to be identified as the Author of this Work has been asserted by him in accordance with the Copyright, Designs and Patents Act 1988.

All material supplied via the Arts and Humanities Data Service is protected by copyright, and duplication or sale of all or any part of it is not permitted, except that material may be duplicated by you for your personal research use or educational purposes in electronic or print form. Permission for any other use must be obtained from the Arts and Humanities Data Service.

Electronic or print copies may not be offered, whether for sale or otherwise, to any third party.


Next Bibliography Back Glossary Contents