|
A place in history: a guide to using GIS in historical research CHAPTER 7: SPATIAL ANALYSES OF STATISTICAL DATA IN GIS
|
|
|
7.3 Spatial analysis techniques Having introduced the advantages and disadvantages associated with the statistical analysis of spatially-referenced data, a few types of techniques that can be used are described here. It is not the intention to describe them in great detail as they are well described elsewhere (see the bibliography), but to give a flavour of the types of approaches that can be used. The spatial data often determines what type of spatial analysis technique is most appropriate as different techniques must be used to take advantage of the characteristics of point, line and polygon datasets. It is sometimes sensible to analyse polygon data as points based on their centroids or points as polygons using Thiessen polygons.
Figure 7.1: Types of spatial autocorrelation in
point patterns Point pattern analysis is concerned with attempting to determine whether the distribution of points is random or whether it either clusters (positive spatial autocorrelation) or is evenly distributed (negative spatial autocorrelation) as shown in Figure 7.1. The easiest way of doing this is termed quadrat analysis whereby the study area is sub-divided into regular grid squares and the number of events in each square is counted. This is not particularly satisfactory as the results tend to be heavily dependent on the size and arrangement of the grid squares. Better techniques focus on the distance between each point and its nearest neighbour or on kernel estimations which effectively produces a moving average of the density of points at each location on the map. Global summary statistics from these techniques will simply say that the the data clusters, and is evenly distributed or is randomly distributed. It is better to use local techniques that allow summaries such as 'the data cluster in this location but are randomly distributed here'. Pure point pattern analysis is simply concerned with analysing the spatial component of the data and ignores attribute. This is not possible with polygon data where the arrangement of the polygons will usually be arbitrary, such as with census data based on administrative units. For these it is necessary to devise proximity measures that allow us to quantify the influence that each polygon has on its neighbours. These can either be based on the distance between the centroids or based on some measure on the relationship between polygon boundaries. They may also be binary where polygon i either has an influence on polygon j or it does not or there may be some degree of quantification on the strength of the relationship. Simple binary proximity measures include whether two polygons share a boundary or whether their centroids lie within a set distance of each other. More complex measures include ratios based on the length of shared boundary between two polygons compared to their total perimeters, or distance decay models whereby the influence one polygon has on another declines as the distance between the two centroids increases. Once a proximity measure has been calculated this may be used to analyse the polygon or point data set. A simple form of analysis is to test for spatial autocorrelation using techniques such as Geary's coefficient or Moran's coefficient that analyse proximity and attribute together. Traditionally these provide global summaries but techniques such as Geary's Gi allow us to perform local analysis of these questions (Fotheringham et al. 2000). More sophisticated forms of spatial analysis that combine the analysis of spatial and attribute data include kriging and geographically weighted regression (GWR). Kriging is a sophisticated interpolation technique that attempts to estimate a continuous trend surface from a set of known sample points (Isaaks and Srivastava 1989). GWR explicitly incorporates space into regression analysis. Rather than simply producing a single, global regression equation, GWR produces an equation for each point or polygon in the dataset based on the location's relationship with its neighbours. This can be used to produce maps that show how relationships vary over space (Brunsdon et al. 1996). Analysis of line data is often different from analysis of other forms of data, as it tends to concentrate on flows. This is termed network analysis. This can involve problems such as the way in which the shortest route between a set of points on the network is calculated, termed the travelling salesman problem. This can be made more sophisticated by classing lines on the network in different ways, for example, different types of road or railways may have different journey times or journey costs associated with them. Networks can also be used in location-allocation models that attempt to find the most efficient location on a network. An example of this might be calculating the most efficient location for an industrial complex based on certain assumptions about the rail network. These can then be compared with actual locations. |
|
|
|
|
|
|
© Ian Gregory 2002 The right of Ian Gregory to be identified as the Author of this Work has been asserted by him in accordance with the Copyright, Designs and Patents Act 1988. All material supplied via the Arts and Humanities Data Service is protected by copyright, and duplication or sale of all or any part of it is not permitted, except that material may be duplicated by you for your personal research use or educational purposes in electronic or print form. Permission for any other use must be obtained from the Arts and Humanities Data Service. Electronic or print copies may not be offered, whether for sale or otherwise, to any third party. |