spatial statistics

PD-31 - PySAL and Spatial Statistics Libraries

As spatial statistics are essential to the geographical inquiry, accessible and flexible software offering relevant functionalities is highly desired. Python Spatial Analysis Library (PySAL) represents an endeavor towards this end. It is an open-source python library and ecosystem hosting a wide array of spatial statistical and visualization methods. Since its first public release in 2010, PySAL has been applied to address various research questions, used as teaching materials for pedagogical purposes in regular classes and conference workshops serving a wide audience, and integrated into general GIS software such as ArcGIS and QGIS. This entry first gives an overview of the history and new development with PySAL. This is followed by a discussion of PySAL’s new hierarchical structure, and two different modes of accessing PySAL’s functionalities to perform various spatial statistical tasks, including exploratory spatial data analysis, spatial regression, and geovisualization. Next, a discussion is provided on how to find and utilize useful materials for studying and using spatial statistical functions from PySAL and how to get involved with the PySAL community as a user and prospective developer. The entry ends with a brief discussion of future development with PySAL.

FC-37 - Spatial Autocorrelation

The scientific term spatial autocorrelation describes Tobler’s first law of geography: everything is related to everything else, but nearby things are more related than distant things. Spatial autocorrelation has a:

  • past characterized by scientists’ non-verbal awareness of it, followed by its formalization;
  • present typified by its dissemination across numerous disciplines, its explication, its visualization, and its extension to non-normal data; and
  • an anticipated future in which it becomes a standard in data analytic computer software packages, as well as a routinely considered feature of space-time data and in spatial optimization practice.

Positive spatial autocorrelation constitutes the focal point of its past and present; one expectation is that negative spatial autocorrelation will become a focal point of its future.

AM-97 - An Introduction to Spatial Data Mining

The goal of spatial data mining is to discover potentially useful, interesting, and non-trivial patterns from spatial data-sets (e.g., GPS trajectory of smartphones). Spatial data mining is societally important having applications in public health, public safety, climate science, etc. For example, in epidemiology, spatial data mining helps to nd areas with a high concentration of disease incidents to manage disease outbreaks. Computational methods are needed to discover spatial patterns since the volume and velocity of spatial data exceed the ability of human experts to analyze it. Spatial data has unique characteristics like spatial autocorrelation and spatial heterogeneity which violate the i.i.d (Independent and Identically Distributed) assumption of traditional statistic and data mining methods. Therefore, using traditional methods may miss patterns or may yield spurious patterns, which are costly in societal applications. Further, there are additional challenges such as MAUP (Modiable Areal Unit Problem) as illustrated by a recent court case debating gerrymandering in elections. In this article, we discuss tools and computational methods of spatial data mining, focusing on the primary spatial pattern families: hotspot detection, collocation detection, spatial prediction, and spatial outlier detection. Hotspot detection methods use domain information to accurately model more active and high-density areas. Collocation detection methods find objects whose instances are in proximity to each other in a location. Spatial prediction approaches explicitly model the neighborhood relationship of locations to predict target variables from input features. Finally, spatial outlier detection methods find data that differ from their neighbors. Lastly, we describe future research and trends in spatial data mining.

AM-22 - Global Measures of Spatial Association

Spatial association broadly describes how the locations and values of samples or observations vary across space. Similarity in both the attribute values and locations of observations can be assessed using measures of spatial association based upon the first law of geography. In this entry, we focus on the measures of spatial autocorrelation that assess the degree of similarity between attribute values of nearby observations across the entire study region. These global measures assess spatial relationships with the combination of spatial proximity as captured in the spatial weights matrix and the attribute similarity as captured by variable covariance (i.e. Moran’s I) or squared difference (i.e. Geary’s C). For categorical data, the join count statistic provides a global measure of spatial association. Two visualization approaches for spatial autocorrelation measures include Moran scatterplots and variograms (also known as semi-variograms).

AM-25 - Bayesian methods
  • Define “prior and posterior distributions” and “Markov-Chain Monte Carlo”
  • Explain how the Bayesian perspective is a unified framework from which to view uncertainty
  • Compare and contrast Bayesian methods and classical “frequentist” statistical methods
AM-23 - Local Measures of Spatial Association

Local measures of spatial association are statistics used to detect variations of a variable of interest across space when the spatial relationship of the variable is not constant across the study region, known as spatial non-stationarity or spatial heterogeneity. Unlike global measures that summarize the overall spatial autocorrelation of the study area in one single value, local measures of spatial association identify local clusters (observations nearby have similar attribute values) or spatial outliers (observations nearby have different attribute values). Like global measures, local indicators of spatial association (LISA), including local Moran’s I and local Geary’s C, incorporate both spatial proximity and attribute similarity. Getis-Ord Gi*another popular local statistic, identifies spatial clusters at various significance levels, known as hot spots (unusually high values) and cold spots (unusually low values). This so-called “hot spot analysis” has been extended to examine spatiotemporal trends in data. Bivariate local Moran’s I describes the statistical relationship between one variable at a location and a spatially lagged second variable at neighboring locations, and geographically weighted regression (GWR) allows regression coefficients to vary at each observation location. Visualization of local measures of spatial association is critical, allowing researchers of various disciplines to easily identify local pockets of interest for future examination.

AM-19 - Exploratory data analysis (EDA)
  • Describe the statistical characteristics of a set of spatial data using a variety of graphs and plots (including scatterplots, histograms, boxplots, q–q plots)
  • Select the appropriate statistical methods for the analysis of given spatial datasets by first exploring them using graphic methods
PD-31 - PySAL and Spatial Statistics Libraries

As spatial statistics are essential to the geographical inquiry, accessible and flexible software offering relevant functionalities is highly desired. Python Spatial Analysis Library (PySAL) represents an endeavor towards this end. It is an open-source python library and ecosystem hosting a wide array of spatial statistical and visualization methods. Since its first public release in 2010, PySAL has been applied to address various research questions, used as teaching materials for pedagogical purposes in regular classes and conference workshops serving a wide audience, and integrated into general GIS software such as ArcGIS and QGIS. This entry first gives an overview of the history and new development with PySAL. This is followed by a discussion of PySAL’s new hierarchical structure, and two different modes of accessing PySAL’s functionalities to perform various spatial statistical tasks, including exploratory spatial data analysis, spatial regression, and geovisualization. Next, a discussion is provided on how to find and utilize useful materials for studying and using spatial statistical functions from PySAL and how to get involved with the PySAL community as a user and prospective developer. The entry ends with a brief discussion of future development with PySAL.

FC-37 - Spatial Autocorrelation

The scientific term spatial autocorrelation describes Tobler’s first law of geography: everything is related to everything else, but nearby things are more related than distant things. Spatial autocorrelation has a:

  • past characterized by scientists’ non-verbal awareness of it, followed by its formalization;
  • present typified by its dissemination across numerous disciplines, its explication, its visualization, and its extension to non-normal data; and
  • an anticipated future in which it becomes a standard in data analytic computer software packages, as well as a routinely considered feature of space-time data and in spatial optimization practice.

Positive spatial autocorrelation constitutes the focal point of its past and present; one expectation is that negative spatial autocorrelation will become a focal point of its future.

AM-97 - An Introduction to Spatial Data Mining

The goal of spatial data mining is to discover potentially useful, interesting, and non-trivial patterns from spatial data-sets (e.g., GPS trajectory of smartphones). Spatial data mining is societally important having applications in public health, public safety, climate science, etc. For example, in epidemiology, spatial data mining helps to nd areas with a high concentration of disease incidents to manage disease outbreaks. Computational methods are needed to discover spatial patterns since the volume and velocity of spatial data exceed the ability of human experts to analyze it. Spatial data has unique characteristics like spatial autocorrelation and spatial heterogeneity which violate the i.i.d (Independent and Identically Distributed) assumption of traditional statistic and data mining methods. Therefore, using traditional methods may miss patterns or may yield spurious patterns, which are costly in societal applications. Further, there are additional challenges such as MAUP (Modiable Areal Unit Problem) as illustrated by a recent court case debating gerrymandering in elections. In this article, we discuss tools and computational methods of spatial data mining, focusing on the primary spatial pattern families: hotspot detection, collocation detection, spatial prediction, and spatial outlier detection. Hotspot detection methods use domain information to accurately model more active and high-density areas. Collocation detection methods find objects whose instances are in proximity to each other in a location. Spatial prediction approaches explicitly model the neighborhood relationship of locations to predict target variables from input features. Finally, spatial outlier detection methods find data that differ from their neighbors. Lastly, we describe future research and trends in spatial data mining.

Pages