MAUP

AM-97 - An Introduction to Spatial Data Mining

The goal of spatial data mining is to discover potentially useful, interesting, and non-trivial patterns from spatial data-sets (e.g., GPS trajectory of smartphones). Spatial data mining is societally important having applications in public health, public safety, climate science, etc. For example, in epidemiology, spatial data mining helps to nd areas with a high concentration of disease incidents to manage disease outbreaks. Computational methods are needed to discover spatial patterns since the volume and velocity of spatial data exceed the ability of human experts to analyze it. Spatial data has unique characteristics like spatial autocorrelation and spatial heterogeneity which violate the i.i.d (Independent and Identically Distributed) assumption of traditional statistic and data mining methods. Therefore, using traditional methods may miss patterns or may yield spurious patterns, which are costly in societal applications. Further, there are additional challenges such as MAUP (Modiable Areal Unit Problem) as illustrated by a recent court case debating gerrymandering in elections. In this article, we discuss tools and computational methods of spatial data mining, focusing on the primary spatial pattern families: hotspot detection, collocation detection, spatial prediction, and spatial outlier detection. Hotspot detection methods use domain information to accurately model more active and high-density areas. Collocation detection methods find objects whose instances are in proximity to each other in a location. Spatial prediction approaches explicitly model the neighborhood relationship of locations to predict target variables from input features. Finally, spatial outlier detection methods find data that differ from their neighbors. Lastly, we describe future research and trends in spatial data mining.

GS-27 - GIS&T for Equity and Social Justice

A geographic information system (GIS) can be used effectively for activities, programs, and analyses focused on equity and social justice (ESJ).  Many types of inequities exist in society, but race and space are key predictors of inequity. A key concept of social justice is that any person born into society, no matter where they were born or live, will have an equitable opportunity to achieve successful life outcomes and to thrive. Geographic information science and its technologies (GIS&T) provide powerful tools to analyze equity and social justice issues and help government agencies apply an equity lens to every aspect of their administration. Given the reliance on spatial data to represent and analyze matters of ESJ, the use of these tools is necessary, logical, and appropriate. Some types of analyses and mapping commonly used with ESJ programs require careful attention to how data are combined and represented, risking misleading or false conclusions otherwise. Such outcomes could build mistrust when trust is most needed. A GIS-supported lifecycle for ESJ is presented that includes stages of exploratory issue analysis, community feedback, pro-equity programs analysis, management monitoring and stakeholder awareness, program performance metrics, and effectiveness analysis.

GS-20 - Aggregation of Spatial Entities and Legislative Redistricting

The partitioning of space is an essential consideration for the efficient allocation of resources. In the United States and many other countries, this parcelization of sub-regions for political and legislative purposes results in what is referred to as districts. A district is an aggregation of smaller, spatially bound units, along with their statistical properties, into larger spatially-bound units. When a district has the primary purpose of representation, individuals who reside within that district make up a constituency. Redistricting is often required as populations of constituents shift over time or resources that service areas change. Administrative challenges with creating districts have been greatly aided by increasing utilization of GIS. However, with these advances in geospatial methods, political disputes with the way in which districts increasingly snare the process in legal battles often centered on the topic of gerrymandering. This chapter focuses on the redistricting process within the United States and how the aggregation of representative spatial entities presents a mix of political, technical and legal challenges.

FC-26 - Problems of Scale and Zoning

Spatial data are often encoded within a set of spatial units that exhaustively partition a region, where individual level data are aggregated, or continuous data are summarized, over a set of spatial units. Such is the case with census data aggregated to enumeration units for public dissemination. Partitioning schemes can vary by scale, where one partitioning scheme spatially nests within another, or by zoning, where two partitioning schemes have the same number of units but the unit shapes and boundaries differ. The Modifiable Areal Unit Problem (MAUP) refers to the fact the nature of spatial partitioning can affect the interpretation and results of visualization and statistical analysis. Generally, coarser scales of data aggregation tend to have stronger observed statistical associations among variables. The ecological fallacy refers to the assumption that an individual has the same attributes as the aggregate group to which it belongs. Combining spatial data with different partitioning schemes to facilitate analysis is often problematic. Areal interpolation may be used to estimate data over small areas or ecological inference may be used to infer individual behaviors from aggregate data. Researchers may also perform analyses at multiple scales as a point of comparison.

CV-05 - Statistical Mapping (Enumeration, Normalization, Classification)

Proper communication of spatial distributions, trends, and patterns in data is an important component of a cartographers work. Geospatial data is often large and complex, and due to inherent limitations of size, scalability, and sensitivity, cartographers are often required to work with data that is abstracted, aggregated, or simplified from its original form. Working with data in this manner serves to clarify cartographic messages, expedite design decisions, and assist in developing narratives, but it also introduces a degree of abstraction and subjectivity in the map that can make it easy to infer false messages from the data and ultimately can mislead map readers. This entry introduces the core topics of statistical mapping around cartography. First, we define enumeration and the aggregation of data to units of enumeration. Next, we introduce the importance of data normalization (or standardization) to more truthfully communicate cartographically and, lastly, discuss common methods of data classification and how cartographers bin data into groups that simplify communication.

AM-97 - An Introduction to Spatial Data Mining

The goal of spatial data mining is to discover potentially useful, interesting, and non-trivial patterns from spatial data-sets (e.g., GPS trajectory of smartphones). Spatial data mining is societally important having applications in public health, public safety, climate science, etc. For example, in epidemiology, spatial data mining helps to nd areas with a high concentration of disease incidents to manage disease outbreaks. Computational methods are needed to discover spatial patterns since the volume and velocity of spatial data exceed the ability of human experts to analyze it. Spatial data has unique characteristics like spatial autocorrelation and spatial heterogeneity which violate the i.i.d (Independent and Identically Distributed) assumption of traditional statistic and data mining methods. Therefore, using traditional methods may miss patterns or may yield spurious patterns, which are costly in societal applications. Further, there are additional challenges such as MAUP (Modiable Areal Unit Problem) as illustrated by a recent court case debating gerrymandering in elections. In this article, we discuss tools and computational methods of spatial data mining, focusing on the primary spatial pattern families: hotspot detection, collocation detection, spatial prediction, and spatial outlier detection. Hotspot detection methods use domain information to accurately model more active and high-density areas. Collocation detection methods find objects whose instances are in proximity to each other in a location. Spatial prediction approaches explicitly model the neighborhood relationship of locations to predict target variables from input features. Finally, spatial outlier detection methods find data that differ from their neighbors. Lastly, we describe future research and trends in spatial data mining.

GS-20 - Aggregation of Spatial Entities and Legislative Redistricting

The partitioning of space is an essential consideration for the efficient allocation of resources. In the United States and many other countries, this parcelization of sub-regions for political and legislative purposes results in what is referred to as districts. A district is an aggregation of smaller, spatially bound units, along with their statistical properties, into larger spatially-bound units. When a district has the primary purpose of representation, individuals who reside within that district make up a constituency. Redistricting is often required as populations of constituents shift over time or resources that service areas change. Administrative challenges with creating districts have been greatly aided by increasing utilization of GIS. However, with these advances in geospatial methods, political disputes with the way in which districts increasingly snare the process in legal battles often centered on the topic of gerrymandering. This chapter focuses on the redistricting process within the United States and how the aggregation of representative spatial entities presents a mix of political, technical and legal challenges.

FC-26 - Problems of Scale and Zoning

Spatial data are often encoded within a set of spatial units that exhaustively partition a region, where individual level data are aggregated, or continuous data are summarized, over a set of spatial units. Such is the case with census data aggregated to enumeration units for public dissemination. Partitioning schemes can vary by scale, where one partitioning scheme spatially nests within another, or by zoning, where two partitioning schemes have the same number of units but the unit shapes and boundaries differ. The Modifiable Areal Unit Problem (MAUP) refers to the fact the nature of spatial partitioning can affect the interpretation and results of visualization and statistical analysis. Generally, coarser scales of data aggregation tend to have stronger observed statistical associations among variables. The ecological fallacy refers to the assumption that an individual has the same attributes as the aggregate group to which it belongs. Combining spatial data with different partitioning schemes to facilitate analysis is often problematic. Areal interpolation may be used to estimate data over small areas or ecological inference may be used to infer individual behaviors from aggregate data. Researchers may also perform analyses at multiple scales as a point of comparison.

CV-05 - Statistical Mapping (Enumeration, Normalization, Classification)

Proper communication of spatial distributions, trends, and patterns in data is an important component of a cartographers work. Geospatial data is often large and complex, and due to inherent limitations of size, scalability, and sensitivity, cartographers are often required to work with data that is abstracted, aggregated, or simplified from its original form. Working with data in this manner serves to clarify cartographic messages, expedite design decisions, and assist in developing narratives, but it also introduces a degree of abstraction and subjectivity in the map that can make it easy to infer false messages from the data and ultimately can mislead map readers. This entry introduces the core topics of statistical mapping around cartography. First, we define enumeration and the aggregation of data to units of enumeration. Next, we introduce the importance of data normalization (or standardization) to more truthfully communicate cartographically and, lastly, discuss common methods of data classification and how cartographers bin data into groups that simplify communication.

AM-97 - An Introduction to Spatial Data Mining

The goal of spatial data mining is to discover potentially useful, interesting, and non-trivial patterns from spatial data-sets (e.g., GPS trajectory of smartphones). Spatial data mining is societally important having applications in public health, public safety, climate science, etc. For example, in epidemiology, spatial data mining helps to nd areas with a high concentration of disease incidents to manage disease outbreaks. Computational methods are needed to discover spatial patterns since the volume and velocity of spatial data exceed the ability of human experts to analyze it. Spatial data has unique characteristics like spatial autocorrelation and spatial heterogeneity which violate the i.i.d (Independent and Identically Distributed) assumption of traditional statistic and data mining methods. Therefore, using traditional methods may miss patterns or may yield spurious patterns, which are costly in societal applications. Further, there are additional challenges such as MAUP (Modiable Areal Unit Problem) as illustrated by a recent court case debating gerrymandering in elections. In this article, we discuss tools and computational methods of spatial data mining, focusing on the primary spatial pattern families: hotspot detection, collocation detection, spatial prediction, and spatial outlier detection. Hotspot detection methods use domain information to accurately model more active and high-density areas. Collocation detection methods find objects whose instances are in proximity to each other in a location. Spatial prediction approaches explicitly model the neighborhood relationship of locations to predict target variables from input features. Finally, spatial outlier detection methods find data that differ from their neighbors. Lastly, we describe future research and trends in spatial data mining.

Pages