FC-26 - Problems of Scale and Zoning

Spatial data are often encoded within a set of spatial units that exhaustively partition a region, where individual level data are aggregated, or continuous data are summarized, over a set of spatial units. Such is the case with census data aggregated to enumeration units for public dissemination. Partitioning schemes can vary by scale, where one partitioning scheme spatially nests within another, or by zoning, where two partitioning schemes have the same number of units but the unit shapes and boundaries differ. The Modifiable Areal Unit Problem (MAUP) refers to the fact the nature of spatial partitioning can affect the interpretation and results of visualization and statistical analysis. Generally, coarser scales of data aggregation tend to have stronger observed statistical associations among variables. The ecological fallacy refers to the assumption that an individual has the same attributes as the aggregate group to which it belongs. Combining spatial data with different partitioning schemes to facilitate analysis is often problematic. Areal interpolation may be used to estimate data over small areas or ecological inference may be used to infer individual behaviors from aggregate data. Researchers may also perform analyses at multiple scales as a point of comparison.
DM-70 - Problems of Large Spatial Databases
Large spatial databases often labeled as geospatial big data exceed the capacity of commonly used computing systems as a result of data volume, variety, velocity, and veracity. Additional problems also labeled with V’s are cited, but the four primary ones are the most problematic and focus of this chapter (Li et al., 2016, Panimalar et al., 2017). Sources include satellites, aircraft and drone platforms, vehicles, geosocial networking services, mobile devices, and cameras. The problems in processing these data to extract useful information include query, analysis, and visualization. Data mining techniques and machine learning algorithms, such as deep convolutional neural networks, often are used with geospatial big data. The obvious problem is handling the large data volumes, particularly for input and output operations, requiring parallel read and write of the data, as well as high speed computers, disk services, and network transfer speeds. Additional problems of large spatial databases include the variety and heterogeneity of data requiring advanced algorithms to handle different data types and characteristics, and integration with other data. The velocity at which the data are acquired is a challenge, especially using today’s advanced sensors and the Internet of Things that includes millions of devices creating data on short temporal scales of micro seconds to minutes. Finally, the veracity, or truthfulness of large spatial databases is difficult to establish and validate, particularly for all data elements in the database.