Data Management

Data management involves the theories and techniques for managing the entire data lifecycle, from data collection to data format conversion, from data storage to data sharing and retrieval, to data provenance, data quality control and data curation for long-term data archival and preservation.

Topics in this Knowledge Area are listed thematically below. Existing topics are in regular font and linked directly to their original entries (published in 2006; these contain only Learning Objectives). Entries that have been updated and expanded are in bold. Forthcoming, future topics are italicized

 

Spatial Databases Spatial Access Methods Georeferencing Systems
Spatial Database Management Systems Data Retrieval Strategies Approximating the Earth's Shape with Geoids
Use of Relational DBMSs Spatial Indexing Geographic Coordinate Systems
Object-Oriented DBMSs Space-driven Structures: Grid, linear quadtree, and z-ordering tree files Planar Coordinate Systems
Relational DBMS (and extensions) Data-driven structures: R-trees and cost models Tesselated Referencing Systems
Topological Relationships Modeling Unstructured Spatial Data Linear Referencing
Database Administration Modeling Semi-Structured Spatial Data Vertical Datums
Conceptual Data Models   Horizontal Datums
Logical Data Models Query Processing Georegistration
Physical Data Models Optimal I/O Algorithms Map Projections
NoSQL Databases Spatial Joins  
Problems with Large Spatial Databases Complex Queries  
Array Databases Spatial Data Infrastructures  
Representations of Spatial Objects Metadata  
Events and Processes Content Standards  
Raster Data Models Data Warehouses  
Vector Data Models Spatial Data Infrastructures  
Topological Models U.S. National Spatial Data Infrastructure  
Network Models Ontology for Geospatial Semantic Interoperability  
Modeling 3D Entities Hydrographic Geospatial Data Standards  
Fields in Space and Time Marine Spatial Data Infrastructure  
Fuzzy Models    
Triangulated Irregular Network Models    
Genealogical Relationships, Linkage, and Inheritance    
Geospatial Data Conflation    
Standardization & Exchange Specifications    

 

DM-48 - Plane coordinate systems
  • Explain why plane coordinates are sometimes preferable to geographic coordinates
  • Identify the map projection(s) upon which UTM coordinate systems are based, and explain the relationship between the projection(s) and the coordinate system grid
  • Discuss the magnitude and cause of error associated with UTM coordinates
  • Differentiate the characteristics and uses of the UTM coordinate system from the Military Grid Reference System (MGRS) and the World Geographic Reference System (GEOREF)
  • Explain what State Plane Coordinates system (SPC) eastings and northings represent
  • Associate SPC coordinates and zone specifications with corresponding positions on a U.S. map or globe
  • Identify the map projection(s) upon which SPC coordinate systems are based, and explain the relationship between the projection(s) and the coordinate system grids
  • Discuss the magnitude and cause of error associated with SPC coordinates
  • Recommend the most appropriate plane coordinate system for applications at different spatial extents and justify the recommendation
  • Critique the U.S. Geological Survey’s choice of UTM as the standard coordinate system for the U.S. National Map
  • Describe the characteristics of the “national grids” of countries other than the U.S.
  • Explain what Universal Transverse Mercator (UTM) eastings and northings represent
  • Associate UTM coordinates and zone specifications with corresponding position on a world map or globe
DM-70 - Problems of Large Spatial Databases

Large spatial databases often labeled as geospatial big data exceed the capacity of commonly used computing systems as a result of data volume, variety, velocity, and veracity. Additional problems also labeled with V’s are cited, but the four primary ones are the most problematic and focus of this chapter (Li et al., 2016, Panimalar et al., 2017).  Sources include satellites, aircraft and drone platforms, vehicles, geosocial networking services, mobile devices, and cameras. The problems in processing these data to extract useful information include query, analysis, and visualization. Data mining techniques and machine learning algorithms, such as deep convolutional neural networks, often are used with geospatial big data. The obvious problem is handling the large data volumes, particularly for input and output operations, requiring parallel read and write of the data, as well as high speed computers, disk services, and network transfer speeds. Additional problems of large spatial databases include the variety and heterogeneity of data requiring advanced algorithms to handle different data types and characteristics, and integration with other data. The velocity at which the data are acquired is a challenge, especially using today’s advanced sensors and the Internet of Things that includes millions of devices creating data on short temporal scales of micro seconds to minutes. Finally, the veracity, or truthfulness of large spatial databases is difficult to establish and validate, particularly for all data elements in the database.

DM-03 - Relational DBMS
  • Explain the advantage of the relational model over earlier database structures including spreadsheets
  • Define the basic terms used in relational database management systems (e.g., tuple, relation, foreign key, SQL, relational join)
  • Discuss the efficiency and costs of normalization
  • Describe the entity-relationship diagram approach to data modeling
  • Explain how entity-relationship diagrams are translated into relational tables
  • Create an SQL query that extracts data from related tables
  • Describe the problems associated with failure to follow the first and second normal forms (including data confusion, redundancy, and retrieval difficulties)
  • Demonstrate how search and relational join operations provide results for a typical GIS query and other simple operations using the relational DBMS within a GIS software application
DM-60 - Spatial Data Infrastructures

Spatial data infrastructure (SDI) is the infrastructure that facilitates the discovery, access, management, distribution, reuse, and preservation of digital geospatial resources. These resources may include maps, data, geospatial services, and tools. As cyberinfrastructures, SDIs are similar to other infrastructures, such as water supplies and transportation networks, since they play fundamental roles in many aspects of the society. These roles have become even more significant in today’s big data age, when a large volume of geospatial data and Web services are available. From a technological perspective, SDIs mainly consist of data, hardware, and software. However, a truly functional SDI also needs the efforts of people, supports from organizations, government policies, data and software standards, and many others. In this chapter, we will present the concepts and values of SDIs, as well as a brief history of SDI development in the U.S. We will also discuss the components of a typical SDI, and will specifically focus on three key components: geoportals, metadata, and search functions. Examples of the existing SDI implementations will also be discussed.  

DM-66 - Spatial Indexing

A spatial index is a data structure that allows for accessing a spatial object efficiently. It is a common technique used by spatial databases.  Without indexing, any search for a feature would require a "sequential scan" of every record in the database, resulting in much longer processing time. In a spatial index construction process, the minimum bounding rectangle serves as an object approximation. Various types of spatial indices across commercial and open-source databases yield measurable performance differences. Spatial indexing techniques are playing a central role in time-critical applications and the manipulation of spatial big data.

DM-18 - Spatio-temporal GIS
  • Describe extensions to relational DBMS to represent temporal change in attributes
  • Evaluate the advantages and disadvantages of existing space-time models based on storage efficiency, query performance, ease of data entry, and ability to implement in existing software
  • Create a GIS database that models temporal information
  • Utilize two different space-time models to characterize a given scenario, such as a daily commute
  • Describe the architecture of data models (both field and object based) to represent spatio-temporal phenomena
  • Differentiate the two types of temporal information to be modeled in databases: database (or transaction) time and valid (or world) time
  • Identify whether it is important to represent temporal change in a particular GIS application
  • Describe SQL extensions for querying temporal change
DM-49 - Tessellated referencing systems
  • Explain the concept “quadtree”
  • Describe the octahedral quarternary triangulated mesh georeferencing system proposed by Dutton
  • Discuss the advantages of hierarchical coordinates relative to geographic and plane coordinate systems
DM-09 - The hexagonal model
  • Illustrate the hexagonal model
  • Explain the limitations of the grid model compared to the hexagonal model
  • Exemplify the uses (past and potential) of the hexagonal model
DM-15 - The network model
  • Define the following terms pertaining to a network: Loops, multiple edges, the degree of a vertex, walk, trail, path, cycle, fundamental cycle
  • List definitions of networks that apply to specific applications or industries
  • Create an adjacency table from a sample network
  • Explain how a graph can be written as an adjacency matrix and how this can be used to calculate topological shortest paths in the graph
  • Create an incidence matrix from a sample network
  • Explain how a graph (network) may be directed or undirected
  • Demonstrate how attributes of networks can be used to represent cost, time, distance, or many other measures
  • Demonstrate how the star (or forward star) data structure, which is often employed when digitally storing network information, violates relational normal form, but allows for much faster search and retrieval in network databases
  • Discuss some of the difficulties of applying the standard process-pattern concept to lines and networks
  • Demonstrate how a network is a connected set of edges and vertices
DM-07 - The Raster Data Model

The raster data model is a widely used method of storing geographic data. The model most commonly takes the form of a grid-like structure that holds values at regularly spaced intervals over the extent of the raster. Rasters are especially well suited for storing continuous data such as temperature and elevation values, but can hold discrete and categorical data such as land use as well.  The resolution of a raster is given in linear units (e.g., meters) or angular units (e.g., one arc second) and defines the extent along one side of the grid cell. High (or fine) resolution rasters have comparatively closer spacing and more grid cells than low (or coarse) resolution rasters, and require relatively more memory to store. Active research in the domain is oriented toward improving compression schemes and implementation for alternative cell shapes (such as hexagons), and better supporting multi-resolution raster storage and analysis functions.

Pages