Data Management

Data management involves the theories and techniques for managing the entire data lifecycle, from data collection to data format conversion, from data storage to data sharing and retrieval, to data provenance, data quality control and data curation for long-term data archival and preservation.

Topics in this Knowledge Area are listed thematically below. Existing topics are in regular font and linked directly to their original entries (published in 2006; these contain only Learning Objectives). Entries that have been updated and expanded are in bold. Forthcoming, future topics are italicized

 

Spatial Databases Spatial Access Methods Georeferencing Systems
Spatial Database Management Systems Data Retrieval Strategies Earth's Shape, Sea Level, and the Geoid
Use of Relational DBMSs Spatial Indexing Geographic Coordinate Systems
Object-Oriented DBMSs Space-driven Structures: Grid, linear quadtree, and z-ordering tree files Planar Coordinate Systems
Relational DBMS and their Spatial Extensions Data-driven structures: R-trees and cost models Tesselated Referencing Systems
Topological Relationships Modeling Unstructured Spatial Data Linear Referencing
Database Administration Modeling Semi-Structured Spatial Data Vertical (Geopotential) Datums
Conceptual Data Models   Horizontal (Geometric) Datums
Logical Data Models Query Processing Georegistration
Physical Data Models Optimal I/O Algorithms Map Projections
NoSQL Databases Spatial Joins  
Problems with Large Spatial Databases Complex Queries Data Manipulation
Array Databases Spatial Data Infrastructures Point, Line, and Area Generalization
Representations of Spatial Objects Metadata Vector-to-Raster and Raster-to-Vector Conversions
Events and Processes Content Standards Raster Resampling
Raster Data Models Data Warehouses Coordinate Transformations
Vector Data Models Spatial Data Infrastructures Transaction Management
Topological Models U.S. National Spatial Data Infrastructure  
Network Models Ontology for Geospatial Semantic Interoperability  
Entity-based Models Marine Spatial Data Infrastructure  
Modeling 3D Entities Hydrographic Geospatial Data Standards  
Fields in Space and Time    
Fuzzy Models    
Triangular Irregular Network (TIN) Models    
Genealogical Relationships, Linkage, and Inheritance    
Geospatial Data Conflation    

 

DM-07 - The Raster Data Model

The raster data model is a widely used method of storing geographic data. The model most commonly takes the form of a grid-like structure that holds values at regularly spaced intervals over the extent of the raster. Rasters are especially well suited for storing continuous data such as temperature and elevation values, but can hold discrete and categorical data such as land use as well.  The resolution of a raster is given in linear units (e.g., meters) or angular units (e.g., one arc second) and defines the extent along one side of the grid cell. High (or fine) resolution rasters have comparatively closer spacing and more grid cells than low (or coarse) resolution rasters, and require relatively more memory to store. Active research in the domain is oriented toward improving compression schemes and implementation for alternative cell shapes (such as hexagons), and better supporting multi-resolution raster storage and analysis functions.

DM-13 - The topological model
  • Define terms related to topology (e.g., adjacency, connectivity, overlap, intersect, logical consistency)
  • Describe the integrity constraints of integrated topological models (e.g., POLYVRT)
  • Discuss the historical roots of the Census Bureau’s creation of GBF/DIME as the foundation for the development of topological data structures
  • Explain why integrated topological models have lost favor in commercial GIS software
  • Evaluate the positive and negative impacts of the shift from integrated topological models
  • Discuss the role of graph theory in topological structures
  • Exemplify the concept of planar enforcement (e.g., TIN triangles)
  • Demonstrate how a topological structure can be represented in a relational database structure
  • Explain the advantages and disadvantages of topological data models
  • Illustrate a topological relation
DM-28 - Topological relationships
  • Define various terms used to describe topological relationships, such as disjoint, overlap, within, and intersect
  • List the possible topological relationships between entities in space (e.g., 9-intersection) and time
  • Use methods that analyze topological relationships
  • Recognize the contributions of topology (the branch of mathematics) to the study of geographic relationships
  • Describe geographic phenomena in terms of their topological relationships in space and time to other phenomena
DM-10 - Triangular Irregular Network (TIN) Models

A Triangular Irregular Network (TIN) is a way of storing continuous surfaces. It is vector based, and works in such a way that it connects known data points with straight lines to create triangles, often called facets. These facets are planes that have the same slope and aspect over the facet. Collectively, these hypothetical lines form a network covering the whole surface. TINs are efficient when storing heterogeneous surfaces, since homogenous areas are stored using few data points, while areas with more variability are stored in detail using a larger number of data points. In other words, a TIN can be more detailed where the surface is complex (high variation) by using smaller facets, and less detailed where the surface is more homogeneous by using larger facets. TINs also have a high modelling potential, e.g. in topography and hydrology. However, the unique way of storing data an a TIN often makes it difficult to combine with other spatial data formats. Instead, the TIN data would usually be converted to other suitable formats.

DM-79 - U.S. National Spatial Data Infrastructure

Spatial data infrastructures may be thought of as socio-technical frameworks for coordinating the development, management, sharing and use of geospatial data across multiple organizational jurisdictions and varying geographic extents. The United States was an early adopter of the SDI concept and the U.S. National Spatial Data Infrastructure (NSDI) is an example of a country-wide SDI implementation facilitated by coordination at the federal-government level. At the time of its establishment in the early 1990s, a unique characteristic of the NSDI was a mandate for federal agencies to establish partnerships with state- and local-level government. This entry summarizes the origins of the NSDI’s establishment, its original core components and how they’ve evolved over the last 25 years, the role of the Federal Geographic Data Committee (FGDC), and the anticipated impact of passage of the Geospatial Data Act of 2018. For broader technical information about SDIs, readers are referred to GIST BoK Entry DM-60: Spatial Data Infrastructures (Hu and Li 2017). For additional details on the history of the NSDI, readers are referred to Rhind (1999). For the latest information on recent and emerging NSDI initiatives, please visit the FGDC web site (www.fgdc.gov).  

DM-86 - Vector-to-raster and raster-to-vector conversions
  • Explain how the vector/raster/vector conversion process of graphic images and algorithms takes place and how the results are achieved
  • Create estimated tessellated data sets from point samples or isolines using interpolation operations that are appropriate to the specific situation
  • Illustrate the impact of vector/raster/vector conversions on the quality of a dataset
  • Convert vector data to raster format and back using GIS software
DM-51 - Vertical (Geopotential) Datums

The elevation of a point requires a reference surface defining zero elevation. In geodesy, this zero-reference surface has historically been mean sea level (MSL) – a vertical datum. However, the geoid, which is a particular equipotential surface of Earth’s gravity field that would coincide with mean sea level were mean sea level altogether unperturbed and placid, is the ideal datum for physical heights, meaning height associated with the flow of water, like elevations. Tidal, gravimetric, and ellipsoidal are common vertical datums that use different approaches to define the reference surface. Tidal datums average water heights over a period of approximately 19 years, gravimetric datums record gravity across Earth’s surface, and ellipsoidal datums use specific reference ellipsoids to report ellipsoid heights. Increasingly, gravity measurements, positional data from GNSS (Global Navigation Satellite System), and other sophisticated measurement technologies GRACE-FO (Gravity Recovery and Climate Experiment – Follow On) are sourced to accurately model the geoid and its geopotential surface advancing the idea of a geopotential datum. Stemming from these advancements, a new geopotential datum for the United States will be developed: North American-Pacific Geopotential Datum 2022 (NAPGD2022).

Pages