Data Management

Data management involves the theories and techniques for managing the entire data lifecycle, from data collection to data format conversion, from data storage to data sharing and retrieval, to data provenance, data quality control and data curation for long-term data archival and preservation.

Topics in this Knowledge Area are listed thematically below. Existing topics are in regular font and linked directly to their original entries (published in 2006; these contain only Learning Objectives). Entries that have been updated and expanded are in bold. Forthcoming, future topics are italicized

 

Database Management Systems Events and Processes Plane Coordinate Systems
Data Retrieval Strategies Fields in Space & Time Tessellated Referencing Systems
Relational DBMS Integrated Models Linear Referencing
Extensions of the Relational Model Mereology: Structural Relationships Linear Referencing Systems
Object-oriented Spatial Databases Geneaological Relationships: Lineage, Inheritance Vertical Datums
Spatio-temporal GIS Topological Relationships Horizontal Datums
Database Change Modeling Tools Map Projection Properties
Modeling Database Change Conceptual Data Models Map Projection Classes
Managing Versioned Geospatial Databases Logical Data Models Map Projection Parameters
Reconciling Database Change Physical Data Models  
Data Warehouses Fuzzy Logic Georegistration
Ongoing GIS Revision Grid Compression Methods Systematic Georefencing Systems
Database Administration Spatial Indexing Unsystematic Georeferencing Systems
NoSQL Databases    
Spatial Data Models   Spatial Data Infrastructure
Basic Data Structures Spatial Data Quality Spatial Data Infrastructures
Grid Representations Spatial Data Uncertainty Content Standards
The Raster Model Error-based Uncertainty Metadata
The Hexagonal Model Modeling Uncertainty Adoption of Standards
The Triangulated Irregular Network (TIN) Model Vagueness  
Hierarchical Data Models Mathemematical Models of Vaguness: Fuzzy Sets and Rough Sets  
Classical Vector Data Models    
The Topological Model Georeferencing Systems  
The Spaghetti Model History of Understanding Earth's Shape  
The Network Model Approximating the Geoid with Spheres & Ellipsoids  
Discrete Entities Approximating the Earth's Shape with Geoids  
Modeling 3D Entities The Geographic Coordinate System  

 

DM-65 - Spatial Data Uncertainty

Although spatial data users may not be aware of the inherent uncertainty in all the datasets they use, it is critical to evaluate data quality in order to understand the validity and limitations of any conclusions based on spatial data. Spatial data uncertainty is inevitable as all representations of the real world are imperfect. This topic presents the importance of understanding spatial data uncertainty and discusses major methods and models to communicate, represent, and quantify positional and attribute uncertainty in spatial data, including both analytical and simulation approaches. Geo-semantic uncertainty that involves vague geographic concepts and classes is also addressed from the perspectives of fuzzy-set approaches and cognitive experiments. Potential methods that can be implemented to assess the quality of large volumes of crowd-sourced geographic data are also discussed. Finally, this topic ends with future directions to further research on spatial data quality and uncertainty.

DM-66 - Spatial Indexing

A spatial index is a data structure that allows for accessing a spatial object efficiently. It is a common technique used by spatial databases.  Without indexing, any search for a feature would require a "sequential scan" of every record in the database, resulting in much longer processing time. In a spatial index construction process, the minimum bounding rectangle serves as an object approximation. Various types of spatial indices across commercial and open-source databases yield measurable performance differences. Spatial indexing techniques are playing a central role in time-critical applications and the manipulation of spatial big data.

DM-18 - Spatio-temporal GIS
  • Describe extensions to relational DBMS to represent temporal change in attributes
  • Evaluate the advantages and disadvantages of existing space-time models based on storage efficiency, query performance, ease of data entry, and ability to implement in existing software
  • Create a GIS database that models temporal information
  • Utilize two different space-time models to characterize a given scenario, such as a daily commute
  • Describe the architecture of data models (both field and object based) to represent spatio-temporal phenomena
  • Differentiate the two types of temporal information to be modeled in databases: database (or transaction) time and valid (or world) time
  • Identify whether it is important to represent temporal change in a particular GIS application
  • Describe SQL extensions for querying temporal change
DM-46 - Systematic methods
  • Describe the historical context of the USPLS
  • Discuss the consequences of the USPLS with regard to public administration (i.e., zoning)
  • Explain how townships, ranges, and their sections are delineated in terms of baselines and principal meridians
  • Illustrate how to quarter-off portions of a township and range section
  • Discuss advantages and disadvantages of systematic land partitioning methods in the context of GIS
  • Differentiate the USPLS from the geographic coordinate system
  • Describe the New England Town partitioning system
  • Compare and contrast the United States Public Land Survey System (USPLS) and the Spanish land grant and French long lot systems
DM-49 - Tessellated referencing systems
  • Explain the concept “quadtree”
  • Describe the octahedral quarternary triangulated mesh georeferencing system proposed by Dutton
  • Discuss the advantages of hierarchical coordinates relative to geographic and plane coordinate systems
DM-09 - The hexagonal model
  • Illustrate the hexagonal model
  • Explain the limitations of the grid model compared to the hexagonal model
  • Exemplify the uses (past and potential) of the hexagonal model
DM-15 - The network model
  • Define the following terms pertaining to a network: Loops, multiple edges, the degree of a vertex, walk, trail, path, cycle, fundamental cycle
  • List definitions of networks that apply to specific applications or industries
  • Create an adjacency table from a sample network
  • Explain how a graph can be written as an adjacency matrix and how this can be used to calculate topological shortest paths in the graph
  • Create an incidence matrix from a sample network
  • Explain how a graph (network) may be directed or undirected
  • Demonstrate how attributes of networks can be used to represent cost, time, distance, or many other measures
  • Demonstrate how the star (or forward star) data structure, which is often employed when digitally storing network information, violates relational normal form, but allows for much faster search and retrieval in network databases
  • Discuss some of the difficulties of applying the standard process-pattern concept to lines and networks
  • Demonstrate how a network is a connected set of edges and vertices
DM-07 - The Raster Data Model

The raster data model is a widely used method of storing geographic data. The model most commonly takes the form of a grid-like structure that holds values at regularly spaced intervals over the extent of the raster. Rasters are especially well suited for storing continuous data such as temperature and elevation values, but can hold discrete and categorical data such as land use as well.  The resolution of a raster is given in linear units (e.g., meters) or angular units (e.g., one arc second) and defines the extent along one side of the grid cell. High (or fine) resolution rasters have comparatively closer spacing and more grid cells than low (or coarse) resolution rasters, and require relatively more memory to store. Active research in the domain is oriented toward improving compression schemes and implementation for alternative cell shapes (such as hexagons), and better supporting multi-resolution raster storage and analysis functions.

DM-12 - The spaghetti model
  • Identify a widely-used example of the spaghetti model (e.g., AutoCAD DWF, ESRI shapefile)
  • Write a program to read and write a vector data file using a common published format
  • Explain the conditions under which the spaghetti model is useful
  • Explain how the spaghetti data model embodies an object-based view of the world
  • Describe how geometric primitives are implemented in the spaghetti model as independent objects without topology
DM-13 - The topological model
  • Define terms related to topology (e.g., adjacency, connectivity, overlap, intersect, logical consistency)
  • Describe the integrity constraints of integrated topological models (e.g., POLYVRT)
  • Discuss the historical roots of the Census Bureau’s creation of GBF/DIME as the foundation for the development of topological data structures
  • Explain why integrated topological models have lost favor in commercial GIS software
  • Evaluate the positive and negative impacts of the shift from integrated topological models
  • Discuss the role of graph theory in topological structures
  • Exemplify the concept of planar enforcement (e.g., TIN triangles)
  • Demonstrate how a topological structure can be represented in a relational database structure
  • Explain the advantages and disadvantages of topological data models
  • Illustrate a topological relation

Pages