Data Management

Data management involves the theories and techniques for managing the entire data lifecycle, from data collection to data format conversion, from data storage to data sharing and retrieval, to data provenance, data quality control and data curation for long-term data archival and preservation.

Topics in this Knowledge Area are listed thematically below. Existing topics are linked directly to either their original (2006) or revised entries; forthcoming, future topics are italicized. 


Database Management Systems Events and Processes Plane Coordinate Systems
Data Retrieval Strategies Fields in Space & Time Tessellated Referencing Systems
Relational DBMS Integrated Models Linear Referencing
Extensions of the Relational Model Mereology: Structural Relationships Linear Referencing Systems
Object-oriented Spatial Databases Geneaological Relationships: Lineage, Inheritance Vertical Datums
Spatio-temporal GIS Topological Relationships Horizontal Datums
Database Change Modeling Tools Map Projection Properties
Modeling Database Change Conceptual Data Models Map Projection Classes
Managing Versioned Geospatial Databases Logical Data Models Map Projection Parameters
Reconciling Database Change Physical Data Models  
Data Warehouses Fuzzy Logic Georegistration
Ongoing GIS Revision Grid Compression Methods Systematic Georefencing Systems
Database Administration   Unsystematic Georeferencing Systems
Spatial Data Models   Spatial Data Infrastructure
Basic Data Structures Spatial Data Quality Spatial Data Infrastructures
Grid Representations Spatial Data Uncertainty Content Standards
The Raster Model Error-based Uncertainty Metadata
The Hexagonal Model Modeling Uncertainty Adoption of Standards
The Triangulated Irregular Network (TIN) Model Vagueness  
Hierarchical Data Models Mathemematical Models of Vaguness: Fuzzy Sets and Rough Sets  
Classical Vector Data Models    
The Topological Model Georeferencing Systems  
The Spaghetti Model History of Understanding Earth's Shape  
The Network Model Approximating the Geoid with Spheres & Ellipsoids  
Discrete Entities Approximating the Earth's Shape with Geoids  
Modeling 3D Entities The Geographic Coordinate System  


DM-19 - Modeling uncertainty
  • Differentiate among modeling uncertainty for entire datasets, for features, and for individual data values
  • Describe SQL extensions for querying uncertainty information in databases
  • Describe extensions to relational DBMS to represent different types of uncertainty in attributes, including both vagueness/fuzziness and error-based uncertainty
  • Discuss the role of metadata in representing and communicating dataset-level uncertainty
  • Create a GIS database that models uncertain information
  • Identify whether it is important to represent uncertainty in a particular GIS application
  • Describe the architecture of data models (both field- and object-based) to represent feature-level and datum-level uncertainty
  • Evaluate the advantages and disadvantages of existing uncertainty models based on storage efficiency, query performance, ease of data entry, and ability to implement in existing software
DM-17 - Object-based spatial databases
  • Discuss the merits of storing geometric data in the same location as attribute data
  • Evaluate the advantages and disadvantages of the object-based data model compared to the layer-based vector data model (topological or spaghetti)
  • Describe the architectures of various object-relational spatial data models, including spatial extensions of DBMS, proprietary object-based data models from GIS vendors, and open-source and standards-based efforts
  • Differentiate between the topological vector data model and spaghetti object data with topological rulebases
  • Write a script (in a GIS, database, or Web environment) to read and write data in an objectbased spatial database
  • Transfer geospatial data from an XML schema to a database
  • Discuss the degree to which various object-relational spatial data models approximate a true object-oriented paradigm, and whether they should
DM-04 - Object-oriented DBMS
  • Describe the basic elements of the object-oriented paradigm, such as inheritance, encapsulation, methods, and composition
  • Evaluate the degree to which the object-oriented paradigm does or does not approximate cognitive structures
  • Explain how the principle of inheritance can be implemented using an object-oriented programming approach
  • Defend or refute the notion that the Extensible Markup Language (XML) is a form of object-oriented database
  • Explain how the properties of object orientation allows for combining and generalizing objects
  • Evaluate the advantages and disadvantages of object-oriented databases compared to relational databases, focusing on representational power, data entry, storage efficiency, and query performance
  • Implement a GIS database design in an off-the-shelf, object-oriented database
  • Differentiate between object-oriented programming and object-oriented databases
DM-61 - Ongoing GIS revision
  • Describe a method that allows users within an organization to access data, including methods of data sharing, version control, and maintenance
  • Describe how spatial data and GIS&T can be integrated into a work flow process
  • Develop a plan for user feedback and self-evaluation procedures
  • Evaluate how external spatial data sources can be incorporated into the business process
  • Evaluate internal spatial databases for continuing adequacy
  • Evaluate the efficiency and effectiveness of an existing enterprise GIS
  • Evaluate the needs for spatial data sources including currency, accuracy and access, specifically addressing issues related to financial costs, sharing arrangements, online/realtime, and transactional processes across an organization
  • Illustrate how a business process analysis can be used to periodically review system requirements
  • List improvements that may be made to the design of an existing GIS
  • Describe how internal spatial data sources can be handled during an implementation process
DM-36 - Physical Data Models

Constructs within a particular implementation of database management software guide the development of a physical data model, which is a product of a physical database design process. A physical data model documents how data are to be stored and accessed on storage media of computer hardware.  A physical data model is dependent on specific data types and indexing mechanisms used within database management system software.  Data types such as integers, reals, character strings, plus many others can lead to different storage structures. Indexing mechanisms such as region-trees and hash functions and others lead to differences in access performance.  Physical data modeling choices about data types and indexing mechanisms related to storage structures refine details of a physical database design. Data types associated with field, record and file storage structures together with the access mechanisms to those structures foster (constrain) performance of a database design. Since all software runs using an operating system, field, record, and file storage structures must be translated into operating system constructs to be implemented.  As such, all storage structures are contingent on the operating system and particular hardware that host data management software. 

DM-48 - Plane coordinate systems
  • Explain why plane coordinates are sometimes preferable to geographic coordinates
  • Identify the map projection(s) upon which UTM coordinate systems are based, and explain the relationship between the projection(s) and the coordinate system grid
  • Discuss the magnitude and cause of error associated with UTM coordinates
  • Differentiate the characteristics and uses of the UTM coordinate system from the Military Grid Reference System (MGRS) and the World Geographic Reference System (GEOREF)
  • Explain what State Plane Coordinates system (SPC) eastings and northings represent
  • Associate SPC coordinates and zone specifications with corresponding positions on a U.S. map or globe
  • Identify the map projection(s) upon which SPC coordinate systems are based, and explain the relationship between the projection(s) and the coordinate system grids
  • Discuss the magnitude and cause of error associated with SPC coordinates
  • Recommend the most appropriate plane coordinate system for applications at different spatial extents and justify the recommendation
  • Critique the U.S. Geological Survey’s choice of UTM as the standard coordinate system for the U.S. National Map
  • Describe the characteristics of the “national grids” of countries other than the U.S.
  • Explain what Universal Transverse Mercator (UTM) eastings and northings represent
  • Associate UTM coordinates and zone specifications with corresponding position on a world map or globe
DM-39 - Reconciling database change
  • Design a test of reliability of change information (e.g., the logical consistency of updates to the TIGER database)
  • Implement a test of reliability of change information
DM-03 - Relational DBMS
  • Explain the advantage of the relational model over earlier database structures including spreadsheets
  • Define the basic terms used in relational database management systems (e.g., tuple, relation, foreign key, SQL, relational join)
  • Discuss the efficiency and costs of normalization
  • Describe the entity-relationship diagram approach to data modeling
  • Explain how entity-relationship diagrams are translated into relational tables
  • Create an SQL query that extracts data from related tables
  • Describe the problems associated with failure to follow the first and second normal forms (including data confusion, redundancy, and retrieval difficulties)
  • Demonstrate how search and relational join operations provide results for a typical GIS query and other simple operations using the relational DBMS within a GIS software application
DM-60 - Spatial Data Infrastructures

Spatial data infrastructure (SDI) is the infrastructure that facilitates the discovery, access, management, distribution, reuse, and preservation of digital geospatial resources. These resources may include maps, data, geospatial services, and tools. As cyberinfrastructures, SDIs are similar to other infrastructures, such as water supplies and transportation networks, since they play fundamental roles in many aspects of the society. These roles have become even more significant in today’s big data age, when a large volume of geospatial data and Web services are available. From a technological perspective, SDIs mainly consist of data, hardware, and software. However, a truly functional SDI also needs the efforts of people, supports from organizations, government policies, data and software standards, and many others. In this chapter, we will present the concepts and values of SDIs, as well as a brief history of SDI development in the U.S. We will also discuss the components of a typical SDI, and will specifically focus on three key components: geoportals, metadata, and search functions. Examples of the existing SDI implementations will also be discussed.  

DM-65 - Spatial Data Uncertainty

Although spatial data users may not be aware of the inherent uncertainty in all the datasets they use, it is critical to evaluate data quality in order to understand the validity and limitations of any conclusions based on spatial data. Spatial data uncertainty is inevitable as all representations of the real world are imperfect. This topic presents the importance of understanding spatial data uncertainty and discusses major methods and models to communicate, represent, and quantify positional and attribute uncertainty in spatial data, including both analytical and simulation approaches. Geo-semantic uncertainty that involves vague geographic concepts and classes is also addressed from the perspectives of fuzzy-set approaches and cognitive experiments. Potential methods that can be implemented to assess the quality of large volumes of crowd-sourced geographic data are also discussed. Finally, this topic ends with future directions to further research on spatial data quality and uncertainty.