Data Management

Data management involves the theories and techniques for managing the entire data lifecycle, from data collection to data format conversion, from data storage to data sharing and retrieval, to data provenance, data quality control and data curation for long-term data archival and preservation.

Topics in this Knowledge Area are listed thematically below. Existing topics are in regular font and linked directly to their original entries (published in 2006; these contain only Learning Objectives). Entries that have been updated and expanded are in bold. Forthcoming, future topics are italicized


Spatial Databases Spatial Access Methods Georeferencing Systems
Spatial Database Management Systems Data Retrieval Strategies Approximating the Earth's Shape with Geoids
Use of Relational DBMSs Spatial Indexing Geographic Coordinate Systems
Object-Oriented DBMSs Space-driven Structures: Grid, linear quadtree, and z-ordering tree files Planar Coordinate Systems
Extensions of the Relational DBMS Data-driven structures: R-trees and cost models Tesselated Referencing Systems
Topological Relationships Modeling Unstructured Spatial Data Linear Referencing
Database Administration Modeling Semi-Structured Spatial Data Vertical Datums
Conceptual Data Models   Horizontal Datums
Logical Data Models Query Processing Georegistration
Physical Data Models Optimal I/O Algorithms Map Projections
NoSQL Databases Spatial Joins  
Problems with Large Spatial Databases Complex Queries  
Array Databases Spatial Data Infrastructures  
Representations of Spatial Objects Metadata  
Events and Processes Content Standards  
Raster Data Models Data Warehouses  
Vector Data Models Spatial Data Infrastructures  
Topological Models U.S. National Spatial Data Infrastructure  
Network Models Ontology for Geospatial Semantic Interoperability  
Modeling 3D Entities Hydrographic Geospatial Data Standards  
Fields in Space and Time Marine Spatial Data Infrastructure  
Fuzzy Models    
Triangulated Irregular Network Models    
Genealogical Relationships, Linkage, and Inheritance    
Geospatial Data Conflation    
Standardization & Exchange Specifications    


DM-16 - Linear Referencing

Linear referencing is a term that encompasses a family of concepts and techniques for associating features with a spatial location along a network, rather than referencing those locations to a traditional spherical or planar coordinate system. Linear referencing is used when the location on the network, and the relationships to other locations on the network, are more significant than the location in 2D or 3D space. Linear referencing is commonly used in transportation applications, including roads, railways, and pipelines, although any network structure can be used as the basis for linearly referenced features. Several data models for storing linearly referenced data are available, and well-defined sets of procedures can be used to implement linear referencing for a particular application. As network analysis and network based statistical analysis become more prevalent across disciplines, linear referencing is likely to remain an important component of the data used for such analyses.

DM-35 - Logical Data Models

A logical data model is created for the second of three levels of abstraction, conceptual, logical, and physical. A logical data model expresses the meaning context of a conceptual data model, and adds to that detail about data (base) structures, e.g. using topologically-organized records, relational tables, object-oriented classes, or extensible markup language (XML) construct  tags. However, the logical data model formed is independent of a particular database management software product. Nonetheless such a model is often constrained by a class of software language techniques for representation, making implementation with a physical data model easier. Complex entity types of the conceptual data model must be translated into sub-type/super-type hierarchies to clarify data contexts for the entity type, while avoiding duplication of concepts and data. Entities and records should have internal identifiers. Relationships can be used to express the involvement of entity types with activities or associations. A logical schema is formed from the above data organization. A schema diagram depicts the entity, attribute and relationship detail for each application. The resulting logical data models can be synthesized using schema integration to support multi-user database environments, e.g., data warehouses for strategic applications and/or federated databases for tactical/operational business applications.

DM-31 - Mathematical models of vagueness: Fuzzy sets and rough sets
  • Compare and contrast the relative merits of fuzzy sets, rough sets, and other models
  • Differentiate between fuzzy set membership and probabilistic set membership
  • Explain the problems inherent in fuzzy sets
  • Create appropriate membership functions to model vague phenomena
DM-57 - Metadata
  • Define “metadata” in the context of the geospatial data set
  • Use a metadata utility to create a geospatial metadata document for a digital database you created
  • Formulate metadata for a graphic output that would be distributed to the general public
  • Formulate metadata for a geostatistical analysis that would be released to an experienced audience
  • Compose data integrity statements for a geostatistical or spatial analysis to be included in graphic output
  • Identify software tools available to support metadata creation
  • Interpret the elements of an existing metadata document
  • Explain why metadata production should be integrated into the data production and database development workflows, rather than treated as an ancillary activity
  • Outline the elements of the U.S. geospatial metadata standard
  • Explain the ways in which metadata increases the value of geospatial data
DM-21 - Modeling three-dimensional (3-D) entities
  • Identify GIS application domains in which true 3-D models of natural phenomena are necessary
  • Illustrate the use of Virtual Reality Modeling Language (VRML) to model landscapes in 3-D
  • Explain how octatrees are the 3-D extension of quadtrees
  • Explain how voxels and stack-unit maps that show the topography of a series of geologic layers might be considered 3-D extensions of field and vector representations respectively
  • Explain how 3-D models can be extended to additional dimensions
  • Explain the use of multi-patching to represent 3-D objects
  • Explain the difficulties in creating true 3-D objects in a vector or raster format
  • Differentiate between 21/2-D representations and true 3-D models
DM-19 - Modeling uncertainty
  • Differentiate among modeling uncertainty for entire datasets, for features, and for individual data values
  • Describe SQL extensions for querying uncertainty information in databases
  • Describe extensions to relational DBMS to represent different types of uncertainty in attributes, including both vagueness/fuzziness and error-based uncertainty
  • Discuss the role of metadata in representing and communicating dataset-level uncertainty
  • Create a GIS database that models uncertain information
  • Identify whether it is important to represent uncertainty in a particular GIS application
  • Describe the architecture of data models (both field- and object-based) to represent feature-level and datum-level uncertainty
  • Evaluate the advantages and disadvantages of existing uncertainty models based on storage efficiency, query performance, ease of data entry, and ability to implement in existing software
DM-67 - NoSQL Databases

NoSQL databases are open-source, schema-less, horizontally scalable and high-performance databases. These characteristics make them very different from relational databases, the traditional choice for spatial data. The four types of data stores in NoSQL databases (key-value store, document store, column store, and graph store) contribute to significant flexibility for a range of applications. NoSQL databases are well suited to handle typical challenges of big data, including volume, variety, and velocity. For these reasons, they are increasingly adopted by private industries and used in research. They have gained tremendous popularity in the last decade due to their ability to manage unstructured data (e.g. social media data).

DM-04 - Object-oriented DBMS
  • Describe the basic elements of the object-oriented paradigm, such as inheritance, encapsulation, methods, and composition
  • Evaluate the degree to which the object-oriented paradigm does or does not approximate cognitive structures
  • Explain how the principle of inheritance can be implemented using an object-oriented programming approach
  • Defend or refute the notion that the Extensible Markup Language (XML) is a form of object-oriented database
  • Explain how the properties of object orientation allows for combining and generalizing objects
  • Evaluate the advantages and disadvantages of object-oriented databases compared to relational databases, focusing on representational power, data entry, storage efficiency, and query performance
  • Implement a GIS database design in an off-the-shelf, object-oriented database
  • Differentiate between object-oriented programming and object-oriented databases
DM-80 - Ontology for Geospatial Semantic Interoperability

It is difficult to share and reuse geospatial data and retrieve geospatial information because of geospatial data heterogeneity problems. Lack of semantic interoperability is one of the major problems facing GIS (Geographic Information Science/System) systems and applications today. To solve geospatial data heterogeneity problems and support geospatial information retrieval and semantic interoperability over the Web, the use of an ontology is proposed because it is a formal explicit description of concepts or meanings of words in a well-defined and unambiguous manner. Geospatial ontologies represent geospatial concepts and properties for use over the Web. OWL (Ontology Web Language) is an emerging language for defining and instantiating ontologies. OWL builds on RDF (Resource Description Framework) but adds more vocabulary for describing properties and classes. The downside of representing structured geospatial data in OWL and RDF languages is that it can result in inefficient data access. SPARQL (Simple Protocol and RDF Query Language) is recommended for general RDF query while the GeoSPARQL (Geographic Simple Protocol and RDF Query Language) protocol is proposed as an extension of SPARQL for querying geospatial data. However, the runtime cost of GeoSPARQL queries can be high due to the fine-grained nature of RDF data models. There are several challenges to using ontologies for geospatial semantic interoperability but these can be overcome through collaboration.

DM-36 - Physical Data Models

Constructs within a particular implementation of database management software guide the development of a physical data model, which is a product of a physical database design process. A physical data model documents how data are to be stored and accessed on storage media of computer hardware.  A physical data model is dependent on specific data types and indexing mechanisms used within database management system software.  Data types such as integers, reals, character strings, plus many others can lead to different storage structures. Indexing mechanisms such as region-trees and hash functions and others lead to differences in access performance.  Physical data modeling choices about data types and indexing mechanisms related to storage structures refine details of a physical database design. Data types associated with field, record and file storage structures together with the access mechanisms to those structures foster (constrain) performance of a database design. Since all software runs using an operating system, field, record, and file storage structures must be translated into operating system constructs to be implemented.  As such, all storage structures are contingent on the operating system and particular hardware that host data management software.