Data Management

Data management involves the theories and techniques for managing the entire data lifecycle, from data collection to data format conversion, from data storage to data sharing and retrieval, to data provenance, data quality control and data curation for long-term data archival and preservation.

Topics in this Knowledge Area are listed thematically below. Existing topics are in regular font and linked directly to their original entries (published in 2006; these contain only Learning Objectives). Entries that have been updated and expanded are in bold. Forthcoming, future topics are italicized

 

Spatial Databases Genealogical Relationships, Linkage, and Inheritance Georeferencing Systems
Spatial Database Management Systems Conflation & Related Spatial Data Integration Techniques Approximating the Earth's Shape with Geoids
Use of Relational DBMSs Standardization & Exchange Specifications Geographic Coordinate Systems
Object-Oriented DBMSs Spatial Access Methods Planar Coordinate Systems
Extensions of the Relational DBMS Data Retrieval Methods Tesselated Referencing Systems
Topological Relationships Spatial Indexing Linear Referencing Systems
Database Administration Space-driven Structures: Grid, linear quadtree, and z-ordering tree files Vertical Datums
Conceptual Data Models Data-driven structures: R-trees and cost models Horizontal Datums
Logical Data Models Modeling Unstructured Spatial Data Georegistration
Physical Data Models Modeling Semi-Structured Spatial Data Spatial Data Infrastructures
NoSQL Databases Query Processing Metadata
Problems with Large Spatial Databases Optimal I/O Algorithms Content Standards
Representations of Spatial Objects Spatial Joins Data Warehouses
The Raster Data Model Complex Queries Spatial Data Infrastructures
Classical Vector Data Models Spatial Data Quality U.S. National Spatial Data Infrastructure
The Topological Model Vagueness Common Ontologies for Spatial Data & Their Applications
The Spaghetti Model Mathematical Models of Vagueness: Fuzzy and Rough sets   
The Network Model Error-based Uncertainty  
Modeling 3D Entities Spatial Data Uncertainty  
Field-Based Models    
Fuzzy Models    
Triangulated Irregular Network Models    

 

DM-57 - Metadata
  • Define “metadata” in the context of the geospatial data set
  • Use a metadata utility to create a geospatial metadata document for a digital database you created
  • Formulate metadata for a graphic output that would be distributed to the general public
  • Formulate metadata for a geostatistical analysis that would be released to an experienced audience
  • Compose data integrity statements for a geostatistical or spatial analysis to be included in graphic output
  • Identify software tools available to support metadata creation
  • Interpret the elements of an existing metadata document
  • Explain why metadata production should be integrated into the data production and database development workflows, rather than treated as an ancillary activity
  • Outline the elements of the U.S. geospatial metadata standard
  • Explain the ways in which metadata increases the value of geospatial data
DM-21 - Modeling three-dimensional (3-D) entities
  • Identify GIS application domains in which true 3-D models of natural phenomena are necessary
  • Illustrate the use of Virtual Reality Modeling Language (VRML) to model landscapes in 3-D
  • Explain how octatrees are the 3-D extension of quadtrees
  • Explain how voxels and stack-unit maps that show the topography of a series of geologic layers might be considered 3-D extensions of field and vector representations respectively
  • Explain how 3-D models can be extended to additional dimensions
  • Explain the use of multi-patching to represent 3-D objects
  • Explain the difficulties in creating true 3-D objects in a vector or raster format
  • Differentiate between 21/2-D representations and true 3-D models
DM-19 - Modeling uncertainty
  • Differentiate among modeling uncertainty for entire datasets, for features, and for individual data values
  • Describe SQL extensions for querying uncertainty information in databases
  • Describe extensions to relational DBMS to represent different types of uncertainty in attributes, including both vagueness/fuzziness and error-based uncertainty
  • Discuss the role of metadata in representing and communicating dataset-level uncertainty
  • Create a GIS database that models uncertain information
  • Identify whether it is important to represent uncertainty in a particular GIS application
  • Describe the architecture of data models (both field- and object-based) to represent feature-level and datum-level uncertainty
  • Evaluate the advantages and disadvantages of existing uncertainty models based on storage efficiency, query performance, ease of data entry, and ability to implement in existing software
DM-67 - NoSQL Databases

NoSQL databases are open-source, schema-less, horizontally scalable and high-performance databases. These characteristics make them very different from relational databases, the traditional choice for spatial data. The four types of data stores in NoSQL databases (key-value store, document store, column store, and graph store) contribute to significant flexibility for a range of applications. NoSQL databases are well suited to handle typical challenges of big data, including volume, variety, and velocity. For these reasons, they are increasingly adopted by private industries and used in research. They have gained tremendous popularity in the last decade due to their ability to manage unstructured data (e.g. social media data).

DM-04 - Object-oriented DBMS
  • Describe the basic elements of the object-oriented paradigm, such as inheritance, encapsulation, methods, and composition
  • Evaluate the degree to which the object-oriented paradigm does or does not approximate cognitive structures
  • Explain how the principle of inheritance can be implemented using an object-oriented programming approach
  • Defend or refute the notion that the Extensible Markup Language (XML) is a form of object-oriented database
  • Explain how the properties of object orientation allows for combining and generalizing objects
  • Evaluate the advantages and disadvantages of object-oriented databases compared to relational databases, focusing on representational power, data entry, storage efficiency, and query performance
  • Implement a GIS database design in an off-the-shelf, object-oriented database
  • Differentiate between object-oriented programming and object-oriented databases
DM-36 - Physical Data Models

Constructs within a particular implementation of database management software guide the development of a physical data model, which is a product of a physical database design process. A physical data model documents how data are to be stored and accessed on storage media of computer hardware.  A physical data model is dependent on specific data types and indexing mechanisms used within database management system software.  Data types such as integers, reals, character strings, plus many others can lead to different storage structures. Indexing mechanisms such as region-trees and hash functions and others lead to differences in access performance.  Physical data modeling choices about data types and indexing mechanisms related to storage structures refine details of a physical database design. Data types associated with field, record and file storage structures together with the access mechanisms to those structures foster (constrain) performance of a database design. Since all software runs using an operating system, field, record, and file storage structures must be translated into operating system constructs to be implemented.  As such, all storage structures are contingent on the operating system and particular hardware that host data management software. 

DM-48 - Plane coordinate systems
  • Explain why plane coordinates are sometimes preferable to geographic coordinates
  • Identify the map projection(s) upon which UTM coordinate systems are based, and explain the relationship between the projection(s) and the coordinate system grid
  • Discuss the magnitude and cause of error associated with UTM coordinates
  • Differentiate the characteristics and uses of the UTM coordinate system from the Military Grid Reference System (MGRS) and the World Geographic Reference System (GEOREF)
  • Explain what State Plane Coordinates system (SPC) eastings and northings represent
  • Associate SPC coordinates and zone specifications with corresponding positions on a U.S. map or globe
  • Identify the map projection(s) upon which SPC coordinate systems are based, and explain the relationship between the projection(s) and the coordinate system grids
  • Discuss the magnitude and cause of error associated with SPC coordinates
  • Recommend the most appropriate plane coordinate system for applications at different spatial extents and justify the recommendation
  • Critique the U.S. Geological Survey’s choice of UTM as the standard coordinate system for the U.S. National Map
  • Describe the characteristics of the “national grids” of countries other than the U.S.
  • Explain what Universal Transverse Mercator (UTM) eastings and northings represent
  • Associate UTM coordinates and zone specifications with corresponding position on a world map or globe
DM-70 - Problems of large spatial databases
  • Describe emerging geographical analysis techniques in geocomputation derived from artificial intelligence (e.g., expert systems, artificial neural networks, genetic algorithms, and software agents)
  • Explain how to recognize contaminated data in large datasets
  • Outline the implications of complexity for the application of statistical ideas in geography
  • Explain what is meant by the term “contaminated data,” suggesting how it can arise
  • Describe difficulties in dealing with large spatial databases, especially those arising from spatial heterogeneity
DM-03 - Relational DBMS
  • Explain the advantage of the relational model over earlier database structures including spreadsheets
  • Define the basic terms used in relational database management systems (e.g., tuple, relation, foreign key, SQL, relational join)
  • Discuss the efficiency and costs of normalization
  • Describe the entity-relationship diagram approach to data modeling
  • Explain how entity-relationship diagrams are translated into relational tables
  • Create an SQL query that extracts data from related tables
  • Describe the problems associated with failure to follow the first and second normal forms (including data confusion, redundancy, and retrieval difficulties)
  • Demonstrate how search and relational join operations provide results for a typical GIS query and other simple operations using the relational DBMS within a GIS software application
DM-60 - Spatial Data Infrastructures

Spatial data infrastructure (SDI) is the infrastructure that facilitates the discovery, access, management, distribution, reuse, and preservation of digital geospatial resources. These resources may include maps, data, geospatial services, and tools. As cyberinfrastructures, SDIs are similar to other infrastructures, such as water supplies and transportation networks, since they play fundamental roles in many aspects of the society. These roles have become even more significant in today’s big data age, when a large volume of geospatial data and Web services are available. From a technological perspective, SDIs mainly consist of data, hardware, and software. However, a truly functional SDI also needs the efforts of people, supports from organizations, government policies, data and software standards, and many others. In this chapter, we will present the concepts and values of SDIs, as well as a brief history of SDI development in the U.S. We will also discuss the components of a typical SDI, and will specifically focus on three key components: geoportals, metadata, and search functions. Examples of the existing SDI implementations will also be discussed.  

Pages