Data Management

Data management involves the theories and techniques for managing the entire data lifecycle, from data collection to data format conversion, from data storage to data sharing and retrieval, to data provenance, data quality control and data curation for long-term data archival and preservation.

Topics in this Knowledge Area are listed thematically below. Existing topics are in regular font and linked directly to their original entries (published in 2006; these contain only Learning Objectives). Entries that have been updated and expanded are in bold. Forthcoming, future topics are italicized

 

Spatial Databases Genealogical Relationships, Linkage, and Inheritance Georeferencing Systems
Spatial Database Management Systems Geospatial Data Conflation Approximating the Earth's Shape with Geoids
Use of Relational DBMSs Standardization & Exchange Specifications Geographic Coordinate Systems
Object-Oriented DBMSs Spatial Access Methods Planar Coordinate Systems
Extensions of the Relational DBMS Data Retrieval Methods Tesselated Referencing Systems
Topological Relationships Spatial Indexing Linear Referencing Systems
Database Administration Space-driven Structures: Grid, linear quadtree, and z-ordering tree files Vertical Datums
Conceptual Data Models Data-driven structures: R-trees and cost models Horizontal Datums
Logical Data Models Modeling Unstructured Spatial Data Georegistration
Physical Data Models Modeling Semi-Structured Spatial Data Map Projections
NoSQL Databases Query Processing  
Problems with Large Spatial Databases Optimal I/O Algorithms Spatial Data Infrastructures
Representations of Spatial Objects Spatial Joins Metadata
Raster Data Models Complex Queries Content Standards
Vector Data Models Spatial Data Quality Data Warehouses
Topological Models Spatial Data Uncertainty Spatial Data Infrastructures
Spaghetti Models Modeling Uncertainty U.S. National Spatial Data Infrastructure
Network Models Error-based Uncertainty Common Ontologies for Spatial Data & Their Applications
Modeling 3D Entities Vagueness  
Fields in Space and Time Mathematical Models of Vagueness: Fuzzy and Rough sets   
Fuzzy Models    
Triangulated Irregular Network Models    

 

DM-35 - Logical Data Models

A logical data model is created for the second of three levels of abstraction, conceptual, logical, and physical. A logical data model expresses the meaning context of a conceptual data model, and adds to that detail about data (base) structures, e.g. using topologically-organized records, relational tables, object-oriented classes, or extensible markup language (XML) construct  tags. However, the logical data model formed is independent of a particular database management software product. Nonetheless such a model is often constrained by a class of software language techniques for representation, making implementation with a physical data model easier. Complex entity types of the conceptual data model must be translated into sub-type/super-type hierarchies to clarify data contexts for the entity type, while avoiding duplication of concepts and data. Entities and records should have internal identifiers. Relationships can be used to express the involvement of entity types with activities or associations. A logical schema is formed from the above data organization. A schema diagram depicts the entity, attribute and relationship detail for each application. The resulting logical data models can be synthesized using schema integration to support multi-user database environments, e.g., data warehouses for strategic applications and/or federated databases for tactical/operational business applications.

DM-31 - Mathematical models of vagueness: Fuzzy sets and rough sets
  • Compare and contrast the relative merits of fuzzy sets, rough sets, and other models
  • Differentiate between fuzzy set membership and probabilistic set membership
  • Explain the problems inherent in fuzzy sets
  • Create appropriate membership functions to model vague phenomena
DM-57 - Metadata
  • Define “metadata” in the context of the geospatial data set
  • Use a metadata utility to create a geospatial metadata document for a digital database you created
  • Formulate metadata for a graphic output that would be distributed to the general public
  • Formulate metadata for a geostatistical analysis that would be released to an experienced audience
  • Compose data integrity statements for a geostatistical or spatial analysis to be included in graphic output
  • Identify software tools available to support metadata creation
  • Interpret the elements of an existing metadata document
  • Explain why metadata production should be integrated into the data production and database development workflows, rather than treated as an ancillary activity
  • Outline the elements of the U.S. geospatial metadata standard
  • Explain the ways in which metadata increases the value of geospatial data
DM-21 - Modeling three-dimensional (3-D) entities
  • Identify GIS application domains in which true 3-D models of natural phenomena are necessary
  • Illustrate the use of Virtual Reality Modeling Language (VRML) to model landscapes in 3-D
  • Explain how octatrees are the 3-D extension of quadtrees
  • Explain how voxels and stack-unit maps that show the topography of a series of geologic layers might be considered 3-D extensions of field and vector representations respectively
  • Explain how 3-D models can be extended to additional dimensions
  • Explain the use of multi-patching to represent 3-D objects
  • Explain the difficulties in creating true 3-D objects in a vector or raster format
  • Differentiate between 21/2-D representations and true 3-D models
DM-19 - Modeling uncertainty
  • Differentiate among modeling uncertainty for entire datasets, for features, and for individual data values
  • Describe SQL extensions for querying uncertainty information in databases
  • Describe extensions to relational DBMS to represent different types of uncertainty in attributes, including both vagueness/fuzziness and error-based uncertainty
  • Discuss the role of metadata in representing and communicating dataset-level uncertainty
  • Create a GIS database that models uncertain information
  • Identify whether it is important to represent uncertainty in a particular GIS application
  • Describe the architecture of data models (both field- and object-based) to represent feature-level and datum-level uncertainty
  • Evaluate the advantages and disadvantages of existing uncertainty models based on storage efficiency, query performance, ease of data entry, and ability to implement in existing software
DM-67 - NoSQL Databases

NoSQL databases are open-source, schema-less, horizontally scalable and high-performance databases. These characteristics make them very different from relational databases, the traditional choice for spatial data. The four types of data stores in NoSQL databases (key-value store, document store, column store, and graph store) contribute to significant flexibility for a range of applications. NoSQL databases are well suited to handle typical challenges of big data, including volume, variety, and velocity. For these reasons, they are increasingly adopted by private industries and used in research. They have gained tremendous popularity in the last decade due to their ability to manage unstructured data (e.g. social media data).

DM-04 - Object-oriented DBMS
  • Describe the basic elements of the object-oriented paradigm, such as inheritance, encapsulation, methods, and composition
  • Evaluate the degree to which the object-oriented paradigm does or does not approximate cognitive structures
  • Explain how the principle of inheritance can be implemented using an object-oriented programming approach
  • Defend or refute the notion that the Extensible Markup Language (XML) is a form of object-oriented database
  • Explain how the properties of object orientation allows for combining and generalizing objects
  • Evaluate the advantages and disadvantages of object-oriented databases compared to relational databases, focusing on representational power, data entry, storage efficiency, and query performance
  • Implement a GIS database design in an off-the-shelf, object-oriented database
  • Differentiate between object-oriented programming and object-oriented databases
DM-36 - Physical Data Models

Constructs within a particular implementation of database management software guide the development of a physical data model, which is a product of a physical database design process. A physical data model documents how data are to be stored and accessed on storage media of computer hardware.  A physical data model is dependent on specific data types and indexing mechanisms used within database management system software.  Data types such as integers, reals, character strings, plus many others can lead to different storage structures. Indexing mechanisms such as region-trees and hash functions and others lead to differences in access performance.  Physical data modeling choices about data types and indexing mechanisms related to storage structures refine details of a physical database design. Data types associated with field, record and file storage structures together with the access mechanisms to those structures foster (constrain) performance of a database design. Since all software runs using an operating system, field, record, and file storage structures must be translated into operating system constructs to be implemented.  As such, all storage structures are contingent on the operating system and particular hardware that host data management software. 

DM-48 - Plane coordinate systems
  • Explain why plane coordinates are sometimes preferable to geographic coordinates
  • Identify the map projection(s) upon which UTM coordinate systems are based, and explain the relationship between the projection(s) and the coordinate system grid
  • Discuss the magnitude and cause of error associated with UTM coordinates
  • Differentiate the characteristics and uses of the UTM coordinate system from the Military Grid Reference System (MGRS) and the World Geographic Reference System (GEOREF)
  • Explain what State Plane Coordinates system (SPC) eastings and northings represent
  • Associate SPC coordinates and zone specifications with corresponding positions on a U.S. map or globe
  • Identify the map projection(s) upon which SPC coordinate systems are based, and explain the relationship between the projection(s) and the coordinate system grids
  • Discuss the magnitude and cause of error associated with SPC coordinates
  • Recommend the most appropriate plane coordinate system for applications at different spatial extents and justify the recommendation
  • Critique the U.S. Geological Survey’s choice of UTM as the standard coordinate system for the U.S. National Map
  • Describe the characteristics of the “national grids” of countries other than the U.S.
  • Explain what Universal Transverse Mercator (UTM) eastings and northings represent
  • Associate UTM coordinates and zone specifications with corresponding position on a world map or globe
DM-70 - Problems of Large Spatial Databases

Large spatial databases often labeled as geospatial big data exceed the capacity of commonly used computing systems as a result of data volume, variety, velocity, and veracity. Additional problems also labeled with V’s are cited, but the four primary ones are the most problematic and focus of this chapter (Li et al., 2016, Panimalar et al., 2017).  Sources include satellites, aircraft and drone platforms, vehicles, geosocial networking services, mobile devices, and cameras. The problems in processing these data to extract useful information include query, analysis, and visualization. Data mining techniques and machine learning algorithms, such as deep convolutional neural networks, often are used with geospatial big data. The obvious problem is handling the large data volumes, particularly for input and output operations, requiring parallel read and write of the data, as well as high speed computers, disk services, and network transfer speeds. Additional problems of large spatial databases include the variety and heterogeneity of data requiring advanced algorithms to handle different data types and characteristics, and integration with other data. The velocity at which the data are acquired is a challenge, especially using today’s advanced sensors and the Internet of Things that includes millions of devices creating data on short temporal scales of micro seconds to minutes. Finally, the veracity, or truthfulness of large spatial databases is difficult to establish and validate, particularly for all data elements in the database.

Pages