Data Management

Data management involves the theories and techniques for managing the entire data lifecycle, from data collection to data format conversion, from data storage to data sharing and retrieval, to data provenance, data quality control and data curation for long-term data archival and preservation.

Topics in this Knowledge Area are listed thematically below. Existing topics are in regular font and linked directly to their original entries (published in 2006; these contain only Learning Objectives). Entries that have been updated and expanded are in bold. Forthcoming, future topics are italicized

 

Spatial Databases Spatial Access Methods Georeferencing Systems
Spatial Database Management Systems Data Retrieval Strategies Earth's Shape, Sea Level, and the Geoid
Use of Relational DBMSs Spatial Indexing Geographic Coordinate Systems
Object-Oriented DBMSs Space-driven Structures: Grid, linear quadtree, and z-ordering tree files Planar Coordinate Systems
Relational DBMS and their Spatial Extensions Data-driven structures: R-trees and cost models Tesselated Referencing Systems
Topological Relationships Modeling Unstructured Spatial Data Linear Referencing
Database Administration Modeling Semi-Structured Spatial Data Vertical (Geopotential) Datums
Conceptual Data Models   Horizontal (Geometric) Datums
Logical Data Models Query Processing Georegistration
Physical Data Models Optimal I/O Algorithms Map Projections
NoSQL Databases Spatial Joins  
Problems with Large Spatial Databases Complex Queries Data Manipulation
Array Databases Spatial Data Infrastructures Point, Line, and Area Generalization
Representations of Spatial Objects Metadata Vector-to-Raster and Raster-to-Vector Conversions
Events and Processes Content Standards Raster Resampling
Raster Data Models Data Warehouses Coordinate Transformations
Vector Data Models Spatial Data Infrastructures Transaction Management
Topological Models U.S. National Spatial Data Infrastructure  
Network Models Ontology for Geospatial Semantic Interoperability  
Entity-based Models Marine Spatial Data Infrastructure  
Modeling 3D Entities Hydrographic Geospatial Data Standards  
Fields in Space and Time    
Fuzzy Models    
Triangular Irregular Network (TIN) Models    
Genealogical Relationships, Linkage, and Inheritance    
Geospatial Data Conflation    

 

DM-91 - Marine Spatial Data Infrastructure

Marine Spatial Data Infrastructure (MSDI), the extension of terrestrial Spatial Data Infrastructure to the marine environment, is a type of cyberinfrastructure that facilitates the discovery, access, management, distribution, reuse, and preservation of hydrospatial data. MSDIs provide timely access to data from public and private organizations of marine related disciplines such as hydrography, oceanography, meteorology and maritime economic sectors, to be used for applications such as the safety of navigation, aquatic and marine activities, economic development, security and defence, scientific research, and marine ecosystems sustainability. This chapter discusses the main pillars of a MSDI, its importance for facilitating public processes such as Marine Spatial Planning and Coastal Zone Management, the wide range of stakeholders, implementation challenges, and future developments, such as the FAIR design principles, new data sources and services.

DM-57 - Metadata
  • Define “metadata” in the context of the geospatial data set
  • Use a metadata utility to create a geospatial metadata document for a digital database you created
  • Formulate metadata for a graphic output that would be distributed to the general public
  • Formulate metadata for a geostatistical analysis that would be released to an experienced audience
  • Compose data integrity statements for a geostatistical or spatial analysis to be included in graphic output
  • Identify software tools available to support metadata creation
  • Interpret the elements of an existing metadata document
  • Explain why metadata production should be integrated into the data production and database development workflows, rather than treated as an ancillary activity
  • Outline the elements of the U.S. geospatial metadata standard
  • Explain the ways in which metadata increases the value of geospatial data
DM-21 - Modeling three-dimensional (3-D) entities
  • Identify GIS application domains in which true 3-D models of natural phenomena are necessary
  • Illustrate the use of Virtual Reality Modeling Language (VRML) to model landscapes in 3-D
  • Explain how octatrees are the 3-D extension of quadtrees
  • Explain how voxels and stack-unit maps that show the topography of a series of geologic layers might be considered 3-D extensions of field and vector representations respectively
  • Explain how 3-D models can be extended to additional dimensions
  • Explain the use of multi-patching to represent 3-D objects
  • Explain the difficulties in creating true 3-D objects in a vector or raster format
  • Differentiate between 21/2-D representations and true 3-D models
DM-67 - NoSQL Databases

NoSQL databases are open-source, schema-less, horizontally scalable and high-performance databases. These characteristics make them very different from relational databases, the traditional choice for spatial data. The four types of data stores in NoSQL databases (key-value store, document store, column store, and graph store) contribute to significant flexibility for a range of applications. NoSQL databases are well suited to handle typical challenges of big data, including volume, variety, and velocity. For these reasons, they are increasingly adopted by private industries and used in research. They have gained tremendous popularity in the last decade due to their ability to manage unstructured data (e.g. social media data).

DM-04 - Object-oriented DBMS
  • Describe the basic elements of the object-oriented paradigm, such as inheritance, encapsulation, methods, and composition
  • Evaluate the degree to which the object-oriented paradigm does or does not approximate cognitive structures
  • Explain how the principle of inheritance can be implemented using an object-oriented programming approach
  • Defend or refute the notion that the Extensible Markup Language (XML) is a form of object-oriented database
  • Explain how the properties of object orientation allows for combining and generalizing objects
  • Evaluate the advantages and disadvantages of object-oriented databases compared to relational databases, focusing on representational power, data entry, storage efficiency, and query performance
  • Implement a GIS database design in an off-the-shelf, object-oriented database
  • Differentiate between object-oriented programming and object-oriented databases
DM-80 - Ontology for Geospatial Semantic Interoperability

It is difficult to share and reuse geospatial data and retrieve geospatial information because of geospatial data heterogeneity problems. Lack of semantic interoperability is one of the major problems facing GIS (Geographic Information Science/System) systems and applications today. To solve geospatial data heterogeneity problems and support geospatial information retrieval and semantic interoperability over the Web, the use of an ontology is proposed because it is a formal explicit description of concepts or meanings of words in a well-defined and unambiguous manner. Geospatial ontologies represent geospatial concepts and properties for use over the Web. OWL (Ontology Web Language) is an emerging language for defining and instantiating ontologies. OWL builds on RDF (Resource Description Framework) but adds more vocabulary for describing properties and classes. The downside of representing structured geospatial data in OWL and RDF languages is that it can result in inefficient data access. SPARQL (Simple Protocol and RDF Query Language) is recommended for general RDF query while the GeoSPARQL (Geographic Simple Protocol and RDF Query Language) protocol is proposed as an extension of SPARQL for querying geospatial data. However, the runtime cost of GeoSPARQL queries can be high due to the fine-grained nature of RDF data models. There are several challenges to using ontologies for geospatial semantic interoperability but these can be overcome through collaboration.

DM-36 - Physical Data Models

Constructs within a particular implementation of database management software guide the development of a physical data model, which is a product of a physical database design process. A physical data model documents how data are to be stored and accessed on storage media of computer hardware.  A physical data model is dependent on specific data types and indexing mechanisms used within database management system software.  Data types such as integers, reals, character strings, plus many others can lead to different storage structures. Indexing mechanisms such as region-trees and hash functions and others lead to differences in access performance.  Physical data modeling choices about data types and indexing mechanisms related to storage structures refine details of a physical database design. Data types associated with field, record and file storage structures together with the access mechanisms to those structures foster (constrain) performance of a database design. Since all software runs using an operating system, field, record, and file storage structures must be translated into operating system constructs to be implemented.  As such, all storage structures are contingent on the operating system and particular hardware that host data management software. 

DM-48 - Plane coordinate systems
  • Explain why plane coordinates are sometimes preferable to geographic coordinates
  • Identify the map projection(s) upon which UTM coordinate systems are based, and explain the relationship between the projection(s) and the coordinate system grid
  • Discuss the magnitude and cause of error associated with UTM coordinates
  • Differentiate the characteristics and uses of the UTM coordinate system from the Military Grid Reference System (MGRS) and the World Geographic Reference System (GEOREF)
  • Explain what State Plane Coordinates system (SPC) eastings and northings represent
  • Associate SPC coordinates and zone specifications with corresponding positions on a U.S. map or globe
  • Identify the map projection(s) upon which SPC coordinate systems are based, and explain the relationship between the projection(s) and the coordinate system grids
  • Discuss the magnitude and cause of error associated with SPC coordinates
  • Recommend the most appropriate plane coordinate system for applications at different spatial extents and justify the recommendation
  • Critique the U.S. Geological Survey’s choice of UTM as the standard coordinate system for the U.S. National Map
  • Describe the characteristics of the “national grids” of countries other than the U.S.
  • Explain what Universal Transverse Mercator (UTM) eastings and northings represent
  • Associate UTM coordinates and zone specifications with corresponding position on a world map or globe
DM-85 - Point, Line, and Area Generalization

Generalization is an important and unavoidable part of making maps because geographic features cannot be represented on a map without undergoing transformation. Maps abstract and portray features using vector (i.e. points, lines and polygons) and raster (i.e pixels) spatial primitives which are usually labeled. These spatial primitives are subjected to further generalization when map scale is changed. Generalization is a contradictory process. On one hand, it alters the look and feel of a map to improve overall user experience especially regarding map reading and interpretive analysis. On the other hand, generalization has documented quality implications and can sacrifice feature detail, dimensions, positions or topological relationships. A variety of techniques are used in generalization and these include selection, simplification, displacement, exaggeration and classification. The techniques are automated through computer algorithms such as Douglas-Peucker and Visvalingam-Whyatt in order to enhance their operational efficiency and create consistent generalization results. As maps are now created easily and quickly, and used widely by both experts and non-experts owing to major advances in IT, it is increasingly important for virtually everyone to appreciate the circumstances, techniques and outcomes of generalizing maps. This is critical to promoting better map design and production as well as socially appropriate uses.

DM-70 - Problems of Large Spatial Databases

Large spatial databases often labeled as geospatial big data exceed the capacity of commonly used computing systems as a result of data volume, variety, velocity, and veracity. Additional problems also labeled with V’s are cited, but the four primary ones are the most problematic and focus of this chapter (Li et al., 2016, Panimalar et al., 2017).  Sources include satellites, aircraft and drone platforms, vehicles, geosocial networking services, mobile devices, and cameras. The problems in processing these data to extract useful information include query, analysis, and visualization. Data mining techniques and machine learning algorithms, such as deep convolutional neural networks, often are used with geospatial big data. The obvious problem is handling the large data volumes, particularly for input and output operations, requiring parallel read and write of the data, as well as high speed computers, disk services, and network transfer speeds. Additional problems of large spatial databases include the variety and heterogeneity of data requiring advanced algorithms to handle different data types and characteristics, and integration with other data. The velocity at which the data are acquired is a challenge, especially using today’s advanced sensors and the Internet of Things that includes millions of devices creating data on short temporal scales of micro seconds to minutes. Finally, the veracity, or truthfulness of large spatial databases is difficult to establish and validate, particularly for all data elements in the database.

Pages