Data Management

Data management involves the theories and techniques for managing the entire data lifecycle, from data collection to data format conversion, from data storage to data sharing and retrieval, to data provenance, data quality control and data curation for long-term data archival and preservation.

Topics in this Knowledge Area are listed thematically below. Existing topics are in regular font and linked directly to their original entries (published in 2006; these contain only Learning Objectives). Entries that have been updated and expanded are in bold. Forthcoming, future topics are italicized

 

Spatial Databases Spatial Access Methods Georeferencing Systems
Spatial Database Management Systems Data Retrieval Strategies Earth's Shape, Sea Level, and the Geoid
Use of Relational DBMSs Spatial Indexing Geographic Coordinate Systems
Object-Oriented DBMSs Space-driven Structures: Grid, linear quadtree, and z-ordering tree files Planar Coordinate Systems
Relational DBMS and their Spatial Extensions Data-driven structures: R-trees and cost models Tesselated Referencing Systems
Topological Relationships Modeling Unstructured Spatial Data Linear Referencing
Database Administration Modeling Semi-Structured Spatial Data Vertical (Geopotential) Datums
Conceptual Data Models   Horizontal (Geometric) Datums
Logical Data Models Query Processing Georegistration
Physical Data Models Optimal I/O Algorithms Map Projections
NoSQL Databases Spatial Joins  
Problems with Large Spatial Databases Complex Queries Data Manipulation
Array Databases Spatial Data Infrastructures Point, Line, and Area Generalization
Representations of Spatial Objects Metadata Vector-to-Raster and Raster-to-Vector Conversions
Events and Processes Content Standards Raster Resampling
Raster Data Models Data Warehouses Coordinate Transformations
Vector Data Models Spatial Data Infrastructures Transaction Management
Topological Models U.S. National Spatial Data Infrastructure  
Network Models Ontology for Geospatial Semantic Interoperability  
Entity-based Models Marine Spatial Data Infrastructure  
Modeling 3D Entities Hydrographic Geospatial Data Standards  
Fields in Space and Time    
Fuzzy Models    
Triangular Irregular Network (TIN) Models    
Genealogical Relationships, Linkage, and Inheritance    
Geospatial Data Conflation    

 

DM-87 - Raster resampling
  • Evaluate methods used by contemporary GIS software to resample raster data on-the-fly during display
  • Select appropriate interpolation techniques to resample particular types of values in raster data (e.g., nominal using nearest neighbor)
  • Resample multiple raster data sets to a single resolution to enable overlay
  • Resample raster data sets (e.g., terrain, satellite imagery) to a resolution appropriate for a map of a particular scale
  • Discuss the consequences of increasing and decreasing resolution
DM-03 - Relational DBMS and their Spatial Extensions

The relational Database Management System (DBMS) is widely used in modern business systems. Entities and relationships from a data model are presented as relational tables. To store data in a relational database, a relation schema should be defined to specify the design and structure of relations. The schema design generally uses database normalization to reduce data redundancy and maintain data integrity. Users can retrieve and manage data in a relational database using Structured Query Language (SQL). To make spatial data fit the relational model, spatial vector geometry or raster data type can be customized by extending basic data types in relational databases. This further helps derive the so-called spatial object-relational DBMS, by manipulating vector geometry and/or raster data types as spatial objects using SQL queries. The performance of queries is improved by adding spatial indexes in relational databases.

DM-60 - Spatial Data Infrastructures

Spatial data infrastructure (SDI) is the infrastructure that facilitates the discovery, access, management, distribution, reuse, and preservation of digital geospatial resources. These resources may include maps, data, geospatial services, and tools. As cyberinfrastructures, SDIs are similar to other infrastructures, such as water supplies and transportation networks, since they play fundamental roles in many aspects of the society. These roles have become even more significant in today’s big data age, when a large volume of geospatial data and Web services are available. From a technological perspective, SDIs mainly consist of data, hardware, and software. However, a truly functional SDI also needs the efforts of people, supports from organizations, government policies, data and software standards, and many others. In this chapter, we will present the concepts and values of SDIs, as well as a brief history of SDI development in the U.S. We will also discuss the components of a typical SDI, and will specifically focus on three key components: geoportals, metadata, and search functions. Examples of the existing SDI implementations will also be discussed.  

DM-01 - Spatial Database Management Systems

A spatial database management system (SDBMS) is an extension, some might say specialization, of a conventional database management system (DBMS).  Every DBMS (hence SDBMS) uses a data model specification as a formalism for software design, and establishing rigor in data management.  Three components compose a data model, 1) constructs developed using data types which form data structures that describe data, 2) operations that process data structures that manipulate data, and 3) rules that establish the veracity of the structures and/or operations for validating data.  Basic data types such as integers and/or real numbers are extended into spatial data types such as points, polylines and polygons in spatial data structures.  Operations constitute capabilities that manipulate the data structures, and as such when sequenced into operational workflows in specific ways generate information from data; one might say that new relationships constitute the information from data.  Different data model designs result in different combinations of structures, operations, and rules, which combine into various SDBMS products.  The products differ based upon the underlying data model, and these data models enable and constrain the ability to store and manipulate data. Different SDBMS implementations support configurations for different user environments, including single-user and multi-user environments.  

DM-66 - Spatial Indexing

A spatial index is a data structure that allows for accessing a spatial object efficiently. It is a common technique used by spatial databases.  Without indexing, any search for a feature would require a "sequential scan" of every record in the database, resulting in much longer processing time. In a spatial index construction process, the minimum bounding rectangle serves as an object approximation. Various types of spatial indices across commercial and open-source databases yield measurable performance differences. Spatial indexing techniques are playing a central role in time-critical applications and the manipulation of spatial big data.

DM-77 - Spatial Joins

The measuring (or query) of the relationship between spatial features is of particular utility within a GIS. A spatial join combines represented geographic objects and their associated attributes based on a spatial relationship test (or predicate). The method of spatial join operation utilized depends on the relationship between the features represented and how those features are represented in the GIS.  Regardless of the software implementation, the spatial join operation results are predicated on a test condition such as adjacency, proximity, or topology comparison among represented geographic data. This topic discusses how spatial join operations can be utilized for different geographic problems.

DM-18 - Spatio-temporal GIS
  • Describe extensions to relational DBMS to represent temporal change in attributes
  • Evaluate the advantages and disadvantages of existing space-time models based on storage efficiency, query performance, ease of data entry, and ability to implement in existing software
  • Create a GIS database that models temporal information
  • Utilize two different space-time models to characterize a given scenario, such as a daily commute
  • Describe the architecture of data models (both field and object based) to represent spatio-temporal phenomena
  • Differentiate the two types of temporal information to be modeled in databases: database (or transaction) time and valid (or world) time
  • Identify whether it is important to represent temporal change in a particular GIS application
  • Describe SQL extensions for querying temporal change
DM-49 - Tessellated referencing systems
  • Explain the concept “quadtree”
  • Describe the octahedral quarternary triangulated mesh georeferencing system proposed by Dutton
  • Discuss the advantages of hierarchical coordinates relative to geographic and plane coordinate systems
DM-09 - The hexagonal model
  • Illustrate the hexagonal model
  • Explain the limitations of the grid model compared to the hexagonal model
  • Exemplify the uses (past and potential) of the hexagonal model
DM-15 - The network model
  • Define the following terms pertaining to a network: Loops, multiple edges, the degree of a vertex, walk, trail, path, cycle, fundamental cycle
  • List definitions of networks that apply to specific applications or industries
  • Create an adjacency table from a sample network
  • Explain how a graph can be written as an adjacency matrix and how this can be used to calculate topological shortest paths in the graph
  • Create an incidence matrix from a sample network
  • Explain how a graph (network) may be directed or undirected
  • Demonstrate how attributes of networks can be used to represent cost, time, distance, or many other measures
  • Demonstrate how the star (or forward star) data structure, which is often employed when digitally storing network information, violates relational normal form, but allows for much faster search and retrieval in network databases
  • Discuss some of the difficulties of applying the standard process-pattern concept to lines and networks
  • Demonstrate how a network is a connected set of edges and vertices

Pages