Data Management

Data management involves the theories and techniques for managing the entire data lifecycle, from data collection to data format conversion, from data storage to data sharing and retrieval, to data provenance, data quality control and data curation for long-term data archival and preservation.

Topics in this Knowledge Area are listed thematically below. Existing topics are in regular font and linked directly to their original entries (published in 2006; these contain only Learning Objectives). Entries that have been updated and expanded are in bold. Forthcoming, future topics are italicized

 

Spatial Databases Spatial Access Methods Georeferencing Systems
Spatial Database Management Systems Data Retrieval Strategies Earth's Shape, Sea Level, and the Geoid
Use of Relational DBMSs Spatial Indexing Geographic Coordinate Systems
Object-Oriented DBMSs Space-driven Structures: Grid, linear quadtree, and z-ordering tree files Planar Coordinate Systems
Relational DBMS and their Spatial Extensions Data-driven structures: R-trees and cost models Tesselated Referencing Systems
Topological Relationships Modeling Unstructured Spatial Data Linear Referencing
Database Administration Modeling Semi-Structured Spatial Data Vertical (Geopotential) Datums
Conceptual Data Models   Horizontal (Geometric) Datums
Logical Data Models Query Processing Georegistration
Physical Data Models Optimal I/O Algorithms Map Projections
NoSQL Databases Spatial Joins  
Problems with Large Spatial Databases Complex Queries Data Manipulation
Array Databases Spatial Data Infrastructures Point, Line, and Area Generalization
Representations of Spatial Objects Metadata Vector-to-Raster and Raster-to-Vector Conversions
Events and Processes Content Standards Raster Resampling
Raster Data Models Data Warehouses Coordinate Transformations
Vector Data Models Spatial Data Infrastructures Transaction Management
Topological Models U.S. National Spatial Data Infrastructure  
Network Models Ontology for Geospatial Semantic Interoperability  
Entity-based Models Marine Spatial Data Infrastructure  
Modeling 3D Entities Hydrographic Geospatial Data Standards  
Fields in Space and Time    
Fuzzy Models    
Triangular Irregular Network (TIN) Models    
Genealogical Relationships, Linkage, and Inheritance    
Geospatial Data Conflation    

 

DM-81 - Array Databases

Array Databases are a class of No-SQL databases that store, manage, and analyze data whose natural structures are arrays. With the growth of large volumes of spatial data (i.e., satellite imagery) there is a pressing need to have new ways to store and manipulate array data. Currently, there are several databases and platforms that have extended their initial architectures to support for multidimensional arrays. However, extending a platform to support a multidimensional array comes at a performance cost, when compared to Array Databases who specialize in the storage, retrieval, and processing of n-dimensional data.

DM-14 - Classic vector data models
  • Illustrate the GBF/DIME data model
  • Describe a Freeman-Huffman chain code
  • Describe the relationship of Freeman-Huffman chain codes to the raster model
  • Discuss the impact of early prototype data models (e.g., POLYVRT and GBF/DIME) on contemporary vector formats
  • Describe the relationship between the GBF/DIME and TIGER structures, the rationale for their design, and their intended primary uses, paying particular attention to the role of graph theory in establishing the difference between GBF/DIME and TIGER files
  • Discuss the advantages and disadvantages of POLYVRT
  • Explain what makes POLYVRT a hierarchical vector data model
DM-34 - Conceptual Data Models

Within an initial phase of database design, a conceptual data model is created as a technology-independent specification of the data to be stored within a database. This specification often times takes the form of a formalized diagram.  The process of conceptual data modeling is meant to foster shared understanding among data modelers and stakeholders when creating the specification.  As such, a conceptual data model should be easily readable by people with little or no technical-computer-based expertise because a comprehensive view of information is more important than a detailed view. In a conceptual data model, entity classes are categories of things (person, place, thing, etc.) that have attributes for describing the characteristics of the things.  Relationships can exist between the entity classes.  Entity-relationship diagrams have been and are likely to continue to be a popular way of characterizing entity classes, attributes and relationships.  Various notations for diagrams have been used over the years. The main intent about a conceptual data model and its corresponding entity-relationship diagram is that they should highlight the content and meaning of data within stakeholder information contexts, while postponing the specification of logical structure to the second phase of database design called logical data modeling. 

DM-58 - Content standards
  • Differentiate between a controlled vocabulary and an ontology
  • Describe a domain ontology or vocabulary (i.e., land use classification systems, surveyor codes, data dictionaries, place names, or benthic habitat classification system)
  • Describe how a domain ontology or vocabulary facilitates data sharing
  • Define “thesaurus” as it pertains to geospatial metadata
  • Describe the primary focus of the following content standards: FGDC, Dublin Core Metadata Initiative, and ISO 19115
  • Differentiate between a content standard and a profile
  • Describe some of the profiles created for the Content Standard for Digital Geospatial Metadata (CSDGM)
DM-88 - Coordinate Transformations

Coordinate transformations are needed to align multiple GIS datasets to one coordinate system when they use multiple coordinate systems. To transform coordinates, the properties of the source and target coordinate systems such as datums, projection methods, and their measurement origins and units should be identified carefully. Implemented in most GIS software and GIS data viewers, the on-the-fly projection technology projects GIS datasets automatically without the need for manual coordinate transformations by users. The coordinate transformation mechanisms for vector and raster datasets are different because the raster datasets require pixel value resampling during coordinate transformations. As a case study, eight GIS datasets were downloaded from multiple websites and were reprojected to a coordinate system in QGIS.

DM-02 - Data retrieval strategies
  • Analyze the relative performance of data retrieval strategies
  • Implement algorithms that retrieve geospatial data from a range of data structures
  • Describe the particular advantages of Morton addressing relative to geographic data representation
  • Discuss the advantages and disadvantages of different data structures (e.g., arrays, linked lists, binary trees, hash tables, indexes) for retrieving geospatial data
  • Compare and contrast direct and indirect access search and retrieval methods
DM-59 - Data warehouses
  • Differentiate between a data warehouse and a database
  • Describe the functions that gazetteers support
  • Differentiate the retrieval mechanisms of data warehouses and databases
  • Discuss the appropriate use of a data warehouse versus a database
DM-62 - Database Administration

Organizations with a responsibility for maintaining large-scale, multi-user spatial databases often turn to server-based relational database management systems to achieve their goals.  The administration of such databases has many dimensions.  Industry standards in the areas of data storage and services should be researched and applied to ensure a sound, comprehensive database design as well as to promote interoperability with external entities.  Data validation tools should be implemented to improve the accuracy and efficiency of data maintenance activities.  Metadata should be maintained according to industry standards to protect the organization’s investment in data and to increase the likelihood of the data being located by clearinghouse and portal search tools.  Database security strategies can prevent unauthorized access to data and lessen the chances of data loss due to accidental data corruption.  Database performance should be monitored and strategies implemented to ensure that data can be retrieved from the system with acceptable response times.  Finally, trends in the field such as the increasing need to manage large volumes of data call for spatial database managers to be knowledgeable of non-relational data models as well, such as NoSQL data models.

DM-44 - Earth's Shape, Sea Level, and the Geoid

C. F. Gauss set the modern definition of the shape of the Earth, being described as the shape the oceans would adopt if they were entirely unperturbed and, thus, placid—a surface now called the geoid.  This surface cannot be observed directly because the oceans have waves, tides, currents, and other perturbations. Nonetheless, the geoid is the ideal datum for heights, and the science of determining the location of the geoid for practical purposes is the topic of physical geodesy. The geoid is the central concept that ties together what the various kinds of height mean, how they are measured, and how they are inter-related.

DM-20 - Entity-based Models

As we translate real world phenomena into data structures that we can store in a computer, we must determine the most appropriate spatial representation and how it relates to the characteristics of such a phenomenon. All spatial representations are derivatives of graph theory and should therefore be described in such terms. This then helps to understand the principles of low-level GIS operations. A constraint-driven approach allows the reader to evaluate implementations of the geo-relational principle in terms of the hierarchical level of mathematical space adopted.

Pages