Data Management

Data management involves the theories and techniques for managing the entire data lifecycle, from data collection to data format conversion, from data storage to data sharing and retrieval, to data provenance, data quality control and data curation for long-term data archival and preservation.

Topics in this Knowledge Area are listed thematically below. Existing topics are in regular font and linked directly to their original entries (published in 2006; these contain only Learning Objectives). Entries that have been updated and expanded are in bold. Forthcoming, future topics are italicized.

Spatial Databases	Spatial Access Methods	Georeferencing Systems
Spatial Database Management Systems	Data Retrieval Strategies	Earth's Shape, Sea Level, and the Geoid
Use of Relational DBMSs	Spatial Indexing	Geographic Coordinate Systems
Object-Oriented DBMSs	Space-driven Structures: Grid, linear quadtree, and z-ordering tree files	Planar Coordinate Systems
Relational DBMS and their Spatial Extensions	Data-driven structures: R-trees and cost models	Tesselated Referencing Systems
Topological Relationships	Modeling Unstructured Spatial Data	Linear Referencing
Database Administration	Modeling Semi-Structured Spatial Data	Vertical (Geopotential) Datums
Conceptual Data Models		Horizontal (Geometric) Datums
Logical Data Models	Query Processing	Georegistration
Physical Data Models	Optimal I/O Algorithms	Map Projections
NoSQL Databases	Spatial Joins
Problems with Large Spatial Databases	Complex Queries	Data Manipulation
Array Databases	Spatial Data Infrastructures	Point, Line, and Area Generalization
Representations of Spatial Objects	Metadata	Vector-to-Raster and Raster-to-Vector Conversions
Events and Processes	Content Standards	Raster Resampling
Raster Data Models	Data Warehouses	Coordinate Transformations
Vector Data Models	Spatial Data Infrastructures	Transaction Management
Topological Models	U.S. National Spatial Data Infrastructure
Network Models	Ontology for Geospatial Semantic Interoperability
Entity-based Models	Marine Spatial Data Infrastructure
Modeling 3D Entities	Hydrographic Geospatial Data Standards
Fields in Space and Time
Fuzzy Models
Triangular Irregular Network (TIN) Models
Genealogical Relationships, Linkage, and Inheritance
Geospatial Data Conflation

DM-81 - Array Databases

Array Databases are a class of No-SQL databases that store, manage, and analyze data whose natural structures are arrays. With the growth of large volumes of spatial data (i.e., satellite imagery) there is a pressing need to have new ways to store and manipulate array data. Currently, there are several databases and platforms that have extended their initial architectures to support for multidimensional arrays. However, extending a platform to support a multidimensional array comes at a performance cost, when compared to Array Databases who specialize in the storage, retrieval, and processing of n-dimensional data.

DM-14 - Classic vector data models

Illustrate the GBF/DIME data model
Describe a Freeman-Huffman chain code
Describe the relationship of Freeman-Huffman chain codes to the raster model
Discuss the impact of early prototype data models (e.g., POLYVRT and GBF/DIME) on contemporary vector formats
Describe the relationship between the GBF/DIME and TIGER structures, the rationale for their design, and their intended primary uses, paying particular attention to the role of graph theory in establishing the difference between GBF/DIME and TIGER files
Discuss the advantages and disadvantages of POLYVRT
Explain what makes POLYVRT a hierarchical vector data model

DM-34 - Conceptual Data Models

Within an initial phase of database design, a conceptual data model is created as a technology-independent specification of the data to be stored within a database. This specification often times takes the form of a formalized diagram. The process of conceptual data modeling is meant to foster shared understanding among data modelers and stakeholders when creating the specification. As such, a conceptual data model should be easily readable by people with little or no technical-computer-based expertise because a comprehensive view of information is more important than a detailed view. In a conceptual data model, entity classes are categories of things (person, place, thing, etc.) that have attributes for describing the characteristics of the things. Relationships can exist between the entity classes. Entity-relationship diagrams have been and are likely to continue to be a popular way of characterizing entity classes, attributes and relationships. Various notations for diagrams have been used over the years. The main intent about a conceptual data model and its corresponding entity-relationship diagram is that they should highlight the content and meaning of data within stakeholder information contexts, while postponing the specification of logical structure to the second phase of database design called logical data modeling.

DM-58 - Content standards

Differentiate between a controlled vocabulary and an ontology
Describe a domain ontology or vocabulary (i.e., land use classification systems, surveyor codes, data dictionaries, place names, or benthic habitat classification system)
Describe how a domain ontology or vocabulary facilitates data sharing
Define “thesaurus” as it pertains to geospatial metadata
Describe the primary focus of the following content standards: FGDC, Dublin Core Metadata Initiative, and ISO 19115
Differentiate between a content standard and a profile
Describe some of the profiles created for the Content Standard for Digital Geospatial Metadata (CSDGM)

DM-88 - Coordinate Transformations

Coordinate transformations are needed to align multiple GIS datasets to one coordinate system when they use multiple coordinate systems. To transform coordinates, the properties of the source and target coordinate systems such as datums, projection methods, and their measurement origins and units should be identified carefully. Implemented in most GIS software and GIS data viewers, the on-the-fly projection technology projects GIS datasets automatically without the need for manual coordinate transformations by users. The coordinate transformation mechanisms for vector and raster datasets are different because the raster datasets require pixel value resampling during coordinate transformations. As a case study, eight GIS datasets were downloaded from multiple websites and were reprojected to a coordinate system in QGIS.

DM-02 - Data retrieval strategies

Analyze the relative performance of data retrieval strategies
Implement algorithms that retrieve geospatial data from a range of data structures
Describe the particular advantages of Morton addressing relative to geographic data representation
Discuss the advantages and disadvantages of different data structures (e.g., arrays, linked lists, binary trees, hash tables, indexes) for retrieving geospatial data
Compare and contrast direct and indirect access search and retrieval methods

DM-59 - Data warehouses

Differentiate between a data warehouse and a database
Describe the functions that gazetteers support
Differentiate the retrieval mechanisms of data warehouses and databases
Discuss the appropriate use of a data warehouse versus a database

DM-62 - Database Administration

Organizations with a responsibility for maintaining large-scale, multi-user spatial databases often turn to server-based relational database management systems to achieve their goals. The administration of such databases has many dimensions. Industry standards in the areas of data storage and services should be researched and applied to ensure a sound, comprehensive database design as well as to promote interoperability with external entities. Data validation tools should be implemented to improve the accuracy and efficiency of data maintenance activities. Metadata should be maintained according to industry standards to protect the organization’s investment in data and to increase the likelihood of the data being located by clearinghouse and portal search tools. Database security strategies can prevent unauthorized access to data and lessen the chances of data loss due to accidental data corruption. Database performance should be monitored and strategies implemented to ensure that data can be retrieved from the system with acceptable response times. Finally, trends in the field such as the increasing need to manage large volumes of data call for spatial database managers to be knowledgeable of non-relational data models as well, such as NoSQL data models.

DM-44 - Earth's Shape, Sea Level, and the Geoid

C. F. Gauss set the modern definition of the shape of the Earth, being described as the shape the oceans would adopt if they were entirely unperturbed and, thus, placid—a surface now called the geoid. This surface cannot be observed directly because the oceans have waves, tides, currents, and other perturbations. Nonetheless, the geoid is the ideal datum for heights, and the science of determining the location of the geoid for practical purposes is the topic of physical geodesy. The geoid is the central concept that ties together what the various kinds of height mean, how they are measured, and how they are inter-related.

DM-20 - Entity-based Models

As we translate real world phenomena into data structures that we can store in a computer, we must determine the most appropriate spatial representation and how it relates to the characteristics of such a phenomenon. All spatial representations are derivatives of graph theory and should therefore be described in such terms. This then helps to understand the principles of low-level GIS operations. A constraint-driven approach allows the reader to evaluate implementations of the geo-relational principle in terms of the hierarchical level of mathematical space adopted.

Search form

Pages