data mining

DM-70 - Problems of Large Spatial Databases

Large spatial databases often labeled as geospatial big data exceed the capacity of commonly used computing systems as a result of data volume, variety, velocity, and veracity. Additional problems also labeled with V’s are cited, but the four primary ones are the most problematic and focus of this chapter (Li et al., 2016, Panimalar et al., 2017).  Sources include satellites, aircraft and drone platforms, vehicles, geosocial networking services, mobile devices, and cameras. The problems in processing these data to extract useful information include query, analysis, and visualization. Data mining techniques and machine learning algorithms, such as deep convolutional neural networks, often are used with geospatial big data. The obvious problem is handling the large data volumes, particularly for input and output operations, requiring parallel read and write of the data, as well as high speed computers, disk services, and network transfer speeds. Additional problems of large spatial databases include the variety and heterogeneity of data requiring advanced algorithms to handle different data types and characteristics, and integration with other data. The velocity at which the data are acquired is a challenge, especially using today’s advanced sensors and the Internet of Things that includes millions of devices creating data on short temporal scales of micro seconds to minutes. Finally, the veracity, or truthfulness of large spatial databases is difficult to establish and validate, particularly for all data elements in the database.

AM-38 - Pattern recognition
  • Differentiate among machine learning, data mining, and pattern recognition
  • Explain the principles of pattern recognition
  • Apply a simple spatial mean filter to an image as a means of recognizing patterns
  • Construct an edge-recognition filter
  • Design a simple spatial mean filter
  • Explain the outcome of an artificial intelligence analysis (e.g., edge recognition), including a discussion of what the human did not see that the computer identified and vice versa
AM-37 - Knowledge discovery
  • Explain how spatial data mining techniques can be used for knowledge discovery
  • Explain how a Bayesian framework can incorporate expert knowledge in order to retrieve all relevant datasets given an initial user query
  • Explain how visual data exploration can be combined with data mining techniques as a means of discovering research hypotheses in large spatial datasets
AM-36 - Data mining approaches
  • Describe how data mining can be used for geospatial intelligence
  • Explain how the analytical reasoning techniques, visual representations, and interaction techniques that make up the domain of visual analytics have a strong spatial component
  • Demonstrate how cluster analysis can be used as a data mining tool
  • Interpret patterns in space and time using Dorling and Openshaw’s geographical analysis machine (GAM) demonstration of disease incidence diffusion
  • Differentiate between data mining approaches used for spatial and non-spatial applications
  • Explain how spatial statistics techniques are used in spatial data mining
  • Compare and contrast the primary types of data mining: summarization/characterization, clustering/categorization, feature extraction, and rule/relationships extraction
DM-70 - Problems of Large Spatial Databases

Large spatial databases often labeled as geospatial big data exceed the capacity of commonly used computing systems as a result of data volume, variety, velocity, and veracity. Additional problems also labeled with V’s are cited, but the four primary ones are the most problematic and focus of this chapter (Li et al., 2016, Panimalar et al., 2017).  Sources include satellites, aircraft and drone platforms, vehicles, geosocial networking services, mobile devices, and cameras. The problems in processing these data to extract useful information include query, analysis, and visualization. Data mining techniques and machine learning algorithms, such as deep convolutional neural networks, often are used with geospatial big data. The obvious problem is handling the large data volumes, particularly for input and output operations, requiring parallel read and write of the data, as well as high speed computers, disk services, and network transfer speeds. Additional problems of large spatial databases include the variety and heterogeneity of data requiring advanced algorithms to handle different data types and characteristics, and integration with other data. The velocity at which the data are acquired is a challenge, especially using today’s advanced sensors and the Internet of Things that includes millions of devices creating data on short temporal scales of micro seconds to minutes. Finally, the veracity, or truthfulness of large spatial databases is difficult to establish and validate, particularly for all data elements in the database.

AM-38 - Pattern recognition
  • Differentiate among machine learning, data mining, and pattern recognition
  • Explain the principles of pattern recognition
  • Apply a simple spatial mean filter to an image as a means of recognizing patterns
  • Construct an edge-recognition filter
  • Design a simple spatial mean filter
  • Explain the outcome of an artificial intelligence analysis (e.g., edge recognition), including a discussion of what the human did not see that the computer identified and vice versa
AM-37 - Knowledge discovery
  • Explain how spatial data mining techniques can be used for knowledge discovery
  • Explain how a Bayesian framework can incorporate expert knowledge in order to retrieve all relevant datasets given an initial user query
  • Explain how visual data exploration can be combined with data mining techniques as a means of discovering research hypotheses in large spatial datasets
AM-36 - Data mining approaches
  • Describe how data mining can be used for geospatial intelligence
  • Explain how the analytical reasoning techniques, visual representations, and interaction techniques that make up the domain of visual analytics have a strong spatial component
  • Demonstrate how cluster analysis can be used as a data mining tool
  • Interpret patterns in space and time using Dorling and Openshaw’s geographical analysis machine (GAM) demonstration of disease incidence diffusion
  • Differentiate between data mining approaches used for spatial and non-spatial applications
  • Explain how spatial statistics techniques are used in spatial data mining
  • Compare and contrast the primary types of data mining: summarization/characterization, clustering/categorization, feature extraction, and rule/relationships extraction
DM-70 - Problems of large spatial databases
  • Describe emerging geographical analysis techniques in geocomputation derived from artificial intelligence (e.g., expert systems, artificial neural networks, genetic algorithms, and software agents)
  • Explain how to recognize contaminated data in large datasets
  • Outline the implications of complexity for the application of statistical ideas in geography
  • Explain what is meant by the term “contaminated data,” suggesting how it can arise
  • Describe difficulties in dealing with large spatial databases, especially those arising from spatial heterogeneity
AM-36 - Data mining approaches
  • Describe how data mining can be used for geospatial intelligence
  • Explain how the analytical reasoning techniques, visual representations, and interaction techniques that make up the domain of visual analytics have a strong spatial component
  • Demonstrate how cluster analysis can be used as a data mining tool
  • Interpret patterns in space and time using Dorling and Openshaw’s geographical analysis machine (GAM) demonstration of disease incidence diffusion
  • Differentiate between data mining approaches used for spatial and non-spatial applications
  • Explain how spatial statistics techniques are used in spatial data mining
  • Compare and contrast the primary types of data mining: summarization/characterization, clustering/categorization, feature extraction, and rule/relationships extraction

Pages