Generalization is an important and unavoidable part of making maps because geographic features cannot be represented on a map without undergoing transformation. Maps abstract and portray features using vector (i.e. points, lines and polygons) and raster (i.e pixels) spatial primitives which are usually labeled. These spatial primitives are subjected to further generalization when map scale is changed. Generalization is a contradictory process. On one hand, it alters the look and feel of a map to improve overall user experience especially regarding map reading and interpretive analysis. On the other hand, generalization has documented quality implications and can sacrifice feature detail, dimensions, positions or topological relationships. A variety of techniques are used in generalization and these include selection, simplification, displacement, exaggeration and classification. The techniques are automated through computer algorithms such as Douglas-Peucker and Visvalingam-Whyatt in order to enhance their operational efficiency and create consistent generalization results. As maps are now created easily and quickly, and used widely by both experts and non-experts owing to major advances in IT, it is increasingly important for virtually everyone to appreciate the circumstances, techniques and outcomes of generalizing maps. This is critical to promoting better map design and production as well as socially appropriate uses.
- Database Generalization vs. Cartographic Generalization
- Data Enrichment and Generalization
- Generalization Techniques for Point, Line, and Area Data
- Generalization Challenges, Evaluation, and Research Outlook
Cartography: Process of designing, making and using tangible maps.
Map use: Involves reading, analyzing and interpreting maps (Kimerling et al. 2012).
Generalization: Basically a process which modifies the information presented on a map or contained in a spatial database based on factors like purpose, users, uses and scale.
Database: A structured collection of digital spatial data for a given purpose.
Map scale: Provides the conversion factor between the dimensions of map and real world features.
Most information about geographic phenomena (e.g. buildings, railroads and administrative boundaries) is location-based, and cartographic maps are the most powerful and common way of displaying and sharing that information. Cartographic maps come in all shapes and sizes and are now readily available in digital format due to rapid developments and easy user access to IT. Digital technology renders maps highly manipulable enabling the user reading, analyzing or interpreting these graphical representations to generate in-depth ideas about the content (Kimerling et al. 2012). Maps are not reproductions of reality. In the real world, there is no reduction in detail, and many geographic things or features are substantially big and inherently complex. Designing and making maps that emulate this level of detail and complexity is basically not feasible or beneficial. Thus, maps abstract and represent select geographic phenomena to meet particular needs drawing upon vector (i.e. points, lines and polygons) and raster (i.e. pixels) spatial primitives that are usually labeled. Maps foster understanding of not only the locations and arrangements of geographic phenomena but also their interrelationships.
The reduction of real-world detail in creating maps results in well-documented changes in one or more properties like geometry, size, dimensionality, and distances or angles between features especially on flat maps. Scale is a concept that exists only on maps and influences and controls how much detail is presented to users. Scale defines the conversion factor between the dimensions of map and real-world features. Thus, maps produced at different scales generally show different amounts of information and levels of generalization. This is exemplified by a small-scale map of a country that essentially exhibits considerably more generalization than a large-scale map representing a city.
Generalization modifies (e.g. refines feature geometries) map content in a manner that is purposeful, measured and coordinated (Kraak and Omerling 2010). Generalization benefits from robust data models and operates on the primitives used to model geographic phenomena. In practice, generalization is a contradictory process that is prompted by several circumstances and factors, and results in various implications (MacMaster and Shea 1992). On one hand, generalization is crucial in delivering maps that offer high user experience. In particular, decreasing the amount and complexity of information helps to optimize the readability and interpretive analysis of maps (MacMaster and Shea 1992). This also assists in reducing the user’s cognitive load which is important since our eye-brain visual knowing system is, by itself, weak (Bunch and Lloyd 2006). On the other hand, feature details are lost albeit in a controlled way when generalization is implemented. This can potentially impact the breadth and depth of knowledge that can be gained from maps.
The benefits of generalization far outweigh the drawbacks, though, and this process remains an integral component of map making. Both experts and untrained individuals need to appreciate the main reasons for generalizing maps, how this process is implemented and the results of different generalization techniques. This is crucial particularly in today’s digital world where different kinds of maps are now created easily and efficiently, used and maintained by the masses through social media (e.g. Facebook), neogeographic tools (e.g. OpenStreetMap and Google Maps) and other mainstream technologies (Figure 1). Developing knowledge about how maps are created and generalized can potentially encourage appropriate and ethical uses.
Figure 1. A Google Maps navigation map commonly created by the public showing travel times for different modes of transportation. The interactive map includes information about road construction activities, the option to use a satellite image to provide locational context and access to detailed street level imagery. Source: author.
Another technology that has greatly advanced how maps are planned, produced and utilized in a digital context is GIS. GIS integrates comprehensive tools for building, managing, analyzing and generalizing map content. In fact, many cartographic maps are currently compiled from thematic data layers kept in GIS databases. The datasets are being collected and updated at incredible speeds such that storing, processing and maintaining current and historical information is now a challenging big data problem. Although this is increasingly changing, many users including private companies, government agencies and non-governmental organizations lack the requisite IT power to successfully handle big data, and find themselves in need of generalizing their databases.
Database generalization resembles cartographic (i.e. map) generalization in being a purpose-oriented process that attempts to carefully exclude non-essential elements from data about geographic phenomena. On one hand, database generalization entails computational thinking which, inter alia, involves abstracting and decomposing reality into several ‘things’ that are intelligible and manageable in digital technology (Weibel and Jones 1998; Weibel and Dutton 1999, Mackaness 2007). On the other hand, this process can be used to generate a new database that is smaller in size or lower in detail than the original without compromising the basic meaning of the content (Weibel and Dutton 1999). Small databases are generally better than their large counterparts because they take up less storage space, are more portable and easier and cheaper to maintain, and result in more efficient computational operations (e.g. retrieval and updating of data records).
Maps assembled from thematic data layers stored in GIS databases usually undergo additional generalization to create new and effective maps that may subsequently get stored and maintained in the databases. This implies that cartographic and database generalization processes complement each other in the production of maps. Of particular importance to the former is the overall look and feel of a map which can make or break user experience. The look and feel of a map is a function of many factors including the producer’s map design skills and his or her generalization experience with manual or automated techniques. The map medium is also an important consideration in cartographic generalization since single scale analog maps can have different design and generalization> specifications than zoomable digital maps which enable users to select map content and modify display scale in real-time.
There is an urgent need to produce up-to-date, accurate and readily usable maps in quick, cheap and easy ways. This requirement has partly motivated and enabled automated generalization. Commonly cited prerequisites for effective automated generalization include deep user- and computer-based understanding plus robust formalizations of geographic phenomena (Mackaness et al. 2014, Hu 2017). This understanding can be achieved in many ways including extracting (e.g. via ESRI’s GeoEnrichment web service synthesizing and analyzing ‘enriched’ datasets maintained in spatial databases. Here, enrichment describes the process of adding something that enhances one or more qualities of geographic data primarily to inform and support generalization. That ‘something’ can take various forms like variables (e.g. feature attributes), other geographic data (Figure 2), refinements in feature geometries, and explicitly modelled spatial and semantic contexts plus relationships between geographic phenomena (Neun 2007).
Figure 2. Rivers data enriched by draping on shaded relief can lead to improved understanding of stream drainage. Image sources: author.
Research on some of the concepts above has occurred under broad topics like geographic ontologies, semantics and cognition, and identified or produced several spatial data models, methods, rules and tools that can help enrich geographic data and facilitate automated generalization. One approach involves integrating geotagged social media especially tweets and photos to better capture the sense of place in spatial databases (Hu 2017). Analyzing these and other datasets can reveal the levels of social significance of different places (e.g. villages and cities) to consider when seeking to selectively include or omit specific places from a map as scale is altered. Another approach entails using object-oriented and hierarchical data model abstractions like aggregation and specialization to respectively model HAS-A and IS-A types of relations between geographic phenomena. This can aid, for example, in removing boundaries between contiguous polygons (e.g. counties) with the same attribute value (e.g. state).
There are also other relational concepts (e.g. spatial and temporal topology) that are valuable in building accurate context-aware databases. For example, predefining spatial topology rules like ‘area features must not overlap’ provides a useful tool for assessing and controlling the quality of automated generalization. Proximity methods based on Euclidean, Manhattan or Hausdorf distances can also be applied in real-time to determine the spatial context of geographic phenomena including identifying collocated points according to some predefined cluster tolerance. It is also important to recognize, relate and concurrently generalize the data layers of inextricably linked geographic phenomena to prevent erroneous outcomes such as misalignments between the courses of rivers and the ‘Vs’ in topographic contours (Mackaness et al. 2014).
There are many techniques for generalizing point, line and area data. These are identified and grouped (e.g. based on aspects of mapped geographic phenomena they modify) in several taxonomies that have been put forward, refined and extended over the years by various authors following improved knowledge, understanding and implementation technology for generalization (Foerster et al. 2007, Roth et al. 2011, Stanislawski et al. 2014). The taxonomies differ in such things as total numbers, terminology and definitions of techniques considered important (Roth et al. 2011). Below, we briefly discuss some of the most common techniques and note that they vary in purpose, driving factors, implementation approach, rate of use, outcomes and consequences.
There are no hard and fast rules about sequencing the techniques for generalizing cartographic maps (Regnauld and McMaster 2007). A reasonable and common starting point is selection which prompts map makers to carefully pick geographic features to map. The selection process is user-oriented and largely influenced by map scale and predefined map uses. For example, a map for visualizing the spatial distribution of national high schools does not necessarily need to incorporate middle or elementary schools even though such information might be readily available or easy to gather and depict.
Depending on map scale or purpose, not all selected features should or can be drawn effectively on a map. Map makers make decisions to remove features or certain parts of feature using criteria such as length, size, societal importance or potential impact on the capability of the map to successfully tell the intended spatial story or facilitate knowledge construction. A map maker might choose to delete all oil wells with low cumulative production volumes, discard all short tributaries, or remove small lakes from a map. Elimination results in fewer features which helps reduce the complexity, overload and time required to mentally or computationally process map information (Weibel and Dutton 1999).
Figure 3. Illustrating the results of feature elimination (left = before, right = after). Click here to see an animated version of this process. Data source: Wyoming Oil and Gas Conservation Commission. Maps source: author.
5.3 Simplification and smoothing
Simplification applies to linear and areal features. The vertices which form and shape these features can be grouped into key and non-key points. Simplification creates new features by discarding non-key points with overall minimal distortion to the original shapes. Tolerances defined by distances and angles between feature points provide the basis for identifying the points to weed out (Regnauld and McMaster 2007). Unlike smoothing, simplification typically retains the acute angles which give some features a jagged appearance. One motivating factor is oriented towards visualization and about making displayed data look less complex. Another is computational in nature and concerned with saving computer storage space (Shea and McMaster 1989). Widely used algorithms for simplifying and smoothing line and area features include Douglas-Peucker (Douglas and Peucker 1973) and Visvalingam-Whyatt (Visvalingam and Whyatt 1992). The former is particularly effective at generalizing cultural features (e.g. roads) but natural features (e.g. rivers) and polygon boundaries are better simplified using the Wang-Muller algorithm which preserves polygon area (Wang and Muller 1998). Douglas-Peucker and Visvalingam algorithms are computationally efficient and quite good at minimizing linear deformations and disagreements between original and derived lines (Shi and Cheung 2006). For example, Mike Bostock applied the Visvalingam-Whyatt algorithm to polygon data and produced incredible results from both cartographic and database generalization perspectives. Through barely noticeable but important geometrical modifications, Mike was able to shrink the file size of a detailed boundary of the coterminous U.S. states by a whopping 95%!
Figure 4. Illustrating the results of polygon simplification and line smoothing (left = before, right = after). Click here to see animated versions of this process. Images source: author.
Displacement resolves issues that occur when several features are congested in a map area. The technique moves features from their true map positions so that they are more distinct and can easily be distinguished. The key is to keep the movements as small as possible so as to minimize concomitant errors especially in the positions and relationships between features.
Figure 5. Illustrating the results of feature displacement (left = before, right = after). Click here to see an animated version of this process. Images source: author.
Classification organizes geographic data in classes or groups. The classes are defined by qualitative (e.g. landcover type) or quantitative (e.g. population) feature properties, the closeness of quantitative property values (e.g. values between 10,000 and 20,000) or the functional roles (e.g. administrative boundaries) of features (Shea and McMaster 1989). Classified data are generally easier to visually analyze, interpret and comprehend because of less attribute data that needs to be dealt with. Methods for classifying quantitative data such as quantile and standard deviation are chosen based on the distribution of data and significantly impact the spatial patterns portrayed on maps. Data on existing maps can also be re-classified; for example, grouping hierarchical classes like deciduous, coniferous and mixed forest lands into a broader but useful forest landcover class.
Figure 6. Classification methods can vary dramatically how the data appear, especially in choropleth maps such as these. Click here to see an animated version of these differences. Maps source: author.
5.6 Exaggeration and Enhancement
Although different, exaggeration and enhancement techniques similarly draw upon the map maker’s familiarity with real-world features. All features are not equal and exaggeration gives prominence to more important features by drawing them disproportionately on maps. An example is depicting ancient ruins much larger than they truly are so as to highlight them as symbols of national pride. Enhancement adds extra detail to help map users better appreciate the nature of geographic features. Both enhancement and exaggeration modify feature symbology but for different reasons, the former to augment visible aspects and improve feature recognition, and the latter to highlight the non-visible (e.g. societal value).
Figure 7. Illustrating the results of enhancing and exaggerating line and polygon features (left = before, right = after). Click here to see an animated version of this process. Images source: author.
The aggregation technique creates a single feature by combining two or more features that may or may not be contiguous. The boundaries between contiguous regions are dissolved to create a simple polygon, or retained to build a complex multipart feature. Non-contiguous features can only be combined when they fall within a predefined threshold distance or share the same property (e.g. owner) values (e.g. John Doe). Points can also be combined to define areal extents of some variables, geomask sensitive data or enhance the display and analysis of data. An example is aggregating a group of fruit tree points into an orchard landuse region.
Figure 8. Illustrating the results of aggregating point and polygon features (left = before, right = after). Click here to see an animated version of this process. Images source: author.
Many features undergo transformation when map scale is reduced. Through collapsing, features can change from one size to another and/or become a new type of geometry sometimes with unintended consequences. For example, a narrow river polygon may be converted into a single line feature giving some users the impression that the river is not navigable. Similarly, changing building footprints or small islands into points or reducing multiline roads to single lines may alter people’s perceptions of those features since we tend to make connections between the sizes of features and their values, levels of importance or functions (ESRI 1996).
Figure 9. Illustrating the results of collapsing features (left = before, right = after). Click here to see an animated version of this process. Images source: author.
Data layers digitized in a GIS often require refinement such as making a running track appear more rectangular with rounded corners (ESRI 1996). Refinement fine-tunes a map layer by subtracting or adding features or feature detail. Features that rank low in importance or relevancy to specific purposes are usually the first to be weeded out when users become overwhelmed by map information, or effective map use is hampered. An example of adding feature detail is including lower order streams to densify and improve the look of a hydrographic network. Refinement also enables subtle changes to be made to features are not large enough to draw proportionally on maps.
Figure 10. Illustrating the results of refining a digitized running track and a hydrographic network (left = before, right = after). Click here to see an animated version of this process. Images source: author.
Typification is based on the basic sampling principle that suggests that a subset of the features in a map layer can adequately capture and represent the essential qualities of the entire population. The goal is to ensure that a map of representative features provides a good idea about the general arrangement, spread or connections between features in the entire dataset. Sampled features especially points (e.g. address points geocoded by zip code) may be rearranged slightly to closely mirror the picture or structure of the population (ESRI 1996).
Figure 11. Illustrating the results of typification using 50% of the original points (left = before, right = after). The heat maps illustrate that the point density pattern is largely the same before and after typification. Click here to see an animated version of this process. Data source: Wyoming Oil and Gas Conservation Commission. Images source: author.
Generalization affects and alters the content of spatial databases and cartographic maps. Although it is now largely automated through various methods and tools (e.g. Web services), generalization remains a challenging problem. This is partly due to the difficulty involved “in finding a compromise between the choice of … [geographic phenomena] …, their form and detail of representation, and the space available in which to display them” (Mackaness et al. 2014, p3). Generalization is also prone to quality concerns (Weibel and Dutton 1999; Regnauld and McMaster 2007). Muller (1991) identified three quality dimensions that are commonly sacrificed when spatial databases and cartographic maps are generalized, that is, accuracy, completeness and consistency. Simplified lines, for example, embed locational and geometrical inaccuracies that can cause topological problems like co-located, misaligned and intersecting lines (Shi and Cheung 2006). These problems often end up in spatial databases where (generalized) data layers are usually kept. Depending on purpose, the databases may also be considered incomplete when, for example, minor roads are deleted from an urban road network dataset.
Another generalization issue is that users are not usually given specific information about the techniques applied or the extent to which these modify the features in a given map. This can potentially mislead some users who might measure the dimensions of exaggerated features and take them literally as truth. At the same time, many people appreciate that maps are generalized spatial representations, and do not, for example, interpret a 1mm wide road symbol on a 1:50,000 scale map to be 50m wide in reality.
As shown through the figures in §3, visual methods can be used to quickly identify and judge the inaccuracies of cartographic generalization (Weibel and Dutton 1999). At the same time, there is growing research on more reliable quantitative methods for evaluating the outputs of generalization. Stoter et al. (2014), for example, evaluate generalization within the context of map use placing emphasis on map reading and how this activity is affected by factors like amount, distribution and complexity of content. The authors also discuss other areas of evaluation that involve defining, iteratively tweaking and then applying specific generalization parameters.
On the whole, the field of generalization has experienced great progress owing to advanced research in Web cartography, data enrichment, generalization algorithms and other areas. Although there are notable successes apparent in, for example, (1) widespread implementations of automated generalization, (2) widely used auto-generalized maps, (3) generalized maps that effectively preserve feature geometries (e.g. river sinuosity) and densities, and (4) the growing number of richer spatial databases, Burghardt et al. (2014) identify some areas that demand further research. These are concerned with but not limited to how to create and employ powerful interactive and statistical tools to effectively solicit user requirements for custom maps, how to develop reusable generalization algorithms, and how to successfully implement real-time and on-demand generalization.
Bunch, R. L.,& Lloyd, R. E. (2006). The cognitive load of geographic information. The Professional Geographer, 58, 209-220.
Burghardt, D., Duchêne C. & Mackaness, W. (2014). Conclusion: Major Achievements and Research Challenges in Generalisation. In: Burghardt, D., Duchêne, C., Mackaness, W. (Eds), Abstracting Geographic Information in a Data Rich World. Lecture Notes in Geoinformation and Cartography. Springer, Cham
Douglas, D. H., & Peucker, T. K. (1973). Algorithms for the reduction of the number of points required to represent a digitised line or its caricature. The Canadian Cartographer, 10(2), 112-122.
ESRI (1996). Automation of map generalization: The cutting-edge technology. Technical paper available at: http://downloads.esri.com/support/whitepapers/ao_/mapgen.pdf.
Foerster, T., Stoter, J., Kobben, B. (2007). Algorithms for the reduction of the number of points required to represent a digitised line or its caricature. Proceedings of the 23rd International Cartographic Conference, Moscow, Russia.
Hu, Y. (2017). Geospatial Semantics. In Huang, B., Cova, T. J. and Tsou M-H., (Eds), Comprehensive Geographic Information Systems, Elsevier. Oxford, UK.
Kimerling, A. J., Muehrcke, J.O. Buckley, A.R. & Muehrcke, P.C. (2012). Map Use: Reading and Analysis, 7th ed., Esri Press Academic, Redlands, CA.
Kraak, M.-J.,& Omerling, F. (2010). Cartography: Visualization of Spatial Data. (3rd ed.). New York: Routledge.
Mackaness, A. W. (2008). Generalization of spatial databases. In Wilson P.J. and Fotheringham A.S. (eds) The handbook of geographic information science. Malden: Blackwell Publishing, 222-238.
Mackaness, W.A., Burghardt, D., Duchêne, C. (2014). Map generalisation: Fundamental to the modelling and understanding of geographic space, In: Burghardt, D., Duchêne, C., and Mackaness, W. (Eds.), Abstracting Geographic Information in a Data Rich World, Lecture Notes in Geoinformation and Cartography, Springer International Publishing.
McMaster, R. B. (1987). The geometric properties of numerical generalization. Geographical Analysis, 19(4), 330-346.
McMaster, R. B.,& Shea, K. S. (1992). Generalization in Digital Cartography. Resource Publication in Geography, Washington D.C., Association of American Geographers.
Muller, J. C. (1991) Generalization of Spatial Databases. In Maguire, D. J., Goodchild, M., and Rhind, D., (eds), Geographical Information Systems: London, Longman Scientific, p. 457-475.
Neun, M. (2007). Data enrichment for adaptive map generalisation using web services. PhD Thesis , Department of Geography, University of Zurich.
Regnauld, N.,& McMaster, R. B. (2007). A synoptic view of generalisation operators. In W.A. Mackaness, A. Ruas, L.T. Sarjakoski (eds), Generalisation of geographic information: cartographic modelling and applications. Oxford, UK: Elsevier, 37-66.
Roth, R. E., Brewer, C. A., & Stryker, M. S. (2011). A typology of operators for maintaining legible map designs at multiple scales. Cartographic Perspectives, 68, 29-64.
Shea, K. S.,& McMaster, R. 1989. Cartographic generalization in a digital environment: When and How to Generalize, AutoCarto 9, 56-65.
Shi, W., & Cheung, C. K. (2006). Performance evaluation of line simplification algorithms for vector generalization. The Cartographic Journal, 43(1), 27-44. DOI: 10.1179/000870406X93490
Stanislawski, L. V., Buttenfield, B. P., Bereuter, P., Savino, S., & Brewer C. A. (2014). Generalization operators, In: Burghardt, D., Duchêne, C., and Mackaness, W. (Eds.), Abstracting Geographic Information in a Data Rich World, Lecture Notes in Geoinformation and Cartography, Springer International Publishing.
Stoter, J., Zhang, X., Stigmar, H. & Harrie, L. (2014). Evaluation in generalisation., In: Burghardt, D., Duchêne, C., and Mackaness, W. (Eds.), Abstracting Geographic Information in a Data Rich World, Lecture Notes in Geoinformation and Cartography, SpringerInternational Publishing.
Visvalingam, M., & Whyatt. J. D. (1992). Line generalisation by repeated elimination of the smallest area. Cartographic Information Systems Research Group (CISRG) Discussion Paper 10, The University of Hull.
Wang, Z., & Muller, J. C., (1998). Line generalization based on analysis of shape characteristics. Cartography and Geographic Information Systems, 25(1), 3-15.
Weibel, R.,& Dutton, G., (1999). Generalizing spatial data and dealing with multiple representations. In Longley, P., Goodchild, M. F., Maguire, D. J., and Rhind, D. W., (eds), Geographical Information Systems: New York, John Wiley, p. 125-156.
Weibel, R., & Jones, C. B. (1998). Computational perspectives on map generalization. GeoInformatica, 2(4), 307-315.
- Discuss generalization as it relates to cartographic maps and spatial databases
- Describe the circumstances under which maps may be generalized
- Explain the complementary relationship between database generalization and cartographic generalization in the context of GIS
- Discuss the different generalization techniques for point, line and polygon data
- Describe how the Douglas-Peucker algorithm is used to simplify linear features.
- Discuss the quality dimensions which are sacrificed when maps and spatial databases are generalized.
- What is generalization and why is it important in designing, making and using maps?
- Discuss generalization in the context of spatial databases and cartographic maps.
- Discuss the argument that cartographic generalization is a contradictory process.
- With the aid of suitable examples, describe the different techniques and associated challenges of generalizing point, line and area data.
- Describe the conceptual solution of the Douglas-Peucker line simplification algorithm.