Michigan Land Use Data Automation
and Dissemination Project (MLDADP)

In Fulfillment of the Digital Library Associate Award, December 1996

By Alison Atkins
School of Information
University of Michigan


Table of Contents

Introduction
Project Background
Definition of the Problem
Aim of Study
Project Benefits
Methodology
Future Plans for MLDADP
Findings/Results
Conclusions
Appendices
Definition List
Scripts
Interface Images
Bibliography

Internal links to the defintion list (indicated with an *) provide more in-depth explanations of the terms used in this paper.

Introduction:

In recent years, Geographic Information System (GIS*) technology has become more universal. The appearance of desktop, icon-based products has meant that the general population will be more likely to use geospatial data* and GIS technology (Strasser, p. 278). But even as the software becomes easier to apply, GIS users are faced with the greater obstacle of obtaining the digital formats of the spatial data they require to perform their work (Strasser, p. 281).

The goal of my Digital Library Associate (DLA) project is to encourage the use of geospatial data by facilitating access to certain digital datasets at the Univeristy of Michigan Map Library. Specifically, I created a simple interface* for using a GIS to convert Michigan land use data* into a GIS-ready* format. The program I created, called the Michigan Land Use Data Automation and Dissemination Project (MLDADP), displays a menu-interface from which Map Library patrons can customize data selection. (For a visual example of the menu interface, see interface images.) MLDADP serves as a model for two levels of application: enhancing general access to geospatial data at the University of Michigan, and promoting public access (for non-GIS experts) to this information on a larger scale, through the Internet.

Project Background:

Review of the literature.

Finding published material which discusses "access to digital spatial data" is very difficult. The amount of digital information available on the Internet is constantly expanding, and at a rapid pace. As Bergen notes (p. 306), the World Wide Web's interactive capabilities make it an appropriate forum for data distribution. Consequently, the medium of the Web is the most useful repository for networking geospatial data and for obtaining information about this process.

Examples of Web sites providing access to geospatial data.

There are both individual and collaborative efforts to provide networked access to geospatial data. Individually, some academic institutions, government agencies, and private corporations all provide some level of GIS or mapping service. New sites appear rapidly, and because they offer such varying levels of access to geospatial data, it is almost impossible to generalize their services. Provided below are a few examples of the variety of sites that provide access to geospatial data. For a more complete look at the kinds of geospatial data available on the Internet, please refer to Starting the Hunt: Guide To On-line And Mostly Free U.S. Geospatial and Attribute Data. This excellent resource for locating geospatial information on the Web is organized by data subject, physical region, and many other categories. Although it is not searchable and it does not provide metadata*, it is easier to use than most other catalog sites.

Many sites, including those below, offer a wide array of geospatial data services, from cataloging and searching geospatial metadata, to providing map-creation services, or allowing users to download GIS-ready datasets. Pseudo-GIS services, like the mapping service from CIESIN, allow users to download an image, but not the data that generates the coverage. These applications, although needed, do not provide access to GIS-ready datasets. Mapping and metadata sites, therefore, will not be addressed in this paper. Focusing instead on sites that provide access to GIS-ready data, I have found that they usually suffer from one of the following failings:

The follwing examples illustrate the seeming inability to combine a simple interface and customized access to geospatial information.

U.S. Census Bureau
The Census Bureau offers several levels of access to geospatial data, but does so in a very static manner. Access tools developed by the Census Bureau include: 1990 Census Lookup, which allows a user to customize data extraction from the latest census but does not provide it in a GIS-ready format; an interactive site which creates maps based on your area of interest; and a Data Extraction System (DES) providing customized extraction of census or survey data in a GIS-ready format. The DES service, although promising, assumes a high level of technical knowledge and is difficult to use. Users must jump through several hoops to get at the information they need, and the Census Bureau states upfront that no human assistance will provided in downloading the data.

Maps of Montana
Part of the Natural Resources Information System (NRIS) from the State of Montana Library, this site provides geospatial data on three levels, through a relatively simple interface. Users can download map images, metadata, or the raw data from a variety of datasets. This site provides access through a simple interface, but offers no customization abilities. This static retrieval method leads to several problems. First, users are restricted to downloading data for the entire state, which sometimes amounts to files as large as 8 Mb. Second, all the data is in state plane coordinates, meaning many users will need to convert it to decimal degrees (a more universal projection) before it can be used.

Scientific Assessment and Strategy Team
(SAST, Division of the U.S. Geological Survey) As their Web site states, "the SAST Data Distribution System allows Internet Users the opportunity to download geographic data determined by their own specific needs and their own choices." In order to provide this high level of customization, the SAST site actually provides a Web front-end to ARC/INFO*. Users interact with the GIS to create a customized dataset. Although the high level of customization is potentially very useful, this site's technical equipment requirements are restrictive, and its difficult interface will hinder general access.

The amount of geospatial information available through the Web is daunting, and so are the technical solutions to making it available. However, in the midst of putting up their information, it appears as if some providers have forgotten that user interests should come first. As GIS applications become more prevalent, data providers need a formula for providing users with both a simple interface and a method of customizing data retrieval.

What role is the government playing?

The National Spatial Data Infrastructure (NSDI), mandated by the Clinton-Gore administration, is leading a collaborative effort to collect and provide access to geospatial data. According to its Web site, the NSDI "is conceived to be an umbrella of policies, standards, and procedures under which organizations and technologies interact to foster more efficient use, management, and production of geospatial data." Under an Executive Order of the President, beginning January 1995, all government agencies are required to document and make their digital data available to the public (Nebert 1).

The Federal Geographic Data Committee (FGDC) is providing leadership in the development of the NDSI, working together with government agencies, nonprofit organizations, and the private sector. Three key components of the FGDC's work are to:

  1. Assist in the discovery and documentation of geospatial data through implementation of the National Geospatial Data Clearinghouse. The Clearinghouse provides technical solutions (such as a Z39.50 protocol) for facilitating discovery of spatial data on the Net (Nebert 1). Current search prototypes from the Clearinghouse provide an interface which allows users to specify needed boundaries by latitude-longitude coordinates or by state name. The immediate goal of the Clearinghouse is not necessarily to provide online access to geospatial data, but to create an inventory of spatial data holdings, complete with metadata documentatiton (Nebert 2).

  2. Develop content standards for digital geospatial metadata*. The purpose of the FGDC's metadata standards is twofold: to help organizations keep track of the content and quality of their data, and to help prospective users determine what data exist, in what format, and how it may be accessed. This standardization ensures that metadata files can be searched as a catalog. (See the FGDC's site for a more comprehensive description of metadata standards.)

  3. Encourage the placement of digital data files on the Net, to promote retrieval and use (Nebert 3). This is perhaps the most useful of the FGDC's goals, but also the least fulfilled. Douglas D. Nebert, Chief of the Spatial Data Support Unit at the U.S. Geological Survey, discusses the architecture of a spatial data server in Status of the National Geospatial Data Clearinghouse on the Internet. In this paper, Nebert outlines how spatial datasets could be stored, searched, and accessed from within ARC/INFO, a long-term goal for spatial data service which would eliminate duplication and maintenance problems (Nebert 2).

The NDSI provides access to three types of geospatial information: catalogs of metadata, maps, and digital spatial datasets. According to Nebert, most users of these Internet resources will download map images rather than digital datasets, "because [the users] lack the GIS/mapping tools to render thier own maps from raw data." (Nebert 3). Nebert's assumption is unfounded, and detrimental to the development of methods for accessing raw digital geospatial data on the Web. Within academia and the private sector, there is a growing clientele for the raw data used in GIS applications, and the current state of access does not meet this clientele's needs.

What is MIRIS data?

The Michigan Resource Information System (MIRIS) is an effort to create a "statewide computerized database of information pertinent to land utilization, management, and resource protection activities" (DNR, pp. 1). This effort was intially known as the Michigan Resource Inventory Program established under the Michigan Resource Inventory Act, 1979 PA 204.

MIRIS data includes a considerable amount of information about land and water resources in Michigan. The information is organized in a variety of formats. The University of Michigan has had access to the print versions of the base maps and landcover features since the late 1980s. Base maps were digitized primarily from U.S. Geological Survey (USGS) 7.5" quadrangles and contain the following features: state/federal highways, county roads, local streets, vehicular trails, railroads, airports/landing fields, lakes, perennial/intermittent streams and drains, county/township/city boundaries, and U.S. Public Land Survey section corners, lines and numbers. Land Cover maps describe land uses in seven major categories: urban, agricultural, nonforested, forested, water, wetlands, and barren. Each of these classifications has two to twenty subdivisions, with the greatest number of categories belonging to the urban classification.

The MIRIS data was compiled by numerous federal, state, and local agencies and therefore the scale and accuracy are not consistent for all the coverages*. MIRIS and other mapping agencies periodically review and correct inconsistencies as new editions of the data are released. The University of Michigan has permission to use and distribute the MIRIS data, and now participates in the IMAGIN datasharing program (Improving Michigan Access to Geographic Information Networks).

Land use files for the state of Michigan are stored by township, a smaller division of counties developed nationally in the 18th century. Washtenaw County, for example, is made up of twenty townships (DNR).

Definition of the Problem:

There are many problems associated with finding and accessing digital spatial data, some of which are listed below:

Although a lack of technological solutions may be the greatest inhibitor of network access to geospatial data in its raw form, this problem will continue to be placed on the back burner as long as people assume that the need for GIS-ready data is small. As Nebert notes, conventional wisdom suggests that geospatial data users do not have access to GIS or mapping tools, and, therefore, need access to maps and other images they are not able to create on their own. Until the deficiency of raw, GIS-ready data is understood, the development of geospatial collections will be slow and inconsistent.

Aim of Study: How MLDADP will help solve these problems

The purpose of this project was to develop a strategy to encourage the use and transfer of spatial data. As a case study for increasing access to geospatial data, I initially intended to provide Web access to MIRIS land use data for Washtenaw County which the University Map Library has permission to disseminate. The Map Library received this data in IGDS format. This format, however, is not universally accepted, and rendered the data unusable for mapping and analysis with the applications available at the University. In order to be useful, the MIRIS data required complicated conversions to change its format and projection*.

In the midst of the conversion stages, I realized that what users need is not only enhanced access to digital geospatial data, provided in part by current interactive Web sites, but the ability to customize the needed data for their own purposes. As discussed above, several government and private sector web sites attempt to provide online access to geospatial data, but do so without giving the user the ability to customize his or her request. This static method of disseminating data may lead to users having to download much more information than they need. Perhaps a user only needs a fourth of the coverage provided by the Census Bureau; once obtained, polygonal coverages* are not easy to manipulate. One cannot just cut out the unwanted sections.

To experiment with user customization, the focus of my project switched from trying to provide Web access to MIRIS data to adding user customization options to the conversion and transfer process. I wanted to create an expert system which would limit the number of steps users have to perform to get to the data they need, and to make the complicated ARC/INFO* processes transparent. I wrote a script creating a menu-driven GUI* through which users are able to select which townships they want converted to ARC format. I then composed another series of scripts automating the conversion process so that desktop GIS users need not learn ARC/INFO (a more difficult to use, command-line application) in order to convert the necessary data to the correct projection. The data are then readily moved into a simpler GIS package, ArcView, where data can be manipulated and images moved to the Web.

Project Benefits:

This project benefits the University community on several levels.

MLDADP benefits are not limited to the University community. Simplifying access to GIS data puts this techonology in the hands of the community leaders and public citizens, where before it was restricted to the realm of the researcher or GIS expert. Increasing and simplifying access to this data will promote GIS use by allowing community leaders to put GIS technology to work for the community at large.

The successful implemenation of this project will undoubtedly lead to the development of other projects which make use of the knowledge gained, both on a local and national level. For ideas of further development, see Future Plans, below.

Methodology:

Overview

The implementation of MLDADP occured in a series of stages. From the initiation of this project to its completion, my purpose and product changed dramatically. My original intent was to convert the MIRIS data to ARC format, and then provide Web access to this data. My focus later changed to customizing the process of of data retrieval by allowing users to choose which townships they wanted converted. To complete this goal, I developed a menu interface and composed scripts to automate the complicated process of conversion. In essence, I created an expert system which allows the user flexibility to request what she needs while making invisible the complex ARC/INFO processes behind the conversions. (See Interface Images.)

Initial Vision

At the outset, I had planned to simply convert the MIRIS data for Washtenaw County from IGDS to ARC format, store all this data in a table, and allow users to access it from the World Wide Web. I experimented with storing tables on a web page and downloading them from different platforms.

Karl Longstreth, head Map Librarian at the University of Michigan, and technical supervisor of my project, quickly showed me that the the conversion to Arc format was only the first of many steps necessary to make the data usable. Longstreth suggested I contact John Fay, GIS lab manager at the School of Natural Resources and the Environment (SNRE), who had successfully converted all of the state MIRIS data from IGDS to ARC format. Fay shared an AML* script he had written to perform the conversions. This script later became the template for all the other scripts I composed.

Once I had gotten Fay's script to perform the IGDS to ARC conversions for Washtenaw County, Longstreth and I began experimenting with and formalizing the steps necessary to complete the process. Through several weeks of trial and error, we found a series of steps which led to a completed, usable coverage of Washtenaw County. These steps included (ARC commands given in parentheses):

  1. converting raw data from IGDS to Arc format (IGDSARC)

  2. appending the individual townships of the county together, since the conversion happens at the level of the township rather than the county (MAPJOIN)

  3. erasing the township boundary lines in the complete county coverage and merging the contiguous land use units (DISSOLVE)

  4. changing the projection/coordinate system of the data from state plane coordinates (based on the North American Datum of 1927) to a geographic (or unprojected) coordinate system of decimal degrees. Choosing decimal degree coordiantes makes the converted townships more transferable between GIS applications. It is the default projection for the Map Library's basic GIS software, ArcView. (PROJECT)

  5. rebuilding the topology of the land use coverages (CLEAN)

    Changing the focus.

    Once I had formalized the steps necessary to convert all of the Washtenaw County data, I was ready to move on to the rest of the state. My focus changed, however, based on a visit from a library user. A patron came into the Map Library and wanted to use MIRIS data for four adjacent townships in Washtenaw County. Once all the county townships have been merged into a single coverage, it becomes very difficult to isolate the needed townships. This is because land use is represented by polygons, which cross over political boundaries such as township lines. After the township lines have been eliminated, separating townships would result in pulling apart polygons (imagine pulling puzzle pieces apart), rather than dividing the coverages along neat, square lines. At that stage, we were only able to offer the data in an all-or-nothing package, as land use for the entire county.

    In order to provide the needed data I retraced my steps, this time for the four townships rather than the entire county. This sparked the idea that I should generate a script that would automate the entire process, incorporating the conversion of the data from IGDS to Arc format, join relevant coverages, dissolve borders, change coordinates to decimal degrees, and rebuild topology. This way, patrons could indicate (through a web or Arc interface) which county or township data they needed, and the ARC/INFO conversions would be processed automatically.

    The benefits of this decision were instantly obvious. Users need flexibility; my criticisms of current web sites focused on the problem that users had no option to customize the data they receive. What I realized from looking at current Web access to geospatial data, and through my own work on this project, was that in order to obtain a high level of customization, one cannot offer pre-defined datasets. Instead, the access system must serve as a front-end to a GIS that customizes data on the fly.

    Another benefit of MLDADP is that it solves problem of inefficient storage. The coverage for the entire county is a very large file (12Mb). An expert GIS user would not have to oversee the conversion of data for the entire state. Neither would large amounts of converted and underutilized data be taking up server space. Instead, townships and counties are processed as needed.

    Writing the scripts.

    ARC/INFO has its own programming language, Arc Macro Language (AML), which allows users to write programs that automate routines. Using Fay's initial script as a template, and working through the AML manual on my own, I was able to construct a series of scripts that automated the conversion process described in the Methodology section, above.

    In the current incarnation of this project, the user has to interface with ARC/INFO only once, to initiate the program. Once begun, the user manipulates the generated menu to customize his or her data needs. The necessary townships are processed by ARC/INFO (according to the methodology described above) and the converted coverage is stored in networked space, accessible by the patron from a desktop GIS program on a remote computer.

    Initially, I saw two obstacles to constructing a working script. When I first ran Fay's script to convert the townships of Washtenaw County, I had to enter all the township file names into a text document. I told the script to use this file so that it would know which townships to process. Giving users control over which townships to process meant that the list of file names would change for every user. The first obstacle I faced was how user input could determine and change the list of needed townships and how that file could be read from within the script.

    My second concern was how to write a script for the MAPJOIN command. The question here is how to read in a list of townships to be appended when the system doesn't know how long the list is or if the coverages are even adjacent. I later discovered non-adjacent townships would not upset the MAPJOIN process. Apparently, non-neighboring townships can still be combined into a single coverage.

    The trial and error process is a necesary part of composing scripts, since many AML commands are not extensively documented. Through this experimentation process, I discovered that there was a single solution to these two concerns. I created a dummy file, which opens with the script's initiation. Selecting townships from the menu results in that particular township's raw file name being written to the dummy file. Ending the selection process closes the dummy file. The file is then opened, read to retrieve township file names, and closed each time a sub-script runs. At the end of the entire process, the dummy file is deleted so that a fresh copy can be used the next time the script is run. (For a more detailed explanation of how these routines operate, see the AML scripts written to perform these tasks.)

    Future Plans for MLDADP:

    The creation of this automated system has made explicit some of the questions and issues involved in providing access to digital spatial data. Yet, its development is incomplete. Writing the scripts and creating the menu interface was not a static intellectual process; I continue to develop ideas for improving this project. Some of these ideas would serve, initially, only the project itself, making MIRIS data more accessible and offering the converted data in a greater variety of finished products. Other future plans for the project contribute to the collaborative efforts of the Federal Geographic Data Committee by participating in the National Clearinghouse and helping formalize national metadata standards.

    Ideas for future development of MLDADP include:

    Findings/Results:

    Through the conversion and refining of the Washtenaw County MIRIS data, I discovered the technical difficulties involved in the conversion process, described above. As each obstacle was overcome, I was able to develop routines (scripts) to more efficiently serve patrons' needs. This project has been extremely useful because it serves current patron requests for land use data, it provides a model for future development of digital data at the Map Library, and it offers some intellectual criticism of the current state of digital access.

    Findings include:

    • Although access to digital data is important, the data may be useless unless users are granted flexibilty.

    • In order to obtain a high level of customization, geospatial providers cannot rely on pre-defined datasets. Instead, their access method should serve as a front-end to a GIS which customizes data on the fly.

    • This project allows users to look at a dataset outside of the boundaries in which it was initially concieved, and to do so through a much simpler interface than would otherwise be required.

    • We have created an expert system from which users can access and manipulate a very complex GIS to make the data they need usable in a less formidable system.

    • MLDADP bridges the technology gap between beginning and advanced GIS users, allowing novices to get the information they need without having to learn advanced skills.

    • The technical obstacles to realizing customized access over the Web are large, but not insurmountable. Now that the seed program is operational, expansion and refinement may be more quickly accomplished.

    • We have developed a model from which the remaining MIRIS data may be converted. Adding more customization features will enlarge MLDADP's usefulness at the University level.

    • MLDADP ensures the most efficient use and storage of large amounts of data. By storing only raw data, and converting and customizing coverages on the fly, the Map Library does not have to guess want users will need, nor store any underutilized conversions.

    Next Steps and Future Studies

    Future studies for MLDADP include testing the efficacy of this model on a larger scale. Its interface should be tested and expanded, within the campus setting, and in the realm of the Net.

    Ideas for future studies can be organized under three main topics:

    • Test the actual demand for GIS-ready data. How does the need for raw geospatial data match what Nebert describes as the need for images and mapping tools?

    • Determine how well the current Web access tools work: Are they easy to navigate or frustrating to users? Do they provide the kinds and formats of geospatial data that are needed?

    • Transfer MLDADP's scripts to a Web front-end which will intialize ARC/INFO and perform the needed conversions. This step will ensure that desktop GIS users, but those without access to ARC/INFO, will have have equal access to the same kinds of GIS-ready data.

    Conclusions

    This project is important to encouraging access to GIS data. My investigation into how private sector and government sites are providing network access to geospatial data has shown that there is much room for improvement. Data providers cannot assume GIS applications are too complex for the general Internet user, because doing so will only discourage their use. Instead, GIS proponents must realize the importance making quality data accessible over the Net. In order to promote our own work, we must encourage widespread use of the Internet as a means of collecting and transferring geospatial data. Before the Internet can be an efficienct means of accesing geospatial data, technological developments must occur on two fronts: making GIS applications easier to use, and making the data necessary to these applications available. Furthermore, these improvements must happen synchronously, since one encourages the other.

    The completion of MLDADP is well-timed, since all federal agencies are now required to offer information over the Internet. (Nebert 1) This project will remind those agencies working with geospatial information that the need for GIS-ready data is very real and should be encouraged. Because network access to federal government data will eliminate human assistance these sites must offer clear and simple interfaces to their information.

    The work I have done on MLDADP is not narrow in its scope. It has provided a model technical soution for encouraging patron use of the digital geospatial data at the Map Library. It has created a seed project from which the University may initiate several new methods of providing access to digital data. More importantly, however, MLDADP will serve to remind other data providers to refocus on providing access -- to keep their users' needs at the forefront and offer customization features whenever possible. Their users will be happier, and their data will be better utilized.

    Finally, MLDADP has bridged the GIS technology gap between the GIS experts or academicians and the community leaders or public citizens whose use of GIS might lead them to better decision-making practices. By simplifying access to GIS-ready data, this project gives desktop GIS users access to the same data available to more technologically-privileged GIS users. Putting GIS power in the hands of the public will offer many benefits: it will encourage the market for GIS products and data, and it will give community leaders better decision-making tools.

    Appendices:


    Paper by Alison Atkins
    Last Update 12/17/96
    Document URL: http://www.iit.edu/~atkins/DLA/
    Copyright 1996, Alison Atkins