Introduction to Geographic Information Systems
ERE 450/550
Lab Exercise 2
Lab due Friday September 29 at 5pm in Bray 410
In this lab exercise you will examine, review, assess, and prepare
the data that you’ll need to address a hypothetical problem in the city of
NOTE: In order to be
successful with these labs, read the exercise over IN ADVANCE of doing it; read
and follow directions carefully; be deliberate; keep track of GIS procedures in
your notebook; also in your notebook, keep track of filenames and where data is
stored; and finally anticipate and prepare for problems with the software….and
sometimes my directions. (By the way, if you see problems with directions
please let me know ASAP so that I can alert the class)
If you haven’t
noticed already, these exercises can get wordy with discussion of GIS concepts
and questions (that’s part of the class goal…you’re not here just to learn
software), but embedded within the exercise are simple GIS steps and processes
that you’ll have to learn so that you can do them again more efficiently in the
future.
If you want to have
clear and concise software instruction, don’t use these exercises; use your
notebook.
Step1: The Big
Issue-Problem, Questions, Purpose, Objectives
Hypothetical
Scenario:
Elizabeth Nifkin (yes, she’s a relative of Eustis B. Nifkin,
and also a member of the “green mafia” – another fun title given to ESF alumni)
was recently hired as an environmental planner for the city of
How many people are near or in flood zones within the City
of
They would like information on home ownership relative to renters; they would like to know population densities of these areas; they would like the Meadowbrook area examined in more detail; and they would also like to know the zip codes of these areas for future mailings.
Where are the Armories, Railroads, and Roads relative to these zones? They want to know, so that they can evaluate evacuation and supply issues.
They would like to know about the dams above residential areas.
They would like to know the watersheds that are affected (because they are also concerned about the potential for pollution loading during a flooding event)
They would like you to map the wetlands, so that they can assess those areas beneficial to buffering the negative effects of flooding.
They would like you to assess potential soil loss using the Revised Universal Soil Loss Equation.
Finally, they would like an idea of relative urban temperature issues that will need to be addressed.
Since the purpose and objectives (Step 1 of the GIS process) are already done for you, you now need to start the project with the next step of the GIS process: Data Management. What this means is gathering information about the data you will need to do the project.
The specific objective of this exercise is to examine, review, and evaluate the data and to prepare for the next lab. To enhance your understanding of GIS concepts, you’ll also be directed to answer some questions along the way.
IMPORTANT INFO BEFORE YOU START
The deliverables (Step 4: Output) for this lab are attached and include
a table (fill-it-in), a spatial model (fill-it-in), and the answer sheet. Those
deliverables capture your Step3: Analysis (asking and answering questions about
the data). Note that many of the blanks
have been filled-in for you. In addition, the data is on the Greenfield/PROJDATA folder. That way, if you stop in the middle of
the exercise, you don’t have to take all that stuff home. You should however have reference in your
notebook so that you can pick up where you left off. You can also check your work against what’s
on the data folder to see if you’re doing okay.
One more thing... check the website often for notices, announcements, and helpful hints that may be posted regarding this and future labs.
Step 2: Data
Management
Before you begin, think about this a bit. Using the issues
above, what are some of the datasets that you’re pretty sure you’ll need? The first question explicitly asks about the
flood zones and population within the City of
Folder Management
A big part of data management, is setting up the data storage structure using folder management. You are about to download a bunch of GIS data, so now is the time to organize. You will set up a folder and subfolders to hold all this downloaded data. This also makes it easier to figure out what you need to save before leaving the lab sessions. You don’t have to save your data from this lab, but it is so much easier to just save your whole project folder directly from the C drive to your portable media. This is pretty important, because an ArcMap project does not save your data.
If you need more help on how to use Windows Explorer, use Exercise 1 and your notes from Exercise 1.
Data Gathering: a.k.a
Data Acquisition
You’re going to download a bunch of data at one time using the “basket” feature on the CUGIR webpage. This will produce one big zipped file of all the datasets that you will then have to unzip individually. Use the checkbox to the left of the datasets, but do not download the basket until you have them all. You can easily go backward and forward on the webpages that you’ll need to navigate to get all of the required datasets, and then “update” your basket as you go.
You should also begin filling-in your table using the metadata and info links associated with each dataset. You may not be able to finish the “Vector or Raster” field in the table from the first part of this exercise, but after looking and manipulating the data sets throughout this exercise, you’ll be able to complete it. You should also use the steps of the exercise to “fill-in-the-blanks” of the spatial model.
Some of the table information will be pretty-hard to assess, and Date is going to be one of those tough ones. We are ultimately concerned about the currentness of the data or in other words, how “fresh” is it? For example, the first dataset you’ll add to the basket is the freshwater wetlands. If you look at the metadata it has a publication date of 1999, but is that really how current it is? If you scroll down through the metadata, you’ll find a link under “Currentness Reference” that refers to specific county work. When you click on that link and look for Onondaga County, you’ll find that despite the publication date, the most recent update to Onondaga county freshwater wetlands is 1994, so that’s the date I put in the table. Yes, this is a pain in the butt, however metadata includes all those darn readme files and any other documentation that may accompany a dataset. For some dates (like the DEM’s), you may have unknown. As you do these labs, you’ll find the information you need is in some of these other places. This is maybe where you start cursing me (it certainly won’t be the last time)….and that’s okay; I’ve been there, and I understand and appreciate what you’re going through….and I hope that you are beginning to appreciate the role of a GIS person.
Why am I worried about dates? Well duh, “the one constant in life is change.” There are actual physical changes, like a stream system constantly changing its path, and there are changes in detection, interpretation, and science. I’m really appreciating the service of CUGIR over the New York State GIS clearinghouse lately. Later in the exercise, when you download the big CUGIR zip file, you’ll see a readme html document that has links to check on updates to the data sets you downloaded…pretty cool…they really know how to serve up the GIS!
By the way, the other item that may make you a little crazy is looking for the scale, but again, we’re looking for limits, so do the best you can to find the most appropriate scale denominator. The freshwater wetland data is most appropriately represented at the 1:24,000 scale, as is most data created with the 7.5 minute USGS quad sheets as the source document. Look for the source scale denominator within Lineage first, and then if it’s not there, you may have to look somewhere else in the metadata documentation. You may also witness one of the big concepts (in the lineage) that I’ll repeat over and over in class, when doing analysis with multiple scale or resolution datasets, you’ll always be “stuck” with the smallest scale and/or the most coarse resolution.
Let me explain that a little, and you should think it through. If I combine two data sets, one at 1:100,000 and 1:24,000, the scale of the analysis is the smaller scale, 1:100,000. That’s because the 1:100,000 is the limit on accuracy. The 1:100,000 has been generalized more than the 1:24,000 scale to a point where features are more abstractly represented beyond their actual location. A good example is the railroad, road, and river next to each other separated by 1m. If we actually used the scale to define where those features are in real space, they would literally be on top of each other. So, they are artificially separated to provide relative location, not absolute location. The same problem may also exist in the 1:24000 scale data, but it would be less abstract and more accurate than the 1:100,000. The same concept works with raster data as well. Resolution refers to the smallest cell that can be representative of an area. If we take the example of a picnic table in your backyard that is 2m long and 1m wide, I may be able to pick that up in a raster image of 1m resolution (it would be 1 cell by 2 cells), however if I had a 30m resolution image, the color and shape of the picnic table would definitely get overwhelmed by the generalized colors and shapes of the surrounding area.
So, here’s the deal. When you combine 30m and 1m resolution data in an analysis, you are stuck with the 30m resolution as your resolution of analysis. Again, I will repeat this over and over and over in class. To maintain accuracy, you can go from big to small scale, but you can’t go from small to large scale. And, you can go from high or fine resolution to low or coarse resolution, but you cannot go from coarse resolution to fine resolution.
Despite my pronouncement, we still may do analysis that goes the “wrong way,” so to speak, in scale or resolution. But, it’s like anything else in science. We sometimes over generalize in order to simplify. The best thing to do is to acknowledge and disclose these limitations, so that we don’t do something stupid in policy when considering the real complexity and unpredictability of life. Okay, I guess it’s time to get some work done.
For the resolution and scale fields in the table, fill out resolution for raster data and scale for vector. If there is a scale for raster data or a resolution for vector data, I will not hold you responsible for that information on the table. In most cases, we are concerned about scale for vector data and resolution for raster data.
NOTE: As you do the
following, check the info and metadata links to fill in your table.
Adding Metadata in
ArcCatalog
This is an important step so that the metadata is kept with your dataset. These days, we primarily use .xml documents to keep track of the metadata. For this exercise, you do not have to add the metadata to each dataset, but try a few using the steps below. For future work, you can always add it when you need to, or you can look at the reference on the website. Your table serves as a nice summary of the metadata.
Question 1: What is
the “Geographic coordinate system name” under “Horizontal coordinate system?”
(Put your answer on the answer sheet)
Question 2: What are
the Geographic Coordinate Units? Are
these units of measurement planar (Cartesian, rectangular, or projected) or
spherical (angular, geographic, or geodetic)?
Question 3: Look
under Geodetic Model. “What is the Horizontal Datum Name?” What is the “Ellipsoid
Name?” Is an ellipsoid a spheroid? Is a spheroid always an ellipsoid?
Question 4: What is a
datum? (You may use the very simple one- to two-word definition.)
Adding metadata can be tricky at times. You may have to change the style sheet (upper left), and sometimes you have to import the metadata as a different format. This example at least gives you an idea on how to use this feature. If I cannot get the import metadata feature to work, I at least download metadata (save as the HTML) and any other accompanying documentation to the folder where the data is kept.
Thankfully, the CUGIR data came with metadata files in the XML format, however, the stuff that you downloaded from the NYS GIS clearinghouse didn’t always have the metadata accompanying the data set. That’s when I have to “save as” the HTML document and then keep it within the folder with the data of interest. Don’t worry, you don’t have to do that for this exercise, because you’re using a “handy-dandy” table to keep track of some of the more important parts of the metadata.
NOTE: There is one small, but very inconvenient problem with using the ArcCatalog imported metadata that you’ll see later in the exercise. So, use the webpage metadata and info links to fill in your table.
IMPORTANT: To stay
safe, limit file names of the converted interchange files and DEMS to 8
characters or less with NO SPACES!!!! Also, keep track of these new file names
in your notebook. Finally, make the names intuitive. For example I named the
s414201 file SEN_SHYD so that I would remember it as Seneca watershed, surface
water hydrogaphy. I named the stream network file SEN_NHYD. You may use longer filenames for the
shapefiles, but don’t go over 12 characters (including spaces/underscores). In the past we were restricted to 8 character
filenames and field names in GIS, so I tend to stick to that rule because I’ve
seen some trouble with longer file names….even when they “should work.”
Importing and
Converting Interchange (a.k.a Export) Files
Interchange or export files are ASCII text files created for quick download and easy management of ArcINFO Coverages (just another data format of GIS). They will not “show-up” in ArcCatalog or ArcMap until they are converted or imported.
Question 5: What are
the four data sets within this ArcINFO coverage?
Question 6: What are
the five data sets within this ArcINFO coverage?
Hopefully, you also took note of the icons used to symbolize these types of data, especially note the difference between tic, arc, node, route, polygon, and labels. There are other types of data that can be in these coverages as well, such as annotation. (To take a look at that one, check out the flood coverage.)
ArcINFO coverages are topologically (ability to know what’s next to what) referenced data sets that hold feature and attribute data within a three-folder system. A root folder, or workspace, holds both a feature folder and info folder, both have files that are associated with each other. If these files are separated the whole thing falls apart; that’s why we use ArcCatalog and NOT Windows Explorer to move GIS data sets around. Tics hold known ground coordinates of the data; nodes are basically points that make up lines; arcs are lines; polygons are made up of connecting lines; labels are points within polygons created to reference those polygons (a similar data structure to labels is used to make just points); annotation is additional descriptive notes of a data set; routes are used for network and connectivity analysis (like Mapquest’s directions). Next, you’ll take a look at the structure for shapefiles. I’ll discuss more issues about GIS data formats throughout this exercise and in class.
ArcCatalog vs.
Windows Explorer: More on GIS data structure
In this section, I will demonstrate why it’s easier to use ArcCatalog over Windows Explorer to manage GIS data. As you saw above with ArcINFO coverages, the biggest issue here is the GIS data structure.
Question 7: How many
shapefiles are shown in ArcCatalog?
Question 8: How many files
(other than the zip files) are there in the CENSUS folder shown in Windows
Explorer?
Shapefiles are non-topological vector data sets will have at
least 3 and up to 11 files that make up one complete data set. They are VERY common, because they are cheap
on memory and easy to create and manage. The three minimum files are the .shp
file, which stores the shapes as lists of vertices (points that make up a line)
as binary code, the .shx file, which stores the index of the shapes for
locating the values in space, and finally, the .dbf, which stores the table or
attribute values for each one of the spatial features.
So the reason to use ArcCatalog vs. Windows Explorer is to manage GIS data is so that you don’t screw-up the data management structure. By copying and moving one shapefile in ArcCatlog, you automatically grab all of those separate files. This is also the advantage when dealing with other GIS folder/data structures.
Question 9: Look for
the two files that have the .prj extension (you may have to change your Windows
Explorer View to see Details). What are they (names)? This will be important to
understand the next part of the exercise.
Defining Spatial
Reference
In this section, you’ll learn how to apply the GIS concept of spatial or geographic references in the software.
IMPORTANT NOTE: Use the webpage links for metadata and info to check the datums, ellipsoids or spheroids, projections, and coordinate systems of the data sets you have. Make sure that you rely on the metadata on the web to fill in the table, because you’re about to see a problem with the metadata spatial reference tab in ArcCatalog.
Question 10: According
to ArcCatalog, what is the “Geographic coordinate system name” for each of the
shapefiles? (Provide both the file name and it’s associated coordinate system)
Question 11: What are
the Horizontal Datum Name and Ellipsoid Name for GCS_North_American_1983? (Use
the Details link)
Question 12: What are
the Horizontal Datum Name and Ellipsoid Name for GCS_Assumed_Geographic_1?
Question 13:
According to the web link to the metadata of these four datasets, what is the
horizontal datum name? (and the one you used to fill in the table)
Question 14: Is the
North American Datum of 1983 (NAD83) a local or global (a.k.a. geocentric)
datum?
Question 15: Is the
North American Datum of 1927 (NAD27) a local or global datum?
So, what’s the big deal, and why do I care that these are different? First, have you heard that line about assuming? “Assume makes an xxx out of “u” and “me.” In this case, it really could, because, if I had to do analysis between two datasets with two different datums, the GIS would locate x and y coordinates according to their respective datums, which will take the same place in space and map it in two different places…..NOT GOOD! Around here, the difference between the NAD27 and NAD83 datums is around 30 meters. Don’t worry, there are ways to convert one datum to another (although not perfectly), and I’ll show you that later.
The other issue deals specifically with the spatial reference metadata. Even if you import the xml document of metadata, the spatial reference information will not overwrite what is on the “spatial tab” (it is actually kept in a separate file). Sometimes this file does not exist, or sometimes it’s incorrect. You will fix this problem. Most of the datasets you’ll use will have spatial reference information stored in a .prj file, or embedded with the data (like ArcINFO coverages or ArcINFO GRIDS), however most census data available on the web do not have the files that make spatial reference easy. Yes, county boundaries and census blocks do, but the other two do not. The way to deal with this is to “define the projection” and create those .prj files.
By the way, if you want to see help on this tool use the Show Help button in the lower right hand corner of a tool window. If it’s already open it will say Hide Help. Note that you can get even more information from help by clicking on the Help icon above the help window. To learn software, and especially GIS, help documentation is usually VERY useful. On the job, I use Help all the time to learn more about GIS stuff.
I would not suggest checking the box that says to close that window automatically upon completion, because when problem solving, we use all the information we can get, and the window reveals what the software is doing.
Converting DEMs
The 4 digital elevation models need to be converted to something we can use, similar to the way you dealt with the interchange files…and actually these DEM’s are also only ASCII text files. First, we have to make sure we have the correct Extension “on.” The Extensions, are supplemental software to ArcGIS…and yes, they usually cost extra money. In this case, you will use Spatial Analyst, the GIS software used to deal with Raster GIS. You will convert the text file (“.dem”) to an ArcINFO GRID data format. The GRID format is very similar to the ArcINFO coverage format with a three-folder system.
Question 16: What is
the Grid Coordinate System Name? What are the planar distance units? What is
the resolution (with the correct distance units)?
Question 17: What is
the Altitude Datum Name? What is the Altitude Resolution? What are the Altitude
Distance Units?
Question 18: Is the vertical
datum used here for z-values (elevation) based mostly on an ellipsoid or a
geoid?
Before we go further, there is another small problem. What were the units of vertical distance? Let’s check that using ArcCatalog.
I don’t know about you, but I seriously doubt that these are decimeters (1/10 of a meter). I know for example that Pompey Hill is around 1700 ft of elevation. Converted to decimeters that would be well over 5100 decimeters. The number I looked at near that area was just over 500 units. Aha! The units are not decimeters anymore; they are actually meters. The software converted them. Good to know, eh? That could really screw this stuff up….by a factor of 10. If I wanted to keep decimeters as the unit of elevation, I would simply change the z-factor to 10.
One more issue before we leave the DEMS. According to the metadata, USGS quad sheet contours were used to create this data. Generally, the accuracy of USGS contours are +/- half of the contour interval, so in most cases between 2.5 to 10 feet depending upon the contour interval for the individual quad map used.
Question 19: Given
the known error discussed above, would it be wise to claim that an analysis
using this data was accurate within a decimeter?
Evaluation of Competing
Data
You downloaded a few hydrography (water) datasets that you better take a look at before we decide which one to use. You downloaded two different data sets of streams and rivers. One was the Hydrography (Census 2000) tgr36067lkH.shp, and the others were the Large Scale Hydrography Network and Surface i.e. the imported/coverted s4140201 and n4140201 .e00 files (whatever you named them after importing them). First, take a look at the table that you’ve been filling-in based on the metadata.
Question 20: What is
the scale of the Census Hydrography data? What is the scale of the Large Scale
Hydrography data?
Question 21: Which
one of those has the larger scale?
Question 22: Based on
scale alone, which data set would you keep if you wanted greater accuracy?
Now, take a look at the two data sets simultaneously in ArcMap to see how they compare.
By the way, because of ArcMap’s projection on the fly feature, you were not prompted about a problem with spatial reference because if you did the define projection step as directed, both datasets have the same datum, NAD83, even though they have different units of distance (large scale is UTM meters and census is Decimal Degrees). The units of distance and coordinates in the Map or Data View of ArcMap (look at the lower left) will default to the first dataset loaded.
Question 23: Are UTM
meters a projected or geographic coordinate system?
Question 24: Are
Decimal Degrees a projected or geographic coordinate system?
With some minor exceptions, the two different data set have
comparable amounts and locations of the water features within
The U.S. Department of Commerce (Census) is accountable for one dataset, and the New York State Department of Environmental Conservation put together the large-scale hydrography. Based on the scale issue from above, and now the agency, the theme of this data (water features), and the most local governmental jurisdiction, the Large-Scale Hydrography would be a better fit for our purposes in this GIS project.
Fitting the Data to
the Geographic Scope
Now it’s time to manipulate these datasets so that they fit our geographic area of focus and reduce our memory needs…and you thought this lab should be over by now, eh? Again, I’m really not doing this to torture you, but to give you a thorough understanding of how the concepts that we’ll talk about in lecture apply to actually doing GIS work. After you survive this semester, you will have no excuse for not anticipating the amount of time and money that you may have spend to get GIS work done in the future, especially the second step of the GIS process and generally the most expensive step, Data Management and Acquisition.
While a lot of this work will only apply within the boundary
of the city of
First some terminology: when you put together smaller vector data sets to make a bigger one, you’ll merge them; when you put together smaller raster data sets to make a bigger one you’ll mosaic them; finally when you want to make smaller vector or raster datasets from larger ones you’ll clip them, or you may export them based on selection of specific features of interest.
Mosaic’ing the DEMs.
Merging and Clipping (VECTOR)
the Hydrography
Clipping (RASTER) the
Classified Land Cover
Exporting Selected
Features
Clipping (RASTER) the
DEM with adjustment for the correct Spatial Reference
This one will be a little different because we want this to be fairly accurate, and I want to show you how to change projection and convert a datum.
The Syracuse DEM has a projected coordinate system of UTM
meters in Zone 18 based on the NAD27 datum (shorthand for North American Datum
of 1927). You’re going to change the spatial reference of the
While you have ArcMap open, play with some the ArcMap features for symbology at this point. There are a couple different ways to see through the SYRUTM27. Double click on the color below the name of the Layer SYRUTM27. The symbol selector window will open. On the right under options, you can use the drop down arrow and select no color. Now make the outline color thicker by changing the outline width to 1, and change the outline color to black. Change the color of the raster data (the clipped DEM) by clicking once on the color and changing the color ramp. The other way to change the ability to “see through” a layer is to double click on the name of the layer to get the properties window, selecting the display tab, and changing the transparency percentage.
Use the icons in ArcCatalog to help you fill-in the field for raster or vector data in the table.
Don’t worry about saving your project, and you can close ArcCatalog.
If you have to continue filling-in the table, spatial model, or answer sheet don’t do the next step.
I will give you data sets to start with on the next lab, so you don’t have to worry about saving these. Please delete your work off the hard drive.
Whew! There are some minor things to do now to prep for the next exercise, but you’ll do them then.
Name _______________________________________ Lab Section ________
ERE 450/550 Lab Exercise 2 Answer Sheet – Lab due 9/29 at 5pm
Hours spent doing the lab_______________ Score_____/44
Question 1: What is
the “Geographic coordinate system name:” under “Horizontal coordinate system?”
(Put your answer on the answer sheet) (1)
Question 2: What are
the Geographic Coordinate Units? Are
these units of measurement planar (Cartesian, rectangular, or projected) or
spherical (angular, geographic, or geodetic)? (2)
Question 3: Look
under Geodetic Model. “What is the Horizontal Datum Name?” What is the
“Ellipsoid Name?” Is an ellipsoid a spheroid? Is a spheroid an ellipsoid? (4)
Question 4: What is a
datum? (You may use the very simple one to two word definition.) (1)
Question 5: What are
the four data sets within this ArcINFO coverage? (4)
Question 6: What are
the five data sets within this ArcINFO coverage? (5)
Question 7: How many
shapefiles are shown in ArcCatalog? (1)
Question 8: How many
files (other than the zip files) are there in the CENSUS folder shown in
Windows Explorer? (1)
Question 9: Look for
the two files that have the .prj extension (you may have to change your Windows
Explorer View to see Details)? What are they (names)? This will be important to
understand the next part of the exercise. (2)
Question 10: What is
the geographic coordinate system name for each of the shapefiles? (Provide file
name and the coordinate system) (4)
Question 11: What are
the Horizontal Datum Name and Ellipsoid Name for GCS_North_American_1983? (Use
the Details link) (2)
Question 12: What are
the Horizontal Datum Name and Ellipsoid Name for GCS_Assumed_Geographic_1? (2)
Question 13:
According to the web link to the metadata of these four datasets, what is the
horizontal datum name? (and the one you used to fill in the table, right?) (1)
Question 14: Is the
North American Datum of 1983 (NAD83) a local or global (a.k.a. geocentric)
datum? (1)
Question 15: Is the
North American Datum of 1927 (NAD27) a local or global datum? (1)
Question 16: What is
the Grid Coordinate System Name and zone? What are the planar distance units?
What is the resolution (with the correct distance units)? (3)
Question 17: What is
the Altitude Datum Name? What is the Altitude Resolution (with the correct
distance units)? What are the Altitude Distance Units? (3)
Question 18: Is the
vertical datum used here based on an ellipsoid, or a geoid? (1)
Question 19: Given
the known error discussed above, would it be wise to claim that an analysis
using this data was accurate within a decimeter? (1)
Question 20: What is
the scale of the Census Hydrography data? What is the scale of the Large Scale
Hydrography data? (2)
Question 21: Which
one of those has the greater scale?(1)
Question 22: Based on
scale alone, which data set would you keep if you wanted greater accuracy? (1)
Question 23: Are UTM
meters a projected or geographic coordinate system? (1)
Question 24: Are
Decimal Degrees a projected or geographic coordinate system?(1)
NAME__________________________________________ Lab Section_______________ Score __/ 7

Name
________________________________________ Lab Section __________________ Score___
/ 56