Lesson 4 Practice Exercises

Lesson 4 Practice Exercises jed124 Fri, 11/08/2019 - 16:51

These practice exercises will give you some more experience applying the Lesson 4 concepts. They are designed to prepare you for some of the techniques you'll need to use in your Project 4 script.

Download the data for the practice exercises

Both exercises involve opening a file and parsing text. In Practice Exercise A, you'll read some coordinate points and make a polygon from those points. In Practice Exercise B, you'll work with dictionaries to manage information that you parse from the text file.

Example solutions are provided for both practice exercises. You'll get the most value out of the exercises if you make your best attempt to complete them on your own before looking at the solutions. In any case, the patterns shown in the solution code can help you approach Project 4.

Lesson 4 Practice Exercise A

Lesson 4 Practice Exercise A jed124 Fri, 11/08/2019 - 16:54

This practice exercise is designed to give you some experience writing geometries to a shapefile. You have been provided two things:

A text file MysteryStatePoints.txt containing the coordinates of a state boundary.
An empty polygon shapefile that uses a geographic coordinate system.

The objective

Your job is to write a script that reads the text file and creates a state boundary polygon out of the coordinates. When you successfully complete this exercise, you should be able to preview the shapefile in Pro and see the state boundary.

Tips

If you're up for the challenge of this script, go ahead and start coding. But if you're not sure how to get started, here are some tips:

In this script, there is no header line to read. You are allowed to use the values of 0 and 1 directly in your code to refer to the longitude and latitude, respectively. This should actually make the file easier to process.
Before you start looping through the coordinates, create an empty list (or Array object) to hold all the (x,y) tuples (or Point objects) of your polygon.
Loop through the coordinates and create a (x,y) tuple (or Point object) from each coordinate pair. Then add the tuple (or Point object) to your list (or Array object).
Once you're done looping, create an insert cursor on your shapefile. Go to the first row and assign your list (or Array) to the SHAPE field.

Lesson 4 Practice Exercise A Solution

Lesson 4 Practice Exercise A Solution jed124 Sat, 05/23/2020 - 11:39

Here's one way you could approach Lesson 4 Practice Exercise A with comments to explain what is going on. If you find a more efficient way to code a solution, please share it through the discussion forums.

# Reads coordinates from a text file and writes a polygon

import arcpy
import csv

arcpy.env.workspace = r"C:\PSU\geog485\L4\Lesson4PracticeExerciseA"

shapefile = "MysteryState.shp"
pointFilePath = arcpy.env.workspace + r"\MysteryStatePoints.txt"

# Open the file and read the first (and only) line
with open(pointFilePath, "r") as pointFile:
    csvReader = csv.reader(pointFile)

    # This list will hold a clockwise "ring" of coordinate pairs
    #  that will form a polygon 
    ptList = []

    # Loop through each coordinate pair 
    for coords in csvReader:
        # Append coords to list as a tuple
        ptList.append((float(coords[0]),float(coords[1])))

    # Create an insert cursor and apply the Polygon to a new row
    with arcpy.da.InsertCursor(shapefile, ("SHAPE@")) as cursor:
        cursor.insertRow((ptList,))

Alternatively, an arcpy Array containing a sequence of Point objects could be passed to the insertRow() method rather than a list of coordinate pairs. Here is how that approach might look:

# Reads coordinates from a text file and writes a polygon

import arcpy
import csv

arcpy.env.workspace = r"C:\PSU\geog485\L4\Lesson4PracticeExerciseA"

shapefile = "MysteryState.shp"
pointFilePath = arcpy.env.workspace + r"\MysteryStatePoints.txt"
spatialRef = arcpy.Describe(shapefile).spatialReference

# Open the file 
with open(pointFilePath, "r") as pointFile:
    csvReader = csv.reader(pointFile)

    # This Array object will hold a clockwise "ring" of Point
    #  objects, thereby making a polygon.
    polygonArray = arcpy.Array()

    # Loop through each coordinate pair and make a Point object
    for coords in csvReader:

        # Create a point, assigning the X and Y values from your list    
        currentPoint = arcpy.Point(float(coords[0]),float(coords[1]))

        # Add the newly-created Point to your Array    
        polygonArray.add(currentPoint)

    # Create a Polygon from your Array
    polygon = arcpy.Polygon(polygonArray, spatialRef)

    # Create an insert cursor and apply the Polygon to a new row
    with arcpy.da.InsertCursor(shapefile, ("SHAPE@")) as cursor:
        cursor.insertRow((polygon,))

Below is a video offering some line-by-line commentary on the structure of these solutions. Note that the scripts shown in the video don't use the float() function as shown in the solutions above. It's likely you can run the script successfully without float(), but we've found that on some systems the coordinates are incorrectly treated as strings unless explicitly cast as floats.

Video: Solution to Lesson 4 Practice Exercise A (11:02)

Click here for transcript of the Solution to Lesson 4, Practice Exercise A.

This video walks through the solution to Lesson 4, practice exercise A where you have a text file with some coordinates that looks like this. And you need to loop through them, parse them out, and then create a polygon out of it to create the shape of a US state.

This is a very basic list of coordinates. There's no header here. So, we're just going to blast right through this as if it were a csv file with two columns and no header.

In order to do this, we're going to use arcpy, which we import in line 3, and the csv module for Python, which we import in line 4. In lines 8 and 9, we set up some variables referencing the files that we're going to work with. So, we're working with the mystery state shapefile that was provided for you.

This shapefile is in the WGS84 geographic coordinate system. However, there is no shape or polygon within this shapefile yet. You will add that. And then, line 9 references the path to the text file containing all the coordinate points.

In line 12, we open up that text file that has the points. The first parameter in the open method is pointFilePath, which we created in line 9. And then the second parameter is r in quotes. And that means we're opening it in read mode. We're just going to read this file. We're not going to write anything into it.

And in line 13, we then pass that file into the reader() method so that we can create this csvReader object that will allow us to iterate through all the rows in the file in an easy fashion. Before we start doing that though, in line 17, we create an empty list which we’ll use to store the coordinate pairs we read in from the text file.

We know, in this case, we're just going to create one geometry. It's going to be one polygon created from one list of coordinates. So, it's OK to create that list before we start looping. And we, actually, don't want to create it inside the loop because we would be creating multiple lists, and we just need one of them in this case.

In line 20, we actually start looping through the rows in the file. Each row is going to be represented here by a variable called coords. Inside the loop, we’re going to get the X coordinate and the Y coordinate and append them to the list as a tuple. It’s very important that the first value in this tuple be the X and the second value the Y.

Now, I think a lot of us typically say “latitude/longitude” when talking about coordinates, with the word latitude coming first. It’s important to remember that latitude is the Y value and longitude is the X. So in defining this tuple, we want to make sure we’re supplying the longitude first, then the latitude.

Looking at our text file, we see that the first column is the longitude and the second column is the latitude. (We know that because latitudes don’t go over 90.) So the coordinates are in the correct X/Y order, we just need to keep them that way.

Back to our script, as we iterate through the CSV Reader object, remember that on each iteration, the Reader object simply gives us a list of the values from the current line of the file, which we’re storing in the coords variable. So if we want the longitude value, that’ll be the item from the list at position 0. If we want the latitude, that’ll be the item at position 1. So, we use coords[0] to get the longitude and coords[1] to get the latitude.

Now, going back to our file, you'll see that in the first column or column 0, that's the longitude. So, we have coords[0] in our code. And then, the next piece is the latitude. So, we use coords[1] for that.

So we’re only doing one thing in this loop: appending a new X/Y coordinate pair to the list on each iteration.

Now, line 25 is the first statement after the loop. We know that because that line of code is not indented. At this point, we’ve got the coordinates of the mystery state in a list, so we’re ready to add a polygon to the shapefile. To add new features to a feature class, we need to use an InsertCursor. And that’s what we’re doing there on line 25: opening up a new insert cursor.

That cursor needs a couple of parameters in order to get started. It needs the feature class that we're going to modify, which is referenced by the variable shapefile. Remember, we created that back in line 8.

And then, the second parameter we need to supply a tuple of fields that we're going to modify. In our case, we're only going to modify the geometry. We're not going to modify any attributes, so the way that we supply this tuple is it just has one item, or token, SHAPE with the @ sign.

And then, our final line 26 actually inserts the row. And it inserts just the list of points we populated in our loop. Because the shapefile was created to hold polygons, a polygon geometry will be created from our list of points. When this is done, we should be able to go into ArcGIS Pro and verify that we indeed have a polygon inside of that shapefile.

One little quirky thing to note about this code is in the last line, where I supplied the tuple of values corresponding to the tuple of fields that was associated with the cursor. When you’re supplying just a single value as we were here, it’s necessary to follow that value up with a comma. If my cursor was defined with more than one field, say 2 fields, then my values tuple would not need to end in a comma like this. It’s only when specifying just a single value that the values tuple needs this trailing comma.

This is a good place to remind you that you’re not limited to setting the value of a single field. For example, if there were a Name field in this shapefile, and we happened to know the name of the state we were adding, we could add Name to the fields tuple and then when doing insertRow() we could supply the name of the state we’re adding in addition to the point list.

Now, the script we just walked through uses the approach to geometry creation that’s recommended to use whenever possible because it offers the best performance. That approach works when you’re dealing with simple, single-part features and the coordinates are in the same spatial reference as the feature class you’re putting the geometry into. And when you’re working with ArcGIS Desktop 10.6 or later or ArcGIS Pro 2.1 or later because that’s when support for this approach was added.

So for example, this approach would *not* work for adding the state of Hawaii as a single feature since that would require inserting a multi-part polygon. It also wouldn’t work if our empty shapefile was in, say, the State Plane Coordinate System rather than WGS84 geographic coordinates.

In those cases, you’d have no choice but to use this alternate solution that I'm going to talk through now.

In this version of the script, note that in line 10, we get the spatial reference of the MysteryState shapefile using the Describe method. We’ll go on to use this SpatialReference object later in the script when we create the Polygon object. We didn’t bother to get the spatial reference of the shapefile in the other version of the script because that method of creating geometries doesn’t support supplying a spatial reference.

The next difference in this second script comes on line 18. Instead of creating a new empty Python list, we create an arcpy Array object.

Then inside the loop, instead of putting the coordinates into a tuple and appending the tuple to the list, we create an arcpy Point object, again in X/Y order. Then, we use the Array’s add() method to add the Point object to the Array.

After the loop is complete and we’ve populated the Array, we’re ready to create an object of the Polygon class. The Polygon constructor requires that we specify an Array of Points, which is in our polygonArray variable. Optionally, we can supply a SpatialReference object, which we’re doing here with our spatialRef variable.

Supplying that spatial reference doesn’t really have any effect in this instance because our coordinates and feature class are in the same spatial reference, but you could see the difference between these two if you created an empty feature class that was defined to use a different spatial reference, say one of the State Plane coordinate systems for New Mexico. Running this version of the script would result in the data being properly aligned because we’re telling arcpy how our data is projected, so it can re-project it behind the scenes into the State Plane coordinates. With the other version of the script, the data can't be properly aligned because that version doesn’t specify a coordinate system.

In any case, this version of the script finishes up by passing the Polygon object to the insertRow() method rather than a list of coordinate pairs.

So, those are two ways you can solve this exercise.

Credit: S. Quinn and J. Detwiler © Penn State is licensed under CC BY-NC-SA 4.0.

Lesson 4 Practice Exercise B

Lesson 4 Practice Exercise B jed124 Mon, 11/11/2019 - 12:39

This practice exercise does not do any geoprocessing or GIS, but it will help you get some experience working with functions and dictionaries. The latter will be especially helpful as you work on Project 4.

The objective

You've been given a text file of (completely fabricated) soccer scores from some of the most popular teams in Buenos Aires. Write a script that reads through the scores and prints each team name, followed by the maximum number of goals that team scored in a game, for example:

River: 5
Racing: 4
etc.

Keep in mind that the maximum number of goals scored might have come during a loss.

You are encouraged to use dictionaries to complete this exercise. This is probably the most efficient way to solve the problem. You'll also be able to write at least one function that will cut down on repeated code.

I have purposefully kept this text file short to make things simple to debug. This is an excellent exercise in using the debugger, especially to watch your dictionary as you step through each line of code.

This file is space-delimited, therefore, you must explicitly set up the CSV reader to use a space as the delimiter instead of the default comma. The syntax is as follows:

csvReader = csv.reader(scoresFile, delimiter=" ")

Tips

If you want a challenge, go ahead and start coding. Otherwise, here are some tips that can help you get started:

Create variables for all your items of interest, including winner, winnerGoals, loser, and loserGoals and assign them appropriate values based on what you parsed out of the line of text. Because there are often ties in soccer, the term "winner" in this context means the first score given and the term "loser" means the second score given.
Review section 4.17 on dictionaries in the Zandbergen text. You want to make a dictionary that has a key for each team, and an associated value that represents the team's maximum number of goals. If you looked at the dictionary in the debugger it would look like {'River': '5', 'Racing': '4', etc.}
You can write a function that takes in three things: the key (team name), the number of goals, and the dictionary name. This function should then check if the key has an entry in the dictionary. If not, a key should be added and its value set to the current number of goals. If a key is found, you should perform a check to see if the current number of goals is higher than the value associated with that key. If so, you should set a new value. Notice how many "ifs" appear in the preceding sentences.

Lesson 4 Practice Exercise B Solution

Lesson 4 Practice Exercise B Solution jed124 Mon, 11/11/2019 - 12:40

This practice exercise is a little trickier than previous exercises. If you were not able to code a solution, study the following solution carefully and make sure you know the purpose of each line of code.

The code below refers to the "winner" and "loser" of each game. This really refers to the first score given and the second score given, in the case of a tie.

# Reads through a text file of soccer (football)
#  scores and reports the highest number of goals
#  in one game for each team

# ***** DEFINE FUNCTIONS *****

# This function checks if the number of goals scored
#  is higher than the team's previous max.
def checkGoals(team, goals, dictionary):
    #Check if the team has a key in the dictionary
    if team in dictionary:
        # If a key was found, check goals against team's current max
        if goals > dictionary[team]:
            dictionary[team] = goals
        else:
            pass
    # If no key found, add one with current number of goals
    else:
        dictionary[team] = goals

# ***** BEGIN SCRIPT BODY *****

import csv

# Open the text file of scores
scoresFilePath = "C:\\Users\\jed124\\Documents\\geog485\\Lesson4\\Lesson4PracticeExercises\\Lesson4PracticeExerciseB\\Scores.txt"
with open(scoresFilePath) as scoresFile:
    # Read the header line and get the important field indices
    csvReader = csv.reader(scoresFile, delimiter=" ")
    header = next(csvReader)
    
    winnerIndex = header.index("Winner")
    winnerGoalsIndex = header.index("WG")
    loserIndex = header.index("Loser")
    loserGoalsIndex = header.index("LG")
    
    # Create an empty dictionary. Each key will be a team name.
    #  Each value will be the maximum number of goals for that team.
    maxGoalsDictionary = {}
    
    for row in csvReader:
    
        # Create variables for all items of interest in the line of text    
        winner = row[winnerIndex]
        winnerGoals = int(row[winnerGoalsIndex])
        loser = row[loserIndex]
        loserGoals = int(row[loserGoalsIndex])
        
        # Check the winning number of goals against the team's max
        checkGoals(winner, winnerGoals, maxGoalsDictionary)
        
        # Also check the losing number of goals against the team's max    
        checkGoals(loser, loserGoals, maxGoalsDictionary)

    # Print the results
    for key in maxGoalsDictionary:
        print (key + ": " + str(maxGoalsDictionary[key]))

Below is a video offering some line-by-line commentary on the structure of this solution.

Video: Solution to Lesson 4 Practice Exercise B (13:03)

Click here for transcript of Solution to Lesson 4 Practice Exercise B.

This video describes one possible solution for Lesson 4 practice exercise B where you are reading the names of soccer teams in Buenos Aires, looking at their scores, and then compiling a report about the top number of goals scored by each team over the course of the games covered by the file.

In order to maintain all this information, it's helpful to use a dictionary, which is a way of storing information in the computer's memory based on key-value pairs. So, the key in this case will be the name of the team, and the value will be the maximum number of goals found for that team as we read the file line by line. Now, this file is sort of like a comma separated value file, though the delimiter in this case is a space. The file does have a header, which we'll use to pull out information.

The header is organized in terms of winner, winner goals, loser, and loser goals. Although, really, there are some ties in here. So, we might say first score and second score rather than winner and loser. For our purposes, it doesn't matter who won or lost because the maximum number of goals might have come during a loss or a tie.

So, this solution is a little more complex than some of the other practice exercises. It involves a function which you can see beginning in line 9, but I'm not going to describe the function just yet. I'll wait until we get to the point where we need the logic that's in that function. So, I'll start explaining this solution by going to line 23, where we import the Python CSV module.

Now, there's nothing in this script that uses ArcGIS or arcpy geometries or anything like that, so I don't import arcpy at all. But you will do that in Project 4, where you'll use a combination of the techniques used here along with ArcGIS geometries and really put everything together from both the practice exercises. In line 26, we set up a variable representing the path to the scores text file. And in line 27, we actually open the file. By default, I'm opening it here in read mode. The opening mode parameter is not specifically supplied here.

In line 29, we create the CSV reader object. You should be familiar with this from the other examples in the lesson and the other practice exercise. One thing that's different here is, as a second parameter, we can specify the delimiter using this type of syntax-- delimiter equals, and then the space character.

Again, this file does have a header. So in line 30, we'll read the header, and then we figure out the index positions of all of the columns in the file. That's what's going on in lines 32 through 35.

Now, we know that these columns are in a particular order. But writing it in this way where we use the header.index method makes the script a little more flexible in case the column order had been shifted around by somebody, which could easily happen if somebody had previously opened this file in a spreadsheet program and moved things around.

In line 39, we're going to create a blank dictionary to keep track of each team and the maximum number of goals they've scored. We'll refer to that dictionary frequently as we read through the file.

In line 41, we begin a loop that actually starts reading data in the file below the header. And so, lines 44 through 47 are pulling out those four pieces of information-- basically, the two team names and the number of goals that each scored. Note that the int() function is used to convert the number of goals in string format to integers. Now, when we get a team name and a number of goals, we need to check it against our dictionary to see if the number of goals scored is greater than that team's maximum that we’ve encountered so far. And we need to do this check for both the winner and the loser-- or in other words, the first team and the second team listed in the row.

To avoid repeating code, this is a good case for a function. Because we're going to use the same logic for both pieces so why not write the code just once in a function? So, in lines 50 and 53, you'll see that I'm invoking a function called checkGoals. And I pass in three things. I pass in the team name, I pass in the number of goals, and the dictionary.

This function is defined up here in line 9. Line 9 defines a function called checkGoals, and I create variables here for those three things that the function needs to do its job -- the team, the number of goals, and the dictionary. Line 11 performs a check to see if the team already has a key in the dictionary. If it does, then we need to look at the number of goals that have been stored for the team and check it against the score that was passed into the function to see if that maximum number of goals needs to be updated.

So in line 13, that check is occurring. And if indeed the number of goals passed into the function is greater than the maximum that we’ve run into so far, then in line 14, we update the team’s entry in the dictionary, setting it equal to what was passed into the goals variable. If the number of goals passed into the function is not greater than our maximum, then we don't want to do anything. So that's what's in line 16 where it says pass. The pass is just a keyword that means don't do anything here. And really, we could eliminate the else clause altogether if we wanted to.

Now, if the team has never been read before, and it doesn't have an entry in the dictionary, then we're going to jump down to line 18. We're going to add the team to the dictionary, and we're going to set its max goals to the number passed into the goals variable. No need to check against another number, since there’s nothing in the dictionary yet for that team.

Now, what we can do to get a better feel for how this script works is to run it with the debugging tools. So I’ve inserted a breakpoint on the first line inside the checkGoals function, and I'm going to run the script up until the first time this function gets called. And I can click on the Variables window here to follow along with what’s happening to my variables.

I can see that there are global and local variables. The local variables are the ones defined within the checkGoals function since that's where the script execution is currently paused. If I look at the globals list, I can find the row variable, which holds the list of values from the first line of the file. The variables I want to focus on though are the ones defined as part of the function, the locals -- the dictionary, goals, and team variables.

So on this first call to the function we've passed in Boca as the team and 2 as the number of goals. And right now, there's nothing in our dictionary. So when evaluating line 11, we’d expect PyScripter to jump down to line 19 because the team does not have a key in the dictionary yet. So I'll see what happens by clicking on the Step button here. And we see that we do in fact jump down to line 19. And if I step again, I will be adding that team to the dictionary. I can check on that; note that it's jumped out of the function because that was the last line of the function. It's jumped back to the main body of the script, which is another call to the checkGoals function, this time for the losing team. While we're paused here, if I scroll down in the list of locals I can find the maxGoalsDictionary and I can see that it now has an entry for Boca with a value of 2.

If I hit Resume again, it will run to the breakpoint again. So now it's dealing with data for the losing team from that first match -- Independiente with 1 goal. Again, because Independiente doesn't have a key in the dictionary, we would expect the script to jump down to line 19, and indeed that is what happens. So when I hit Step again, it's going to add an entry to the dictionary for that second team.

I’m going to hit the Resume button a couple more times for that second game, adding two more teams to the dictionary. Now I’m going to pause as we’ve hit a line where we have teams that we have encountered before. On this current call to the function, the team is River, and they scored 2 goals in this particular match. Looking at the dictionary, we can see that River's current maximum is 0. So, in this case we would expect on line 11 to find that yes, the team is already in the dictionary and so we'd expect it to jump down to line 13 instead of line 19. And that's what happens. So now we're going to check -- Is the value in the goals variable greater than their current maximum, which is 0. And indeed it is, so we’ll execute line 14, updating the dictionary with a new maximum goals for that team.

And we could continue stepping through the script, but hopefully, you get the feel for how the logic plays out now.

So as you're working on Project 4, and you're working with dictionaries in this manner, it's a good idea to keep the debugger open and watch what's happening, and you should be able to tell if your dictionary is being updated in the way that you expect.

So to finish out this script, once we have our dictionary all built, after this for loop is finished, then we're going to loop through the dictionary and print each key and each value. And that can be done using a simple for loop, like in line 56. In this case, the variable key represents a key in the dictionary. And in line 57, we print out that key, we print a colon and a space, and then we print the associated value with that key. If you want to pull a value out of a dictionary, you use square brackets, and you pass in the key name. And so, that's what we're doing there. So running this all the way through should produce a printout in the Python Interpreter of the different teams, as well as the maximum number of goals found for each.