GEOG 485: GIS Programming and Software Development

GEOG 485: GIS Programming and Software Development mxw142

Quick Facts

Overview

Bill Gates is credited with saying he would "hire a lazy person to do a difficult job" with the justification that "a lazy person will find an easy way to do it." GEOG 485 doesn't teach the lazy way to get the job done, but it does teach the scripting way - which is arguably even better. You've probably heard the "give a fish"/"teach to fish" saying? That's the gist of GEOG 485: to equip you, in an ArcGIS context, with the ModelBuilder and Python scripting skills to make your repetitive geoprocessing tasks easier, quicker and automatic, so you can focus on the more interesting (and potentially more valuable) work that you (and your employers) really want you to be doing.

The course focuses on solving geographic problems by modifying and automating generic Geographic Information System (GIS) software through programming. In GEOG 485, students use the Python programming language to write and modify scripts that add functionality to desktop GIS tools and to automate geospatial analysis processes. No previous programming experience is assumed. Core topics covered in this class include object-oriented programming, component object model technologies, object model diagrams, loops, if-then constructs, and modular code design, and situates these topics in the geospatial workflow through their integration with maps, layers, spatial data tables, and spatial analysis methods.

Students who successfully complete the course can automate repetitive GIS tasks, customize GIS interfaces, and share their geospatial software development work with others. 

GEOG 485 is a required course in Penn State's Postbaccalaureate Certificate in Geospatial Programming and Web Map Development and one of several electives students may choose as their final course leading to Penn State's Postbaccalaureate Certificate in Geographic Information Systems. It can also be applied toward the Penn State Geospatial Intelligence Graduate Certificate and Master of Geographic Information Systems degree.

Learn more about GEOG 485, GIS Programming and Software Development (1 min, 22 sec)

JIM DETWILER: Hi there, and welcome to GIS Programming and Automation. I'm Jim Detwiler, one of a small team of instructors who teaches this course. This has always been one of our most popular courses, probably because who doesn't want to learn how to automate some of the tedious, repetitive tasks that they have to deal with at work. This course requires no programming experience and I think we've done a pretty good job of meeting the needs of folks that are novice programmers, while at the same time, still having something good to offer for people that do have some experience.

What you're going to learn in this course is how to do Python programming in the context of Esri's desktop GIS software. Though it will set you up well for using it in other contexts, such as web application development or doing GIS automation in an open source context, for example, QGIS. My hope is that by the end of the course, you will be ready to apply what you've learned immediately at work, to automate some of the workflows that you have there. With that, thank you for watching. If you have any questions at all, please don't hesitate to contact me.

Credit: © Penn State is licensed under CC BY-NC-SA 4.0

Want to join us? Students who register for this Penn State course gain access to assignments and instructor feedback and earn academic credit. For more information, visit Penn State's Online Geospatial Education Program website. Official course descriptions and curricular details can be reviewed in the University Bulletin.

This course is offered as part of the Repository of Open and Affordable Materials at Penn State. You are welcome to use and reuse materials that appear on this site (other than those copyrighted by others) subject to the licensing agreement linked to the bottom of this and every page.

Lesson 1: Introduction to GIS modeling and Python

Lesson 1: Introduction to GIS modeling and Python jed124

The links below provide an outline of the material for this lesson. Be sure to carefully read through the entire lesson before returning to Canvas to submit your assignments.

Lesson 1 Overview

Lesson 1 Overview jed124

Welcome to Geography 485. Over the next ten weeks, you'll work through four lessons and a final project dealing with ArcGIS automation in Python. Each lesson will contain readings, examples, and projects. Since the lessons are two weeks long, you should plan between 20 - 30 hours of work to complete them, although this number may vary depending on your prior programming experience. See the Course Schedule section of this syllabus, below, for a schedule of the lessons and course projects.

As with GEOG 483 and GEOG 484, the lessons in this course are project-based with key concepts embedded within. However, because of the nature of computer programming, there is no way this course can follow the step-by-step instructional design of the previous courses. You will probably find the course to be more challenging than our courses on GIS fundamentals. For that reason, it is more important than ever that you stay on schedule and take advantage of the course message boards and private email. It's quite likely that you will get stuck somewhere during the course, so before getting hopelessly frustrated, please seek help from me or your classmates!

I hope that by now that you have reviewed our Orientation and Syllabus for an important course site overview. Before we begin our first project, let me share some important information about the textbook and a related Esri course.

Textbook and Readings

The textbook for this course is Python Scripting for ArcGIS Pro by Paul A. Zandbergen. As you read through Zandbergen's book, you'll see material that closely parallels what is in the Geog 485 lessons. This isn't necessarily a bad thing; when you are learning a subject like programming, it can be helpful to have the same concept explained from two angles.

My advice about the readings is this: Read the material on the Geog 485 lesson pages first. If you feel like you have a good understanding from the lesson pages, you can skim through some of the more lengthy Zandbergen readings. If you struggled with understanding the lesson pages, you should pay close attention to the Zandbergen readings and try some of the related code snippets and exercises. I suggest you plan about 1 - 2 hours per week of reading if you are going to study the chapters in detail.

In all cases, you should get a copy of the textbook because it is a relevant and helpful reference.

Note:

The Zandbergen textbook is up to the 3rd Edition as of Summer 2024. The free copy of the book available through the PSU library is the 2nd Edition.  Differences between the two editions are relatively minor and you may assume that the section numbers referenced here in the lessons are applicable to both editions unless otherwise noted.

You may see that in Esri's documentation, shapefiles are also referred to as "feature classes." When you see the term "feature class," consider it to mean a vector dataset that can be used in ArcGIS.

Esri Virtual Campus Course Python for Everyone

There is a free Esri Virtual Campus course, Python for Everyone, that introduces a lot of the same things you'll learn this term in Geog 485. Python for Everyone consists of a series of short videos and exercises, some of which might help toward the projects. If you want to get a head start, or you want some reinforcement of what we're learning from a different point of view, it would be worth your time to complete that Virtual Campus course.

All you need in order to access the course is an Esri Global Account, which you can create for free. You do not need to obtain an access code from Penn State.

The course moves through ideas very quickly and covers a range of concepts that we'll spend 10 weeks studying in depth, so don't worry if you don't understand it all immediately or if it seems overwhelming. You might find it helpful to quickly review the course again near the end of Geog 485 to review what you've learned.

Questions?

If you have any questions now or at any point during this week, please feel free to post them to the Lesson 1 Discussion Forum. (To access the forums, return to Canvas via the Canvas link. Once in Canvas, you can navigate to the Modules tab, and then scroll to the Lesson 1 Discussion Forum.) While you are there, feel free to post your own responses if you are able to help a classmate.

Now, let's begin Lesson 1.

Lesson 1 Checklist

Lesson 1 Checklist jed124

This lesson is two weeks in length. (See the Calendar in Canvas for specific due dates.) To finish this lesson, you must complete the activities listed below. You may find it useful to print this page so that you can follow along with the directions.

  1. Download the Lesson 1 data and extract it to C:\PSU\Geog485 or your own desired path.
  2. Work through the online sections of Lesson 1.
  3. Read the assigned pages from the Zandbergen textbook. In the online lesson pages, I have inserted instructions on the sections to read from Zandbergen when it is most appropriate to read them.
  4. Complete Project 1, Part I and submit the deliverables to the course drop box in Canvas.
  5. Complete Project 1, Part II and submit the deliverables to the course drop box in Canvas.
  6. Complete the Lesson 1 Quiz in Canvas.

Do items 1 - 3 (including any of the practice exercises you want to attempt) during the first week of the lesson. You will need the second week to concentrate on the project and quiz.

Lesson Objectives

By the end of this lesson, you should:

  • be able to create automated workflows in ArcGIS Pro ModelBuilder;
  • be familiar with the PyScripter development environment;
  • know some of Python’s most important basic data types and how to use variables in Python scripts;
  • be familiar with the basic concepts of object oriented programming (e.g. classes & inheritance);
  • be able to use arcpy tool functions to achieve basic spatial analysis tasks; and
  • know how to create a simple script tool and pass a parameter to a script.

1.1.1 The need for GIS automation

1.1.1 The need for GIS automation jed124

A geographic information system (GIS) can manipulate and analyze spatial datasets with the purpose of solving geographic problems. GIS analysts perform all kinds of operations on data to make it useful for solving a focused problem. This includes clipping, reprojecting, buffering, merging, mosaicking, extracting subsets of the data, and hundreds of other operations. In the ArcGIS software used in this course, these operations are known as geoprocessing and they are performed using tools.

Successful GIS analysis requires selecting the most appropriate tools to operate on your data. ArcGIS uses a toolbox metaphor to organize its suite of tools. You pick the tools you need and run them in the proper order to make your finished product.

Suppose you’re responsible for selecting sites for a chain restaurant. You might use one tool to select land parcels along a major thoroughfare, another tool to select parcels no smaller than 0.25 acres, and other tools for other selection criteria. If this selection process were limited to a small area, it would probably make sense to perform the work manually.

However, let’s suppose you’re responsible for carrying out the same analysis for several areas around the country. Because this scenario involves running the same sequence of tools for several areas, it is one that lends itself well to automation. There are several major benefits to automating tasks like this:

  • Automation makes work easier. Once you automate a process, you don't have to put in as much effort remembering which tools to use or the proper sequence in which they should be run.
  • Automation makes work faster. A computer can open and execute tools in sequence much faster than you can accomplish the same task by pointing and clicking.
  • Automation makes work more accurate. Any time you perform a manual task on a computer, there is a chance for error. The chance multiplies with the number and complexity of the steps in your analysis, as well as the fatigue incurred by repeating the task many times. In contrast, once an automated task is configured, a computer can be trusted to perform the same sequence of steps every time for a potentially endless number of cycles.

The ArcGIS platform provides several ways for users to automate their geoprocessing tasks. These options differ in the amount of skill required to produce the automated solution and in the range of scenarios that each can address. The text below touches briefly on these automation options, in order from requiring the least coding skill to the most.

The first option is to construct a model using ModelBuilder. ModelBuilder is an interactive program that allows the user to “chain” tools together, using the output of one tool as input in another. Perhaps the most attractive feature of ModelBuilder is that users can automate rather complex GIS workflows without the need for programming. You will learn how to use ModelBuilder early in this course.

Some automation tasks require greater flexibility than is offered by ModelBuilder, and for these scenarios it's recommended that you write short computer programs, or scripts. The bulk of this course is concerned with script writing.

A script typically executes some sequential procedure of steps. Within a script, you can run GIS tools individually or chain them together. You can insert conditional logic in your script to handle cases where different tools should be run depending on the output of the previous operation. You can also include iteration, or loops, in a script to repeat a single action as many times as needed to accomplish a task.

Although ArcGIS supports various scripting languages for working with its tools, Esri emphasizes Python in its documentation and includes Python with the ArcGIS Pro and ArcGIS Server installations. In this course, we’ll be working strictly with Python for this reason, as well as the fact that Python can be used for many other file and data manipulation tasks outside of ArcGIS. You’ll learn the basics of the Python language, how to write a script, and how to manipulate and analyze GIS data using scripts. Finally, you’ll apply your new Python knowledge to a final project, where you write a script of your choosing that you may be able to apply directly to your work.

A more recently developed automation option available on the ArcGIS platform is the ArcGIS for Python API (Application Programming Interface). The ArcGIS API is written to work with the web based architecture of the ArcGIS Enterprise and other web based data. The API provides programmatic administration to ArcGIS Enterprise, as well as methods to consume and perform analyses from web-based GIS data, integrating popular data science packages such as pandas, numpy and Jupyter Notebook environments. The use of the Python API in a Jupyter Notebook environment is a topic in our Advanced Python class, GEOG 489

For geoprocessing tasks that require support for user interaction with the map or other UI elements, the ArcGIS Pro SDK (Software Development Kit) offers the ability to add custom tools to the Pro interface. The Pro SDK requires programming in the .NET framework using a .NET language such as Visual Basic .NET or C#. Working with this SDK's object model provides greater flexibility in terms of what can be built, as compared to writing Python scripts around their geoprocessing framework. The tradeoff is a higher level of complexity involved in the coding.

Finally, developers who want to create their own custom GIS applications, typically focused on delivering much narrower functionality than the one-size-fits-all ArcGIS Pro, can develop apps using the ArcGIS Maps SDKs (previously called Runtime SDKs). The Maps SDKs make it possible to author apps for Windows, Mac, or Linux desktop machines, as well as for iOS and Android mobile devices, again involving a greater level of effort than your typical Python geoprocessing script. In the past, there was a native version for MacOS but that has been retired. Instead, programmers can use the Maps SDK for Java version to develop for MacOS.

This first lesson will introduce you to concepts in both model building and script writing. We’ll start by just getting familiar with how tools run in ArcGIS and how you can use those tools in the ModelBuilder interface. Then, we’ll cover some of the basics of Python and see how the tools can be run within scripts.

1.2.1 Exploring the toolbox

1.2.1 Exploring the toolbox jed124

The ArcGIS software that you use in this course contains hundreds of tools that you can use to manipulate and analyze GIS data. Back before ArcGIS had a graphical user interface (GUI), people would access these tools by typing commands. Nowadays, you can point and click your way through a whole hierarchy of toolboxes using the Catalog window in ArcGIS Pro.

Although you may have seen them before, let’s take a quick look at the toolboxes:

  1. Open a new or existing project in ArcGIS Pro.
  2. If the Geoprocessing pane isn't visible, click the Analysis tab, then click Tools.
  3. In the Geoprocessing pane, click the Toolboxes tab heading. Notice that the tools are organized into toolboxes and toolsets. For example, the IDW and Spline tools can be found in the Interpolation toolset within the Spatial Analyst toolbox. Sometimes it’s faster to use the Find Tools box at the top of the Geoprocessing tab to find the tool you need instead of browsing this tree.
  4. Let’s examine a tool. Expand Analysis Tools > Proximity > Buffer, and double-click the Buffer tool to open it.

    You've probably seen this tool in past courses, but this time, really pay attention to the components that make up the user interface. Specifically, you’re looking at a dialog with many fields. Each geoprocessing tool has required inputs and outputs. Those are indicated by the red asterisks. They represent the minimum amount of information you need to supply in order to run a tool. For the Buffer tool, you’re required to supply an input features location (the features that will be buffered) and a buffer distance. You’re also required to indicate an output feature class location (for the new buffered features).

    Many tools also have optional parameters. You can modify these if you want, but if you don’t supply them, the tool will still run using default values. For the Buffer tool, optional parameters are the Side Type, End Type, Method, and Dissolve Type. Optional parameters are typically specified after required parameters.

  5. Hover your mouse over any of the tool parameters. You should see a blue "info" icon to the left of the parameter. Moving your mouse over that icon will show a brief description of the parameter in a pop-out window.

    If you’re not sure what a parameter means, this is a good way to learn. For example, viewing the pop-out documentation for the End Type parameter will show you an explanation of what this parameter means and list the two options: Round and Flat.

    If you need even more help, each tool is more expansively documented in the ArcGIS Pro web-based help system. You can access a tool's documentation in this system by clicking on the blue ? icon in the upper-right of the tool dialog, which will open the help page in your default web browser.

  6. Open the web-based help page for the Buffer tool. As mentioned, the help page will provide extensive documentation of the tool's usage. In the upper right of the help page, you should see a list of links that are shortcuts to various sections of the tool's documentation.
  7. Click the Parameters link to jump to that section. There you should find that the information is divided between two tabs (Dialog and Python), with the Dialog tab displayed by default. The Dialog tab focuses on providing help to those who are executing the tool through the ArcGIS Pro GUI, while the Python tab focuses on providing help to those who are invoking it through a Python script.
  8. Click on the Python tab, since that's our focus in this class. You should see the name, a description, and the data type associated with each of the tool's parameters. That information is certainly helpful, but often even more helpful is the Code sample section that appears below the parameter list. Every tool's help page has these programming examples showing ways to run the tool from within a script, and they will be extremely valuable to you as you complete the assignments in this course.

1.2.2 Environments for accessing tools

1.2.2 Environments for accessing tools jed124

You can access ArcGIS geoprocessing tools in several different ways:

  • Sometimes you just need to run a tool once, or you want to experiment with a tool and its parameters. In this case, you can open the tool directly from the Geoprocessing pane and use the tool’s graphical user interface (GUI, pronounced gooey) to fill in the parameters.
  • ModelBuilder is also a GUI application where you can set up tools to run in a given sequence, using the output of one tool as input to another tool.
  • If you’re familiar with the tool and want to use it quickly in Pro, you may prefer the Python window approach. You type the tool name and required parameters into a command window. You can use this window to run several tools in a row and declare variables, effectively doing simple scripting.
  • If you want to run the tool automatically, repeatedly, or as part of a greater logical sequence of tools, you can run it from a script. Running a tool from a script is the most powerful and flexible option.

We’ll start with the simplest of these cases, running a tool from its GUI, and work our way up to scripting.

1.2.3 Running a tool from its GUI

1.2.3 Running a tool from its GUI jed124

Let’s start by opening a tool from the Catalog pane and running it using its graphical user interface (GUI).

  1. If, by chance, you still have the Buffer tool open from the previous section, close it for now so you can add some data.
  2. Download the Lesson 1 data and extract Lesson1.gdb.zip into your C:\PSU\Geog485 (or other desired) folder.  The zip file should extract to a file geodatabase called Lesson1.gdb, which contains all of the data used in Lesson 1.
  3. Open a new project in Pro with a new empty map if you don't have one open already.
  4. Add the us_boundaries and us_cities feature classes from the USA feature dataset to your map.
  5. Open the Geoprocessing pane if necessary and browse to the Buffer tool as you did in the previous section.
  6. Double-click the Buffer tool to open it.
  7. Examine the first required parameter: Input Features. Click the Browse button Browse Button and browse to the path of your cities dataset C:\PSU\Geog485\Lesson1.gdb\USA\us_cities. Notice that once you do this, a name is automatically supplied for the Output Feature Class (and the output path is your project's default workspace). The software does this for your convenience only, and you can change the name/path if you want.

    A more convenient way to supply the Input Features is to just select the cities map layer from the dropdown menu. This dropdown automatically contains all the layers in your map. However, in this example, we browsed to the path of the data because it’s conceptually similar to how we’ll provide the paths in the command line and scripting environments.

  8. Now you need to supply the Distance parameter for the buffer. For this run of the tool, set a Linear unit of 5 Statute Miles. When we run the tool from the other environments, we’ll make the buffer distance slightly larger, so we know that we got distinct outputs.
  9. The rest of the parameters are optional. The Side Type parameter applies only to lines and polygons, so it is not even available for setting in the GUI environment when working with city points. However, change the Dissolve Type to Dissolve all output features into a single feature. This combines overlapping buffers into a single polygon.
  10. Leave the Method set to Planar. We're not going to take the earth's curvature into account for this simple example.
  11. Click Run to execute the tool.
  12. The tool should take just a few seconds to complete. Examine the output that appears on the map, and do a “sanity check” to make sure that buffers appear around the cities, and they appear to be about 5 miles in radius. You may need to zoom in to a single state in order to see the buffers.
  13. Click on Pro's Analysis tab, then on History (in the Geoprocessing button group). This will open the History tab in Pro's Catalog pane, which lists messages about successes or failures of all recent tools that you've run.
  14. Hover over the Buffer tool entry in this list to see a pop-out window. This window lists the tool parameters, the time of completion, and any problems that occurred when running the tool (see Figure 1.1). These messages can be a big help later when you troubleshoot your Python scripts. The text of these messages is available whether you run the tool from the GUI, from the Python window in Pro, or from scripts.

Buffer tool results
Figure 1.1 Screen capture showing the Buffer tool and all messages.

1.2.4 Modeling with tools

1.2.4 Modeling with tools jed124

When you work with geoprocessing, you’ll frequently want to use the output of one tool as the input into another tool. For example, suppose you want to find all fire hydrants within 200 meters of a building. You would first buffer the building, then use the output buffer as a spatial constraint for selecting fire hydrants. The output from the Buffer tool would be used as an input to the Select by Location tool.

A set of tools chained together in this way is called a model. Models can be simple, consisting of just a few tools, or complex, consisting of many tools and parameters and occasionally some iterative logic. Whether big or small, the benefit of a model is that it solves a unique geographic problem that cannot be addressed by one of the “out-of-the-box” tools.

In ArcGIS, modeling can be done either through the ModelBuilder graphical user interface (GUI) or through code, using Python. To keep our terms clear, we’ll refer to anything built in ModelBuilder as a “model” and anything built through Python as a “script.” However, it’s important to remember that both things are doing modeling.

1.3.1 Why learn ModelBuilder?

1.3.1 Why learn ModelBuilder? jed124

ModelBuilder is Esri’s graphical interface for making models. You can drag and drop tools from the Catalog pane into the model and “connect” them, specifying the order in which they should run.

Although this is primarily a programming course, we’ll spend some time in ModelBuilder during the first lesson for two reasons:

  • ModelBuilder is a nice environment for exploring the ArcGIS tools, learning how tool inputs and outputs are used, and visually understanding how GIS modeling works. When you begin using Python, you will not have the same visual assistance to see how the tools you’re using are connected, but you may still want to draw your model on a whiteboard in a similar fashion to what you saw in ModelBuilder.
  • ModelBuilder can frequently reduce the amount of Python coding that you need to do. If your GIS problem does not require advanced conditional and iterative logic, you may be able to get your work done in ModelBuilder without writing a script. ModelBuilder also allows you to export any model to Python code, so if you get stuck implementing some tools within a script, it may be helpful to make a simple working model in ModelBuilder, then export it to Python to see how ArcGIS would construct the code. (Exporting a complex model is not recommended for beginners due to the verbose amount of code that ModelBuilder tends to create when exporting Python).

1.3.2 Opening and exploring ModelBuilder

1.3.2 Opening and exploring ModelBuilder jed124

Let’s get some practice with ModelBuilder to solve a real scenario. Suppose you are working on a site selection problem where you need to select all areas that fall within 10 miles of a major highway and 10 miles of a major city. The selected area cannot lie in the ocean or outside the United States. Solving the problem requires that you make buffers around both the roads and the cities, intersect the buffers, then clip to the US outline. Instead of manually opening the Buffer tool twice, followed by the Intersect tool, then the Clip tool, you can set this up in ModelBuilder to run as one process.

  1. Create a new Pro project called Lesson1Practice in the C:\PSU\Geog485 folder.
  2. Add the us_cities, us_roads, and us_boundaries feature classes from the Lesson1 file geodatabase to a new map.
  3. In the Catalog pane, right-click your Lesson1Practice toolbox and click New > Model. You’ll see ModelBuilder appear in the middle of the Pro window.
  4. Under the ModelBuilder tab, click the Properties button.
  5. For the Name, type SuitableLand and for the Label, type Find Suitable Land. The label is what everyone will see when they open your tool from the Catalog. That’s why it can contain spaces. The name is what people will use if they ever run your model from Python. That’s why it cannot contain spaces.
  6. Click OK to dismiss the model Properties dialog.

    You now have a blank canvas on which you can drag and drop the tools. When creating a model (and when writing Python scripts), it’s best to break your problem into manageable pieces. The simple site selection problem here can be thought of as four steps:

    • Buffer the cities
    • Buffer the roads
    • Intersect the buffers
    • Clip to the US boundary

    Let’s tackle these items one at a time, starting with buffering the cities.

  7. With ModelBuilder still open, go to the Catalog pane > Geoprocessing tab and browse to Analysis Tools > Proximity.
  8. Click the Buffer tool and drag it onto the ModelBuilder canvas. You’ll see a gray rectangular box representing the buffer tool and a gray oval representing the output buffers. These are connected with a line, showing that the Buffer tool will always produce an output data set.

    In ModelBuilder, tools are represented with boxes and variables are represented with ovals. Right now, the Buffer tool, at center, is gray because you have not yet supplied the required parameters. Once you do this, the tool and the variable will fill in with color.

  9. In your ModelBuilder window, double-click the Buffer box. The tool dialog here is the same as if you had opened the Buffer directly from the Catalog pane. This is where you can supply parameters for the tool.
  10. For Input Features, browse to the path of your us_cities feature class on disk. The Output Feature Class will populate automatically.
  11. For Distance [value or field], enter 10 Statute Miles.
  12. For Dissolve Type, select Dissolve all output features..., then click OK to close the Buffer dialog. The model elements (tools and variables) should be filled in with color, and you should see a new element to the left of the tool representing the input us_cities feature class.
  13. An important part of working with ModelBuilder is supplying clear labels for all the elements. This way, if you share your model, others can easily understand what will happen when it runs. Supplying clear labels also helps you remember what the model does, especially if you haven’t worked with the model for a while.

    In ModelBuilder, right-click the us_cities element (blue oval, at far left) and click Rename. Name this element "US Cities."

  14. Right-click the Buffer tool (yellow-orange box, at center) and click Rename. Name this “Buffer the cities.”
  15. Right-click the buffer output element (green oval, at far right) and click Rename. Name this “Buffered cities.” Your model should look like this.

    blue oval labeled US Cities to a yellow-orange box labeled Buffer the cities to a green oval labeled Buffered Cities
    Figure 1.2 The model's appearance following step 15, above.
  16. Save your model (ModelBuilder tab > Save). This is the kind of activity where you want to save often.
  17. Practice what you just learned by adding another Buffer tool to your model. This time, configure the tool so that it buffers the us_roads feature class by 10 miles. Remember to set the Dissolve type to Dissolve all output features... and to add meaningful labels. Your model should now look like this.

    blue oval labeled US Roads to a yellow box labeled buffer the roads to a green oval labeled Buffered roads under similar diagram about cities
    Figure 1.3 The model's appearance following step 17, above.
  18. The next task is to intersect the buffers. In the Catalog pane's list of toolboxes, browse to Analysis Tools > Overlay and drag the Intersect tool onto your model. Position it to the right of your existing Buffer tools.
  19. Here’s the pivotal moment when you chain the tools together, setting the outputs of your Buffer tools as the inputs of the Intersect tool. Click the Buffered cities element and drag over to the Intersect element. If you see a small menu appear, click Input Features to denote that the buffered cities will act as inputs to the Intersect tool. An arrow will now point from the Buffered cities element to the Intersect element.
  20. Use the same process to connect the Buffered roads to the Intersect element. Again, if prompted, click Input Features.
  21. Rename the output of the Intersect operation "Intersected buffers." If the text runs onto multiple lines, you can click and drag the edges of the element to resize it. You can also rearrange the elements on the page however you like. Because models can get large, ModelBuilder contains several navigation buttons for zooming in and zooming to the full extent of the model in the View button group on the ribbon. Your model should now look like this:

    Buffered cities and buffered roads intersect at a yellow box which leads to a green oval labeled intersected buffers
    Figure 1.4 The model's appearance following step 21, above.
  22. The final step is to clip the intersected buffers to the outline of the United States. This prevents any of the selected area from falling outside the country or in the ocean. In the Catalog pane, browse to Analysis Tools > Extract and drag the Clip tool into ModelBuilder. Position this tool to the right of your existing tools.
  23. As you did above, set the Intersected buffers as an input to the Clip tool, choosing Input Features when prompted. Notice that even when you do this, the Clip tool is not ready to run (it’s still shown as a gray rectangle, located at right). You need to supply the clip features, which is the shape to which the buffers will be clipped.
  24. In ModelBuilder (not in the Catalog pane), double-click the Clip tool. Set the Clip Features by browsing to the path of us_boundaries, then click OK to dismiss the dialog. You’ll notice that a blue oval appeared, representing the Clip Features (US Boundaries).
  25. Set meaningful labels for the remaining tools as shown below. Below is an example of how you can label and arrange the model elements.

    Green oval (intersected buffers) & a blue oval (US boundaries) lead to a yellow box labeled Clip which leads to a green oval (Suitable Land)
    Figure 1.5 The completed model with the clip tool included.
  26. Double-click the final output element (named "Suitable Land" in the image above) and set the path to C:\PSU\Geog485\Lesson1.gdb\USA\suitable_land. This is where you can expect your model output feature class to be written to disk.
  27. Right-click Suitable Land and click Add to display.
  28. Save your model again.
  29. Test the model by clicking the Run button.ModelBuilder run button You’ll see the by-now-familiar geoprocessing message window that will report any errors that may occur. ModelBuilder also gives you a visual cue of which tool is running by turning the tool red. (If the model crashes, try closing ModelBuilder and running the model by double-clicking it from the Catalog pane. You'll get a message that the model has no parameters. This is okay [and true, as you'll learn below]. Go ahead and run the model anyway.)
  30. When the model has finished running (it may take a while), examine the output on the map. Zoom into Washington state to verify that the has Clip worked on the coastal areas. The output should look similar to this.

    The model output in ArcMap. Suitable land is often along water ways and has major roads through them
    Figure 1.6 The model's output in ArcGIS Pro.

That’s it! You’ve just used ModelBuilder to chain together several tools and solve a GIS problem.

You can double-click this model anytime in the Catalog pane and run it just as you would a tool. If you do this, you’ll notice that the model has no parameters; you can’t change the buffer distance or input features. The truth is, our model is useful for solving this particular site-selection problem with these particular datasets, but it’s not very flexible. In the next section of the lesson, we’ll make this model more versatile by configuring some of the variables as input and output parameters.

1.3.3 Model parameters

1.3.3 Model parameters jed124

Most tools, models, and scripts that you create with ArcGIS have parameters. Input parameters are values with which the tool (or model or script) starts its work, and output parameters represent what the tool gives you after its work is finished.

A tool, model, or script without parameters is only good in one scenario. Consider the model you just built that used the Buffer, Intersect, and Clip tools. This model was hard-coded to use the us_cities, us_roads, and us_boundaries feature classes and output a feature class called suitable_land. In other words, if you wanted to run the model with other datasets, you would have to open ModelBuilder, double-click each element (US Cities, US Roads, US Boundaries, and Suitable Land), and change the paths that were written directly into the model. You would have to follow a similar process if you wanted to change the buffer distances, too, since those were hard-coded to 10 miles.

Let’s modify that model to use some parameters, so that you can easily run it with different datasets and buffer distances.

  1. If it's not open already, open the project C:\PSU\Geog485\Lesson1Practice.aprx in ArcGIS Pro.
  2. In the Catalog window, find the model you created earlier in the lesson, which should be under Toolboxes > Lesson1Practice.atbx > Find Suitable Land.
  3. Right-click the model Find Suitable Land and click Copy. Now, right-click the Lesson1Practice toolbox and click Paste. This creates a new copy of your model that you can work with to create model parameters. Using a copy of the model like this allows you to easily start over if you make a mistake.
  4. Rename the copy of your model Find Suitable Land With Parameters or something similar.
  5. In your Lesson 1 toolbox, right-click Find Suitable Land With Parameters and click Edit. You'll see the model appear in ModelBuilder.
  6. Right-click the element US Cities (should be a blue oval) and click Parameter. This means that whoever runs the model must specify the cities dataset to use before the model can run.
  7. You need a more general name for this parameter now, so right-click the US Cities element and click Rename. Change the name to just "Cities."
  8. Even though you "parameterized" the cities, your model still defaults to using the C:\PSU\Geog485\Lesson1.gdb\USA\us_cities dataset. This isn't going to make much sense if you share your model or toolbox with other people because they may not have the same us_cities feature class, and even if they do, it probably won't be sitting at the same path on their machines.

    To remove the default dataset, double-click the Cities element and delete the path, then click OK. Some of the elements in your model may turn gray. This signifies that a value has to be provided before the model can successfully run.

  9. Now you need to create a parameter for the distance of the buffer to be created around the cities. Right-click the element that you named "Buffer the cities" and click Create Variable > From Parameter > Distance [value or field].
  10. The previous step created a new element Distance [value or field]. Rename this element to "Cities buffer distance" and make it a model parameter. (Review the steps above if you're unsure about how to rename an element or make it a model parameter.) For this element, you can leave the default at 10 miles. Your model should look similar to this, although the title bar of your window may vary:
    see long description

    A gray oval labeled Cities and a light blue oval labeled Cities Buffer Distance lead to a gray box labeled Buffer the cities, which leads to a gray oval labeled buffered cities.

    A blue oval labeled US roads leads to a yellow box labeled Buffer the roads, which leads to a green oval labeled Buffered roads.

    The gray oval labeled Buffered cities and the green oval labeled Buffered roads both lead to a gray box labeled Intersect. This gray box leads to a gray oval labeled Intersected buffers.

    The gray oval labeled Intersected buffers and a blue oval labeled US Boundaries lead to a gray box labeled Clip.

    The gray box labeled Clip leads to a gray oval labeled Suitable land.

  11. Repeating what you learned above, rename the US Roads element to "Roads," make it a model parameter, and remove the default value.
  12. Repeating what you learned above, make a parameter for the Roads buffer distance. Leave the default at 10 miles.
  13. Repeating what you learned above, rename the US Boundaries element to Boundaries, make it a model parameter, and remove the default value. Your model should look like this (notice the five parameters indicated by "P"s):
    see long description

    A gray oval labeled Cities and a light blue oval labeled Cities buffer distance lead to a gray box labeled Buffer the cities which leads to a gray oval labeled Buffered cities.

    A gray oval labeled Roads and a light blue oval labeled Roads buffer distance both lead to a gray box labeled Buffer the roads, which leads to a gray oval labeled Buffered roads.

    The gray oval labeled Buffered cities and the gray oval labeled Buffered roads both lead to a gray box labeled Intersect. This gray box leads to a gray oval labeled Intersected buffers.

    The gray oval labeled Intersected buffers and a white oval labeled Boundaries lead to a white box labeled Clip.

    The gray box labeled Clip leads to a gray oval labeled Suitable land.

  14. Save and close your model.
  15. Double-click your model Lesson 1 > Find Suitable Land With Parameters and examine the tool dialog. It should look similar to this:

    Screen capture to show the model interface with parameters.
    Figure 1.9 The model interface, or tool dialog, for the model "Find Suitable Land With Parameters."

    People who run this model will be able to browse to any cities, roads, and boundaries datasets, and will be able to control the buffer distance. The red asterisks indicate parameters that must be supplied with valid values before the model can run.

  16. Test your model by supplying the us_cities, us_roads, and us_boundaries feature classes for the model parameters. If you like, you can try changing the buffer distance.

The above exercise demonstrated how you can expose values as parameters using ModelBuilder. You need to decide which values you want the user to be able to change and designate those as parameters. When you write Python scripts, you'll also need to identify and expose parameters in a similar way.

1.3.4 Advanced geoprocessing and ModelBuilder concepts

1.3.4 Advanced geoprocessing and ModelBuilder concepts jed124

By now, you've had some practice with ModelBuilder, and you're about ready to get started with Python. This page of the lesson contains some optional advanced material that you can read about ModelBuilder. This is particularly helpful if you anticipate using ModelBuilder frequently in your employment. Some of the items are common to the ArcGIS geoprocessing framework, meaning that they also apply when writing Python scripts with ArcGIS.

Managing intermediate data

GIS analysis sometimes gets messy. Most of the tools that you run produce an output dataset, and when you chain many tools together, those datasets start piling up on disk. Esri has programmed ModelBuilder's default behavior such that when a model is run from a GUI interface, all datasets besides the final output -- referred to as intermediate data -- are automatically deleted. If, on the other hand, the model is run from ModelBuilder, intermediate datasets are left in their specified locations.

When running a model on another file system, specifying paths as we did above can be problematic, since the folder structure is not likely to be the same. This is where the concept of the scratch geodatabase (or scratch folder for file-based data like shapefiles) environment variable can come in handy. A scratch geodatabase is one that is guaranteed to exist on all ArcGIS installations. Unless the user has changed it, the scratch geodatabase will be found at C:\Users\<user>\Documents\ArcGIS\scratch.gdb on Windows 7/8. You can specify that a tool write to the scratch geodatabase by using the %scratchgdb% variable in the path. For example, %scratchgdb%\myOutput.

The following topics from Esri go into more detail on intermediate data and are important to understand as you work with the geoprocessing framework. I suggest reading them once now and returning to them occasionally throughout the course. Some of the concepts in them are easier to understand once you've worked with geoprocessing for a while.

Looping in ModelBuilder

Looping, or iteration, is the act of repeating a process. A main benefit of computers is their ability to quickly repeat tasks that would otherwise be mundane, cumbersome, or error-prone for a human to repeat and record. Looping is a key concept in computer programming, and you will use it often as you write Python scripts for this course.

ModelBuilder contains a number of elements called Iterators that can do looping in various ways. The names of these iterators, such as For and While actually mimic the types of looping that you can program in Python and other languages. In this course, we'll focus on learning iteration in Python, which may actually be just as easy as learning how to use a ModelBuilder iterator.

To take a peek at how iteration works in ModelBuilder, you can visit the ArcGIS Pro ModeBuilder help book for model iteration. If you're having trouble understanding looping in later lessons, ModelBuilder might be a good environment to visualize what a loop does. You can come back and visit this book as needed.

Readings

Read Zandbergen Chapter 3.1 - 3.6, and 3.8 to reinforce what you learned about geoprocessing and ModelBuilder.

1.4.1 Introducing Python Using the Python Window in ArcGIS

1.4.1 Introducing Python Using the Python Window in ArcGIS jed124

The best way to introduce Python may be to look at a little bit of code. Let’s take the Buffer tool which you recently ran from the Geoprocessing pane and run it in the ArcGIS Python window. This window allows you to type a simple series of Python commands without writing full permanent scripts. The Python Window is a great way to get a taste of Python.

This time, we’ll make buffers of 15 miles around the cities.

  1. Open a new empty map in ArcGIS Pro.
  2. Add the us_cities dataset from the Lesson 1 data.
  3. Under the View tab, click the Python window button Python window button (you can also access this from the Analysis tab but there you have to make sure to use the drop-down arrow to open the Python window rather than a Python Notebook).
  4. Type the following in the Python window (Don't type the >>>. These are just included to show you where the new lines begin in the Python window.)

    >>> import arcpy
    >>> arcpy.Buffer_analysis("us_cities", "us_cities_buffered", "15 miles", "", "", "ALL")
    
  5. Zoom in and confirm that the buffers were created.

You’ve just run your first bit of Python. You don’t have to understand everything about the code you wrote in this window, but here are a few important things to note.

The first line of the script -- import arcpy -- tells the Python interpreter (which was installed when you installed ArcGIS) that you’re going to work with some special scripting functions and tools included with ArcGIS. Without this line of code, Python knows nothing about ArcGIS, so you'll put it at the top of all ArcGIS-related code that you write in this class. You technically don't need this line when you work with the Python window in ArcGIS Pro because arcpy is already imported, but I wanted to show you this pattern early; you'll use it in all the scripts you write outside the Python window.

The second line of the script actually runs the tool. You can type arcpy, plus a dot, plus any tool name to run a tool in Python. Notice here that you also put an underscore followed by the name of the toolbox that includes the Buffer tool. This is necessary because some tools in different toolboxes actually have the same name (like Clip, which is a tool for clipping vectors in the Analysis toolbox or tool for clipping rasters in the Data Management toolbox).

After you typed arcpy.Buffer_analysis, you typed all the parameters for the tool. Each parameter was separated by a comma, and the whole list of parameters was enclosed in parentheses. Get used to this pattern, since you'll follow it with every tool you run in this course.

In this code, we also supplied some optional parameters, leaving empty quotes where we wanted to take the default values, and truncating the parameter list at the final optional parameter we wanted to set.

How do you know the syntax, or structure, of the parameters to enter? For example, for the buffer distance, should you enter 15MILES, ‘15MILES’, 15 Miles, or ’15 Miles’? The best way to answer questions like these is to return to the Geoprocessing tool reference help topic for the Buffer tool. All of the topics in this reference section have Usage and Code Sample sections to help you understand how to structure the parameters. Optional parameters are enclosed in braces, while the required parameters are not. From the example in this topic, you can see that the buffer distance should be specified as ’15 miles’. Because there is a space in this text, or string, you need to surround it with single quotes.

You might have noticed that the Python window helps you by popping up different options you can type for each parameter. This is called autocompletion, and it can be very helpful if you're trying to run a tool for the first time, and you don't know exactly how to type the parameters. You might have also noticed that some code pops up like (Buffer() analysis and Buffer3D() 3d) when you were typing in the function name. You can use your up/down arrows to highlight the alternatives. If you selected Buffer() analysis, it will appear in your Python window.

Please note that if you do you use the code completion your code will sometimes look slightly different - esri reorganized how the functions are arranged within arcpy, they work the same they're just in a slightly different place. The "old" way still works though so you might see inconsistencies in this class, online forums, esri's documentation etc. So, in this example, arcpy.Buffer_analysis(...) has changed to arcpy.analysis.Buffer(....) reflecting that the Buffer tool is located within the Analysis toolbox in Pro.

There are a couple of differences between writing code in the Python window and writing code in some other program, such as Notepad or PyScripter. In the Python window, you can reference layers in the map document by their names only, instead of their file paths. Thus, we were able to type "us_cities" instead of something like "C:\\data.gdb\\us_cities". We were also able to make up the name of a new layer "us_cities_buffered" and get it added to the map by default after the code ran. If you're going to use your code outside the Python window, make sure you use the full paths.

When you write more complex scripts, it will be helpful to use an integrated development environment (IDE), meaning a program specifically designed to help you write and test Python code. Later in this course, we’ll explore the PyScripter IDE.

Earlier in this lesson, you saw how tools can be chained together to solve a problem using ModelBuilder. The same can be done in Python, but it’s going to take a little groundwork to get to that point. For this reason, we’ll spend the rest of Lesson 1 covering some of the basics of Python.

Readings

Zandbergen covers the Python window and some things you can do with it in Chapter 2.

2nd Edition: 2.7, and 2.9-2.13

3rd Edition: 2.8, and 2.10-2.14

1.4.2 What is Python?

1.4.2 What is Python? jed124

Python is a language that is used to automate computing tasks through programs called scripts. In the introduction to this lesson, you learned that automation makes work easier, faster, and more accurate. This applies to GIS and many other areas of computer science. Learning Python will make you a more effective GIS analyst, but Python programming is a technical skill that can be beneficial to you even outside the field of GIS.

Python is a good language for beginning programming. Python is a high-level language, meaning you don’t have to understand the “nuts and bolts” of how computers work in order to use it. Python syntax (how the code statements are constructed) is relatively simple to read and understand. Finally, Python requires very little overhead to get a program up and running.

Python is an open-source language, and there is no fee to use it or deploy programs with it. Python can run on Windows, Linux, Unix, and Mac operating systems with very little change in syntax.

In ArcGIS, Python can be used for coarse-grained programming, meaning that you can use it to easily run geoprocessing tools such as the Buffer tool that we just worked with. You could code all the buffer logic yourself, using more detailed, fine-grained programming with the ArcGIS Pro SDK, but this would be time consuming and unnecessary in most scenarios; it’s easier just to call the Buffer tool from a Python script using one line of code.

In addition to the Esri help which describes all of the parameters of a function and how to access them from Python, you can also get Python syntax (the structure of the language) for a tool like this :

  1. Run the tool interactively (e.g., Buffer) from the Geoprocessing pane with your input data, output data and any other relevant parameters (e.g., distance to buffer).
  2. Beneath the tool parameters, you should see a notification of the completed running of the tool. Right-click on that notification. (Alternatively, you can click the History button under the Analysis tab to obtain a list of run tools.)
  3. Pick "Copy Python command."
  4. Paste the code into the Python code window or PyScripter (see next section) or to see how you would code the same operation you just ran in Pro in Python.

1.4.3 Installing PyScripter

1.4.3 Installing PyScripter jed124

PyScripter is an easy IDE to install for ArcGIS Pro development. ArcGIS Pro 3.0 changed the way that Pro manages the environments and gives more control to the user. These steps below are written using Pro 3.1 as a reference. If you have an earlier version of Pro, these steps are similar in process but the output destination (step 4.1) will be set for you. To do this, follow these steps below and please let the instructor know if you run into any trouble.

  1. In the ArcGIS Pro Settings window (often the window that opens when you start Pro), click on Package Manager on the left side to open Pro's Python Package manager.
  2. Next to the Active Environment near the top right, click on the gear to open a window listing the Python environments.
  3. Find the environment named arcgispro-py3 (Default) and click on the symbol that looks like two pages or click on the ... to expand a menu to select clone.
  4. A window will pop up to allow you to name the new environment. The default destination path that Pro automatically sets is ok.
    1. The destination should be something like C:\Program Files\ArcGIS\Pro\bin\Python\envs\arcgispro-py3-clone, or C:\Users\<username>\AppData\Local\ESRI\conda\envs\arcgispro-py3-clone. Make a note of this path (you can highlight the path and ctrl+c to copy it to your clipboard) because you will need it for PyScripter.
  5. Wait while it clones the environment.
    1. If the cloning fails with an error message saying that a Python package couldn't be installed, you may need to run ArcPro as an Administrator (do a right-click -> Run as administrator on the ArcGIS Pro icon) and repeat the steps above (it's also helpful to mouseover that error box in Pro and see if it gives you any additional details).
  6. When the cloning is done, the new environment "arcgispro-py3-clone" (or whatever you choose to call it - but we'll be assuming it's called the default name) can be activated by clicking on the ... and selecting Activate.
  7. You may need to restart Pro to get this change to take effect, and you may see a message telling you to restart. If not, then click the ok button to get back to Pro.

Now perform the following steps to install PyScripter:

  1. Download the installer from PyScripter's soureceforge download page, or you can download the install exe from the course files (64bit version or 32bit version). If you have problems with one version, you can try the other version. Let the Instructor know if you have any problems during the installation.
  2. Follow the steps to install it, checking the additional shortcuts if you want those features. Create a desktop shortcut does what it says it does. A Quick Launch shortcut is placed in the Task Bar, and the Add 'Edit' with PyScripter' to File Explorer context menu adds the option to use PyScripter in the menu that appears when you right click on a file.
  3. Click the Install button and follow the prompts. Check the Launch PyScripter box and click Finish.
  4. Once PyScripter opens, we need to point it to the cloned environment we made. To do this, find the Python 3.x (64-bit) on the bottom bar. Your Python might be different, and that is ok for now. Click on the Python 3.x text and it will open a window listing all of the Python environments it found. If your clone is not listed, you need to click on the Add a new Python version button (gear with a plus) and navigate to where it was created. You can get this path by referring to the Package Manager in Pro. Once you get to the parent folder of the environment, select it and it will add it to the list of Unregistered Versions.
  5. Double click on the environment to activate it. If you get an Abort Error, click ok to close the prompt.
  6. Verify that your environment is active. It will have a large arrow next to it. Close the Python Versions window. Note that the below image is pointing to the default ArcGIS Pro Python environment. You can use this environment, but adding packages to it has the potential to break Pro. It is recommended that you use a cloned environment.

    Conda 3.9 with arrow to the left indicating active
    Figure 1.21 PyScripter setting ArcGIS Pro conda Python environment
  7. It is a good idea to close the application and restart it to ensure that it saves your activated environment. If the settings revert back to the defaults, repeat step 4 through 7 again, and it should save.

    PyScripter start view
    Figure 1.22 PyScripter Installation

If you are familiar with another IDE, you're welcome to use it instead of PyScripter (just verify that it is using Python 3!), but we recommend that you still install PyScripter to be able to work through the following sections and the sections on debugging in Lesson 2.

1.4.4 Exploring PyScripter

1.4.4 Exploring PyScripter jed124

Here’s a brief explanation of the main parts of PyScripter. Before you begin reading, be sure to have PyScripter open, so you can follow along.

When PyScripter opens, you’ll see a large text editor in the right side of the window. We'll come back to this part of the PyScripter interface in a moment. For now, focus on the pane in the bottom called the Python Interpreter. If this window is not open, and not listed as a tab along the bottom, you can open it by going to the top menu and selecting View> IDE Windows> then select Python Interpreter. This console is much like the Python interactive window we saw earlier in the lesson. You can type a line of Python at the In >>> prompt, and it will immediately execute and print the result if there is a printable result. This console can be a good place to practice with Python in this course, and whenever you see some Python code next to the In >>> prompt in the lesson materials, this means you can type it in the console to follow along.

We can experiment here by typing "import arcpy" to import arcpy or running a print statement.

>>> import arcpy

>>> print ("Hello World")
Hello world 

You might have noticed while typing in that second example a useful function of the Python Interpreter - code completion. This is where PyScripter, like Pro's Python window, is smart enough to recognize that you're entering a function name, and it provides you with the information about the parameters that function takes. If you missed it the first time, enter print(in the IPython window and wait for a second (or less) and the print function's parameters will appear. This also works for arcpy functions (or those from any library that you import). Try it out with arcpy.Buffer_analysis.

Now let's return to the right side of the window, the Editor pane. It will contain a blank script file by default (module1.py). I say it's a blank script file, because while there is text in the file, that text is delimited by special characters that cause it to be ignored when the script is executed We'll discuss these special characters further later, but for now, it's sufficient to note that PyScripter automatically inserts the character encoding of the file, the time it was created, and the login name of the user running PyScripter. You can add the actual Python statements that you'd like to be executed beneath these bits of documentation. (You can also remove the function and if statement along with the documentation, if you like.)

Among the nice features of PyScripter's editor (and other Python IDEs) is its color coding of different Python language constructs. Spacing and indentation, which are important in Python, are also easy to keep track of in this interface. Lastly, note that the Editor pane is a tabbed environment; additional script files can be loaded using File > New or File > Open.

Above the Editor pane, a number of toolbars are visible by default. The File, Run, and Debug toolbars provide access to many commonly used operations through a set of buttons. The File toolbar contains tools for loading, running, and saving scripts. Finally, the Debug toolbar contains tools for carefully reviewing your code line-by-line to help you detect errors. The Debugging toolbar is extremely valuable to you as a programmer, and you’ll learn how to use it later in this course. This toolbar is one of the main reasons to use an Integrated Development Environment (IDE) instead of writing your code in a simple text editor like Notepad.

1.5.1 Working with variables

1.5.1 Working with variables jed124

It’s time to get some practice with some beginning programming concepts that will help you write some simple scripts in Python by the end of Lesson 1. We’ll start by looking at variables.

Remember your first introductory algebra class where you learned that a letter could represent any number, like in the statement x + 3? This may have been your first exposure to variables. (Sorry if the memory is traumatic!) In computer science, variables represent values or objects you want the computer to store in its memory for use later in the program.

Variables are frequently used to represent not only numbers, but also text and “Boolean” values (True or False). A variable might be used to store input from the program’s user, to store values returned from another program, to represent constant values, and so on.

Variables make your code readable and flexible. If you hard-code your values, meaning that you always use the literal value, your code is useful only in one particular scenario. You could manually change the values in your code to fit a different scenario, but this is tedious and exposes you to a greater risk of making a mistake (suppose you forget to change a value). Variables, on the other hand, allow your code to be useful in many scenarios and are easy to parameterize, meaning you can let users change the values to whatever they need.

To see some variables in action, open PyScripter and type this in the Python Interpreter:

>>> x = 2

You’ve just created, or declared, a variable, x, and set its value to 2. In some strongly-typed programming languages, such as Java, you would be required to tell the program that you were creating a numerical variable, but Python assumes this when it sees the 2.

When you hit Enter, nothing happens, but the program now has this variable in memory. To prove this, type:

>>> x + 3

You see the answer of this mathematical expression, 5, appear immediately in the console, proving that your variable was remembered and used.

You can also use the print function to write the results of operations. We’ll use this a lot when practicing and testing code.

>>> print (x + 3)
5

Variables can also represent words, or strings, as they are referred to by programmers. Try typing this in the console:

>>> myTeam = "Nittany Lions"
>>> print (myTeam)
Nittany Lions

In this example, the quotation marks tell Python that you are declaring a string variable. Python is a powerful language for working with strings. A very simple example of string manipulation is to add, or concatenate, two strings, like this:

>>> string1 = "We are "
>>> string2 = "Penn State!"
>>> print (string1 + string2)
We are Penn State!

You can include a number in a string variable by putting it in quotes, but you must thereafter treat it like a string; you cannot treat it like a number. For example, this results in an error:

>>> myValue = "3"
>>> print (myValue + 2)

In these examples, you’ve seen the use of the = sign to assign the value of the variable. You can always reassign the variable. For example:

>>> x = 5
>>> x = x - 2
>>> print (x)
3

When naming your variables, the following tips will help you avoid errors.

  • Variable names are case-sensitive. myVariable is a different variable than MyVariable.
  • Variable names cannot contain spaces.
  • Variable names cannot begin with a number.
  • A recommended practice for Python variables is to name the variable beginning with a lower-case letter, then begin each subsequent word with a capital letter. This is sometimes known as camel casing. For example: myVariable, mySecondVariable, roadsTable, bufferField1, etc.
  • Variables cannot be any of the special Python reserved words such as "import" or "print."

Make variable names meaningful so that others can easily read your code. This will also help you read your code and avoid making mistakes. A good rule of thumb for naming is to use nouns for variables and classes, and verbs for functions. For example, a variable named country_feature indicates the variable contains a country feature whereas a plural country_features may be a list of country features. The function name get_country_features() indicates a process to get country features.

You’ll get plenty of experience working with variables throughout this course and will learn more in future lessons.

Readings

Read Zandbergen section 4.5 on variables and naming.

  • Chapter 4 covers the basics of Python syntax, loops, strings and other things which we will look at in more detail in Lesson 2, but feel free to read ahead a little now if you have time or come back to it at the end of Lesson 1 to prepare for Lesson 2.
  • You'll note that the section on variable naming refers to a Python style guide, which recommends against the use of camel casing. As the text notes, camel casing is a popular naming scheme for a lot of developers, and you'll see it used throughout our course materials. If you're a novice developer working in an environment with other developers, we recommend that you find out how variable naming is done by your colleagues and get in the habit of following their lead.

1.5.2 Objects and object-oriented programming

1.5.2 Objects and object-oriented programming jed124

The number and string variables that we worked with in the last section represent data types that are built into Python. Variables can also represent other things, such as GIS datasets, tables, rows, and the geoprocessor that we saw earlier that can run tools. All of these things are objects that you use when you work with ArcGIS in Python.

In Python, everything is an object. All objects have:

  • a unique ID, or location in the computer’s memory;
  • a set of properties that describe the object;
  • a set of methods, or things, that the object can do.

One way to understand objects is to compare performing an operation in a procedural language (like C or FORTRAN) to performing the same operation in an object-oriented language. We'll pretend that we are writing a program to make a peanut butter and jelly sandwich. If we were to write the program in a procedural language, it would flow something like this:

  1. Go to the refrigerator and get the jelly and bread.
  2. Go to the cupboard and get the peanut butter.
  3. Take out two slices of bread.
  4. Open the jars.
  5. Get a knife.
  6. Put some peanut butter on the knife.
  7. etc.

If we were to write the program in an object-oriented language, it might look like this:

  1. mySandwich = Sandwich.Make
  2. mySandwich.Bread = Wheat
  3. mySandwich.Add(PeanutButter)
  4. mySandwich.Add(Jelly)

In the object-oriented example, the bulk of the steps have been eliminated. The sandwich object "knows how" to build itself, given just a few pieces of information. This is an important feature of object-oriented languages known as encapsulation.

Notice that you can define the properties of the sandwich (like the bread type) and perform methods (remember that these are actions) on the sandwich, such as adding the peanut butter and jelly.

1.5.3 Classes

1.5.3 Classes jed124

The reason it’s so easy to "make a sandwich" in an object-oriented language is that some programmer, somewhere, already did the work to define what a sandwich is and what you can do with it. He or she did this using a class. A class defines how to create an object, the properties and methods available to that object, how the properties are set and used, and what each method does.

A class may be thought of as a blueprint for creating objects. The blueprint determines what properties and methods an object of that class will have. A common analogy is that of a car factory. A car factory produces thousands of cars of the same model that are all built on the same basic blueprint. In the same way, a class produces objects that have the same predefined properties and methods.

In Python, we can take advantage of pre-written modules which are a grouping of classes to help with your program. You import these modules into your code to tell your program what objects you’ll be working with and then use their pre-written methods to get a result. For example, the first line of most of the scripts you write in this course will be:

import arcpy

Here, you're using the import keyword to tell your script that you’ll be working with the arcpy module, which is provided as part of ArcGIS. After importing this module, you can create objects and processes that leverage ArcGIS functionality in your scripts.

Other modules that you may import in this course are os (allows you to work with the operating system), random (allows for generation of random numbers), csv (allows for reading and writing of spreadsheet files in comma-separated value format), and math (allows you to work with advanced math operations). These modules are included with Python, but they aren't imported by default. A best practice for keeping your scripts fast is to import only the modules that you need for that particular script. For example, although it might not cause any errors in your script, you wouldn't include import arcpy in a script not requiring any ArcGIS functions.

Readings

Read Zandbergen section 5.9 (Classes) for more information about classes.

1.5.4 Inheritance

1.5.4 Inheritance jed124

Another important feature of object-oriented languages is inheritance. Classes are arranged in a hierarchical relationship, such that each class inherits its properties and methods from the class above it in the hierarchy (its parent class or superclass). A class also passes along its properties and methods to the class below it (its child class or subclass). A real-world analogy involves the classification of animal species. As a species, we have many characteristics that are unique to humans. However, we also inherit many characteristics from classes higher in the class hierarchy. We have some characteristics as a result of being vertebrates. We have other characteristics as a result of being mammals. To illustrate the point, think of the ability of humans to run. Our bodies respond to our command to run not because we belong to the "human" class, but because we inherit that trait from some class higher in the class hierarchy.

Back in the programming context, the lesson to be learned is that it pays to know where a class fits into the class hierarchy. Without that piece of information, you will be unaware of all of the operations available to you. This information about inheritance can often be found in documentation for the module or the applications API. For example, esri provides the classes for the arcpy Python module here: https://pro.arcgis.com/en/pro-app/latest/arcpy/classes/alphabetical-list-of-arcpy-classes.htm and ArcGIS Pro's SDK for C# here: https://pro.arcgis.com/en/pro-app/latest/sdk/api-reference.

1.5.5 Python syntax

1.5.5 Python syntax jed124

Every programming language has rules about capitalization, white space, how to set apart lines of code and procedures, and so on. Here are some basic syntax rules to remember for Python:

  • Python is case-sensitive, both in variable names and reserved words. That means it’s important whether you use upper or lower-case. The all lower-case "print" is a reserved word in Python that will print a value, while "Print" is unrecognized by Python and will return an error. Likewise, arcpy is very sensitive about case and will return an error if you try to run a tool without capitalizing the tool name.
  • You end a Python statement by pressing Enter and literally beginning a new line. (In some other languages, a special character, such as a semicolon, denotes the end of a statement.) It’s okay to add empty lines to divide your code into logical sections.
  • If you have a long statement that you want to display on multiple lines for readability, you need to use a line continuation character, which in Python is a backslash (\).
  • Indentation is required in Python to logically group together certain lines, or blocks, of code. You should indent your code four spaces inside loops, if/then statements, and try/except statements. In most programming languages, developers are encouraged to use indentation to logically group together blocks of code; however, in Python, indentation of these language constructs is not only encouraged, but required. Though this requirement may seem burdensome, it does result in greater readability.
  • You can add a comment to your code by beginning the line with a # character. Comments are lines that you include to explain what the code is doing. Comments are ignored by Python when it runs the script, so you can add them at any place in your code without worrying about their effect. Comments help others who may have to work with your code in the future, and they may even help you remember what the code does. The intended audience for comments is developers.
  • You can add documentation to your code by surrounding a block of text with """ (3 consecutive double quotes) on either side). As with the # character, text surrounded by """ is ignored when Python runs the script. Unlike comments, which are sprinkled throughout scripts, documentation strings are generally found at the beginning of scripts. Their intended audience is end users of the script (non-developers).

1.6.1 Introductory Python examples

1.6.1 Introductory Python examples jed124

Let’s look at a few example scripts to see how these rules are applied. The first example script is accompanied with a walkthrough video that explains what happens in each line of the code. You can also review the main points about each script after reading the code.

1.6.2 Example: Printing the spatial reference of a feature class

1.6.2 Example: Printing the spatial reference of a feature class jed124

This first example script reports the spatial reference (coordinate system) of a feature class stored in a geodatabase. If you want to use the USA.gdb referenced in this example, you can run the code yourself.

# Opens a feature class from a geodatabase and prints the spatial reference

import arcpy

featureClass = "C:/Data/USA/USA.gdb/Boundaries"

# Describe the feature class and get its spatial reference   
desc = arcpy.Describe(featureClass)
spatialRef = desc.spatialReference

# Print the spatial reference name
print (spatialRef.Name)

This may look intimidating at first, so let’s go through what’s happening in this script, line by line. Watch this video (5:54) to get a visual walkthrough of the code.

Video: Walk-Through of First Python Script (5:33)

The purpose of this video is to walk through the first Python script in this course, which will be especially helpful if you're a beginner with Python. I have PyScripter open here, which is the development environment, or code editor, that we recommend you use throughout the class. The first couple lines of this script are a comment. And it says what we're going to do with this script, which is to look at a feature class inside of a geodatabase and then print the spatial reference of that feature class. In line four, we import the arcpy site package. Now if your script has anything at all to do with geoprocessing or looking at Esri datasets, this is what you're going to put at the top of your script. So you can get in the habit of using import arcpy at the top of your scripts in this class. In line six, we create a string variable. We call this variable featureClass, but we could really call it anything.

What to call the variables is up to us; what to assign the variable? Here is the path of the feature class that we want to look at, and we put that in quotes, which makes it a string in the eyes of Python. When you make a string, pyScripter is going to help you out by putting that text in orange so that it's easy to spot in your code. Notice in the path we use forward slashes. This is different from the common backslash that you would see typically when you define a path. The backslash in Python is a reserved character, so you have to use either two backslashes or a forward slash. You're going to see us use both in this course. In this case, the path is pointing at a file geodatabase. That's why you see the USA.gdb. Inside that file geodatabase is a feature class called boundaries. This is probably a polygon or a line feature class, who knows?

If you didn't know the exact path to get, you could open the Catalog view in ArcGIS Pro, navigate to that feature class and then you would see in the upper location bar the exact path to use. You could also navigate to it in your File Explorer in Windows. In line 9, we call a method in arcpy, the arcpy Describe method, which returns to us what's called a Describe object. We've mentioned in the course material that everything in Python is an object. Objects have properties that describe them, and methods that are things that they can do. If you want to learn about this particular object, the Describe object, you could look it up and you should look it up in the ArcGIS Pro Help. In this case, since we're describing a feature class, the Describe object that we're working with is going to have a spatialReference property.

Now, before we move off of line 9, I want to point out the parentheses there. When you call the method Describe, you need to put something inside that parentheses, which is the path of the feature class that you would like to Describe. Now, we don't have to type in that full path here because we made a variable earlier to store the path. This is where you would plug in the featureClass variable. Python is going to see that as the path stored in that variable.

After line 9, we've got a Describe object stored in our desc variable. That Describe object has a property which is the spatialReference, so in line 10, that's what we're getting. desc is the name of the Describe object, and spatialReference gets us, returns to us, a SpatialReference object. Now we can't print out an object, we can't do print(spatialRef). If you tried to do that. pyScripter is going to tell you that the thing you're trying to print is an object, which isn't going to be very helpful. We actually need to get a property from the SpatialReference object. The property we're going to get is the name. That's why in line 13 we do spatialRef.name. By this time we've drilled down far enough that we've made our way to a string. spatialRef.name gives you back a string, the name of the spatial reference. That's something that we are able to print to the Python Interpreter window.

Now I'm going to run this script. To run a script I just click on the Run button, which looks like a Play icon on the top. We'll see later in the course that scripts sometimes require arguments or pieces of information, like a path name or a number, in order to do their job. It's possible to supply such arguments, but this is a pretty simple script that doesn't require any kind of input, so we can just go ahead and click on the Run button. Now, anytime you run a script that includes a print statement, you should look to the bottom of the pyScripter application window to the area labeled as the Python Interpreter.

Before your print statement output, you'll see a message that indicates that you've re-initialized the Python interpreter. Then comes your output, which in this case is the word geographic, meaning that this particular feature class has a geographic coordinate system. The first time you run a script in any particular pyScripter session, it's going to take several seconds for the interpreter to load into memory. However, if you go on to run any other arcpy scripts, to re-run this script or another script, they should all run more quickly.

Credit: S. Quinn, J. Detwiler © Penn State is licensed under CC BY-NC-SA 4.0.

Again, notice that:

  • a comment begins the script to explain what’s going to happen.
  • case sensitivity is applied in the code. "import" is all lower-case. So is "print". The module name "arcpy" is always referred to as "arcpy," not "ARCPY" or "Arcpy". Similarly, "Describe" is capitalized in arcpy.Describe.
  • The variable names featureClass, desc, and spatialRef that the programmer assigned are short, but intuitive. By looking at the variable name, you can quickly guess what it represents.
  • The script creates objects and uses a combination of properties and methods on those objects to get the job done. That’s how object-oriented programming works.

Trying the example for yourself

The best way to get familiar with a new programming language is to look at example code and practice with it yourself. See if you can modify the script above to report the spatial reference of a feature class on your computer. In my example, the feature class is in a file geodatabase; you’ll need to modify the structure of the featureClass path if you are using a shapefile (for example, you'll put .shp at the end of the file name, and you won't have .gdb in your path).

Follow this pattern to try the example:

  1. Open PyScripter and click File > New.
  2. Paste the code above into the new script file and modify it to fit your data (change the path).
  3. Save your script as a .py file.
  4. Click the Run button to run the script. Look at the IPython console to see the output from the print keyword. The print keyword does not actually cause a hard copy to be printed! Note that the script will take several seconds to run the first time because of the need to import the arcpy module and its dependencies. Subsequent runs of arcpy-dependent scripts during your current PyScripter session will not suffer from this lag.

Readings

We'll take a short break and do some reading from another source. If you are new to Python scripting, it can be helpful to see the concepts from another point of view.

Read parts of Zandbergen chapters 4 & 5. This will be a valuable introduction to Python in ArcGIS, on how to work with tools and toolboxes (very useful for Project 1), and also on some concepts which we'll revisit later in Lesson 2 (don't worry if the bits we skip over seem daunting - we'll explain those in Lesson 2).

  • Chapter 4 deals with the fundamentals of Python. We will need a few of these to get started, and we'll revisit this chapter in Lesson 2. For now, read sections 4.1-4.7, reviewing 4.5, which you read earlier.
  • Chapter 5 talks about working with arcpy and functions - read sections 5.1-5.2 and 5.4-5.7.

1.6.3 Example: Performing map algebra on a raster

1.6.3 Example: Performing map algebra on a raster jed124

Here’s another simple script that finds all cells over 3500 meters in an elevation raster and makes a new raster that codes all those cells as 1. Remaining values in the new raster are coded as 0. This type of “map algebra” operation is common in site selection and other GIS scenarios.

Something you may not recognize below is the expression Raster(inRaster). This function just tells ArcGIS that it needs to treat your inRaster variable as a raster dataset so that you can perform map algebra on it. If you didn't do this, the script would treat inRaster as just a literal string of characters (the path) instead of a raster dataset.

# This script uses map algebra to find values in an
#  elevation raster greater than 3500 (meters).

import arcpy
from arcpy.sa import *

# Specify the input raster
inRaster = "C:/PSU/Geog485/Lesson1.gdb/foxlake"
cutoffElevation = 3500 

# Check out the Spatial Analyst extension
arcpy.CheckOutExtension("Spatial")

# Make a map algebra expression and save the resulting raster
outRaster = Raster(inRaster) > cutoffElevation
outRaster.save("C:/PSU/Geog485/Lesson1.gdb/foxlake_hi_10")

# Check in the Spatial Analyst extension now that you're done
arcpy.CheckInExtension("Spatial")

Begin by examining this script and trying to figure out as much as you can based on what you remember from the previous scripts you’ve seen.

The main points to remember on this script are:

  • Unlike other tools/functions we've seen so far, map algebra functions and the Raster class aren't in the arcpy module, but in the arcpy.sa sub-module. Thus, this script imports both arcpy and arcpy.sa.
  • The script uses an alternative syntax for importing modules: from <module> import <class> or *. When a module is imported with this syntax, any classes imported can be invoked without prefixing the parent module. In other words, using from arcpy.sa import * allows us to refer to the Raster class like this:
    outRaster = Raster(inRaster) > cutoffElevation
    whereas using import arcpy.sa would require this syntax when referring to the Raster class:
    outRaster = arcpy.sa.Raster(inRaster) > cutoffElevation
  • You may notice that PyScripter may flag the from statement and later the statement that invokes the Raster class with a warning icon. Your script will still run, but the warning appears because it can be poor practice to import all classes/functions from a module. It is often better to import only those items that are needed for the task at hand. In this case, changing * to Raster would be the better practice and would eliminate the warning. We've used the * notation to be consistent with Esri's code samples, but also encourage you to limit what you import from modules when possible.
  • Notice the lines of code that check out the Spatial Analyst extension before doing any map algebra and check it back in after finishing. In ArcGIS Pro, the checkout step is required only if you're operating in an environment where extension licenses are shared (i.e., using a Concurrent Use license). If you were to run this script on an educational copy of the software, the checkout and checkin statements could be omitted. We've left them in because they don't hurt, and they may be needed in your work environment.
  • Because each line of code takes some time to run, avoid putting unnecessary code between checkout and checkin. This allows others in your organization to use the extension if licenses are limited. The extension automatically gets checked back in when your script ends, thus some of the Esri code examples you will see do not check it in. However, it is a good practice to explicitly check it in, just in case you have some long code that needs to execute afterward, or in case your script crashes and against your intentions "hangs onto" the license.
  • inRaster begins as a string, but is then used to create a Raster object once you run Raster(inRaster). A Raster object is a special object used for working with raster datasets in ArcGIS. It's not available in just any Python script: you can use it only if you import the arcpy module at the top of your script.
  • cutoffElevation is a number variable that you declare early in your script and then use later on when you build the map algebra expression for your outRaster.
  • Your expression outRaster = Raster(inRaster) > cutoffElevation is saying, in plain terms, "Make a new raster and call it outRaster. Do this by taking all the cells of the raster dataset at the path of inRaster that are greater than the number I assigned to the variable cutoffElevation."
  • outRaster is also a Raster object, but you have to call the method outRaster.save() in order to make it permanent on disk. The save() method takes one argument, which is the path to which you want to save.

Now try to run the script yourself using the FoxLake digital elevation model (DEM) in your Lesson 1 geodatabase. If it doesn’t work the first time, verify that:

  • you have supplied the correct input and output paths;
  • your path name contains forward slashes (/) or double backslashes (\\), not single backslashes (\);
  • you have the Spatial Analyst extension installed and enabled. To check this, from ArcPro, click on the Project tab > Licensing and ensure Spatial Analyst is enabled;
  • you do not have any of the datasets open in ArcGIS products;
  • the output data does not exist yet. If you want to be able to overwrite the output, you need to add the line arcpy.env.overwriteOutput = True. This line can be placed immediately after import arcpy.

You can experiment with this script using different values in the map algebra expression (try 3000 for example).

Readings

ArcGIS Pro edition:

Read the sections of Chapter 5 that talk about environment variables and licenses (5.11 & 5.13) which we covered in this part of the lesson.

1.6.4 Example: Creating buffers

1.6.4 Example: Creating buffers jed124

Think about the previous example where you ran some map algebra on an elevation raster. If you wanted to change the value of your cutoff elevation to 2500 instead of 3500, you had to open the script itself and change the value of the cutoffElevation variable in the code.

This third example is a little different. Instead of hard-coding the values needed for the tool (in other words, literally including the values in the script) we’ll use some user input variables, or parameters. This allows people to try different values in the script without altering the code itself. Just like in ModelBuilder, parameters make your script available to a wider audience.

The simple example below just runs the Buffer tool, but it allows the user to enter the path of the input and output datasets as well as the distance of the buffer. The user-supplied parameters make their way into the script with the arcpy.GetParameterAsText() function.

Examine the script below carefully, but don't try to run it yet. You'll do that in the next part of the lesson.

# This script runs the Buffer tool. The user supplies the input
# and output paths, and the buffer distance.

import arcpy
arcpy.env.overwriteOutput = True

try:
     # Get the input parameters for the Buffer tool
     inPath = arcpy.GetParameterAsText(0)
     outPath = arcpy.GetParameterAsText(1)
     bufferDistance = arcpy.GetParameterAsText(2)

     # Run the Buffer tool
     arcpy.Buffer_analysis(inPath, outPath, bufferDistance)

     # Report a success message
     arcpy.AddMessage("All done!")

except:
     # Report an error messages
     arcpy.AddError("Could not complete the buffer")

     # Report any error messages that the Buffer tool might have generated
     arcpy.AddMessage(arcpy.GetMessages())

Again, examine the above code line by line and figure out as much as you can about what the code does. If necessary, print the code and write notes next to each line. Here are some of the main points to understand:

  • GetParameterAsText() is a function in the arcpy module. Notice that it takes a zero-based integer (0, 1, 2, 3, etc.) as an argument. If you’re going to go ahead and make a tool out of this script, as we are going to do in the next page of this lesson, then it’s important you define the parameters in the same order you want them to appear in the tool’s dialog.
  • When we called the Buffer tool in this script, we supplied only three parameters. By not supplying any more, we accepted the default values for the rest of the tool’s parameter (Side Type, End Type, etc.).
  • The try and except blocks of code are a way that you can prevent your script from crashing if there is an error. Your script attempts to run all of the code in the try block. If the script cannot continue for some reason, it jumps down and runs the code in the except block. Inserting try/except blocks like this is a good practice to follow once you think you've gotten all the errors out of your script, or when you want to make sure your code will run a certain line at the end, no matter what happens.

    When you are first writing and debugging your script, sometimes it's more useful to leave out try/except and let the code crash because the error messages reported in the console sometimes give you better clues on how to diagnose the problem in your code. Suppose you put a print statement in your except block saying "There was an error. Please try again." For the end user of your script, this is nicer than seeing a nasty error message; however, as a programmer debugging the script, you want to see the (red) error message to get any insight you can about what went wrong.

    Projects that you submit in this course require error handling using try/except in order to receive full credit.

  • The arcpy.AddMessage() and arcpy.AddError() functions are ways of adding additional messages to the user of the tool. Whenever you run a tool, the geoprocessor prints messages, which you have probably seen before (for example, “Executed (Buffer) successfully. End time: Sat Oct 05 07:37:31 2019”). You have the power to add more messages through these functions. The messages have differing levels of severity, hence different functions for AddMessage and AddError. Sometimes people choose to view only the errors instead of all the messages, or they do a quick visual scan of the messages for errors.

    When you use arcpy.GetMessages(), you get all the messages generated by the tool itself. These will tell you things such as whether the user entered invalid parameters. Notice in this script the somewhat complex syntax you have to use to first get the messages, then add them: arcpy.AddMessage(arcpy.GetMessages()). If this line of code is confusing to understand, remember that the order of functions works like math operations: you start by working inside the parentheses first to get the messages, then you add them.

Readings

ArcGIS Pro edition:

Read the section of Chapter 5 that talks about working with tool messages (5.12) for another perspective on handling tool output.

1.7.1 Making a script tool

1.7.1 Making a script tool jed124

User input variables that you retrieve through GetParameterAsText() make your script very easy to convert into a tool in ArcGIS. A few people know how to alter Python code, a few more can run a Python script and supply user input variables, but almost all ArcGIS users know how to run a tool. To finish off this lesson, we’ll take the previous script and make it into a tool that can easily be run in ArcGIS.

Before you begin this exercise, I strongly recommend that you scan the ArcGIS help topic Adding a script tool. You likely will not understand all the parts of this topic yet, but it will give you some familiarity with script tools that will be helpful during the exercise.

Follow these steps to make a script tool:

  1. Copy the code from Lesson 1.6.4 "Example: Creating Buffers" into a new PyScripter script and save it as buffer_user_input.py.
  2. Open your Lesson1Practice.aprx Pro project and display the Catalog window.
  3. Expand the nodes Toolboxes > Lesson1Practice.atbx.
  4. Right-click Lesson1Practice.atbx and click New > Script.
  5. Fill in the Name and Label properties for your Script tool as shown below, depending on your version of ArcGIS Pro: The first image below shows the interface and the information you have to enter under General in this and the next step if you have a version earlier than version 2.9. The other two images show the interface and information that needs to be entered in this and the next step under General and under Execution if you have version 2.9 or higher.

    A screen capture to show the Add Script dialog box for version earlier than 2.9

    Figure 1.10a Entering information for your script tool if you have a version < 2.9.

    A screen capture to show the Add Script dialog box for version 2.9 or higher (General tab)

    Figure 1.10b Entering information for your script tool if you have a version ≥ 2.9: General tab.

    A screen capture to show the Add Script dialog box for versin 2.9 or higher (Execution tab)

    Figure 1.10c Entering information for your script tool if you have a version ≥ 2.9: Execution tab.
  6. For version <2.9, click the folder icon for Script File on the General tab and browse to your buffer_user_input.py file. For version 2.9 or higher, do the same on the Execution tab; the code from the script file will be shown in the window below.
  7. All versions: On the left side of the dialog, click on Parameters. You will now enter the parameters needed for this tool to do its job, namely inPath, outPath, and bufferMiles. You will specify the parameters in the same order, except you can give the parameters names that are easier to understand.
  8. Working in the first row of the table, enter Input Feature Class in the Label column, then Tab out of that cell. Note that the Name column is automatically filled in with the same text, except that the spaces are replaced by spaces.
  9. Next, click in the Data Type column and choose Feature Class from the subsequent dialog. Here is one of the huge advantages of making a script tool. Instead of accepting any string as input (which could contain an error), your tool will now enforce the requirement that a feature class be used as input. ArcGIS will help you by confirming that the value entered is a path to a valid feature class. It will even supply the users of your tool with a browse button, so they can browse to the feature class.

    A screen capture to show "setting the first parameter" in the Add Script dialog box

    Figure 1.11 Choosing "Feature Class."
  10. Just as you did in the previous steps, add a second parameter labeled Output Feature Class. The data type should again be Feature Class.
  11. For the Output Feature Class parameter, change the Direction property to Output.
  12. Add a third parameter named Buffer Distance. Choose Linear Unit as the data type. This data type will allow the user of the tool to select both the distance value and the units (for example, miles, kilometers, etc.).
  13. Set the Buffer Distance parameter's Default property to 5 Miles. (You may need to scroll to the right to see the Default column.) Your dialog should look like what you see below:

    A screen capture to show the Add Script dialog box and the tool after setting all parameters

    Figure 1.12 Setting the Default property to "5 Miles."
  14. Click OK to finish creation of the new script tool and open it by double-clicking it.
  15. Try out your tool by buffering any feature class on your computer. Notice that once you supply the input feature class, an output feature class path is suggested for you. This is because you specifically set Output Feature Class as an output parameter. Also, when the tool is complete, examine the Details box (visible if you hover your mouse over the tool's "Completed successfully" message) for the custom message "All done!" that you added in your code.

    A screen capture of the Buffer Script Tool dialog box to show a custom geoprocessing message

    Figure 1.13 The tool is complete.

This is a very simple example, and obviously, you could just run the out-of-the-box Buffer tool with similar results. Normally, when you create a script tool, it will be backed with a script that runs a combination of tools and applies some logic that makes those tools uniquely useful.

There’s another benefit to this example, though. Notice the simplicity of our script tool dialog compared to the main Buffer tool:

Screen captures of the Buffer Script Tool and Buffer dialog boxes to show both dialogs compared

Figure 1.14 Comparison of our script tool with the main buffer tool.

At some point, you may need to design a set of tools for beginning GIS users where only the most necessary parameters are exposed. You may also do this to enforce quality control if you know that some of the parameters must always be set to certain defaults, and you want to avoid the scenario where a beginning user (or a rogue user) might change the required values. A simple script tool is effective for simplifying the tool dialog in this way.

Readings: ArcGIS Pro edition

Read Zandbergen 3.9- 3.10 to reinforce what you learned during this lesson about scripts and script tools.

Lesson 1 Practice Exercises

Lesson 1 Practice Exercises jed124

Each lesson in this course includes some simple practice exercises with Python. These are not submitted or graded, but they are highly recommended if you are new to programming or if the project initially looks challenging. Lessons 1 and 2 contain shorter exercises, while Lessons 3 and 4 contain longer, more holistic exercises. Each practice exercise has an accompanying solution that you should carefully study. If you want to use the USA.gdb referenced in some of the solutions you can find it here.

Remember to choose File > New in PyScripter to create a new script (or click the empty page icon). You can name the scripts something like Practice1, Practice2, etc. To execute a script in PyScripter, click the "play" icon.

Practice 1: Say hello

Create a string variable called x and assign it the value "Hello". Display the contents of the x variable in the Console.

Practice 1 Solution

Solution:

x = "Hello"
print (x)

Explanation:

The first line of the script tells the Python code interpreter to set aside space in the computer's memory to hold a value, to refer to that space in memory by the name x and to store the text string "Hello" in that space. The memory space is typically referred to as a variable. Unlike some languages that require declaring a variable and defining the type of data it will hold before assigning a value to it, Python allows variables to be defined without explicitly specifying their type. The data type (string, number, etc.) is inferred by the Python code interpreter based on the kind of value you're assigning to the variable.

The second line of the script tells the interpreter to print the contents of the x variable to the Python Interpreter.

Note that this already simple script could be simplified even further by doing away with the x variable and plugging "Hello" into the print statement directly as literal text:

print ("Hello")

Note also that this script does not include lines that import arcpy since the script does not require any ArcGIS functionality.

Practice 2: Concatenate two strings

Create a string variable called first and assign to it your first name. Likewise, create a string variable called last and assign to it your last name. Concatenate (merge) the two strings together, making sure to also include a space between them.

Practice 2 Solution

Solution:

first = "James"
last = "Franklin"
full = first + " " + last

print (full)

Explanation:

In this script, two string variables are used to hold the two name pieces. The two pieces are merged together using the concatenation character (+ in Python). A space is inserted between them using a space within quotes. 

If you wanted to display the name pieces last name first and with a comma in between, you would change the 'full' statement as follows:

full = last + ", " + first

Practice 3: Pass a value to a script as a parameter

Example 1.6.4 shows the use of the arcpy.GetParameterAsText() function. This function is typically used in conjunction with an ArcGIS script tool that has been designed to prompt the user to enter the required parameters. However, it is possible to supply arguments to the script from within PyScripter. To do so, you would select Run > Configuration per file, check the Command line options checkbox, then enter the arguments something like this:

C:\PSU\geog485\Lesson1.gdb\USA\us_cities C:\PSU\geog485\Lesson1.gdb\USA\us_cities_buffered "10 miles"

Note that the arguments to the two feature class parameters are unquoted, while the argument to the buffer distance string parameter is provided in double quotes.

For this exercise, write a script that accepts a single string value using the GetParameterAsText method. The value entered should be a name, and that name should be concatenated with the literal string "Hi, " and displayed in the console. Test the script from within PyScripter, entering a name (in double quotes) in the Command line options text box as outlined above prior to clicking the Run button.

Practice 3 Solution

Solution:

import arcpy

name = arcpy.GetParameterAsText(0)
print ("Hi, " + name)

Explanation:

There are other means of obtaining user input from the  PyScripter IDE, but this course is focused on executing scripts from within ArcGIS, so the exercise asked you to use arcpy's GetParameterAsText function. That means that, unlike the first two exercises, this script requires importing arcpy. As mentioned, the script can be executed from within PyScripter by entering the name parameter as a command line option. GetParameterAsText is zero-based, so the script is set up to get parameter 0 (the first and only parameter).

Note that it would also be acceptable to do away with the name variable as follows:

print ("Hi, " + arcpy.GetParameterAsText(0))

Practice 4: Report the geometry type of a feature class

Example 1.6.2 demonstrates the use of the Describe function to report the spatial reference of a feature class. The Describe function returns an object that has a number of properties that can vary depending on what type of object you described. A feature class has a spatialReference property by virtue of the fact that it is a type of Dataset. (Rasters are another type of Dataset and also have a spatialReference property.)

The Describe function's page in the Help lists the types of objects that the function can be used on. Clicking the Dataset properties link pulls up a list of the properties available when you describe a dataset, spatialReference being just one of several.

For this exercise, use the Describe function again; this time, to determine the type of geometry (point, polyline or polygon) stored in a feature class. I won't tell you the name of the property that returns this information. But I will give you the hint that feature classes have this mystery property not because they're a type of Dataset as with the spatialReference property, but because they're objects of the type FeatureClass.

Practice 4 Solution

Solution:

import arcpy

fc = "C:/PSU/Geog485/Lesson1.gdb/USA/us-boundaries"

desc = arcpy.Describe(fc)
shapeType = desc.shapeType

print ("The geometry type of " + fc + " is " + shapeType)

Explanation:

This script begins like the one in example 1.6.2 since both require using the Describe function. As mentioned in the exercise instructions, you can see that the Describe function provides access to a spatialReference property by looking at the Describe function's help page for Datasets. In other words, if the object you're describing is a type of Dataset, it has a spatialReference property.

In this case, you were asked to determine the geometry type of a feature class and given the hint that the property you need belongs to objects of the type FeatureClass. This hint should have led you to click the FeatureClass properties link on the Describe function's Help page and scan through the list of properties until you found shapeType. The solution above reads the shapeType property, stores the return value in a variable, and then inserts that value into a user-friendly print message that also includes the name of the feature class.

Practice 5: Do some simple math and report your results

Create a variable score1 and assign it a value of 90. Then create a variable score2 and assign it a value of 80. Compute the sum and average of these values and output to the Console the following messages, by replacing the x with your scores:
The sum of these scores is x.
Their average is x.

In constructing your messages, you're probably going to want to concatenate text strings with numbers. We'll cover this more in Lesson 2, but this requires converting the numeric values into strings to avoid a syntax error. The str() function can be used to do this conversion. For example, I might output the score1 value like this:
print('Score 1 is ' + str(score1))

Practice 5 Solution

Solution:

score1 = 90
score2 = 80

sum = score1 + score2
avg = sum / 2

print("The sum of these scores is " + str(sum) + ".")
print("The average is " + str(avg) + ".")

#Alternatively...
#score1 = 90
#score2 = 80
#
#print("The sum of these scores is " + str(score1 + score2) + ".")
#print("The average is " + str((score1 + score2) / 2) + ".")

Explanation:

This script begins with a couple of statements in which the score1 and score2 variables are created and assigned values.  There are two approaches shown to displaying the required messages.  The first shows creating variables called sum and avg to store the two simple statistics requested.  The values held in those variables are then inserted into a pair of print() statements using string concatenation.  Note the use of the str() function to convert the numbers into strings for the purpose of concatenating with the message text. 

The second approach ("commented out" using the # character) shows how the script could be written without bothering to create the sum and avg variables.  The required math can be embedded right within the print() statements.  Note that in the statement that outputs the average the addition of the values held in score1 and score2 is enclosed in parentheses.  If these parentheses were omitted, the script would still run, but would produce an incorrect average because algebra's order of operations would cause score2 to be divided by 2 first, then score1 would be added to that product. 

A couple of other points can be made about this script:

You might be thinking that the second approach is pretty slick and that you should be striving to write your scripts in the shortest number of lines possible.  However, in terms of performance, shorter scripts don't necessarily translate to higher performing ones.  In this instance, there is practically no difference.  And the second approach is a bit harder for another coder (or yourself at some later date) to interpret what's going on.  That's a very important point to take into consideration when writing code.

You'll often see, including in the ArcGIS documentation, Python's format() function used to insert values into larger strings, eliminating the need for the concatenation operator.  For example, we could display our sum message like so:
print('The sum is {}.'.format(sum))
With the format() function, curly braces are used within a string as placeholders for values to be inserted.  The string (enclosed in single or double quotes) is then followed by .format(), with the desired value plugged into the parentheses.  Note that the values supplied are automatically converted into a string format and any number of placeholders can be employed.  For example, we could combine the two messages like so:
print('The sum is {}. The average is {}.'.format(sum, (score1 + score2) / 2))

Project 1, Part 1: Modeling precipitation zones in Nebraska

Project 1, Part 1: Modeling precipitation zones in Nebraska jed124

Suppose you're working on a project for the Nebraska Department of Agriculture and you are tasked with making some maps of precipitation in the state. Members of the department want to see which parts of the state were relatively dry and wet in the past year, classified in zones. All you have is a series of weather station readings of cumulative rainfall for 2018 that you've obtained from within Nebraska and surrounding areas. This is a feature class of points called Precip2018Readings. It is in the Nebraska feature dataset in your Lesson1 geodatabase.

Precip2018Readings is a fictional dataset created for this project. The locations do not correspond to actual weather stations. However, the measurements are derived from real 2018 precipitation data created by the PRISM Climate Group at Oregon State University.

You need to do several tasks in order to get this data ready for mapping:

  • Interpolate a precipitation surface from your points. This creates a raster dataset with estimated precipitation values for your entire area of interest. You've already planned for this, knowing that you are going to use inverse distance weighted (IDW) interpolation. Click the following link to learn how the IDW technique works. You've also selected your points to include some areas around Nebraska to avoid edge effects in the interpolation.
  • Reclassify the interpolated surface into an ordinal classification of precipitation "zones" that delineate relatively dry, medium, and wet regions.
  • Create vector polygons from the zones.
  • Clip the zone polygons to the boundary of Nebraska.
    Precipitation readings to an interpolated surface to a reclassified surface and vectorized zones which results in precipitation zones
    Figure 1.15 Mapping the data.

It's very possible that you'll want to repeat the above process in order to test different IDW interpolation parameters or make similar maps with other datasets (such as next year's precipitation data). Therefore, the above series of tasks is well-suited to ModelBuilder. Your job is to create a model that can complete the above series of steps without you having to manually open four different tools.

Model parameters

Your model should have these (and only these) parameters:

  1. Input precipitation readings- This is the location of your precipitation readings point data. This is a model parameter so that the model can be easily re-run with other datasets.
  2. Power- An IDW setting specifying how quickly influence of surrounding points decreases as you move away from the point to be interpolated.
  3. Search radius- An IDW setting determining how many surrounding points are included in the interpolation of a point. The search radius can be fixed at a certain distance, including whatever number of points happen to fall within, or its distance can vary in order for it to always include a minimum number of points. When you use ModelBuilder, you don't have to set up any of these choices; ModelBuilder does it for you when you set the Search Radius as a model parameter.
  4. Zone boundaries- This is a table allowing the user of the model to specify the zone boundaries. By default, the table should be configured as shown in Figure 1.16 below: Precipitation values of 0 - 30000 will result in a reclassification of 1 (to correspond with Zone 1), 30000 - 60000 will result in a classification of 2 (to correspond with Zone 2), and so on. The way to get this table is to make a variable from the Reclassification parameter of the Reclassify tool and set it as a model parameter.
  5. Output precipitation zones- This is the location where you want the output dataset of clipped vector zones to be placed on disk.

As you build your model, you will need to configure some settings that will not be exposed as parameters. These include the clip feature, which is the state of Nebraska StateBoundary feature class. There are many other settings such as "Z Value field" and "Input barrier polyline features" (for IDW) or "Reclass field" (for Reclassify) that should not be exposed as parameters. You should just set these values once when you build your model. If you ever ask someone else to run this model, you don't want them to be overwhelmed with choices stemming from every tool in the model; you should just expose the essential things they might want to change.

For this particular model, you should assume that any input dataset will conform to the same schema as your Precip2018Readings feature class. For example, an analyst should be able to submit similar datasets Precip2019Readings, Precip2020Readings, etc. for more recent years with the same fields, field names, and data types. However, he or she should not expect to provide any feature class with a different set of fields and field names, etc. As you might discover, handling all types of feature class schemas would make your model more complex than we want for this assignment.

Important: Given the scenario of wishing to re-run the model for other years of data, it would be a good idea to set default values for the exposed model parameters. Therefore, we are asking you to set default values for all parameters that are exposed as model parameters including the Power value, Search radius value, and Zone boundaries classification table. When you double-click the model to run it, the interface should look like the following:

A screen capture of the Create Precipitation Zones dialog box
Figure 1.16 The model interface.

Running the model with the exact parameters listed above should result in the following (I have symbolized the zones in Pro with different colors to help distinguish them). This is one way you can check your work:

Model output with parameters shown above
Figure 1.17 The completed model output.

Once you are done, take a screenshot of the layout of your final model in ModelBuilder (similar to Figure 1.5 in Section 1.3.2) to include in your homework submission.

Tips

The following tips may help you as you build your model:

  • Your model needs to include the following tools in this order: IDW (from the Spatial Analyst toolbox), Reclassify, Raster to Polygon, Clip (from the Analysis toolbox).
  • An easy way to find the tools you need in Pro is to go to Analysis > Tools, then type the name of the tool you want in the search box. Be careful when multiple tools have the same name. You'll typically be using tools from the Spatial Analyst toolbox in this assignment.
  • Once you drag and drop a tool onto the ModelBuilder canvas, double-click it and set all the parameters the way you want. These will be the default settings for your model.
  • If there is a certain parameter for a tool that you want to expose as a model parameter, right-click the tool in the ModelBuilder canvas, then click Create Variable > From Parameter and choose the parameter. Once the oval appears for the variable, right-click it and click Parameter.

Project 1, Part II: Creating contours for the Fox Lake DEM

Project 1, Part II: Creating contours for the Fox Lake DEM jed124

The second part of Project 1 will help you get some practice with Python. At the end of Lesson 1, you saw three simple scripting examples; now your task is to write your own script. This script will create vector contour lines from a raster elevation dataset. Don't forget that the ArcGIS Pro Help can indeed be helpful if you need to figure out the syntax for a particular command.

Earlier in the lesson, you were introduced to the Fox Lake DEM in your Lesson 1 geodatabase. It represents elevation in the Fox Lake Quadrangle, Utah. Write a script that uses the Contour tool in the Spatial Analyst toolbox to create contour lines for the quadrangle. The contour interval should be 25 meters, and the base contour should be 0. Remember that the native units of the DEM are meters, so no unit conversions are required.

Running the script should immediately create a feature class of contour lines on disk.

Follow these guidelines when writing the script:

  • The purpose of this exercise is just to get you some practice writing Python code. Therefore, you are not required to use arcpy.GetParameterAsText() to get the input parameters. Go ahead and hard-code the values (such as the path name to the dataset).
  • Consequently, you are not required to create a script tool for this exercise. This will be required in Project 2.
  • Your code should run correctly from PyScripter. For full credit, it should also contain comments, attempt to handle errors, and use legal and intuitive variable names.
  • You should assume that you're writing your script for an organization that has a limited number of concurrent Spatial Analyst licenses available.

Deliverables

The deliverables for Project 1 are:

  • The .atbx file of the toolbox containing your Part I model. The easiest way to find it is to hover your mouse over the toolbox in the Catalog window and note its location in the box that appears.
  • A screenshot showing the layout of your final model for Part 1 (the model in the ModelBuilder editor, not the tool interface and not the produced output of your model).
  • The .py file containing your Part II script.
  • A detailed written narrative that explains in plain English what the various lines of your script do.  In this document, we will be looking for you to demonstrate that you understand how the code you're submitting works. You may find it useful to emulate the practice exercise solution explanations.  Alternatively, you may record a video in which you walk the grader through your solution beginning to end.
  • A short writeup (about 400 words) describing how you approached the two problems, any setbacks you faced, and how you dealt with those. Both writeups described here will be required on all projects.

Important: Successful delivery of the above requirements is sufficient to earn 95% on the project. The remaining 5% is reserved for efforts that go "over and above" the minimum requirements. For Part I, this could include (but is not limited to) analysis of how different input values affect the output, substitution of some other interpolation method instead of IDW (for example Kriging), documentation for your model parameters that guides the end user in what to input, or demonstration of how your model was successfully run on a different input dataset. For Part II, in addition to the Contour tool, you could run some other tool that also takes a DEM as an input.

As a general rule throughout the course, full credit in the "over and above" category requires the implementation of 2-4 different ideas, with more complex ideas earning more credit. Note that for future projects, we won't be listing off ideas as we've done here. Otherwise, it wouldn't really be an over and above requirement.

Finishing Lesson 1

To complete Lesson 1, please zip all your Project 1 deliverables (for parts I and II) into one file and submit them to the Project 1 Drop Box in Canvas. Then take the Lesson 1 Quiz if you haven't taken it already.

Lesson 2: Python and programming basics

Lesson 2: Python and programming basics jed124

The links below provide an outline of the material for this lesson. Be sure to carefully read through the entire lesson before returning to Canvas to submit your assignments.

Lesson 2 Overview

Lesson 2 Overview jed124

In Lesson 1, you received an introduction to Python. Lesson 2 builds on that experience, diving into Python fundamentals. Many of the things you'll learn are common to programming in other languages. If you already have coding experience, this lesson may contain some review.

This lesson has a relatively large amount of reading from the course materials, the Zandbergen text, and the ArcGIS help. I believe you will get a better understanding of the Python concepts as they are explained and demonstrated from several different perspectives. Whenever the examples use the IPython console, I strongly suggest that you type in the code yourself as you follow the examples. This can take some time, but you'll be amazed at how much more information you retain if you try the examples yourself instead of just reading them.

At the end of the lesson, you'll be required to write a Python script that adds many of the things you've learned during the lesson. This will go much faster if you've taken the time to read all the required text and work through the examples. These assignments will build off of the previous lesson's concepts, but may require they be applied slightly differently.

Lesson 2 Checklist

Lesson 2 Checklist jed124

Lesson 2 covers Python fundamentals (many of which are common to other programming languages) and gives you a chance to practice these in a project. To complete this lesson, you are required to do the following:

  1. Download the Lesson 2 data and extract it to C:\PSU\Geog485\Lesson2.
  2. Work through the online sections of the lesson.
  3. Read the remainder of Zandbergen chapters 4-6 that we didn't cover in Lesson 1 and sections 7.1 - 7.5. In the online lesson pages, I have inserted instructions about when it is most appropriate to read each of these chapters. There is more reading in this lesson than in a typical week. If you are new to Python, please plan some extra time to read these chapters. There are also some readings this week from the ArcGIS Help.
  4. Complete Project 2 and upload its deliverables to the Lesson 2 drop box in Canvas. The deliverables are listed in the Project 2 description page.
  5. Complete the Lesson 2 Quiz in Canvas.

Do items 1 - 3 (including any of the practice exercises you want to attempt) during the first week of the lesson. You will need the second week to concentrate on the project and quiz.

Lesson Objectives

By the end of this lesson, you should:

  • understand basic Python syntax for conditional statements and program flow control (if-else, comparison operators, for loop, while loop);
  • be familiar with more advanced data types (strings, lists), string manipulation, and casting between different types;
  • know how to debug code and how to use the debugger;
  • be able to program basic scripts that use conditional statements and loops to automate tasks.

2.1 More Python fundamentals

2.1 More Python fundamentals jed124

In Lesson 1, you saw your first Python scripts and were introduced to the basics, such as importing modules, using arcpy, working with properties and methods, and indenting your code in try/catch blocks. In the following sections, you'll learn about more Python programming fundamentals such as working with lists, looping, if/then decision structures, manipulating strings, and casting variables.

Although this might not be the most thrilling section of the course, it's probably the most important section for you to spend time understanding and experimenting with on your own, especially if you are new to programming.

Programming is similar to playing sports: if you take time to practice the fundamentals, you'll have an easier time when you need to put all your skills together. For example, think about the things you need to learn in order to play basketball. A disciplined basketball player practices dribbling, passing, long-range shooting, layup shots, free throws, defense, and other skills. If you practice each of these fundamentals well individually, you'll be able to put them together when it's time to play a full game.

Learning a programming language is the same way. When faced with a problem, you'll be forced to draw on your fundamental skills to come up with a workable plan. You may need to include a loop in your program, store items in a list, or make the program do one of four different things based on certain user input. If you know how to do each of these things individually, you'll be able to fit the pieces together, even if the required task seems daunting.

Take time to make sure you understand what's happening in each line of the code examples, and if you run into a question, please post to the forums.

2.1.1 Lists

2.1.1 Lists jed124

In Lesson 1, you learned about some common data types in Python, such as strings and integers. Sometimes you need a container type that can store multiple related values together. Python offers several ways of doing this, and the first one we'll learn about is the list.

Here's a simple example of a list. You can type this in the PyScripter Python Interpreter to follow along:

>>> suits = ['Spades', 'Clubs', 'Diamonds', 'Hearts']

This list named 'suits' stores four related string values representing the suits in a deck of cards. In many programming languages, storing a group of objects in sequence like this is done with arrays. While the Python list could be thought of as an array, it's a little more flexible than the typical array in other programming languages. This is because you're allowed to put multiple data types into one list.

For example, suppose we wanted to make a list for the card values you could draw. The list might look like this:

>>> values = ['Ace', 2, 3, 4, 5, 6, 7, 8, 9, 10, 'Jack', 'Queen', 'King']

Notice that you just mixed string and integer values in the list. Python doesn't care. However, each item in the list still has an index, meaning an integer that denotes each item's place in the list. The list starts with index 0 and for each item in the list, the index increments by one. Try this:

>>> print (suits[0])
Spades
>>> print (values[12])
King

In the above lines, you just requested the item with index 0 in the suits list and got 'Spades'. Similarly, you requested the item with index 12 in the values list and got 'King'.

It may take some practice initially to remember that your lists start with a 0 index. Testing your scripts can help you avoid off-by-one errors that might result from forgetting that lists are zero-indexed. For example, you might set up a script to draw 100 random cards and print the values. If none of them is an Ace, you've probably stacked the deck against yourself by making the indices begin at 1.

Remember you learned that everything is an object in Python? That applies to lists too. In fact, lists have a lot of useful methods that you can use to change the order of the items, insert items, sort the list, and so on. Try this:

>>> suits = ['Spades', 'Clubs', 'Diamonds', 'Hearts']
>>> suits.sort()
>>> print (suits)
['Clubs', 'Diamonds', 'Hearts', 'Spades']

Notice that the items in the list are now in alphabetical order. The sort() method allowed you to do something in one line of code that would have otherwise taken many lines. Another helpful method like this is reverse(), which allows you to sort a list in reverse alphabetical order:

>>> suits.reverse()
>>> print (suits)
['Spades', 'Hearts', 'Diamonds', 'Clubs']

Before you attempt to write list-manipulation code, check your textbook or the Python list reference documentation to see if there's an existing method that might simplify your work.

Inserting items and combining lists

What happens when you want to combine two lists? Type this in the PyScripter Interpreter:

>>> listOne = [101,102,103]
>>> listTwo = [104,105,106]
>>> listThree = listOne + listTwo
>>> print (listThree)
[101, 102, 103, 104, 105, 106]

Notice that you did not get [205,207,209]; rather, Python treats the addition as appending listTwo to listOne. Next, try these other ways of adding items to the list:

>>> listThree += [107]
>>> print (listThree)
[101, 102, 103, 104, 105, 106, 107]
>>> listThree.append(108)
>>> print (listThree)
[101, 102, 103, 104, 105, 106, 107, 108]

To put an item at the end of the list, you can either add a one-item list (how we added 107 to the list) or use the append() method on the list (how we added 108 to the list). Notice that listThree += [107] is a shortened form of saying listThree = listThree + [107].

If you need to insert some items in the middle of the list, you can use the insert() method:

>>> listThree.insert(4, 999)
>>> print (listThree)
[101, 102, 103, 104, 999, 105, 106, 107, 108]

Notice that the insert() method above took two parameters. You might have even noticed a tooltip that shows you what the parameters mean.

The first parameter is the index position that the new item will take. This method call inserts 999 between 104 and 105. Now 999 is at index 4.

Getting the length of a list

Sometimes you'll need to find out how many items are in a list, particularly when looping. Here's how you can get the length of a list:

>>> myList = [4,9,12,3,56,133,27,3]
>>> print (len(myList))
8

Notice that len() gives you the exact number of items in the list. To get the index of the final item, you would need to use len(myList) - 1. Again, this distinction can lead to off-by-one errors if you're not careful.

Other ways to store collections of data

Lists are not the only way to store ordered collections of items in Python; you can also use tuples and dictionaries. Tuples are like lists, but they are immutable. This means that you can't change the objects inside the tuple. In some cases, a tuple might actually be a better structure for storing values like the suits in a deck of cards because this is a fixed list that you wouldn't want your program to change by accident.

Dictionaries differ from lists in that items are not indexed; instead, each item is stored with a key and the value as a key:value pair. We'll use dictionaries later in the course, and your reading assignment for this lesson covers dictionary basics. The best way to understand how dictionaries work is to play with some of the textbook examples in the PyScripter Python Interpreter (see Zandbergen 4.17).

2.1.2 Loops

2.1.2 Loops jed124

A loop is a section of code that repeats an action. Remember, the power of scripting (and computing in general) is the ability to quickly repeat a task that might be time-consuming or error-prone for a human. Looping is how you repeat tasks with code; whether it's reading a file, searching for a value, or performing the same action on each item in a list.

for Loop

A for loop does something with each item in a list. Type this in the PyScripter Python Interpreter to see how a simple for loop works:

>>> for name in ["Carter", "Reagan", "Bush"]:
 	print (name + " was a U.S. president.")

After typing this, you'll have to hit Enter twice in a row to tell PyScripter that you are done working on the loop and that the loop should be executed. You should see:

Carter was a U.S. president
Reagan was a U.S. president
Bush was a U.S. president

Notice a couple of important things about the loop above. First, you declared a new variable name after the word for to represent each item in the list as you iterated through. This variable name will get set to the current item in the list that is being iterated. Using the example above, the first iteration it is set to Carter. Once all of code logic is executed, and the iteration for that item is complete, name is then set to the next item in the list, which is Reagan. This continues until there is no more items in the list to iterate over, or a special command is used to stop the iteration.

The second thing to notice is that after the condition, or the first line of the loop, you typed a colon (:), then started indenting subsequent lines. Some programming languages require you to type some kind of special line or character at the end of the loop (for example, "Next" in Visual Basic, or "}" in JavaScript), but Python just looks for the place where you stop indenting. By pressing Enter twice, you told Python to stop indenting and that you were ready to run the loop.

for loops can also work with lists of numbers. Try this one in the PyScripter Python Interpreter:

>>> x = 2
>>> multipliers = [1,2,3,4]
>>> for num in multipliers:
 	print (x * num)

2
4
6
8

In the loop above, you multiplied each item in the list by 2. Notice that you can set up your list before you start coding the loop.

You could have also done the following with the same result:

>>> multipliers = [1,2,3,4]
>>> for num in multipliers:
 	x = 2
 	print (x * num)

The above code, however, is less efficient than what we did initially. Can you see why? This time, you are declaring and setting the variable x=2 inside the loop. The Python interpreter will now have to read and execute that line of code four times instead of one. You might think this is a trivial amount of work, but if your list contained thousands or millions of items, the difference in execution time would become noticeable. Declaring and setting variables outside a loop, whenever possible, is a best practice in programming.

While we're on the subject, what would you do if you wanted to multiply 2 by every number from 1 to 1000? It would definitely be too much typing to manually set up a multipliers list, as in the previous example. In this case, you can use Python's built-in range function. Try this:

>>> x = 2
>>> for num in range(1,1001):
 	print (x * num)

The range function is your way of telling Python, "Start here and stop there." We used 1001 because the loop stops one item before the function's second argument (the arguments are the values you put in parentheses to tell the function how to run). If you need the function to multiply by 0 at the beginning as well, you could even get away with using one argument:

>>> x = 2
>>> for num in range(1001):
 	 print (x * num)

The range function has many interesting uses, which are detailed in this section's reading assignment in Zandbergen.

while Loops

A while loop executes until some condition is met. Here's how to code our example above using a while loop:

>>> x = 0
>>> while x < 1001:
 	 print (x * 2)
         x += 1

while loops often involve the use of some counter that keeps track of how many times the loop has run. Sometimes you'll perform operations with the counter. For example, in the above loop, x was the counter, and we also multiplied the counter by 2 each time during the loop. To increment the counter, we used x += 1 which is shorthand for x = x + 1, or "add one to x".

Nesting loops

Some situations call for putting one loop inside another, a practice called nesting. Nested loops could help you print every card in a deck (minus the Jokers):

>>> suits = ['Spades', 'Clubs', 'Diamonds', 'Hearts']
>>> values = ['Ace', 2, 3, 4, 5, 6, 7, 8, 9, 10, 'Jack', 'Queen', 'King']
>>> for suit in suits:
 	 for value in values:
 	     print (str(value) + " of " + str(suit))

In the above example, you start with a suit, then loop through each value in the suit, printing out the card name. When you've reached the end of the list of values, you jump out of the nested loop and go back to the first loop to get the next suit. Then you loop through all values in the second suit and print the card names. This process continues until all the suits and values have been looped through.

Looping in GIS models

You will use looping repeatedly (makes sense!) as you write GIS scripts in Python. Often, you'll need to iterate through every row in a table, every field in a table, or every feature class in a folder or a geodatabase. You might even need to loop through the vertices of a geographic feature.

You saw above that loops work particularly well with lists. arcpy has some methods that can help you create lists. Here's an example you can try that uses arcpy.ListFeatureClasses(). First, manually create a new folder C:\PSU\Geog485\Lesson2\PracticeData. Then copy the code below into a new script in PyScripter and run the script. The script copies all the data in your Lesson2 folder into the new Lesson2\PracticeData folder you just created.

# Copies all feature classes from one folder to another
import arcpy

try:
    arcpy.env.workspace = "C:/PSU/Geog485/Lesson2"

    # List the feature classes in the Lesson 2 folder
    fcList = arcpy.ListFeatureClasses()

    # Loop through the list and copy the feature classes to the Lesson 2 PracticeData folder
    for featureClass in fcList:
        arcpy.CopyFeatures_management(featureClass, "C:/PSU/Geog485/Lesson2/PracticeData/" + featureClass)

except:
    print ("Script failed to complete")
    print (arcpy.GetMessages(2))

Notice above that once you have a Python list of feature classes (fcList), it's very easy to set up the loop condition (for featureClass in fcList:).

Another common operation in GIS scripts is looping through table data. The arcpy module contains some special data-access objects called 'cursors' that help you do this. Here's a short script showing how a cursor can loop through each row in a feature class and print the attribute value from the attribute field 'Name'. We'll cover cursors in detail in the next lesson, so don't worry if some of this code looks confusing right now. The important thing is to notice how a loop is used to iterate through each record:

import arcpy
inTable = "C:/PSU/Geog485/Lesson2/CityBoundaries.shp"
inField = "NAME"

rows = arcpy.SearchCursor(inTable)

#This loop goes through each row in the table
#  and gets a requested field value

for row in rows:
    currentCity = row.getValue(inField)
    print (currentCity)

In the above example, a search cursor object is set to the variable named rows that retrieves data from the table. The for loop provides the iteration over the returned data by setting each item to the variable row, and makes it possible to perform an action on the row's attributes.

ArcGIS Help reading

Read the following in the ArcGIS Pro Help:

2.1.3 Decision structures

2.1.3 Decision structures jed124

Many scripts that you write will need to have conditional logic that executes a block of code given a condition and perhaps executes a different block of code given a different condition. The "if", "elif", and "else" statements in Python provide this conditional logic. Try typing this example in the Python Interpreter:

x = 3
if x > 2:
    print ("Greater than two")
Greater than two

In the above example, the keyword "if" denotes that some conditional test is about to follow. In this case, the condition of x being greater than two was met, so the script printed "Greater than two." Notice that you are required to put a colon (:) after the condition and indent any code executing because of the condition. For consistency in this class, all indentation is done using four spaces.

Using "else" is a way to run code if the condition isn't met. Try this:

x = 1
if x > 2:
    print ("Greater than two")
else:
    print ("Less than or equal to two")
Less than or equal to two

Notice that you don't have to put any condition after "else." It's a way of catching all other cases. Again, the conditional code is indented four spaces, which makes the code very easy for a human to scan. The indentation is required because Python doesn't require any type of "end if" statement (like many other languages) to denote the end of the code you want to execute.

If you want to run through multiple conditions, you can use "elif", which is Python's abbreviation for "else if":

x = 2
if x > 2:
   print ("Greater than two")
elif x == 2:
   print ("Equal to two")
else:
   print ("Less than two")
Equal to two

In the code above, elif x == 2: tests whether x is equal to two. The == is a way to test whether two values are equal. Using a single = in this case would result in an error because = is used to assign values to variables. In the code above, you're not trying to assign x the value of 2, you want to check if x is already equal to 2, hence you use ==.

Caution: Using = instead of == to check for equivalency is a very common Python mistake, especially if you've used other languages where = is allowed for equivalency checks.

You can also use if, elif, and else to handle multiple possibilities in a set. The code below picks a random school from a list (notice we had to import the random module to do this and call a special method random.randrange()). After the school is selected and its name is printed, a series of if/elif/else statements appears that handles each possibility. Notice that the else statement is left in as an error handler; you should not run into that line if your code works properly, but you can leave the line in there to fail gracefully if something goes wrong.

import random

# Choose a random school from a list and print it
schools = ["Penn State", "Michigan", "Ohio State", "Indiana"]
randomSchoolIndex = random.randrange(0,4)
chosenSchool = schools[randomSchoolIndex]
print (chosenSchool)

# Depending on the school, print the mascot
if chosenSchool == "Penn State":
    print ("You're a Nittany Lion")
elif chosenSchool == "Michigan":
    print ("You're a Wolverine")
elif chosenSchool == "Ohio State":
    print ("You're a Buckeye")
elif chosenSchool == "Indiana":
    print ("You're a Hoosier")
else:
    print ("This program has an error")

Another way to handle the conditional logic above is to employ Python's match/case approach. To try this approach, replace the if block in the bottom half of the script with the snippet below:

# Depending on the school, print the mascot
match chosenSchool:
    case "Penn State":
        print ("You're a Nittany Lion")
    case "Michigan":
        print ("You're a Wolverine")
    case "Ohio State":
        print ("You're a Buckeye")
    case "Indiana":
        print ("You're a Hoosier")
    case _:
        print ("This program has an error")

2.1.4 String manipulation

2.1.4 String manipulation jed124

You've previously learned how the string variable can contain numbers and letters and represent almost anything. When using Python with ArcGIS, strings can be useful for storing paths to data and printing messages to the user. There are also some geoprocessing tool parameters that you'll need to supply with strings.

Python has some very useful string manipulation abilities. We won't get into all of them in this course, but the following are a few techniques that you need to know.

Concatenating strings

To concatenate two strings means to append or add one string on to the end of another. For example, you could concatenate the strings "Python is " and "a scripting language" to make the complete sentence "Python is a scripting language." Since you are adding one string to another, it's intuitive that in Python you can use the + sign to concatenate strings.

You may need to concatenate strings when working with path names. Sometimes it's helpful or required to store one string representing the folder or geodatabase from which you're pulling datasets and a second string representing the dataset itself. You put both together to make a full path.

The following example, modified from one in the ArcGIS Help, demonstrates this concept. Suppose you already have a list of strings representing feature classes that you want to clip. The list is represented by "featureClassList" in this script:

# This script clips all datasets in a folder
import arcpy

inFolder = "c:\\data\\inputShapefiles\\"
resultsFolder = "c:\\data\\results\\"
clipFeature = "c:\\data\\states\\Nebraska.shp"

# List feature classes
arcpy.env.workspace = inFolder
featureClassList = arcpy.ListFeatureClasses()

# Loop through each feature class and clip
for featureClass in featureClassList:
    
    # Make the output path by concatenating strings
    outputPath = resultsFolder + featureClass
    # Clip the feature class
    arcpy.Clip_analysis(featureClass, clipFeature, outputPath)

String concatenation is occurring in this line: outputPath = resultsFolder + featureClass. In longhand, the output folder "c:\\data\\results\\" is getting the feature class name added on the end. If the feature class name were "Roads.shp" the resulting output string would be "c:\\data\\results\\Roads.shp".

The above example shows that string concatenation can be useful in looping. Constructing the output path by using a set workspace or folder name followed by a feature class name from a list gives much more flexibility than trying to create output path strings for each dataset individually. You may not know how many feature classes are in the list or what their names are. You can get around that if you construct the output paths on the fly through string concatenation.

Casting to a string

Sometimes in programming, you have a variable of one type that needs to be treated as another type. For example, 5 can be represented as a number or as a string. Python can only perform math on 5 if it is treated as a number, and it can only concatenate 5 onto an existing string if it is treated as a string.

Casting is a way of forcing your program to think of a variable as a different type. Create a new script in PyScripter, and type or paste the following code:

x = 0
while x < 10:
    print (x)
    x += 1

print ("You ran the loop " + x + " times.")

Now, try to run it. The script attempts to concatenate strings with the variable x to print how many times you ran a loop, but it results in an error: "TypeError: must be str not int." Python doesn't have a problem when you want to print the variable x on its own, but Python cannot mix strings and integer variables in a printed statement. To get the code to work, you have to cast the variable x to a string when you try to print it.

x = 0
while x < 10:
    print (x)
    x += 1

print ("You ran the loop " + str(x) + " times.")

You can force Python to think of x as a string by using str(x). Python has other casting functions such as int() and float() that you can use if you need to go from a string to a number. Use int() for integers and float() for decimals.

Readings

It's time to take a break and do some readings from another source. If you are new to Python scripting, this will help you see the concepts from a second angle.

Finish reading Zandbergen chapters 4 - 6 as detailed below. This can take a few hours, but it will save you hours of time if you make sure you understand this material now.

ArcGIS Pro edition:

  • Chapter 4 covers the basics of Python syntax, loops, strings, and other things we just learned. Please read Chapter 4, noting the following:
    - You were already assigned 4.1-4.7 in Lesson 1, so you may skip those sections if you have a good handle on those topics.
    - You may also skip 4.17-4.18, and 4.26.
    - Section 4.25 demonstrates use of the input() function, using the IDLE and PyCharm IDEs. IDEs vary in their implementation of the type() function. In PyScripter, the user is expected to enter a value in the PyScripter Python Interpreter.
    - Another way of providing input to a script in PyScripter is by going to Run > Command Line Parameters, checking the Use Command line Parameters box, entering the arguments in the box that appears, then clicking Run.
  • You've already read most of Chapter 5 on geoprocessing with arcpy. Now please read 5.10 & 5.14.
  • Chapter 6 gives some specific instructions about working with ArcGIS datasets, which will be valuable during this week's assigned project. Please read all sections, 6.1-6.7.

If you still don't feel like you understand the material after reading the above chapters, don't re-read it just yet. Try some coding from the Lesson 2 practice exercises and assignments, then come back and re-read if necessary. If you are really struggling with a particular concept, type the examples in the console. Programming is like a sport in the sense that you cannot learn all about it by reading; at some point, you have to get up and do it.

2.1.5 Putting it all together

2.1.5 Putting it all together jed124

Throughout this lesson, you've learned the basic programming concepts of lists, loops, decision structures, and string manipulation. In this section, we'll practice putting them all together to address a scenario. This will give us an opportunity to talk about strategies for approaching programming problems in general and you might be surprised at what you can do with just these skills.

The scenario we'll tackle is to simulate a one-player game of Hasbro's children's game "Hi Ho! Cherry-O." In this simple game of chance, you begin with 10 cherries on a tree. You take a turn by spinning a random spinner, which tells you whether you get to add or remove cherries on the turn. The possible spinner results are:

  • Remove 1 cherry.
  • Remove 2 cherries.
  • Remove 3 cherries.
  • Remove 4 cherries.
  • Bird visits your cherry bucket (Add 2 cherries).
  • Dog visits your cherry bucket (Add 2 cherries).
  • Spilled bucket (Place all 10 cherries back on your tree).

You continue taking turns until you have 0 cherries left on your tree, at which point you have won the game. Your objective here is to write a script that simulates the game, printing the following:

  • The result of each spin.
  • The number of cherries on your tree after each turn. This must always be between 0 and 10;
  • The final number of turns needed to win the game.

Approaching a programming problem

Although this example may seem juvenile, it's an excellent way to practice everything you just learned. As a beginner, you may seem overwhelmed by the above problem. A common question is, "Where do I start?" The best approach is to break down the problem into smaller chunks of things you know how to do.

One of the most important programming skills you can acquire is the ability to verbalize a problem and translate it into a series of small programming steps. Here's a list of things you would need to do in this script. Programmers call this pseudocode because it's not written in code, but it follows the sequence their code will need to take.

  1. Spin the spinner.
  2. Print the spin result.
  3. Add or remove cherries based on the result.
  4. Make sure the number of cherries is between 0 and 10.
  5. Print the number of cherries on the tree.
  6. Take another turn, or print the number of turns it took to win the game.

It also helps to list the variables you'll need to keep track of:

  • Number of cherries currently on the tree (Starts at 10).
  • Number of turns taken (Starts at 0).
  • Value of the spinner (Random).

Let's try to address each of the pseudocode steps. Don't worry about the full flow of the script yet. Rather, try to understand how each step of the problem should be solved with code. Assembling the blocks of code at the end is relatively trivial.

Spin the spinner

How do you simulate a random spin? In one of our previous examples, we used the random module to generate a random number within a range of integers; however, the choices on this spinner are not linear. A good approach here is to store all spin possibilities in a list and use the random number generator to pick the index for one of the possibilities. On its own, the code would look like this:

import random
spinnerChoices = [-1, -2, -3, -4, 2, 2, 10]
spinIndex = random.randrange(0, 7)
spinResult = spinnerChoices[spinIndex]

The list spinnerChoices holds all possible mathematical results of a spin (remove 1 cherry, remove 2 cherries, etc.). The final value 10 represents the spilled bucket (putting all cherries back on the tree).

You need to pick one random value out of this list to simulate a spin. The variable spinIndex represents a random integer from 0 to 6 that is the index of the item you'll pull out of the list. For example, if spinIndex turns out to be 2, your spin is -3 (remove 3 cherries from the tree). The spin is held in the variable spinResult.

The random.randrange() method is used to pick the random numbers. At the beginning of your script, you have to import the random module in order to use this method.

Print the spin result

Once you have a spin result, it only takes one line of code to print it. You'll have to use the str() method to cast it to a string, though.

print ("You spun " + str(spinResult) + ".")

Add or remove cherries based on the result

As mentioned above, you need to have some variable to keep track of the number of cherries on your tree. This is one of those variables that it helps to name intuitively:

cherriesOnTree = 10

After you complete a spin, you need to modify this variable based on the result. Remember that the result is held in the variable spinResult, and that a negative spinResult removes cherries from your tree. So your code to modify the number of cherries on the tree would look like:

cherriesOnTree += spinResult

Remember, the above is shorthand for saying cherriesOnTree = cherriesOnTree + spinResult.

Make sure the number of cherries is between 0 and 10

If you win the game, you have 0 cherries. You don't have to reach 0 exactly, but it doesn't make sense to say that you have negative cherries. Similarly, you might spin the spilled bucket, which for simplicity we represented with positive 10 in the spinnerChoices. You are not allowed to have more than 10 cherries on the tree.

A simple if/elif decision structure can help you keep the cherriesOnTree within 0 and 10:

if cherriesOnTree > 10:
    cherriesOnTree = 10
elif cherriesOnTree < 0:
    cherriesOnTree = 0

This means, if you wound up with more than 10 cherries on the tree, set cherriesOnTree back to 10. If you wound up with fewer than 0 cherries, set cherriesOnTree to 0.

Print the number of cherries on the tree

All you have to do for this step is to print your cherriesOnTree variable, casting it to a string, so it can legally be inserted into a sentence.

print ("You have " + str(cherriesOnTree) + "cherries on your tree.")

Take another turn or print the number of turns it took to win the game

You probably anticipated that you would have to figure out a way to take multiple turns. This is the perfect scenario for a loop.

What is the loop condition? There have to be some cherries left on the tree in order to start another turn, so you could begin the loop this way:

while cherriesOnTree > 0:

Much of the code we wrote above would go inside the loop to simulate a turn. Since we need to keep track of the number of turns taken, at the end of the loop we need to increment a counter:

turns += 1

This turns variable would have to be initialized at the beginning of the script, before the loop.

This code could print the number of turns at the end of the game:

print ("It took you " + str(turns) + "turns to win the game.")

Final code

Your only remaining task is to assemble the above pieces of code into a script. Below is an example of how the final script would look. Copy this into a new PyScripter script and try to run it:

# Simulates one game of Hi Ho! Cherry-O

import random

spinnerChoices = [-1, -2, -3, -4, 2, 2, 10]
turns = 0
cherriesOnTree = 10

# Take a turn as long as you have more than 0 cherries
while cherriesOnTree > 0:

    # Spin the spinner
    spinIndex = random.randrange(0, 7)
    spinResult = spinnerChoices[spinIndex]

    # Print the spin result    
    print ("You spun " + str(spinResult) + ".")

    # Add or remove cherries based on the result
    cherriesOnTree += spinResult

    # Make sure the number of cherries is between 0 and 10   
    if cherriesOnTree > 10:
        cherriesOnTree = 10
    elif cherriesOnTree < 0:
        cherriesOnTree = 0

    # Print the number of cherries on the tree       
    print ("You have " + str(cherriesOnTree) + " cherries on your tree.")

    turns += 1

# Print the number of turns it took to win the game
print ("It took you " + str(turns) + " turns to win the game.")
lastline = input(">")

Analysis of the final code

Review the final code closely and consider the following things.

The first thing you do is import whatever supporting modules you need; in this case, it's the random module.

Next, you declare the variables that you'll use throughout the script. Each variable has a scope, which determines how broadly it is used throughout the script. The variables spinnerChoices, turns, and cherriesOnTree are needed through the entire script, so they are declared at the beginning, outside the loop. Variables used throughout your entire program like this have global scope. On the other hand, the variables spinIndex and spinResult have local scope because they are used only inside the loop. Each time the loop runs, these variables are re-initialized and their values change.

You could potentially declare the variable spinnerChoices inside the loop and get the same end result, but performance would be slower because the variable would have to be re-initialized every time you ran the loop. When possible, you should declare variables outside loops for this reason.

If you had declared the variables turns or cherriesOnTree inside the loop, your code would have logical errors. You would essentially be starting the game anew on every turn with 10 cherries on your tree, having taken 0 turns. In fact, you would create an infinite loop because there is no way to remove 10 cherries during one turn, and the loop condition would always evaluate to true. Again, be very careful about where you declare your variables when the script contains loops.

Notice that the total number of turns is printed outside the loop once the game has ended. The final line lastline = input(">") gives you an empty cursor prompting for input and is just a trick to make sure the application doesn't disappear when it's finished (if you run the script from a command console).

Summary

In the above example, you saw how lists, loops, decision structures, and variable casting can work together to help you solve a programming challenge. You also learned how to approach a problem one piece at a time and assemble those pieces into a working script. You'll have a chance to practice these concepts on your own during this week's assignment. The next and final section of this lesson will provide you with some sources of help if you get stuck.

Challenge activity

If the above activity made you enthusiastic about writing some code yourself, take the above script and try to find the average number of turns it takes to win a game of Hi-Ho! Cherry-O. To do this, add another loop that runs the game a large number of times, say 10000. You'll need to record the total number of turns required to win all the games, then divide by the number of games (use "/" for the division). Send me your final result, and I'll let you know if you've found the correct average.

2.2 Troubleshooting and getting help

2.2 Troubleshooting and getting help jed124

If you find writing code to be a slow and sometimes confusing process, that’s a normal part of programming. A large portion of a programmer’s time is spent identifying and resolving bugs, and learning to work with new languages and technologies is an ongoing process that involves research, practice, and experimentation.

Success in software development is less about how many languages or acronyms you can list on a resume, and more about being self-sufficient: the ability to learn new tools and solve problems independently. This doesn’t mean programmers never ask for help. On the contrary, good programmers know when it’s appropriate to consult peers or supervisors. At the same time, many common challenges can be addressed using documentation, online examples, forums, existing code, programming books, or debugging tools. Relying on generative AI could also be detrimental since it is plagued with errors that could do damage to company data. For instance, if an interviewer asks, “What do you do when you run into a difficult programming problem? What resources do you turn to first?” a strong answer demonstrates that you can efficiently find solutions on your own. Many effective programmers start with well-phrased searches online. Resources like Stack Exchange or other forums often provide solutions to common questions quickly, which are sometimes faster than walking down the hall to ask a colleague. For more complex problems, collaboration is essential, but the ability to independently explore and solve routine challenges is a valuable skill that helps the team work efficiently.

In this section of the lesson, you'll learn about places where you can go for help when working with Python and when programming in general. You will have a much easier experience in this course if you remember these resources and use them as you complete your assignments.

2.2.1 Potential problems and quick diagnosis

2.2.1 Potential problems and quick diagnosis jed124

The secret to successful programming is to run early, run often, and don't be afraid of things going wrong when you run your code the first time. Debugging, or finding mistakes in code, is a part of life for programmers. Here are some things that can happen:

  • Your code doesn't run at all, usually because of a syntax error (you typed some illegal Python code).
  • Your code runs, but the script doesn't complete and reports an error.
  • Your code runs, but the script never completes. Often this occurs when you've created an infinite loop.
  • Your code runs and the script completes, but it doesn't give you the expected result. This is called a logical error, and it is often the type of error that takes the most effort to debug.

Don't be afraid of errors

Errors happen. There are very few programmers who can sit down and, off the top of their heads, write dozens of lines of bug free code. This means a couple of things for you:

  • Expect to spend some time dealing with errors during the script-writing process. Beginning programmers sometimes underestimate how much time this takes. To get an initial estimate, you can take the amount of time it takes to draft your lines of code, then double it or triple it to accommodate for error handling and final polishing of your script and tool.
  • Don't be afraid to run your script and hit the errors. A good strategy is to write a small piece of functionality, run it to make sure it's working, then add on the next piece. For the sake of discussion, let's suppose that as a new programmer, you introduce a new bug once every 10 lines of code. It's much easier to find a single bug in 10 new lines of code than it is to find 5 bugs in 50 new lines of code. If you're building your script piece by piece and debugging often, you'll be able to make a quicker and more accurate assessment of where you introduced new errors.

Catching syntax errors

Syntax errors occur when you typed something incorrectly and your code refuses to run. Common syntax errors include forgetting a colon when setting a loop or an if condition, using single backslashes in a file name, providing the wrong number of arguments to a function, or trying to mix variable types incorrectly, such as dividing a number by a string.

When you try to run code with a syntax error in PyScripter, an error message will appear in the Python Interpreter, referring to the script file that was run, along with the line number that contained the error. For example, a developer transitioning from ArcMap to ArcGIS Pro might forget that the syntax of the print function is different, resulting in the error message, "SyntaxError: Missing parentheses in call to 'print'. Did you mean print(x)?"

Dealing with crashes

If your code crashes, you may see an error message in the Python Interpreter. Instead of allowing your eyes to glaze over or banging your head against the desk, you should rejoice at the fact that the software possibly reported to you exactly what went wrong! Scour the message for clues as to what line of code caused the error and what the problem was. Do this even if the message looks intimidating. For example, see if you can understand what caused this error message (as reported by the Spyder IDE):

  runfile('C:/Users/detwiler/Documents/geog485/Lesson2/syntax_error_practice.py', wdir='C:/Users/detwiler/Documents/geog485/Lesson2')
Traceback (most recent call last):

  File ~\AppData\Local\ESRI\conda\envs\arcgispro-py3-spyd\lib\site-packages\spyder_kernels\py3compat.py:356 in compat_exec
    exec(code, globals, locals)

  File c:\users\detwiler\documents\geog485\lesson2\syntax_error_practice.py:5
    x = x / 0

ZeroDivisionError: division by zero

The message begins with a call to Python's runfile() function, which Spyder shows when the script executes successfully as well. Because there's an error in this example script, the runfile() bit is then followed by a "traceback," a report on the origin of the error. The first few lines refer to the internal Python modules that encountered the problem, which in most cases you can safely ignore. The part of the traceback you should focus on, the part that refers to your script file, is at the end; in this case, we're told that the error was caused in Line 5: x = x / 0. Dividing by 0 is not possible, and the computer won't try to do it. (PyScripter's traceback report for this same script lists only your script file, leaving out the internal Python modules, so you may find it an easier environment for locating errors.)

Error messages are not always as easy to decipher as in this case, unfortunately. There are many online forums where you might go looking for help with a broken script (Esri's GeoNet for ArcGIS-specific questions; StackOverflow, StackExchange, Quora for more generic questions). You can make it a lot easier for someone to help you if, rather than just saying something like, "I'm getting an error when I run my script. Can anyone see what I did wrong," you include the line number flagged by PyScripter along with the exact text of the error message. The exact text is important because the people trying to help you are likely to plug it into a search engine and will get better results that way. Or better yet, you could search the error message yourself! The ability to solve coding problems through the reading of documentation and searching online forums is one of the distinguishing characteristics of a good developer.

Ad-hoc debugging

Sometimes it's easy to sprinkle a few 'print' statements throughout your code to figure out how far it got before it crashed, or what's happening to certain values in your script as it runs. This can also be helpful to verify that your loops are doing what you expect and that you are avoiding off-by-one errors.

Suppose you are trying to find the mean (average) value of the items in a list with the code below.

#Find average of items in a list

list = [22,343,73,464,90]

for item in list:
    total = 0
    total += item    

average = total / len(list)
print ("Average is " + str(average))

The script reports "Average is 18," which doesn't look right. From a quick visual check of this list you could guess that the average would be over 100. The script isn't erroneously getting the number 18 from the list; it's not one of the values. So where is it coming from? You can place a few strategic print statements in the script to get a better report of what's going on:

#Find average of items in a list

list = [22,343,73,464,90]

for item in list:
    print ("Processing loop...")
    total = 0
    total += item    
    print (total)

print (len(list))
average = total / len(list)
print ("Performing division...")
print ("Average is " + str(average))

Now when you run the script you see.

Processing loop...
22
Processing loop...
343
Processing loop...
73
Processing loop...
464
Processing loop...
90
5
Performing division...
Average is 18

The error now becomes more clear. The running total isn't being kept successfully; instead, it's resetting each time the loop runs. This causes the last value, 90, to be divided by 5, yielding an answer of 18. You need to initialize the variable for the total outside the loop to prevent this from happening. After fixing the code and removing the print statements, you get:

#Find average of items in a list

list = [22,343,73,464,90]
total = 0

for item in list:
    total += item    

average = total / len(list)
print ("Average is " + str(average))

The resulting "Average is 198" looks a lot better. You've fixed a logical error in your code: an error that doesn't make your script crash, but produces the wrong result.

Although debugging with print statements is quick and easy, you need to be careful with it. Once you've fixed your code, you need to remember to remove the statements in order to make your code faster and less cluttered. Also, adding print statements becomes impractical for long or complex scripts. You can pinpoint problems more quickly and keep track of many variables at a time using the PyScripter debugger, which is covered in the next section of this lesson.

2.2.2 Using the PyScripter debugger

2.2.2 Using the PyScripter debugger jed124

Sometimes when other quick attempts at debugging fail, you need a way to take a deeper look into your script. Most integrated development environments (IDEs) like Pyscripter include some debugging tools that allow you to step through your script line-by-line to attempt to find an error. These tools allow you to keep an eye on the value of all variables in your script to see how they react to each line of code. The Debug toolbar can be a good way to catch logical errors where an offending line of code is preventing your script from returning the correct outcome. The Debug toolbar can also help you find which line of code is causing a crash.

The best way to explain the aspects of debugging is to work through an example. This time, we'll look at some code that tries to calculate the factorial of an integer (the integer is hard-coded to 5 in this case). In mathematics, a factorial is the product of an integer and all positive integers below it. Thus, 5! (or "5 factorial") should be 5 * 4 * 3 * 2 * 1 = 120.

The code below attempts to calculate a factorial through a loop that increments the multiplier by 1 until it reaches the original integer. This is a valid approach since 1 * 2 * 3 * 4 * 5 would also yield 120.

# This script calculates the factorial of a given
#  integer, which is the product of the integer and
#  all positive integers below it.

number = 5
multiplier = 1

while multiplier < number:
    number *= multiplier
    multiplier += 1

print (number)

Even if you can spot the error, follow along with the steps below to get a feel for the debugging process and the PyScripter Debug toolbar.

Open PyScripter and copy the above code into a new script.

  1. Save your script as debugger_walkthrough.py. You can optionally run the script, but you won't get a result.
  2. Click View > Toolbars and ensure Debug is checked. You should see a toolbar like this: PyScripter Debugging Toolbar Many IDEs have debugging toolbars like this, and the tools they contain are pretty standard: a way to run the code, a way to set breakpoints, a way to step through the code line by line, and a way to watch the value of variables while stepping through the code. We'll cover each of these in the steps below.
  3. Move your cursor to the left of line 5 (number = 5) and click. If you are in the right area, you will see a red dot next to the line number, indicating the addition of a breakpoint. A breakpoint is a place where you want your code to stop running, so you can examine it line by line using the debugger. Often you'll set a breakpoint deep in the middle of your script, so you don't have to examine every single line of code. In this example, the script is very short, so we're putting the breakpoint right at the beginning. The breakpoint is represented by a circle next to the line of code, and this is common in other debuggers too. Breakpoint next to line 5 Note that F5 is the shortcut key for this command.
  4. Press the Debug file Play button with bug under it to indicate start debugging button . This runs your script up to the breakpoint. In the Python Interpreter console, note that the debugfile() function is run on your script rather than the normal runfile() function. Also, instead of the normal >>> prompt, you should now see a [Dbg]>>> prompt. The cursor will be on that same line in PyScripter's Editor pane, which causes that line to be highlighted.
  5. Click the Step over next function call button dot with an arrow to indicate step over or the Step into subroutine Dot with arrow pointing to it to indicate step into button. This executes the line of your code, in this case the number = 5 line. Both buttons execute the highlighted statement, but it is important to note here that they will behave differently when the statement includes a call to a function. The Step over next function call button will execute all the code within the function, return to your script, and then pause at the script's next line. You'd use this button when you're not interested in debugging the function code, just the code of your main script. The Step into subroutine button, on the other hand, is used when you do want to step through the function code one line at a time. The two buttons will produce the same behavior for this simple script. You'll want to experiment with them later in the course when we discuss writing our own functions and modules.
  6. Before going further, click the Variable window tab in PyScripter's lower pane. Here, you can track what happens to your variables as you execute the code line by line. The variables will be added automatically as they are encountered. At this point, you should see a globals {}, which contain variables from the python packages and locals {} that will contain variables created by your script. We will be looking at the locals dictionary so you can disregard the globals. Expanding the locals dictionary, you should see some are built in variables (__<name>__) and we can ignore those for now. The "number" variable should be listed, with a type of int. Expanding on the +, will expose more of the variables properties.
  7. Click the Step button again. You should now see the "multiplier" variable has been added in the Variable window, since you just executed the line that initializes that variable, as called out in the image.
    PyScripter variables are highlighted
  8. Click the Step button a few more times to cycle through the loop. Go slowly, and use the Variable window to understand the effect that each line has on the two variables. (Note that the keyboard shortcut for the Step button is F8, which you may find easier to use than clicking on the GUI.) Setting a watch on a variable is done by placing the cursor on the variable and then pressing Alt+W or right clicking in the Watches pane and selecting Add Watch At Cursor. This isolates the variable to the Watch window and you can watch the value change as it changes in the code execution.

    PyScripter Watches window with menu open

  9. Step through the loop until "multiplier" reaches a value of 10. It should be obvious at this point that the loop has not exited at the desired point. Our intent was for it to quit when "number" reached 120.

    Can you spot the error now? The fact that the loop has failed to exit should draw your attention to the loop condition. The loop will only exit when "multiplier" is greater than or equal to "number." That is obviously never going to happen as "number" keeps getting bigger and bigger as it is multiplied each time through the loop.

    In this example, the code contained a logical error. It re-used the variable for which we wanted to find the factorial (5) as a variable in the loop condition, without considering that the number would be repeatedly increased within the loop. Changing the loop condition to the following would cause the script to work:

    while multiplier < 5:

    Even better than hard-coding the value 5 in this line would be to initialize a variable early and set it equal to the number whose factorial we want to find. The number could then get multiplied independent of the loop condition variable.

  10. Click the Stop button in the Debug toolbar to end the debugging session. We're now going to step through a corrected version of the factorial script, but you may notice that the Variable window still displays a list of the variables and their values from the point at which you stopped executing. That's not necessarily a problem, but it is good to keep in mind.
  11. Open a new script, paste in the code below, and save the script as debugger_walkthrough2.py

    # This script calculates the factorial of a given
    # integer, which is the product of the integer and
    # all positive integers below it.
    number = 5
    loopStop = number
    multiplier = 1
    while multiplier < loopStop: 
        number *= multiplier
        multiplier += 1
            
    print (number)
  12. Step through the loop a few times as you did above. Watch the values of the "number" and "multiplier" variables, but also the new "loopStop" variable. This variable allows the loop condition to remain constant while "number" is multiplied. Indeed, you should see "loopStop" remain fixed at 5 while "number" increases to 120.
  13. Keep stepping until you've finished the entire script. Note that the usual >>> prompt returns to indicate you've left debugging mode.

In the above example, you used the Debug toolbar to find a logical error that had caused an endless loop in your code. Debugging tools are often your best resource for hunting down subtle errors in your code.

You can and should practice using the Debug toolbar in the script-writing assignments that you receive in this course. You may save a lot of time this way. As a teaching assistant in a university programming lab years ago, the author of this course saw many students wait a long time to get one-on-one help, when a simple walk through their code using the debugger would have revealed the problem.

Readings

Read Zandbergen 7.1 - 7.5to get his tips for debugging. Then read 7.11 and dog-ear the section on debugging as a checklist for you to review any time you hit a problem in your code during the next few weeks. The text doesn't focus solely on PyScripter's debugging tools, but you should be able to follow along and compare the tools you're reading about to what you encountered in PyScripter during the short exercise above. It will also be good for you to see how this important aspect of script development is handled in other IDEs.

2.2.3 Printing messages from the Esri geoprocessing framework

2.2.3 Printing messages from the Esri geoprocessing framework jed124

When you work with geoprocessing tools in Python, sometimes a script will fail because something went wrong with the tool. It could be that you wrote flawless Python syntax, but your script doesn't work as expected because Esri's geoprocessing tools cannot find a dataset or otherwise digest a tool parameter. You won't be able to catch these errors with the debugger, but you can get a view into them by printing the messages returned from the Esri geoprocessing framework.

Esri has configured its geoprocessing tools to frequently report what they're doing. When you run a geoprocessing tool from ArcGIS Pro, you see a box with these messages, sometimes accompanied by a progress bar. You learned in Lesson 1 that you can use arcpy.GetMessages() to access these messages from your script. If you only want to view the messages when something goes wrong, you can include them in an except block of code, like this.

try:
    . . .
except:
    print (arcpy.GetMessages())

Remember that when using try/except, in the normal case, Python will execute everything in the try-block (= everything following the "try:" that is indented relative to it) and then continue after the except-block (= everything following the "except:" that is indented relative to it). However, if some command in the try-block fails, the program execution directly jumps to the beginning of the except-block and, in this case, prints out the messages we get from arcpy.GetMessages(). After the except-block has been executed, Python will continue with the next statement after the except-block.

Geoprocessing messages have three levels of severity: Message, Warning, and Error. You can pass an index to the arcpy.GetMessages() method to filter through only the messages that reach a certain level of severity. For example, arcpy.GetMessages(2) would return only the messages with a severity of "Error". Error and warning messages sometimes include a unique code that you can use to look up more information about the message. The ArcGIS Pro Help contains topics that list the message codes and provide details on each. Some of the entries have tips for fixing the problem.

Try/except can be used in many ways and at different indentation levels in a script. For instance, you can have a single try/except construct as in the example above, where the try-block contains the entire functionality of your program. If something goes wrong, the script will output some error message and then terminate. In other situations, you may want to split the main code into different parts, each contained by its own try/except construct to deal with the particular problems that may occur in this part of the code. For instance, the following structure may be used in a batch script that performs two different geoprocessing operations on a set of shapefiles when an attempt should be made to perform the second operation, even if the first one fails.

for featureClass in fcList: 
    try:
       . . .   # perform geoprocessing operation 1
    except:
       . . .   # deal with failure of geoprocessing operation 1

    try:
       . . .   # perform geoprocessing operation 2
    except:
       . . .   # deal with failure of geoprocessing operation 2

Let us assume, the first geoprocessing operation fails for the first shapefile in fcList. As a result, the program execution will jump to the first except-block which contains the code for dealing with this error situation. In the simplest case, it will just produce some output messages. After the except-block has been executed, the program execution will continue as normal by moving on to the second try/except statement and attempt to perform the second geoprocessing operation. Either this one succeeds or it fails too, in which case the second except-block will be executed. The last thing to note is that since both try/except statements are contained in the body of the for-loop going through the different shapefiles, even if both of the operations fail for one of the files, the script will always jump back to the beginning of the loop body and continue with the next shapefile in the list which is often the desired behavior of a batch script.

It is important to note that arcpy.GetMessages() will only get messages from the arcpy geoprocessing processes and not exceptions raised by other packages such as os or pandas. A general catchall construct is to use the general Exception, which will also get the exceptions raised in arcpy.

try:
    . . .
except Exception as ex:
    print (ex)

Further reading

Please take a look at the official ArcGIS Pro documentation for more detail about geoprocessing messages. Be sure to read these topics:

  • Understanding message types and severity
  • Understanding geoprocessing tool errors and warnings - This is the gateway into the error and warning reference section of the help that explains all the error codes. Sometimes you'll see these codes in the messages you get back, and the specific help topic for the code can help you understand what went wrong. The article also talks about how you can trap for certain conditions in your own scripts and cause specific error codes to appear. This type of thing is optional, for over and above credit, in this course.

2.2.4 Other sources of help

2.2.4 Other sources of help jed124

Besides the above approaches, there are many other places you can get help. A few of them are described below. If you're new to programming, just knowing that these resources exist and how to use them can help you feel more confident. Find the ones that you prefer and return to them often. This habit will help you become a self-sufficient programmer and will improve your potential to learn any new programming language or technology.

Drawing on the resources below takes time and effort. Many people don't like combing through computer documentation, and this is understandable. However, you may ultimately save time if you look up the answer for yourself instead of waiting for someone to help you. Even better, you will have learned something new from your own experience, and things you learn this way are much easier to remember in the future.

Sources of help

Search engines

Search engines are useful for both quick answers and obscure problems. Did you forget the syntax for a loop? The quickest remedy may be to Google "for loop python" or "while loop python" and examine one of the many code examples returned. Search engines are extremely useful for diagnosing error messages. Google the error message in quotes, and you can read experiences from others who have had the same issue. If you don't get enough hits, remove the quotes to broaden the search.

One risk you run from online searches is finding irrelevant information. Even more dangerous is using irrelevant information. Research any sample code to make sure it is applicable to the version of Python you're using. Some syntax in Python 3.x, used for scripting in ArcGIS Pro, is different from the Python 2.x used for scripting in ArcMap, for example.

Esri online help

Esri maintains their entire help system online, and you'll find most of their scripting topics in the arcpy section.

Another section, which you should visit repeatedly, is the Tool Reference, which describes every tool in the toolbox and contains Python scripting examples for each. If you're having trouble understanding what parameters go in or out of a tool, or if you're getting an error back from the geoprocessing framework itself, try the Tool Reference before you do a random Internet search. You will have to visit the Tool Reference in order to be successful in some of the course projects and quizzes.

Python online help

The official Python documentation is available online. Some of it gets very detailed and takes the tone of being written by programmers for programmers. The part you'll probably find most helpful is the Python Standard Library reference, which is a good place to learn about Python's modules such as "os", "csv", "math," or "random."

Generative AI

Caution should be used when using Generative AI tools. AI responses may be sourced from earlier versions of documentation and you may get stuck in an unhelpful loop because it often lacks context. If you do not fully understand the code generated by the tool, you could be introducing malware or cause irreparable damage to your company's data and hardware.

Printed books, including your textbook

Programming books can be very hit or miss. Many books are written for people who have already programmed in other languages. Others proclaim they're aimed at beginners, but the writing or design of the book may not be intuitive or is difficult to digest. Before you drop $40 on a book, try to skim through it yourself to see if the writing generally makes sense to you (don't worry about not understanding the code--that will come along as you work through the book).

The course text Python Scripting for ArcGIS is a generally well-written introduction to just what the title says: working with ArcGIS using Python. There are a few other Python+ArcGIS books as well. If you've struggled with the material, or if you want to do a lot of scripting in the future, I may recommend picking up one of these. Your textbook can come in handy if you need to look at a very basic code example, or if you're going to use a certain type of code construct for the first time, and you want to review the basics before you write anything.

A good general Python reference is Learning Python by Mark Lutz. We previously used this text in Geog 485 before there was a book about scripting with ArcGIS. It covers beginning to advanced topics, so don't worry if some parts of it look intimidating.

Esri forums and other online forums

The Esri forums are a place where you can pose your question to other Esri software users, or read about issues other users have encountered that may be similar to yours. There is a Python Esri forum that relates to scripting with ArcGIS, and also a more general Geoprocessing Esri forum you might find useful.

Before you post a question on the Esri forums, do a little research to make sure the question hasn't been answered already, at least recently. I also suggest that you post the question to our class forums first, since your peers are working on the same problems, and you are more likely to find someone who's familiar with your situation and has found a solution.

There are many other online forums that address GIS or programming questions. You'll see them all over the Internet if you perform a Google search on how to do something in Python. Some of these sites are laden with annoying banner ads or require logins, while others are more immediately helpful. Stack Exchange is an example of a well-traveled technical forum, light on ads, that allows readers to promote or demote answers depending on their helpfulness. One of its child sites, GIS Stack Exchange, specifically addresses GIS and cartography issues.

If you do post to online forums, be sure to provide detailed information on the problem and list what you've tried already. Avoid posts such as "Here's some code that's broken, and I don't know why" followed by dozens of lines of pasted-in code. State the problem in a general sense and focus on the problem code. Include exact error messages when possible.

People on online forums are generally helpful, but expect a hostile reception if you make them feel like they are doing your academic homework for you. Also, be aware that posting or copying extensive sections of Geog 485 assignment code on the internet is a violation of academic integrity and may result in a penalty applied to your grade (see section on Academic Integrity in the course syllabus).

Class forums

Our course has discussion boards that we recommend you use to consult your peers and instructor about any Python problem that you encounter. I encourage you to check them often and to participate by both asking and answering questions. I request that you make your questions focused and avoid pasting large blocks of code that would rob someone of the benefit of completing the assignment on their own. Short, focused blocks of code that solve a specific question are definitely okay. Code blocks that are not copied directly from your assignment are also okay.

I monitor all discussion boards closely; however, sometimes I may not respond immediately because I want to give you a chance to help each other and work through problems together. If you post a question and wind up solving your own problem, please post again to let us know and include how you managed to solve the problem in case other students run into the same issue.

Lesson 2 Practice Exercises

Lesson 2 Practice Exercises jed124

Before trying to tackle Project 2, you may want to try some simple practice exercises, particularly if the concepts in this lesson are new to you. Remember to choose File > New in PyScripter to create a new script (or click the empty page icon). You can name the scripts something like Practice1, Practice2, etc.

Find the spaces in a list of names

Python String objects have an index method that enables you to find a substring within the larger string. For example, if I had a variable defined as name = "James Franklin" and followed that up with the expression name.index("Fr"), it would return the value 6 because the substring "Fr" begins at character 6 in the string held in name. (The first character in a string is at position 0.)

For this practice exercise, start by creating a list of names like the following:

beatles = ["John Lennon", "Paul McCartney", "Ringo Starr", "George Harrison"]

Then write code that will loop through all the items in the list, printing a message like the following:

"There is a space in blank's name at character (blank character)" where the first blank is filled in with the name currently being processed by the loop, and the second blank is filled in with the position of the first space in the name as returned by the index method. (You should obtain values of 4, 4, 5 and 6, respectively, for the items in the list above.)

This is a good example in which it is smart to write and test versions of the script that incrementally build toward the desired result rather than trying to write the final version in one fell swoop. For example, you might start by setting up a loop and simply printing each name. If you get that to work, give yourself a little pat on the back and then see if you can simply print the positions of the space. Once you get that working, then try plugging the name and space positions into the larger message.

Practice 1 Solution

Solution:

beatles = ["John Lennon", "Paul McCartney", "Ringo Starr", "George Harrison"]

for member in beatles:
 space = member.index(" ")
 print ("There is a space in " + member + "'s name at character " + str(space) + ".")

Explanation:

Given the info on the index method in the instructions, the trick to this exercise is applying the method to each string in the list. To loop through all the items in a list, you use a for statement that ends with 'in <list variable>', which in this case was called beatles. Between the words 'for' and 'in', you insert a variable with a name of your choosing. That variable will store a different list item on each iteration through the loop.

In my solution, I called this loop variable member. The member variable takes on the value of "John Lennon" on the first pass through the loop, "Paul McCartney" on the second pass through the loop, etc. The position of the first space in the member variable is returned by the index method and stored in a variable called space.

The last step is to plug the group member's name and the space position into the output message. This is done using the string concatenation character (+). Note that the space variable must be converted to a string before it can be concatenated with the literal text surrounding it.

Convert the names to a "Last, First" format

Build on Exercise 1 by printing each name in the list in the following format:

Last, First

To do this, you'll need to find the position of the space just as before. To extract part of a string, you can specify the start character and the end character in brackets after the string's name, as in the following:

name = "James Franklin"
print (name[6:14])  # prints Franklin

One quirky thing about this syntax is that you need to specify the end character as 1 beyond the one you really want. The final "n" in "Franklin" is really at position 13, but I needed to specify a value of 14.

One handy feature of the syntax is that you may omit the end character index if you want everything after the start character. Thus, name[6:] will return the same string as name[6:14] in this example. Likewise, the start character may be omitted to obtain everything from the beginning of the string to the specified end character.

Practice 2 Solution

Solution:

beatles = ["John Lennon", "Paul McCartney", "Ringo Starr", "George Harrison"]

for member in beatles:
    space = member.index(" ")
    last = member[space+1:]
    first = member[:space]
 print (last + ", " + first)
 # or doing away with the last and first variables
    # print (member[space+1:] + ", " + member[:space])

Explanation:

Building on the first exercise, this script also uses a for loop to process each item in the beatles list. And the index method is again used to determine the position of spaces in the names. The script then obtains all characters after the space (member[space+1:]) and stores them in a variable called last. Likewise, it obtains all characters before the space (member[:space]) and stores them in a variable called first. Again, string concatenation is used to append the first name to the last name with a comma in between.

Convert scores to letter grades

Write a script that accepts a score from 1-100 as an input parameter, then reports the letter grade for that score. Assign letter grades as follows:

A: 90-100
B: 80-89
C: 70-79
D: 60-69
F: <60

Practice 3 Solution

Solution:

import arcpy

input = arcpy.GetParameterAsText(0)
score = int(input)

if score > 89:
 print ("A")
elif score > 79:
 print ("B")
elif score > 69:
 print ("C")
elif score > 59:
 print ("D")
else:
 print ("F")

Explanation:

This exercise requires accepting an input parameter, so in my solution, I imported arcpy so that I could make use of the GetParameterAsText method. If you run this script directly in PyScripter, you can provide the input value for GetParameterAsText by going to Run > Configuration per file, checking the Command line options box, and entering a score in the Command line options text box. An alternative to using GetParameterAsText would be to use the Python function input() to read from the keyboard. Because the input parameter is being stored as a text string, I include an additional step that converts the value to an integer. While not critical in this instance, it's generally a good idea to avoid treating numeric values as strings when possible. For example, the number 10 treated as a string ("10") is considered less than 2 treated as a string ("2") because the first character in "10" is less than the first character in "2".

An 'if' block is used to print the appropriate grade for the inputted score. You may be wondering why all of the expressions handle only the lower end of the grade range and not the upper end. While it would certainly work to specify both ends of the range (e.g., 'elif score > 79 and score < 90:'), it's not really necessary if you evaluate the code as the interpreter would. If the user enters a value greater than 89, then the 'print "A"' statement will be executed and all of the other conditions associated with the if block will be ignored. Code execution would pick up with the first line after the if block. (In this case, there are no lines after the if block, so the script ends.)

What this means is that the 'elif score > 79' condition will only ever be reached if the score is not greater than 89. That makes it unnecessary to specify the upper end of the grade range. The same logic applies to the rest of the conditions in the block.

Create copies of a template shapefile

Imagine that you're again working with the Nebraska precipitation data from Lesson 1 and that you want to create copies of the Precip2008Readings shapefile for the next 4 years after 2008 (e.g., Precip2009Readings, Precip2010Readings, etc.). Essentially, you want to copy the attribute schema of the 2008 shapefile, but not the data points themselves. Those will be added later. The tool for automating this kind of operation is the Create Feature Class tool in the Data Management toolbox. Look up this tool in the Help system and examine its syntax and the example script. Note the optional template parameter, which allows you to specify a feature class whose attribute schema you want to copy. Also note that Esri uses some inconsistent casing with this tool, and you will have to call arcpy.CreateFeatureclass_management() using a lower-case "c" on "class." If you follow the examples in the Geoprocessing Tool Reference help, you will be fine.

To complete this exercise, you should invoke the Create Feature Class tool inside a loop that will cause the tool to be run once for each desired year. The range(...) function can be used to produce the list of years for your loop.

Practice 4 Solution

Solution:

import arcpy

try:
 arcpy.env.workspace = "C:\\Data\\"

 template = "Precip2008Readings.shp"

 for year in range(2009,2013):
 newfile = "Precip" + str(year) + "Readings.shp"
 arcpy.CreateFeatureclass_management(arcpy.env.workspace, newfile, "POINT", template,
                                            "DISABLED", "DISABLED", template)

except:
 print (arcpy.GetMessages())

Explanation:

This script begins by initializing the geoprocessing object, setting the workspace, and selecting the toolbox that holds the desired tool. A reference to the shapefile to be copied is then stored in a variable. Because the idea is to create a copy of that shapefile for each year from 2009-2012, a logical approach is to set up a for loop defined with bounds of 'in range(2009,2013)'. (Recall that the upper end of the range should be specified as 1 higher than the actual desired end point.)

Within the loop, the name of the new shapefile is put together by concatenating the number held in the loop variable with the parts before and after that are always the same. The CreateFeatureClass tool can then be called upon, supplying the desired workspace for the output feature class, its name, its geometry type, the feature class to use as the schema template, whether the feature class should be configured to store m and z values, and finally a spatial reference argument.  Note from the documentation that the output feature class's spatial reference can be specified in a number of different ways and will be undefined if a spatial reference argument is not supplied.  (It does not automatically take on the spatial reference of the schema template feature class.)

Note also that the inconsistent casing CreateFeatureclass looks like a mistake on the part of Esri and is unfortunately required in this script.

Clip all feature classes in a geodatabase

The data for this practice exercise consists of two file geodatabases: one for the USA and one for just the state of Iowa. The USA dataset contains miscellaneous feature classes. The Iowa file geodatabase is empty except for an Iowa state boundary feature class.

Download the data

Your task is to write a script that programmatically clips all the feature classes in the USA geodatabase to the Iowa state boundary. The clipped feature classes should be written to the Iowa geodatabase. Append "Iowa" to the beginning of all the clipped feature class names.

Your script should be flexible enough that it could handle any number of feature classes in the USA geodatabase. For example, if there were 15 feature classes in the USA geodatabase instead of three, your final code should not need to change in any way.

Practice 5 Solution

Here's one way you could approach Lesson 2 Practice Exercise 5.

If you're new to programming, I encourage you to try figuring out a solution yourself before studying this solution. Then once you've given it your best shot, study the answer carefully. The video below (6:10) explains the script in more depth. Please note that the video uses the old Python 2 syntax with no parentheses (...) for the print statements. That is the only real difference to the Python 3 solution code you see below though.

#This script clips all feature classes in a file geodatabase

import arcpy

# Create path variables
sourceWorkspace = "C:\\Data\\Lesson2PracticeExercise\\USA.gdb"
targetWorkspace = "C:\\Data\\Lesson2PracticeExercise\\Iowa.gdb"
clipFeature = "C:\\Data\\Lesson2PracticeExercise\\Iowa.gdb\\Iowa"

# Get a list of all feature classes in the USA folder
arcpy.env.workspace = sourceWorkspace
featureClassList = arcpy.ListFeatureClasses()

try:

 # Loop through all USA feature classes
 for featureClass in featureClassList:

 # Construct the output path
 outClipFeatureClass = targetWorkspace + "\\Iowa" + featureClass

 # Perform the clip and report what happened
 arcpy.Clip_analysis(featureClass, clipFeature, outClipFeatureClass)
 arcpy.AddMessage("Wrote clipped file " + outClipFeatureClass + ". ")
 print ("Wrote clipped file " + outClipFeatureClass + ". ")

except:

 # Report if there was an error
 arcpy.AddError("Could not clip feature classes")
 print ("Could not clip feature classes")
 print (arcpy.GetMessages())
    

PRESENTER: This video explains the solution to the Lesson 2 practice exercise in which we clip a bunch of feature classes from the USA geo database to the Iowa State boundary.

In line 3, we import the arcpy site package. This is required whenever we work with Esri tools or datasets.

In line 6, 7, & 8, we set up a series of string variables representing the locations on disk that we will work with during this script. Now, if you were going to go ahead and make a script tool out of this, you would eventually want to change these to arcpy.GetParameterAsText as described in the lesson text. However, while you're working on your script in PyScripter, you'll want to actually write out the full path so you can test and make sure that your script works. Once you're absolutely certain that your script works, then you can go ahead and insert arcpy.GetParameterAsText and then you can start making the script tool and test out the tool. So I recommend you keep those two parts separate, testing in PyScripter first and then making the tool.

In line 6 & 7, we set up the paths to some of the geodatabases will be working with. Line 6 is the USA geodatabase that contains the feature classes that we will clip. In line 7, is the Iowa geodatabase where we will put the output feature classes. Line 8 is the actual click feature, this is our cookie cutter feature that is the state boundary of Iowa.

Now, we've used the term workspace in the line six and seven variable names just to mean a folder or a geodatabase in which we're working. But there is an official workspace for arcpy which is the current folder or geodatabase in which arcpy is looking for items; and so, in line 11, we actually set the arcpy workspace to be the same path that we set up in line 6 which is USA, and when we start listing feature classes, that's going to be the workspace that we look in.

Line 12 is where we called the method to list feature classes and the question is, list feature classes where? And the answer is the workspace that we just set up in line 11, so it's going to default to look in arcpy's workspace for the feature classes it should list. And so, what we get out of this is a variable that we call featureClassList and it is a Python list of all the feature classes in the USA geodatabase. This is great now, because we know how to loop through a list, and we're going to do that here in just a minute.

In line 14, we begin a try block, so we'll try to run all this code from line 14 to 25. If for some reason there's a crash, the code will go down to line 27 and run the except block.

In line 17, we begin a for loop to try to go through each feature class in the list. And we create a new variable in line 17 called featureClass. So we say for featureClass in featureClassList. featureClass is just a name that we come up with for that variable. We could name it anything. In this case, it's pretty intuitive to call it featureClass.

Line 20 is setting up the output path. Now, one of our requirements was to append Iowa at the beginning of the output feature class name, and so we're doing a little bit of string manipulation using the plus sign to make a path. This particular string outClipFeatureClass is going to be created by concatenating the target workspace so— that's what you created up in line 7. So that will always be part of the output path. Then, we're going to explicitly add slash Iowa and then, on to that, we'll tack the name of the feature class that we're currently working with. This type of string concatenation to make a path is something that you will also need to do during your Lesson 2 project.

In line 23, we actually perform the clip by calling arcpy.Clip, and we need to put underscore analysis because we want to specify that we're using the Clip tool that's in the Analysis toolbox. Now, Clip, in this case, has three required parameters: the input feature class to be clipped so that's the current one that we're looping on right now—it was defined up in line 17. So that's the first parameter featureClass. The second parameter there is the clipFeature, and we defined that up in line 8. That's our cookie cutter feature what we'll be clipping all the rest of the feature classes to. And then, the third is the output path, and that's the path we just created back in line 20, and once we have all three of those things, we can run the Clip tool.

In lines 24 and 25, we report what happened, and we use a little bit of string concatenation there as well to make a custom message saying that exactly the path of the clipped file that we created. This will be useful in your project, as well. Now, this is an either/or scenario you would need to call either AddMessage or do a print statement in line 25. I just put both of them in here so you can see how they were both used. If you're going to go ahead and make a script tool out of this, that would go in your toolbox in ArcGIS, then you would use the approach in line 24, AddMessage. If you're just going to run this inside of PyScripter, then you would use line 25. If for some reason there were to be a crash in the above code, it would jump down to line 27 and run the except statement, and again we're using the approach of doing either an AddError, this is what you would use if you were making a toolbox, so that's line 30; or if you're just running in PyScripter, you can do line 31. And then, line 32 is another print statement you could do out of PyScripter. It actually gets the message from the Clip tool. The Esri geoprocessing tools report their own error messages, so if you were to have a problem that happened inside the Clip, you might be able to get it back this way by doing print arcpy.GetMessages.

Source: Sterling Quinn

Notes

  • The code above both prints messages and uses arcpy.AddMessage and arcpy.AddError to add geoprocessing messages. Normally, you would only put one or the other in your script. When you make a script tool, take out the print statements and add the arcpy.AddMessage and arcpy.AddError functions, so that your messages will appear in the ArcGIS tool results window.

Project 2: Batch reprojection tool for vector datasets

Project 2: Batch reprojection tool for vector datasets jed124

Some GIS departments have determined a single, standard projection in which to maintain their source data. The raw datasets, however, can be obtained from third parties in other projections. These datasets then need to be reprojected into the department's standard projection. Batch reprojection, or the reprojection of many datasets at once, is a task well suited to scripting.

In this project, you'll practice Python fundamentals by writing a script that re-projects the vector datasets in a folder. From this script, you will then create a script tool that can easily be shared with others.

The tool you will write should look like the image below. It has two input parameters and no output parameters. The two input parameters are:

  1. A folder on disk containing vector datasets to be re-projected.
  2. The path to a vector dataset whose spatial reference will be used in the re-projection. For example, if you want to re-project into NAD 1983 UTM Zone 10, you would browse to some vector dataset already in NAD 1983 UTM Zone 10. This could be one of the datasets in the folder you supplied in the first parameter, or it could exist elsewhere on disk.
    Screen capture showing the project 2 tool
    Figure 2.1 The Project 2 tool with two input parameters and no output parameters.

Running the tool causes re-projected datasets to be placed on disk in the target folder.

Requirements

To receive full credit, your script:

  • must re-project shapefile vector datasets in the folder to match the target dataset's projection;
  • must append "_projected" to the end of each projected dataset name. For example: CityBoundaries_projected.shp;
  • must skip projecting any datasets that are already in the target projection;
  • must report a geoprocessing message telling which datasets were projected. In this message, the dataset names can be separated by spaces. In the message, do not include datasets that were skipped because they were already in the target projection. This must be a single message, not one message per projected dataset. Notice an example of this type of custom message below in the line "Projected . . . :"
    Screen capture showing the project 2 tool after running
    Figure 2.2 Your script must report a geoprocessing message telling which datasets were projected.
  • Must not contain any hard-coded values such as dataset names, path names, or projection names.
  • Must be made available as a script tool that can be easily run from ArcGIS Pro by someone with no knowledge of scripting.

Successful completion of the above requirements is sufficient to earn 95% of the credit on this project. The remaining 5% is reserved for "over and above" efforts which could include, but are not limited to, the following:

  • Your geoprocessing message of projected datasets contains commas between the dataset names, with no extra "trailing" comma at the end.
  • User help is provided for your script tool. This means that when you open the tool dialog and hover the mouse over the "i" icon next to each parameter, help appears in a popup box. The ArcGIS Pro Help can teach you how to do this.

You are not required to handle datum transformations in this script. It is assumed that each dataset in the folder uses the same datum, although the datasets may be in different projections. Handling transformations would cause you to have to add an additional parameter in the Project tool and would make your script more complicated than you would probably like for this assignment.

Sample data

The Lesson 2 data folder contains a set of vector shapefiles for you to work with when completing this project (delete any subfolders in your Lesson 2 data folder—you may have one called PracticeData—before beginning this project). These shapefiles were obtained from the Washington State Department of Transportation GeoData Distribution Catalog, and they represent various geographic features around Washington state. For the purpose of this project, I have put these datasets in various projections. These projections share the same datum (NAD 83) so that you do not have to deal with datum transformations.

The datasets and their original projections are:

  • CityBoundaries and StateRoutes - NAD_1983_StatePlane_Washington_South_FIPS_4602
  • CountyLines - NAD_1983_UTM_Zone_10N
  • Ferries - USA_Contiguous_Lambert_Conformal_Conic
  • PopulatedPlaces - GCS_NorthAmerican_1983

Deliverables

Deliverables for this project are as follows:

  • the source .py file containing your script;
  • a detailed written narrative that explains in plain English what the various lines of your script do.  In this document, we will be looking for you to demonstrate that you understand how the code you're submitting works. You may find it useful to emulate the practice exercise solution explanations.  Alternatively, you may record a video in which you walk the grader through your solution beginning to end.
  • the .atbx file containing your script tool;
  • a short writeup (about 300 words) describing how you approached the project, how you successfully dealt with any roadblocks, and what you learned along the way. You should include which requirements you met, or failed to meet. If you added some of the "over and above" efforts, please point these out, so the grader can look for them.

Tips

The following tips can help improve your possibility of success with this project:

  • Do not use the Esri Batch Project tool in this project. In essence, you're required to make your own variation of a batch project tool in this project by running the Project tool inside a loop. Your tool will be easier to use because it's customized to the task at hand.
  • There are a lot of ways to insert "_projected" in the name of a dataset, but you might find it useful to start by temporarily removing ".shp" and adding it back on later. To make your code work for both a shapefile (which has the extension .shp) and a feature class in a geodatabase (which does not have the extension .shp), you can use the following:

    ​rootName = fc
    if rootName.endswith(".shp"):
          rootName = rootName.replace(".shp","")

    In the above code, fc is your feature class name. If it is the name of a shapefile it will include the .shp . The replace function searches for any string ".shp" (the first parameter) in the file name and replaces it with nothing (symbolized in the second parameter by empty quotes ""). So after running this code, variable rootName will contain the name of the feature class name without the ".shp" . Since replace(...) does not change anything if the string given as the first parameter does not occur in fc, the code above can be replaced by just a single line:

    rootName = fc.replace(".shp","")
    You could also potentially chop off the last four characters using something like
    rootName = fc[:-4]

    but hard-coding numbers other than 0 or 1 in your script can make the code less readable for someone else. Seeing a function like replace is a lot easier for someone to interpret than seeing -4 and trying to figure out why that number was chosen. You should therefore use replace(...) in your solution instead.

  • To check if a dataset is already in the target projection, you will need to obtain a Spatial Reference object for each dataset (the dataset to be projected and the target dataset). You will then need to compare the spatial reference names of these two datasets. Be sure to compare the Name property of the spatial references; do not compare the spatial reference objects themselves. This is because you can have two spatial reference objects that are different entities (and are thus "not equal"), but have the same name property.

    You should end up with a line similar to this:
    if fcSR.Name != targetSR.Name: 
    where fcSR is the spatial reference of the feature class to be projected and targetSR is the target spatial reference obtained from the target projection shapefile.
  • If you want to show all the messages from each run of the Project tool, add the line: arcpy.AddMessage(arcpy.GetMessages()) immediately after the line where you run the Project tool. Each time the loop runs, it will add the messages from the current run of the Project tool into the results window. It's been my experience that if you wait to add this line until the end of your script, you only get the messages from the last run of the tool, so it's important to put the line inside the loop. Remember that while you are first writing your script, you can use print statements to debug, then switch to arcpy.AddMessage() when you have verified that your script works, and you are ready to make a script tool.
  • If, after all your best efforts, you ran out of time and could not meet one of the requirements, comment out the code that is not working (using a # sign at the beginning of each line) and send the code anyway. Then explain in your brief write-up which section is not working and what troubles you encountered. If your commented code shows that you were heading down the right track, you may be awarded partial credit.
  • Remember to remove any debugging code from your script once you're satisfied that it's working properly.

Lesson 3: GIS data access and manipulation with Python

Lesson 3: GIS data access and manipulation with Python jed124

The links below provide an outline of the material for this lesson. Be sure to carefully read through the entire lesson before returning to Canvas to submit your assignments.

Lesson 3 overview

Lesson 3 overview jed124

An essential part of a GIS, is that data can represent both the geometry (locations) of geographic features, and the attributes of those features. This combination of features and attributes is what makes GIS go beyond just "mapping". 

Much of your work as a GIS Analyst may  involve adding, modifying, and deleting features and their attribute data from the GIS. To do this, you need to know how to query and select the data that is most important to your projects. Sometimes you'll want to query a dataset to filter features that match a certain criteria (for example, single-family homes constructed before 1980) and calculate some statistics based on only the selected records (for example, percentage of those homes that experienced termite infestation).

ArcGIS Pro contains several data selection methods that an analyst can use to Create, Read, Update, and or Delete (CRUD) feature data. This can become tedious and error prone if performed manually, and arcpy provides many of these methods to be performed programmatically.  Performing these programmatically through scripting is often a faster and more accurate way to read and write large amounts of data.

Using a script to work with your data introduces some other subtle advantages over manual data entry. For example, in a script, you can add checks to ensure that the data entered conforms to a certain format. You can also chain together multiple steps of selection logic that would be time-consuming to perform in ArcGIS Pro.

This lesson explains ways to read and write GIS data using Python. We'll start off by looking at how you can create and open datasets within a script. Then, we'll practice reading and writing data using both geoprocessing tools and cursor objects. We mentioned in the previous lesson that arcpy contains special data-access objects, called cursors, that you can use to perform CRUD operations. You'll quickly see how the looping logic that you learned in Lesson 2 becomes useful when you are cycling through table data using cursors. Although this is most applicable to vector datasets, we'll also look at some ways you can manipulate rasters with Python. Once you're familiar with these concepts, Project 3 will give you a chance to practice what you've learned.

Lesson 3 checklist

Lesson 3 checklist jed124

Lesson 3 explains how to read and manipulate both vector and raster data with Python. To complete Lesson 3, you are required to do the following:

  1. Work through the course lesson materials.
  2. Read Zandbergen chapter 8.1 - 8.3 and all of chapter 10. In the online lesson pages, I have inserted instructions about when it is most appropriate to read each of these chapters.
  3. Complete Project 3, and submit your zipped deliverables to the Project 3 drop box in Canvas.
  4. Complete the Lesson 3 Quiz in Canvas .
  5. Read the Final Project proposal assignment, and begin working on your proposal, which is due after the first week of Lesson 4.

Do items 1 - 2 (including any of the practice exercises you want to attempt) during the first week of the lesson. You will need the second week of the lesson to concentrate on the project, the quiz, and the proposal assignment.

Lesson Objectives

By the end of this lesson, you should:

  • understand the use of different types of workspaces (e.g., geodatabases) in Python;
  • be able to use arcpy cursors to read and manipulate vector attribute data (search, update, and delete records);
  • know how to construct SQL query strings to realize selection by attribute with cursors or MakeFeatureLayer;
  • understand the concept of layers, how layers can be created, and how to realize selection by location with layers;
  • be able to write ArcGIS scripts that solve common tasks with vector data by combining selection by attribute, selection by location, and manipulation of attribute tables via cursors.

3.1 Data storage and retrieval in ArcGIS

3.1 Data storage and retrieval in ArcGIS jed124

Before getting into the details of how to read and modify these attributes, it's helpful to review how geographic datasets are stored in ArcGIS. You need to know this so you can open datasets in your scripts, and on occasion, create new datasets.

Geodatabases

Esri developed various ways of storing vector and raster spatial data. They encourage you to put your data in geodatabases, which are organizational structures for storing datasets and defining relationships between those datasets. Simply, a single vector dataset within a geodatabase is called a feature class or featureclass. Feature classes that share a commonality can be grouped or organized in feature datasets (featuredatasets). Geodatabases can also store raster datasets. 

  • File geodatabases store data on the local file system in a proprietary format developed by esri. They offer more functionality than shapefiles, and are best suited for personal use or small organizations.
  • ArcSDE geodatabases, or  enterprise geodatabases, store data on a central server in a relational database management system (RDBMS) such as SQL Server, Oracle, or PostgreSQL. These are large databases designed for serving data not just to one computer, but to an entire enterprise. esri developed an RDBMS extension (ArcSDE) that provides the spatial functionality and spatial management on top of the database RDBMS. It acts as a seamless "middleware" for the ArcGIS suite of products.

Geodatabases provide some additional functionality over shapefile formats. Notably they allow for longer field name lengths, more field types, and the ability for field domains, and more SQL query functionality.

Standalone datasets

Although geodatabases are essential for long-term data storage and organization, it's sometimes convenient to access datasets in a "standalone" format on the local file system. Esri's shapefile is probably the most ubiquitous standalone vector data format (it even has its own Wikipedia article). A shapefile actually consists of several files that work together to store vector geometries and attributes. The files all have the same root name, but use different extensions. You can zip the participating files together and easily email them or post them in a folder for download. In the Esri file browsers in ArcGIS Pro, the shapefiles just appear as one file.

Raster datasets are also often stored in standalone format instead of being loaded into a geodatabase. A raster dataset can be a single file, such as a JPEG or a TIFF, or, like a shapefile, it can consist of multiple files that work together.

Workspaces

The Esri geoprocessing framework often uses the notion of a workspace to denote the folder or geodatabase where you're currently working. When you specify a workspace in your script, you don't have to list the full path to every dataset. When you run a tool, the geoprocessor sees the feature class name and assumes that it resides in the workspace you specified.

Workspaces are especially useful for batch processing, when you perform the same action on many datasets in the workspace. For example, you may want to clip all the feature classes in a folder to the boundary of your county. The workflow for this is:

  1. Define a workspace.
  2. Create a list of feature classes in the workspace.
  3. Define a clip feature.
  4. Configure a loop to run on each feature class in the list.
  5. Inside the loop, run the Clip tool.

Here's some code that clips each feature class in a file geodatabase to the Alabama state boundary, then places the output in a different file geodatabase. Note how the five lines of code after import arcpy correspond to the five steps listed above.

import arcpy

arcpy.env.workspace = "C:\\Data\\USA\\USA.gdb"
featureClassList = arcpy.ListFeatureClasses()
clipFeature = "C:\\Data\\Alabama\\Alabama.gdb\\StateBoundary"

for featureClass in featureClassList:
    arcpy.Clip_analysis(featureClass, clipFeature, "C:\\Data\\Alabama\\Alabama.gdb\\" + featureClass)

In the above example, the method arcpy.ListFeatureClasses() was the key to making the list. This method looks through a workspace and makes a Python list of each feature class in that workspace. Once you have this list, you can easily configure a for loop to act on each item.

Notice that you designated the path to the workspace using the location of the file geodatabase "C:\\Data\\USA\\USA.gdb". If you were working with shapefiles, you would just use the path to the containing folder as the workspace. You can download the USA.gdb here and the Alabama.gdb here.

If you were working with ArcSDE, you would use the path to the .sde connection file when creating your workspace. This is a file that is created when you connect to ArcSDE in Catalog View, and is placed in your local profile directory. We won't be accessing ArcSDE data in this course, but if you do this at work, remember that you can use the location box as outlined above to help you understand the paths to datasets in ArcSDE.

Providing paths in Python scripts

Often in a script, you'll need to provide the path to a dataset. Knowing the syntax for specifying the path is sometimes a challenge because of the many different ways of storing data listed above. For example, below is an example of what a file geodatabase looks like if you just browse the file system of Windows Explorer. How do you specify the path to the dataset you need? This same challenge could occur with a shapefile, which, although more intuitively named, actually has three or more participating files.

Screen capture showing the windows path to a file geodatabase. Folder USA.gdb
Figure 3.1 A file geodatabase as viewed via the file system of Windows Explorer.

The safest way to get the paths you need is to open Pro's Catalog View (which displays in the middle of the application window, unlike the Catalog Pane, which displays on the right side of the application window) and browse to the dataset. The location box along the top indicates the folder or geodatabase whose contents are being viewed. Clicking on the drop down arrow within that box displays the location as a network path. That's the path you want. Here's what the same file geodatabase would look like in Pro's Catalog view. The circled path shows how you would refer to a feature class's geodatabase then add the feature class name. (Alternatively, you could right-click any feature class from either the Catalog View or Catalog Pane, go to Properties, then click the Source tab to access its path.)

Screen capture to show viewing a path in ArcCatalog. Location: C:\Datat\USA.gbd\Citites
Figure 3.2 The same file geodatabase, shown in ArcPro.

Below is an example of how you could access the feature class in a Python script using this path. This is similar to one of the examples in Lesson 1.

import arcpy
featureClass = "C:\\Data\\USA\\USA.gdb\\Cities"
desc = arcpy.Describe(featureClass)
spatialRef = desc.SpatialReference
print (spatialRef.Name)

Remember that the backslash (\) is a reserved character in Python, so you'll need to use either the double backslash (\\) or forward slash (/) in the path. Another technique you can use for paths is the raw string, which allows you to put backslashes and other reserved characters in your string as long as you put "r" before your quotation marks.

featureClass = r"C:\Data\USA\USA.gdb\Cities"
. . .

3.2 Reading vector attribute data

3.2 Reading vector attribute data jed124

This section of the lesson discusses how to search and read feature data that is stored in the featureclass (or raster) table. You can think of this as programmatically viewing the datasets attributes in the Pro GUI. The next section will cover how to write data to tables. At the end of the lesson, we'll look at rasters.

As we work with the data, it will be helpful for you to follow along, copying and pasting the example code into practice scripts. Throughout the lesson, you'll encounter exercises that you can do to practice what you just learned. You're not required to turn in these exercises; but if you complete them, you will have a greater familiarity with the code that will be helpful when you begin working on this lesson's project. It's impossible to read a book or a lesson, then sit down and write perfect code. Much of what you learn comes through trial and error and learning from mistakes. Thus, it's wise to write code often as you complete the lesson.

3.2.1 Accessing data fields

3.2.1 Accessing data fields jed124

Before we get too deep into vector data access, it's going to be helpful to quickly review how the vector data is stored in the software. Vector features in ArcGIS feature classes (remember, including shapefiles) are stored in a table. The table has rows (records) and columns (fields).

Fields in the table

Fields in the table store attribute information and the geometry for the features.

For Featureclasses, there are two required fields that you cannot delete. One of the fields (usually called Shape) contains the geometry information for the features. This includes the coordinates of each vertex in the feature and allows the feature to be drawn on the screen. The geometry is stored in binary format; if you were to see it printed on the screen, it wouldn't make any sense to you. However, you can read and work with geometries using objects that are provided with arcpy.

The other field included in every feature class is an object ID field (OBJECTID or FID). This contains a unique number, or identifier for each record that is used by ArcGIS to keep track of features. The object ID helps avoid confusion when working with data. Sometimes records have the same attributes. For example, both Los Angeles and San Francisco could have a STATE attribute of 'California,' or a USA cities dataset could contain multiple cities with the NAME attribute of 'Portland;' however, the OBJECTID field can never have the same value for two records.

The rest of the fields contain attribute information that describe the feature. These attributes are usually stored as numbers or text.

Discovering field names

When you write a script, you'll need to provide the names of the particular fields you want to read and write. You can get a Python list of field names using arcpy.ListFields().

# Reads the fields in a feature class

import arcpy

featureClass = "C:\\Data\\USA\\USA.gdb\\Cities"
fieldList = arcpy.ListFields(featureClass)
# Loop through each field in the list and print the name
for field in fieldList:
    print (field.name)

The above would yield a list of the fields in the Cities feature class in a file geodatabase named USA.gdb. If you ran this script in PyScripter (try it with one of your own feature classes!) you would see something like the following in the IPython Console.

OBJECTID
Shape
UIDENT
POPCLASS
NAME
CAPITAL
STATEABB
COUNTRY

Notice the two special fields we already talked about: OBJECTID, which holds the unique identifying number for each record, and Shape, which holds the geometry for the record. Additionally, this feature class has fields that hold the name (NAME), the state (STATEABB), whether or not the city is a capital (CAPITAL), and so on.

arcpy treats the field as an object. Therefore the field has properties that describe it. That's why you can print field.name. The help reference topic Using fields and indexes lists all the properties that you can read from a field. These include aliasName, length, type, scale, precision, and others.

Properties of a field are read-only, meaning that you can find out what the field properties are, but you cannot change those properties in a script using the Field object. If you wanted to change the scale and precision of a field, for instance, you would have to programmatically add a new field.

3.2.2 Reading through records

3.2.2 Reading through records jed124

Now that you know how to read the fields that are available, let's examine how to read through the table records.

The arcpy data access module

At version 10.1 of ArcMap, Esri released a new data access module, which offered faster performance along with more robust behavior when crashes or errors were encountered with the cursor. The module contains different cursor classes for the different operations a developer might want to perform on table records -- one for selecting, or reading, existing records; one for making changes to existing records; and one for adding new records. We'll discuss the editing cursors later, focusing our discussion now on the cursor used for reading tabular data, the search cursor.

As with all geoprocessing classes, the SearchCursor class is documented in the Help system. (Be sure when searching the Help system that you choose the SearchCursor class found in the Data Access module. An older, pre-10.1 class is available through the arcpy module and appears in the Help system as an "ArcPy Function.")

The common workflow for reading data with a search cursor is as follows:

  1. Create the search cursor. This is done through the method arcpy.da.SearchCursor(). This method takes several parameters in which you specify which dataset and, optionally, which specific rows you want to read.
  2. Set up a loop that will iterate through the rows until there are no more available to read.
  3. Within the loop, do something with the values in the current row.

Here's a very simple example of a search cursor that reads through a point dataset of cities and prints the name of each.

# Prints the name of each city in a feature class

import arcpy

featureClass = "C:\\Data\\USA\\USA.gdb\\Cities"

with arcpy.da.SearchCursor(featureClass,("NAME")) as cursor:
  for row in cursor:
    print (row[0])

Important points to note in this example:

  • The cursor is created using a "with" statement. Although the explanation of "with" is somewhat technical, the key thing to understand is that it allows your cursor to exit the dataset gracefully, whether it crashes or completes its work successfully. This is a big issue with cursors, which can sometimes maintain locks on data if they are not exited properly.
  • The "with" statement requires that you indent all the code beneath it. After you create the cursor in your "with" statement, you'll initiate a for loop to run through all the rows in the table. This requires additional indentation.
  • Creating a SearchCursor object requires specifying not just the desired table/feature class, but also the desired fields within it, as a tuple. Remember that a tuple is a Python data structure much like a list, except it is enclosed in parentheses and its contents cannot be modified. Supplying this tuple speeds up the work of the cursor because it does not have to deal with the potentially dozens of fields included in your dataset. In the example above, the tuple contains just one field, "NAME".
  • The "with" statement creates a SearchCursor object, and declares that it will be named "cursor" in any subsequent code.
  • A for loop is used to iterate through the SearchCursor (using the "cursor" name assigned to it). Just as in a loop through a list, the iterator variable can be assigned a name of your choice (here, it's called "row").
  • Retrieving values out of the rows is done using the index position of the field name in the tuple you submitted when you created the object. Since the above example submits only one item in the tuple, then the index position of "NAME" within that tuple is 0 (remember that we start counting from 0 in Python).

Here's another example where something more complex is done with the row values. This script finds the average population for counties in a dataset. To find the average, you need to divide the total population by the number of counties. The code below loops through each record and keeps a running total of the population and the number of records counted. Once all the records have been read, only one line of division is necessary to find the average. You can get the sample data for this script here.

# Finds the average population in a counties dataset
import arcpy

featureClass = "C:\\Data\\Pennsylvania\\Counties.shp"
populationField = "POP1990"
nameField = "NAME"

average = 0
totalPopulation = 0
recordsCounted = 0

print("County populations:")

with arcpy.da.SearchCursor(featureClass, (nameField, populationField)) as countiesCursor:
  for row in countiesCursor:
    print (row[0] + ": " + str(row[1]))
    totalPopulation += row[1]

    recordsCounted += 1

average = totalPopulation / recordsCounted

print ("Average population for a county is " + str(average))

Differences between this example and the previous one:

  • The tuple includes multiple fields, with their names having been stored in variables near the top of the script.
  • The SearchCursor object goes by the variable name countiesCursor rather than cursor.

Before moving on, you should note that cursor objects have a couple of methods that you may find helpful in traversing the returned records. To understand what these methods do, and to better understand cursors in general, it may help to visualize the attribute table (or excel table) with an arrow pointing at the "current row". When a cursor is first created, that arrow is pointing just above the first row in the table. When a cursor is included in a for loop, as in the above examples, each execution of the for statement moves the arrow down one row and assigns that row's values (a tuple) to the row variable. If the for statement is executed when the arrow is pointing at the last row, there is not another row to advance to and the loop will terminate. (The row variable will be left holding the last row's values.)

Imagine that you wanted to iterate through the rows of the cursor a second time. If you were to modify the Cities example above, adding a second loop immediately after the first, you'd see that the second loop would never "get off the ground" because the cursor's internal pointer is still left pointing at the last row. To deal with this problem, you could just re-create the cursor object. However, a simpler solution would be to call on the cursor's reset() method. For example:

cursor.reset()

This will cause the internal pointer (the arrow) to move just above the first row again, enabling you to loop through its rows again.

The other method supported by cursor objects is the next() method, which allows you to retrieve rows without using a for loop. Returning to the internal pointer concept, a call to the next() method moves the pointer down one row and returns the row's values (again, as a tuple). For example:

row = cursor.next()

An alternative means of iterating through all rows in a cursor is to use the next() method together with a while loop. Here is the original Cities example modified to iterate using next() and while:

# Prints the name of each city in a feature class (using next() and while)

import arcpy

featureClass = "C:\\Data\\USA\\USA.gdb\\Cities"

with arcpy.da.SearchCursor(featureClass,("NAME")) as cursor:
  try:
    row = cursor.next()
    while row:
      print (row[0])
      row = cursor.next()
  except StopIteration:
    pass

Points to note in this script:

  • This approach requires invoking next() both prior to the loop and within the loop.
  • Calling the next() method when the internal pointer is pointing at the last row will raise a StopIteration exception. The example prevents that exception from displaying an ugly error.

You should find that using a for loop is usually the better approach, and in fact, you won't see the next() method even listed in the documentation of arcpy.da.SearchCursor. We're pointing out the existence of this method because a) older ArcGIS versions have cursors that can/must be traversed in this way, so you may encounter this coding pattern if looking at older scripts, and b) you may want to use the next() method if you're in a situation where you know the cursor will contain exactly one row (or you're interested in only the first row).

In each of the cursor examples discussed thus far, the cursors retrieved all rows from the specified feature class. Continue reading to see how to retrieve just a subset of rows meeting certain criteria.

3.2.3 Retrieving records using an attribute query

3.2.3 Retrieving records using an attribute query jed124

(Note: In this section, we are only using short code sections to read through, rather than full code examples that can be run directly. However, as an exercise, you can adapt the code to, for instance, work with the Pennsylvania data set from the previous section by using Counties.shp and the POP1990 field.)

The previous examples used the SearchCursor object to read through each record in a dataset. You can get more specific with the search cursor by instructing it to retrieve just the subset of records whose attributes comply with some criteria, for example, "only records with a population greater than 10000" or "all records beginning with the letters P – Z."

For review, this is how you construct a search cursor to operate on every record in a dataset using the arcpy.da module:

with arcpy.da.SearchCursor(featureClass,(populationField)) as cursor:

If you want the search cursor to retrieve only a subset of the records based on some criteria, you can supply a SQL expression (a where clause) as the third argument in the constructor (the constructor is the method that creates the SearchCursor). For example:

with arcpy.da.SearchCursor(featureClass, (populationField), "POP2018 > 100000") as cursor:

The above example uses the SQL expression POP2018 > 100000 to retrieve only the records whose population is greater than 100000. SQL stands for "Structured Query Language" and is a special syntax used for querying datasets. If you've ever used a Definition Query to filter a layer's data in Pro, then you've had some exposure to these sorts of SQL queries. If SQL is new to you, please take a few minutes right now to read Write a query in the Query Builder in the ArcGIS Pro Help. This topic is a simple introduction to SQL in the context of ArcGIS.

SQL expressions can contain a combination of criteria, allowing you to pinpoint a very focused subset of records. The complexity of your query is limited only by your available data. For example, you could use a SQL expression to find only states with a population density over 100 people per square mile that begin with the letter M and were settled after 1850.

Note that the SQL expression you supply for a search cursor is for attribute queries, not spatial queries. You could not use a SQL expression to select records that fall "west of the Mississippi River," or "inside the boundary of Canada" unless you had previously added and populated some attribute stating whether that condition were true (for example, REGION = 'Western' or CANADIAN = True). Later in this lesson, we'll talk about how to make spatial queries using the Select By Location geoprocessing tool.

Once you retrieve the subset of records, you can follow the same pattern of iterating through them using a for loop.

with arcpy.da.SearchCursor(featureClass, (populationField), "POP2018 > 100000") as cursor:
  for row in cursor:
    print (str(row[0]))

Handling quotation marks

When you include a SQL expression in your SearchCursor constructor, you must supply it as a string. This is where things can get tricky with quotation marks since parts of the expression may also need to be quoted (specifically string values, such as a state abbreviation). The rule for writing queries in ArcGIS Pro is that string values must be enclosed in single quotes. Given that, you should enclose the overall expression in double quotes.

For example, suppose your script allows the user to enter the ID of a parcel, and you need to find it with a search cursor. Your SQL expression might look like this: " PARCEL = 'A2003KSW' ".

Handling quotation marks is simplified greatly in Pro as compared to ArcMap. In ArcMap, certain data formats require field names to be enclosed in double quotes, which when combined with string values being present in the expression, can make constructing the expression correctly quite a headache. If you find yourself needing to write ArcMap scripts that query data, check out the parallel page in the ArcMap version of this lesson (see course navigation links to the right).

Selecting records by attribute

As an ArcGIS Pro user, you've probably clicked the Select By Attributes button, located under the Map tab, to perform attribute queries. What we've been talking about in this part of the lesson probably reminded you of doing that sort of query, but it's important to note that opening a search cursor with a SQL expression (or where clause) as described above is not quite the same sort of operation. A search cursor is used in situations where you want to do some sort of processing of the records one by one. For example, as we saw, printing the names of cities or calculating their average population.

Some situations instead call for creating a selection on the feature class (i.e., treating the features identified by the query as a single unit). This can be done in Python scripts by invoking the Select Layer By Attribute tool (which is actually the tool that opens when you click the Select By Attributes button in Pro). The output from this tool -- referred to as a Feature Layer -- can then be used as the input to many other tools (Calculate Field, Copy Features, Delete Features, to name a few). This second tool's processing will be limited to the selection held in the input.

For an example, let's pick up on the population query used above. Let's say we wanted to do an analysis in which high-population cities were excluded. We might select those cities and then delete them:

popQuery = 'POP2018 > 100000'
bigCities = arcpy.SelectLayerByAttribute_management('Cities', 'NEW_SELECTION', popQuery)
arcpy.DeleteFeatures_management(bigCities)

In this snippet of code, note that the same where clause is implemented, this time to create a selection on the Cities feature class as opposed to producing a cursor of records to iterate through. The SelectLayerByAttribute tool returns a Feature Layer, which is an in-memory object that references the selected features. The object is stored in a variable, bigCities, which is then used as input to the DeleteFeatures tool. This will delete the high-population cities from the underlying Cities feature class.

Now, let's have a look at how to handle queries with spatial constraints.

3.2.4 Retrieving records using a spatial query

3.2.4 Retrieving records using a spatial query jed124

Applying a SQL expression to the search cursor is only useful for attribute queries, not spatial queries. For example, you can easily open a search cursor on all counties named "Lincoln" using a SQL expression, but finding all counties that touch or include the Mississippi River requires a different approach. To get a subset of records based on a spatial criterion, you need to use the geoprocessing tool Select Layer By Location first, or use the spatial_filter parameter in the cursor.

Suppose you want to generate a list of all states whose boundaries touch Wyoming. As we saw in the previous section with the Select Layer By Attribute tool, the Select Layer By Location tool will return a Feature Layer containing the features that meet the query criteria. One thing we didn't mention in the previous section is that a search cursor can be opened not only on feature classes, but also on feature layers. With that in mind, here is a set of steps one might take to produce a list of Wyoming's neighbors:

  1. Use Select Layer By Attribute on the states feature class to create a feature layer of just Wyoming. Let's call this the Selection State layer.
  2. Use Select Layer By Location on the states feature class to create a feature layer of just those states that touch the Selection State layer. Let's call this the Neighbors layer.
  3. Open a search cursor on the Neighbors layer. The cursor will include only Wyoming and the states that touch it because the Neighbors layer is the selection applied in step 2 above. Remember that the feature layer is just a set of records held in memory.

Below is some code that applies the above steps.

# Selects all states whose boundaries touch 
# a user-supplied state
import arcpy

# Get the US States layer, state, and state name field
usaFC = r"C:\Data\USA\USA.gdb\Boundaries"
state = "Wyoming"
nameField = "NAME"

try:
  whereClause = nameField + " = '" + state + "'"
  selectionStateLayer = arcpy.SelectLayerByAttribute_management(usaFC, 'NEW_SELECTION', whereClause)

  # Apply a selection to the US States layer
  neighborsLayer = arcpy.SelectLayerByLocation_management(usaFC, 'BOUNDARY_TOUCHES', selectionStateLayer)

  # Open a search cursor on the US States layer
  with arcpy.da.SearchCursor(neighborsLayer, (nameField)) as cursor:
    for row in cursor:
      # Print the name of all the states in the selection
      print (row[0])
   
except:
  print (arcpy.GetMessages())

finally:
  # Clean up feature layers and cursor
  arcpy.Delete_management(neighborsLayer)
  arcpy.Delete_management(selectionStateLayer)
  del cursor

You can choose from many spatial operators when running SelectLayerByLocation. The code above uses "BOUNDARY_TOUCHES". Other available relationships are "INTERSECT", "WITHIN A DISTANCE" (may save you a buffering step), "CONTAINS", "CONTAINED_BY", and others.

Note that the Row object "row" returns only one field ("NAME"), which is accessed using its index position in the list of fields. Since there's only one field, that index is 0, and the syntax looks like this: row[0]. Once you open the search cursor on your selected records, you can perform whatever action you want on them. The code above just prints the state name, but more likely you'll want to summarize or update attribute values. You'll learn how to write attribute values later in this lesson.

Cleaning up feature layers and cursors

Notice that the feature layers are deleted using the Delete tool. This is because feature layers can maintain locks on your data, preventing other applications from using the data until your script is done. arcpy is supposed to clean up feature layers at the end of the script, but it's a good idea to delete them yourself in case this doesn't happen or in case there is a crash. In the examples above, the except block will catch a crash, then the script will continue and Delete the two feature layers.

Cursors can also maintain locks on data. As mentioned earlier, the "with" statement should clean up the cursor for you automatically. However, we've found that it doesn't always, an observation that appears to be backed up by this blurb from Esri's documentation of the arcpy.da.SearchCursor class:

Search cursors also support with statements to reset iteration and aid in removal of locks. However, using a del statement to delete the object or wrapping the cursor in a function to have the cursor object go out of scope should be considered to guard against all locking cases.

One last point to note about this code that cleans up the feature layers and cursor is that it is embedded within a finally block. This is a construct that is used occasionally with try and except to define code that should be executed regardless of whether the statements in the try block run successfully. To understand the usefulness of finally, imagine if you had instead placed these cleanup statements at the end of the try block. If an error were to occur somewhere above that point in the try block -- not hard to imagine, right? -- the remainder of the try block would not be executed, leaving the feature layers and cursor in memory. A subsequent run of the script, after fixing the error, would encounter a new problem: the script would be unable to create the feature layer stored in selectionStatesLayer because it already exists. In other words, the cleanup statements would only run if the rest of the script ran successfully.

This situation is where the finally statement is especially helpful. Code in a finally block will be run regardless of whether something in the try block triggers a crash. (In the event that an error is encountered, the finally code will be executed after the except code.) Thus, as you develop your own code utilizing feature layers and/or cursors, it's a good idea to include these cleanup statements in a finally block.

Required reading

Before you move on, examine the following tool reference pages. Pay particular attention to the Usage and the Code Sample sections.

3.3 Writing vector attribute data

3.3 Writing vector attribute data jed124

In the same way that you use cursors to read vector attribute data, you use cursors to write data as well. Two types of cursors are supplied for writing data:

  • Update cursor - This cursor edits values in existing records or deletes records
  • Insert cursor - This cursor adds new records

In the following sections, you'll learn about both of these cursors and get some tips for using them.

Required reading

The ArcGIS Pro Help has some explanation of cursors. Get familiar with this help now, as it will prepare you for the next sections of the lesson. You'll also find it helpful to return to the code examples while working on Project 3:

Accessing data using cursors

Also follow the three links in the table at the beginning of the above topic. These briefly explain the InsertCursor, SearchCursor, and UpdateCursor and provide a code example for each. You've already worked with SearchCursor, but closely examine the code examples for all three cursor types and see if you can determine what is happening in each.

3.3.1 Updating existing records

3.3.1 Updating existing records jed124

Use the update cursor to modify existing records in a dataset. Here are the general steps for using the update cursor:

  1. Create the update cursor by calling arcpy.da.UpdateCursor(). You can optionally pass in an SQL expression as an argument to this method. This is a good way to narrow down the rows you want to edit if you are not interested in modifying every row in the table.
  2. Use a for loop to iterate through the rows and for each row...
  3. Modify the field values in the row that need updating (see below).
  4. Call UpdateCursor.updateRow() to finalize the edit.

Modifying field values

When you create an UpdateCursor and iterate through the rows using a variable called row, you can modify the field values by making assignments using the syntax row[<index of the field you want to change>] = <the new value>. For example:

row[0] = "Trisha Stevens"

It is important to note that the index occurring in the [...] to determine which field will be changed is given with respect to the tuple of fields provided when the UpdateCursor is created. For instance, if we create the cursor using the following command

with arcpy.da.UpdateCursor(featureClass, ("CompanyName", "Owner")) as cursor:

row[0] would refer to the field called "CompanyName" and row[1] refers to the field that has the name "Owner".

Example

The script below performs a "search and replace" operation on an attribute table. For example, suppose you have a dataset representing local businesses, including banks. One of the banks was recently bought out by another bank. You need to find every instance of the old bank name and replace it with the new name. This script could perform that task automatically.

#Simple search and replace script
import arcpy

# Retrieve input parameters: the feature class, the field affected by
#  the search and replace, the search term, and the replace term.
fc = arcpy.GetParameterAsText(0)
affectedField = arcpy.GetParameterAsText(1)
oldValue = arcpy.GetParameterAsText(2)
newValue = arcpy.GetParameterAsText(3)

# Create the SQL expression for the update cursor. Here this is
#  done on a separate line for readability.
queryString = affectedField + " = '" + oldValue + "'"

# Create the update cursor and update each row returned by the SQL expression
with arcpy.da.UpdateCursor(fc, (affectedField), queryString) as cursor:
    for row in cursor:
        row[0] = newValue
        cursor.updateRow(row)

del row, cursor

Notice that this script is relatively flexible because it gets all the parameters as text. However, this script can only be run on string variables because of the way the query string is set up. Notice that the old value is put in quotes, like this: "'" + oldValue + "'". Handling other types of variables, such as integers, would have made the example longer.

Again, it is critical to understand the tuple of affected fields that you pass in when you create the update cursor. In this example, there is only one affected field (which we named affectedField), so its index position is 0 in the tuple. Therefore, you set that field value using row[0] = newValue.

The last line with updateRow(...) is needed to make sure that the modified row is actually written back to the attribute table. Please note that the variable row needs to be passed as a parameter to updateRow(...).

Dataset locking again

As we mentioned, ArcGIS sometimes places locks on datasets to avoid the possibility of editing conflicts between two users. If you think for any reason that a lock from your script is affecting your dataset (by preventing you from viewing it, making it look like all rows have been deleted, and so on), you must close PyScripter to remove the lock. If you think that Pro has a lock on your data, check to see if there is an open edit session on the data, the data is being displayed in the Catalog View/Pane, or if a layer based on the data is part of an open map.

As we stated, cursor cleanup should happen through the creation of the cursor inside a "with" statement, but adding lines to delete the row and cursor objects will make extra sure locks are released.

For the Esri explanation of how locking works, you can review the section "Cursors and locking" in the topic Accessing data using cursors in the ArcGIS Pro Help.

3.3.2 Inserting new records

3.3.2 Inserting new records jed124

When adding a new record to a table, you must use the insert cursor. Here's the workflow for insert cursors:

  • Create the insert cursor using arcpy.da.InsertCursor().
  • Call InsertCursor.insertRow() to add a new row to the dataset.

As with the search and update cursor, you can use an insert cursor together with the "with" statement to avoid locking problems.

Insert cursors differ from search and update cursors in that you cannot provide an SQL expression when you create the insert cursor. This makes sense because an insert cursor is only concerned with adding records to the table. It does not need to "know" about the existing records or any subset thereof.

When you insert a row using InsertCursor.insertRow(), you provide a comma-delimited tuple of values for the fields of the new row. The order of these values must match the order of values of the tuple of affected fields you provided when you created the cursor. For example, if you create the cursor using

with arcpy.da.InsertCursor(featureClass, ("FirstName","LastName")) as cursor:

you would add a new row with values "Sam" for "FirstName" and "Fisher" for "LastName" by the following command:

cursor.insertRow(("Sam","Fisher"))

Please note that the inner parentheses are needed to turn the values into a tuple that is passed to insertRow(). Writing cursor.insertRow("Sam","Fisher") would have resulted in an error.

Example

The example below uses an insert cursor to create one new point in the dataset and assign it one attribute: a string description. This script could potentially be used behind a public-facing 311 application, in which members of the public can click a point on a Web map and type a description of an incident that needs to be resolved by the municipality, such as a broken streetlight.

# Adds a point and an accompanying description
import arcpy 
# Retrieve input parameters
inX = arcpy.GetParameterAsText(0)
inY = arcpy.GetParameterAsText(1)
inDescription = arcpy.GetParameterAsText(2)

# These parameters are hard-coded. User can't change them.
incidentsFC = "C:/Data/Yakima/Incidents.shp"
descriptionField = "DESCR"

# Make a tuple of fields to update
fieldsToUpdate = ("SHAPE@XY", descriptionField)
# Create the insert cursor
with arcpy.da.InsertCursor(incidentsFC, fieldsToUpdate) as cursor:
    # Insert the row providing a tuple of affected attributes
    cursor.insertRow(((float(inX),float(inY)), inDescription))

del cursor

Take a moment to ensure that you know exactly how the following is done in the code:

  • The creation of the insert cursor with a tuple of affected fields stored in variable fieldsToUpdate
  • The insertion of the row through the insertRow() method using variables that contain the field values of the new row

If this script really were powering an interactive 311 application, the X and Y values could be derived from a point a user clicked on the Web map rather than as parameters as done in lines 4 and 5.

One thing you might have noticed is that the string "SHAPE@XY" is used to specify the Shape field. You might expect that this would just be "Shape," but arcpy.da provides a list of "tokens" that you can use if the field will be specified in a certain way. In our case, it would be very convenient just to provide the X and Y values of the points using a tuple of coordinates. It turns out that the token "SHAPE@XY" allows you to do just that. See the documentation of the InsertCursor's field_names parameter to learn about other tokens you can use.

Putting this all together, the example creates a tuple of affected fields: ("SHAPE@XY", "DESCR"). Notice that in line 13, we actually use the variable descriptionField which contains the name of the second column "DESCR" for the second element of the tuple. Using a variable to store the name of the column we are interested in allows us to easily adapt the script later, for instance to a data set where the column has a different name. When the row is inserted, the values for these items are provided in the same order: cursor.insertRow(((float(inX), float(inY)), inDescription)). The argument passed to insertRow() is a tuple that contains another tuple (inX,inY), namely the coordinates of the point cast to floating point numbers, as the first element and the text for the "DESCR" field as the second element.

Readings

Take a few minutes to read Zandbergen 8.1 - 8.3 to reinforce your learning about cursors.

3.4 Working with rasters

3.4 Working with rasters jed124

So far in this lesson, your scripts have only read and edited vector datasets. This work largely consists of cycling through tables of records and reading and writing values to certain fields. Raster data is very different, and consists only of a series of cells, each with its own value. So, how do you access and manipulate raster data using Python?

It's unlikely that you will ever need to cycle through a raster cell by cell on your own using Python, and that technique is outside the scope of this course. Instead, you'll most often use predefined tools to read and manipulate rasters. These tools have been designed to operate on various types of rasters and perform the cell-by-cell computations so that you don't have to.

In ArcGIS, most of the tools you'll use when working with rasters are in either the Data Management > Raster toolset or the Spatial Analyst toolbox. These tools can reproject, clip, mosaic, and reclassify rasters. They can calculate slope, hillshade, and aspect rasters from DEMs.

The Spatial Analyst toolbox also contains tools for performing map algebra on rasters. Multiplying or adding many rasters together using map algebra is important for GIS site selection scenarios. For example, you may be trying to find the best location for a new restaurant and you have seven criteria that must be met. If you can create a boolean raster (containing 1 for suitable, 0 for unsuitable) for each criterion, you can use map algebra to multiply the rasters and determine which cells receive a score of 1, meeting all the criteria. (Alternatively, you could add the rasters together and determine which areas received a value of 7.) Other courses in the Penn State GIS certificate program walk through these types of scenarios in more detail.

The tricky part of map algebra is constructing the expression, which is a string stating what the map algebra operation is supposed to do. ArcGIS Pro contains interfaces for constructing an expression for one-time runs of the tool. But what if you want to run the analysis several times, or with different datasets? It's challenging even in ModelBuilder to build a flexible expression into the map algebra tools. With Python, you can manipulate the expression as much as you need.

Example

Examine the following example, which takes in a minimum and maximum elevation value as parameters, then does some map algebra with those values. The expression isolates areas where the elevation is greater than the minimum parameter and less than the maximum parameter. Cells that satisfy the expression are given a value of 1 by the software, and cells that do not satisfy the expression are given a value of 0.

But what if you don't want those 0 values cluttering your raster? This script gets rid of the 0's by running the Reclassify tool with a real simple remap table stating that input raster values of 1 should remain 1. Because 0 is left out of the remap table, it gets reclassified as NoData:

# This script takes a DEM, a minimum elevation,
#  and a maximum elevation. It outputs a new
#  raster showing only areas that fall between
#  the min and the max

import arcpy
from arcpy.sa import *
arcpy.env.overwriteOutput = True
arcpy.env.workspace = "C:/Data/Elevation"

# Get parameters of min and max elevations
inMin = arcpy.GetParameterAsText(0)
inMax = arcpy.GetParameterAsText(1)

arcpy.CheckOutExtension("Spatial")

# Perform the map algebra and make a temporary raster
inDem = Raster("foxlake")
tempRaster = (inDem > int(inMin)) & (inDem < int(inMax))

# Set up remap table and call Reclassify, leaving all values not 1 as NODATA
remap = RemapValue([[1,1]])
remappedRaster = Reclassify(tempRaster, "Value", remap, "NODATA")

# Save the reclassified raster to disk
remappedRaster.save("foxlake_recl")

arcpy.CheckInExtension("Spatial")

Read the example above carefully, as many times as necessary, for you to understand what is occurring in each line. Notice the following things:

  • Notice the expression contains > and < signs, as well as the & operator. You have to enclose each side of the expression in parentheses to avoid confusing the & operator.
  • Because you used arcpy.GetParameterAsText() to get the input parameters, you have to cast the input to an integer before you can do map algebra with it. If we just used inMin, the software would see "3000," for example, and would try to interpret that as a string. To do the numerical comparison, we have to use int(inMin). Then the software sees the number 3000 instead of the string "3000."
  • Map algebra can perform many types of math and operations on rasters, not limited to "greater than" or "less than." For example, you can use map algebra to find the cosine of a raster.
  • If you're working at a site where a license manager restricts the Spatial Analyst extension to a certain number of users, you must check out the extension in your script, then check it back in. Notice the calls to arcpy.CheckOutExtension() and arcpy.CheckInExtension(). You can pass in other extensions besides "Spatial" as arguments to these methods.
  • Notice that the script doesn't call these Spatial Analyst functions using arcpy. Instead, it imports functions from the Spatial Analyst module (from arcpy.sa import *) and calls the functions directly. For example, we don't see arcpy.Reclassify(); instead, we just call Reclassify() directly. This can be confusing for beginners (and old pros) so be sure to check the Esri samples closely for each tool you plan to run. Follow the syntax in the samples and you'll usually be safe.
  • See the Remap classes topic in the help to understand how the remap table in this example was created. Whenever you run Reclassify, you have to create a remap table stating how the old values should be reclassified to new values. This example has about the simplest remap table possible, but if you want a more complex remap table, you'll need to study the documentation.

Rasters and file extensions

The above example script doesn't use any file extensions for the rasters. This is because the rasters use the Esri GRID format, which doesn't use extensions. If you have rasters in another format, such as .jpg, you will need to add the correct file extension. If you're unsure of the syntax to use when providing a raster file name, double click the raster in Catalog View and note its Location in the pop-out box.

If you look at rasters such as an Esri GRID in Windows Explorer, you may see that they actually consist of several supporting files with different extensions, sometimes even contained in a series of folders. Don't try to guess one of the files to reference; instead, use Catalog View to get the path to the raster. When you use this path, the supporting files and folders will work together automatically.

Screen capture to show getting the path to a raster in ArcPro. Location: C:\Data\foxlake
Figure 3.4 When using the Catalog View for the path to a raster, the supporting files and folders work together automatically.

Readings

Zandbergen chapter 10 covers a lot of additional functions you can perform with rasters and has some good code examples. You don't have to understand everything in this chapter, but it might give you some good ideas for your final project.

Lesson 3 Practice Exercises

Lesson 3 Practice Exercises jed124

Lessons 3 and 4 contain practice exercises that are longer than the previous practice exercises and are designed to prepare you specifically for the projects. You should make your best attempt at each practice exercise before looking at the solution. If you get stuck, study the solution until you understand it.

Don't spend so much time on the practice exercises that you neglect Project 3. However, successfully completing the practice exercises will make Project 3 much easier and quicker for you.

Data for Lesson 3 practice exercises

The data for the Lesson 3 practice exercises is very simple and, like some of the Project 2 practice exercise data, was derived from Washington State Department of Transportation datasets. Download the data here.

Using the discussion forums

Using the discussion forums is a great way to work towards figuring out the practice exercises. You are welcome to post blocks of code on the forums relating to these exercises.

When completing the actual Project 3, avoid posting blocks of code longer than a few lines. If you have a question about your Project 3 code, please email the instructor, or you can post general questions to the forums that don't contain more than a few lines of code.

Getting ready

If the practice exercises look daunting to you, you might start by practicing with your cursors a little bit using the sample data:

  • Try to loop through the CityBoundaries and print the name of each city.
  • Try using an SQL expression with a search cursor to print the OBJECTIDs of all the park and rides in Chelan county (notice there is a County field that you could put in your SQL expression).
  • Use an update cursor to find the park and ride with OBJECTID 336. Assign it a ZIP code of 98512.

You can post thoughts on the above challenges on the forums.

Lesson 3 Practice Exercise A

Lesson 3 Practice Exercise A jed124

In this practice exercise, you will programmatically select features by location and update a field for the selected features. You'll also use your selection to perform a calculation.

In your Lesson3PracticeExerciseA folder, you have a Washington geodatabase with two feature classes:

  • CityBoundaries - Contains polygon boundaries of cities in Washington over 10 square kilometers in size.
  • ParkAndRide - Contains point features representing park and ride facilities where commuters can leave their cars.

The objective

You want to find out which cities contain park and ride facilities and what percentage of cities have at least one facility.

  • The CityBoundaries feature class has a field "HasParkAndRide," which is set to "False" by default. Your job is to mark this field "True" for every city containing at least one park and ride facility within its boundaries.
  • Your script should also calculate the percentage of cities that have a park and ride facility and print this figure for the user.

You do not have to make a script tool for this assignment. You can hard-code the variable values. Try to group the hard-coded string variables at the beginning of the script.

For the purposes of these practice exercises, assume that each point in the ParkAndRide dataset represents one valid park and ride (ignore the value in the TYPE field).

Tips

You can jump into the assignment at this point, or read the following tips to give you some guidance.

  • Make two feature layers: "CitiesLayer" and "ParkAndRideLayer."
  • Use SelectLayerByLocation with a relationship type of "CONTAINS" to narrow down your cities feature layer list to only the cities that contain park and rides.
  • Create an update cursor for your now narrowed-down "CitiesLayer" and loop through each record, setting the HasParkAndRide field to "True."
  • To calculate the percentage of cities with park and rides, you'll need to know the total number of cities. You can use the GetCount tool to get a total without writing a loop. Beware that this tool has an unintuitive return value that isn't well documented. The return value is a Result object, and the count value itself can be retrieved through that object by appending [0] (see the second code example from the Help). This will return the count as a string though, so you'll want to cast to an integer before trying to use it in an arithmetic expression.
  • Similarly, you may have to play around with your Python math a little to get a nice percentage figure. Don't get too hung up on this part.

Lesson 3 Practice Exercise A Solution

Lesson 3 Practice Exercise A Solution jed124

Below is one possible solution to Practice Exercise A with comments to explain what is going on. If you find a more efficient way to code a solution, please share it through the discussion forums. Please note that in order to make the changes to citiesLayer permanent, you have to write the layer back to disk using the arcpy.CopyFeatures_management(...) function. This is not shown in the solution here.

# This script determines the percentage of cities in the
# state with park and ride facilities

import arcpy

arcpy.env.overwriteOutput = True
arcpy.env.workspace = r"C:\PSU\geog485\L3\PracticeExerciseA\Washington.gdb"
cityBoundariesFC = "CityBoundaries"
parkAndRideFC = "ParkAndRide"
parkAndRideField = "HasParkAndRide"   # Name of column with Park & Ride information
citiesWithParkAndRide = 0             # Used for counting cities with Park & Ride 

try:      
    # Narrow down the cities layer to only the cities that contain a park and ride
    citiesLayer = arcpy.SelectLayerByLocation_management(cityBoundariesFC, "CONTAINS", parkAndRideFC)
 
    # Create an update cursor and loop through the selected records
    with arcpy.da.UpdateCursor(citiesLayer, (parkAndRideField)) as cursor:
        for row in cursor:
            # Set the park and ride field to TRUE and keep a tally
            row[0] = "True"
            cursor.updateRow(row)
            citiesWithParkAndRide += 1
except:
    print ("There was a problem performing the spatial selection or updating the cities feature class")

# Delete the feature layers even if there is an exception (error) raised
finally: 
    arcpy.Delete_management(citiesLayer)
    del row, cursor
     
# Count the total number of cities (this tool saves you a loop)
numCitiesCount = arcpy.GetCount_management(cityBoundariesFC)
numCities = int(numCitiesCount[0])

# Get the number of cities in the feature layer
#citiesWithParkAndRide = int(citiesLayer[2])

# Calculate the percentage and print it for the user
percentCitiesWithParkAndRide = (citiesWithParkAndRide / numCities) * 100
 
print (str(round(percentCitiesWithParkAndRide,1)) + " percent of cities have a park and ride.")

Below is a video offering some line-by-line commentary on the structure of this solution:

Video: One Solution to Lesson 3, Practice Exercise A (9:37)

Lesson 3, Practice Exercise A

This video shows one solution to Lesson 3, Practice Exercise A, which determines the percentage of cities in the State of Washington that have park and ride facilities.

This solution uses the syntax currently shown in the ArcGIS Pro documentation for Select Layer By Location, in which the FeatureLayer returned by the tool is stored in a variable.

Looking at this solution, I start by importing the arcpy site package. Then I added this line 6 to set the overwriteOutput property to True, which is useful when you're writing scripts like these where you'll be doing some testing, and you'll often want to overwrite the output files. Otherwise, ArcGIS will return an error message saying that it cannot overwrite an output that already exists. So, that's a tip to take from this code.

On line 7, I set the workspace to the Washington file geodatabase that stores the data used in the exercise. That enables me to refer to the input feature classes using only their names, not their full paths. This script doesn’t produce any new output feature classes, but if it did, I could likewise specify them using only their names and not their full paths.

So, in lines 8 & 9, I define variables to refer to the CityBoundaries and ParkAndRide feature classes.

The purpose of this script is to update a field in the CityBoundaries feature class called HasParkAndRide with either a True or False value. So on line 10 I define a parkAndRideField variable to store the name of that field I’m going to be updating.

I also want to report the percentage of cities containing a park and ride at the end of the script through a print statement, so on line 11 I define a variable to store a count of those cities and I initialize that variable to 0.

Now, before I talk about the meat of the script, it’s often helpful to walk through your processing steps manually in Pro. So here I have the ParkAndRides, symbolized by black dots, and the city boundaries that are the orange polygons.

So, I can use the Select By Location tool and specify that I want to select the CityBoundaries features that contain ParkAndRide features. You can see all the city features selected, and I can open the attribute table to see the selected tabular records. I've got 74 of 108 selected. And then over here is the HasParkAndRide field that I would want to update to True for these selected records.

Getting back to my script, I’m going to implement the same Select By Location tool. It’s going to return to me a Feature Layer of cities that contain a ParkAndRide, which I store in my citiesLayer variable. I then open up an update cursor on that Feature Layer. This update cursor will only be able to update the fields that I specify in the second parameter. I could include any number of fields here, as a tuple, but for this scenario I really only need to include one. And the name of that field is stored in my parkAndRideField variable, which I had defined up on line 10.

And so what I have is a cursor that I can iterate through row by row. And I do that using a for loop, which starts on line 19. And on line 21, I take that HasParkAndRide field, and I set it to “True”.

Now, why do I say 0 there in square brackets? 0 is the index position of the field that I passed in that tuple in line 18. The first, and in this case the only, field in the tuple is at position 0.

If I was updating three or four fields, I would have row and then the field index position inside brackets. It could be 1, 2, 3, and so on.

After I've made my changes, I need to call updateRow() in order for them to be applied. A mistake that a lot of beginners make is they forget to do this step. So in line 22, I'm calling updateRow().

And in line 23, I'm incrementing the counter that I'm keeping of the number of cities with a park and ride facility found. So I just add 1 to the counter. The plus-equals-1 syntax is a shortcut for saying take the number in this variable and add 1.

I'm using try/except here to provide a custom error message. An important part of this script that goes along with this try/except block is this finally block on lines 28-30. This contains code that I want to run whether or not the try code failed. I don’t want to leave a lock in place on the cities feature class, so I delete the Feature Layer based on it on line 29. I also use a Python del statement to clear the row and cursor objects out of memory, for the same reason I used arcpy’s Delete tool: to make sure no locks are left on the feature classes.

Now, I could have put these statements in the try block, say after the update cursor, and assuming there were no problems in running my code, those clean-up statements would get executed too. However, let’s say there was a problem in my update cursor. If the script crashes there, then my clean-up statements won’t get executed, and if I try to run the script again, I could run into a problem with the cities feature class having a lock on it. Putting the clean-up statements in the finally block ensures they’ll get executed in all situations.

Now that I've done the important stuff, it's time to do some math to figure out the percentage of cities that have a park and ride facility. So in line 33, I'm running the GetCount tool in ArcGIS. When you use the GetCount tool, it’s a bit unintuitive because what it returns isn’t a number, but a result object. What you can do with that object isn’t well documented, so you can just follow on the example here and use [0] to get at the count in string form, and use the int() function to convert that string to an integer.

So what we have in the end, on line 34, is a variable called numCities that is the total number of cities in the entire data set. And then, we've already been keeping the counter of cities that have a park and ride. So to find the percentage, all we need to do is divide one by the other. And that's what's going on in line 40.

Now, you may have noticed line 37 refers to the citiesWithParkAndRide variable, but is commented out. It turns out that the object returned by SelectLayerByLocation (and SelectLayerByAttribute) isn’t just the Feature Layer of selected cities. Like the GetCount tool we just saw, the Select tools actually return a Result object. This object can be treated kind of like a list. If you want the Feature Layer, you can specify [0] or simply not include a number in brackets at all. But you can also get a count of the selected features by specifying [2]. You can see this in the documentation of the tool, in the Derived Output section.

So, it turns out we don’t really need to implement our own counter inside the loop through the cursor rows. We can get the count through the Result object, and thus we could uncomment line 37 and comment out lines 23 and 11. We decided to include the counter approach in this solution because as a programmer, there are bound to be times where you really do need to maintain your own counter, and we wanted you to see an example of how to do that.

Finally, we can print out the result at the end of the script in line 42. The percentage is a number with several digits after the decimal point, so the print statement first rounds the number to one digit after the decimal and then converts that value to a string so that it can be concatenated with the literal text.

Credit: S. Quinn and J. Detwiler © Penn State is licensed under CC BY-NC-SA 4.0.

Here is a different solution to Practice Exercise A, which uses the alternate syntax discussed in the lesson:

# This script determines the percentage of cities in the
#  state with park and ride facilities
 
import arcpy

arcpy.env.overwriteOutput = True
arcpy.env.workspace = r"C:\PSU\geog485\L3\PracticeExerciseA\Washington.gdb"
cityBoundariesFC = "CityBoundaries"
parkAndRideFC = "ParkAndRide"
parkAndRideField = "HasParkAndRide"   # Name of column with Park & Ride information
citiesWithParkAndRide = 0             # Used for counting cities with Park & Ride
 
try:
    # Make a feature layer of all the park and ride facilities
    arcpy.MakeFeatureLayer_management(parkAndRideFC, "ParkAndRideLayer")
 
    # Make a feature layer of all the cities polygons   
    arcpy.MakeFeatureLayer_management(cityBoundariesFC, "CitiesLayer")
 
except:
    print ("Could not create feature layers")
 
try:      
    # Narrow down the cities layer to only the cities that contain a park and ride
    arcpy.SelectLayerByLocation_management("CitiesLayer", "CONTAINS", "ParkAndRideLayer")
 
    # Create an update cursor and loop through the selected records
    with arcpy.da.UpdateCursor("CitiesLayer", (parkAndRideField)) as cursor:
        for row in cursor:
            # Set the park and ride field to TRUE and keep a tally
            row[0] = "True"
            cursor.updateRow(row)
            citiesWithParkAndRide +=1
except:
    print ("There was a problem performing the spatial selection or updating the cities feature class")
    
# Delete the feature layers even if there is an exception (error) raised
finally: 
    arcpy.Delete_management("ParkAndRideLayer")
    arcpy.Delete_management("CitiesLayer")
    del row, cursor
     
# Count the total number of cities (this tool saves you a loop)
numCitiesCount = arcpy.GetCount_management(cityBoundariesFC)
numCities = int(numCitiesCount[0])
 
# Calculate the percentage and print it for the user
percentCitiesWithParkAndRide = (citiesWithParkAndRide / numCities) * 100
 
print (str(round(percentCitiesWithParkAndRide,1)) + " percent of cities have a park and ride.")

Below is a video offering some line-by-line commentary on the structure of this solution:

Video: Alternate Solution to Lesson 3, Practice Exercise A. (2:47)

This video shows an alternate solution to Lesson 3, Practice Exercise A, which uses the SelectLayerByLocation tool’s older syntax.

This solution starts out much like the first one in terms of the variables defined at the beginning.

Where it differs is that instead of specifying feature classes as inputs to SelectLayerByLocation, it specifies FeatureLayer objects. These objects are created on lines 15 and 18 using the MakeFeatureLayer tool.

We make a feature layer that has all the park and rides, and a feature layer that has all of the city boundaries. And for both of these, the MakeFeatureLayer tool has two parameters that we supply, first the name of the feature class that's going to act as the source of the features and then the second parameter is a name that we will use to refer to this FeatureLayer throughout the rest of the script. That’s one aspect of this that takes some getting used to: that the object you’re creating gets referred to using a string. In this case, ParkAndRideLayer and CitiesLayer.

Another thing to remember is that you’re not creating a new dataset on disk with this tool. A FeatureLayer is an object that exists only temporarily in memory while the script runs.

So we've got these two feature layers that we can now perform selections on. And in line 25, we're going to use SelectLayerByLocation to select all cities that contain features from the ParkAndRideLayer.

It’s important to note that when using the tool with this approach, the CitiesLayer comes into line 25 referring to all 108 city features and after having the selection applied to it in line 25, it refers to just the 74 features that contain a ParkAndRide.

I then go on to open an update cursor on that FeatureLayer so that I can iterate over those 74 features.

The rest of the script is mostly the same, though there are a couple of differences:
1. I’m not storing the Result object returned by SelectLayerByLocation in this version, so I’m not able to retrieve the count of selected features from that object as we saw in the first solution. I really do need to implement my counter variable in this version. And

2. This version creates 2 FeatureLayers, so when doing the clean-up, I want to delete both of these layers.

Credit: S. Quinn and J. Detwiler © Penn State is licensed under CC BY-NC-SA 4.0.

Lesson 3 Practice Exercise B

Lesson 3 Practice Exercise B jed124

If you look in your Lesson3PracticeExerciseB folder, you'll notice the data is exactly the same as for Practice Exercise A...except, this time the field is "HasTwoParkAndRides."

The objective

In Practice Exercise B, your assignment is to find which cities have at least two park and rides within their boundaries.

  • Mark the "HasTwoParkAndRides" field as "True" for all cities that have at least two park and rides within their boundaries.
  • Calculate the percentage of cities that have at least two park and rides within their boundaries and print this for the user.

Tips

This simple modification in requirements is a game changer. The following is one way you can approach the task. Notice that it is very different from what you did in Practice Exercise A:

  • Create an update cursor for the cities and start a loop that will examine each city.
  • Make a feature layer with all the park and ride facilities.
  • Make a feature layer for just the current city. You'll have to make an SQL query expression in order to do this. Remember that an UpdateCursor can get values, so you can use it to get the name of the current city.
  • Use SelectLayerByLocation to find all the park and rides CONTAINED_BY the current city. Your result will be a narrowed-down park and ride feature layer. This is different from Practice Exercise A where you narrowed down the cities feature layer.
  • One approach to determine whether there are at least two park and rides is to run the GetCount tool to find out how many features were selected, then check if the result is 2 or greater. Another approach is to create a search cursor on the selected park and rides and see if you can count at least two cursor advancements.
  • Be sure to delete your feature layers before you loop on to the next city. For example: arcpy.Delete_management("ParkAndRideLayer")
  • Keep a tally for every row you mark "True" and find the average as you did in Practice Exercise A.

Lesson 3 Practice Exercise B Solution

Lesson 3 Practice Exercise B Solution jed124

Below is one possible solution to Practice Exercise B with comments to explain what is going on. If you find a more efficient way to code a solution, please share it through the discussion forums.

# This script determines the percentage of cities with two park
#  and ride facilities

import arcpy

arcpy.env.overwriteOutput = True
arcpy.env.workspace = r"C:\PSU\geog485\L3\PracticeExerciseB\Washington.gdb"

cityBoundariesFC = "CityBoundaries"
parkAndRideFC = "ParkAndRide"
parkAndRideField = "HasTwoParkAndRides"   # Name of column for storing the Park & Ride information
cityIDStringField = "CI_FIPS"             # Name of column with city IDs
citiesWithTwoParkAndRides = 0             # Used for counting cities with at least two P & R facilities
numCities = 0                             # Used for counting cities in total

# Make an update cursor and loop through each city
with arcpy.da.UpdateCursor(cityBoundariesFC, (cityIDStringField, parkAndRideField)) as cityRows:
    for city in cityRows:
        # Create a query string for the current city    
        cityIDString = city[0]
        whereClause = cityIDStringField + " = '" + cityIDString + "'"
        print("Processing city " + cityIDString)
 
        # Make a feature layer of just the current city polygon    
        currentCityLayer = arcpy.SelectLayerByAttribute_management(cityBoundariesFC, "NEW_SELECTION", whereClause)
 
        try:
            # Narrow down the park and ride layer by selecting only the park and rides in the current city
            selectedParkAndRideLayer = arcpy.SelectLayerByLocation_management(parkAndRideFC, "CONTAINED_BY", currentCityLayer)
 
            # Count the number of park and ride facilities selected
            numSelectedParkAndRide = int(selectedParkAndRideLayer[2])
 
            # If more than two park and ride facilities found, update the row to TRUE
            if numSelectedParkAndRide >= 2:
                city[1] = "TRUE"
 
                # Don't forget to call updateRow
                cityRows.updateRow(city)
 
                # Add 1 to your tally of cities with two park and rides                
                citiesWithTwoParkAndRides += 1
            
            numCities += 1
        except:
            print("Problem determining number of ParkAndRides in " + cityIDString)
        finally:
            # Clean up feature layers 
            arcpy.Delete_management(selectedParkAndRideLayer) 
            arcpy.Delete_management(currentCityLayer)

del city, cityRows
 
# Calculate and report the number of cities with two park and rides
if numCities != 0:
    percentCitiesWithParkAndRide = (citiesWithTwoParkAndRides / numCities) * 100
    print (str(round(percentCitiesWithParkAndRide,1)) + " percent of cities have two park and rides.")
else:
    print ("Error with input dataset. No cities found.")
 

The video below offers some line-by-line commentary on the structure of the above solution:

Video: Solution to Lesson 3, Practice Exercise B (8:57)

This video shows one possible solution to Lesson 3, Practice Exercise B, which calculates the number of cities in the data set with at least two park and ride facilities using selections in ArcGIS. It also uses an update cursor to update a field indicating whether the city has at least two park and rides. The beginning of this script is very similar to Practice Exercise A, so I won't spend a lot of time on the beginning. Lines 4 through 10 are pretty familiar from that exercise.

In lines 11 and 12, I'm setting up variables to reference some fields that I'm going to use. Part of the exercise is to update a field called HasTwoParkAndRides to be either True or False. And so, I'm setting a variable to refer to that field.

And in line 12, I set up a variable for the city FIPS code. We're going to be querying each city one by one, and we need some field with unique values for each city. We might use the city name, but that could be dangerous in the event that you have a dataset where two cities happen to have the same name. So we'll use the FIPS code instead.

In lines 13 and 14, I'm setting up some counters that will be used to count the number of cities we run into that have at least two park and rides, and then the number of cities total respectively.

On line 17, I open an update cursor on the cities feature class. In specifying the fields tuple, I’m going to need the FIPS code, so I can create a Feature Layer for the city in each iteration of the loop through the cursor and I need the HasTwoParkAndRides field since that’s the field I want to set to True or False.

So I'm going to loop through each city here and select the number of facilities that occur in each city and count those up. And I'm going to be working with two fields using this cursor. And I pass those in a tuple. That's cityIDStringField and parkAndRideField. Those are the variables that I set up earlier in the script in lines 11 and 12.

It might be helpful at this point to take a look at what we're doing conceptually in ArcGIS Pro. So, the first thing we're going to do is for each city is make an attribute query for that city FIPS. And so, for example, if I were to do this for the city of Seattle, its FIPS code is this long number here.

If I were to run a query for the cities with that FIPS code, I’d end up with a selection containing just one city. The next step then is to select the ParkRide features that intersect the selected features in the cities layer. If that step selects 2 or more features, then I want to record True in my True/False field. If it’s 0 or 1, then I want to record False.

And I want to repeat this for each of the 108 cities. So, this is definitely a good candidate for automation.

So getting back to the script, I’ve called my cursor cityRows and set up a loop in which I called the iterator variable, or loop variable, city.

In line 20, we get the FIPS code for the city that's currently being looked at on the current iteration of the loop. We use 0 in square brackets there because that's the index position of the CityIDStringField that we put in the tuple in line 17. It was the first thing in the tuple, so it has index position 0.

And in line 22, we're setting up a string to store a query for that city FIPS. And so, we need the field name equals the city ID. The city ID has to be in single quotes so we use string concatenation to plug the field name and city ID into a string expression. Note that because this script takes a while to run, I also have a print statement here that includes the city ID to help track the script’s progress as it runs.

On line 25, I use SelectLayerByAttribute to create a Feature Layer representing just the current city. I store a reference to that layer in a variable called currentCityLayer.

I then use that layer in a call to the SelectLayerByLocation tool in which I select the ParkAndRide features that are contained by the feature in currentCityLayer. The result of that selection is stored in the selectedParkAndRideLayer variable.

As discussed in Exercise A, the third item in the Result object is the count of features in the Feature Layer. So on line 32, I get that count using the expression selectedParkAndRideLayer[2].

And then in line 35, I do a check to see if that number is greater than or equal to 2. If it is, then I’m going to assign a value of True to the ParkAndRide field. So that's where I look back at how I defined the cursor and see that the parkAndRideField was the second of the two fields in the fields tuple, so I need to use the expression city[1] to refer to that field. I set that field equal to True and then call updateRow() so that my change gets applied to the data set.

As in exercise A, I want to report the percentage of cities meeting the selection criteria, so I increment my citiesWithTwoParkAndRides variable by 1 when numSelectedParkAndRide is >= 2.

On line 44, I increment the numCities variable by 1. I want to increment this variable on each pass through the loop, not just for cities with 2 or more ParkAndRides, so note that line 44 is indented one less level than line 42 (in other words, so it’s not part of the if block).

Next, note the finally block on lines 47-50. Here we are disposing of the objects held in the currentCityLayer and selectedParkAndRideLayer variables on each iteration of the loop since we no longer need those objects and would like to free up the memory that they take up.

After the loop, we use the Python del statement to kill the objects held in the city and cityRows variables so we avoid retaining unnecessary locks on the data after the script completes.

So, after doing the cleanup, now we're going to calculate the percentage of cities that have at least two park and rides. Now, you can't divide by zero, so we're doing a check in line 55 for that. But if everything checks out OK, which it would unless we had a major problem with our data set, we go ahead in line 56 and divide the number of cities with two park and rides by the total number of cities. And then, we multiply by 100 to get a nice percentage figure, which we then report in the print statement on line 57.

Credit: S. Quinn, J. Detwiler © Penn State is licensed under CC BY-NC-SA 4.0.

Here is a different solution to Practice Exercise B, which uses the alternate syntax discussed in the lesson.

# This script determines the percentage of cities with two park
#  and ride facilities

import arcpy

arcpy.env.overwriteOutput = True
arcpy.env.workspace = r"C:\PSU\geog485\L3\PracticeExerciseB\Washington.gdb"

cityBoundariesFC = "CityBoundaries"
parkAndRideFC = "ParkAndRide"
parkAndRideField = "HasTwoParkAndRides"   # Name of column for storing the Park & Ride information
cityIDStringField = "CI_FIPS"             # Name of column with city IDs
citiesWithTwoParkAndRides = 0             # Used for counting cities with at least two P & R facilities
numCities = 0                             # Used for counting cities in total
  
# Make a feature layer of all the park and ride facilities
arcpy.MakeFeatureLayer_management(parkAndRideFC, "ParkAndRideLayer")
  
# Make an update cursor and loop through each city
with arcpy.da.UpdateCursor(cityBoundariesFC, (cityIDStringField, parkAndRideField)) as cityRows:
    for city in cityRows:
        # Create a query string for the current city    
        cityIDString = city[0]
        whereClause = cityIDStringField + " = '" + cityIDString + "'"
        print("Processing city " + cityIDString)
  
        # Make a feature layer of just the current city polygon    
        arcpy.MakeFeatureLayer_management(cityBoundariesFC, "CurrentCityLayer", whereClause)
  
        try:
            # Narrow down the park and ride layer by selecting only the park and rides
            #  in the current city
            arcpy.SelectLayerByLocation_management("ParkAndRideLayer", "CONTAINED_BY", "CurrentCityLayer")
  
            # Count the number of park and ride facilities selected
            selectedParkAndRideCount = arcpy.GetCount_management("ParkAndRideLayer")
            numSelectedParkAndRide = int(selectedParkAndRideCount[0])
  
            # If more than two park and ride facilities found, update the row to TRUE
            if numSelectedParkAndRide >= 2:
                city[1] = "TRUE"
  
                # Don't forget to call updateRow
                cityRows.updateRow(city)
  
                # Add 1 to your tally of cities with two park and rides                
                citiesWithTwoParkAndRides += 1
            
            numCities += 1
        except:
            print("Problem determining number of ParkAndRides in " + cityIDString)           
        finally:
            # Clean up feature layer
            arcpy.Delete_management("CurrentCityLayer") 
 
# Clean up feature layer 
arcpy.Delete_management("ParkAndRideLayer")
del city, cityRows
  
# Calculate and report the number of cities with two park and rides
if numCities != 0:
    percentCitiesWithParkAndRide = (citiesWithTwoParkAndRides / numCities) * 100
    print (str(round(percentCitiesWithParkAndRide,1)) + " percent of cities have two park and rides.")
else:
    print ("Error with input dataset. No cities found.")

The video below offers some line-by-line commentary on the structure of the above solution:

Video: Alternate Solution to Lesson 3, Practice Exercise B (2:10)

This video shows an alternate solution to Lesson 3, Practice Exercise B. The general outline of this script is much like the first solution. As we saw with Exercise A, this version of the script uses the MakeFeatureLayer tool to create feature layers used in the call to the SelectLayerByLocation tool rather than feature classes.

Line 17 creates a feature layer called ParkAndRideLayer which references all of the ParkAndRide points in the state.

Then I create an update cursor on the city boundaries feature class, just like in the earlier solution. In this solution, though, note that line 28 creates a feature layer (“CurrentCityLayer”) by passing whereClause as an argument to the MakeFeatureLayer tool. This means the feature layer will refer to a different city on each pass through the loop.

The two feature layers are then passed as inputs to the SelectLayerByLocation tool on line 33, finding the ParkAndRides within the current city and putting that selection into the ParkAndRideLayer.

Then unlike the first solution, which got the number of selected ParkAndRides out of the Result object returned by the SelectLayerbyLocation tool, in this solution, I use the Get Count tool to get the number of ParkAndRides.

Also, note that in the finally block, I use the Delete tool to clean up just "CurrentCityLayer", not "ParkAndRideLayer". That's because the object held in "ParkAndRideLayer" is the same and is needed throughout all iterations of the loop, whereas the object in "CurrentCityLayer" is different on each iteration and is only needed in that iteration. Thus, it's a good idea to dispose of it as part of the loop.

The rest of the script is then pretty much the same.

Credit: S. Quinn and J. Detwiler © Penn State is licensed under CC BY-NC-SA 4.0.

Testing these scripts on a ~2-year-old Dell laptop running Windows 10 with 16GB of RAM yielded some very different results. In 5 trials of both scripts, the first one needed an average of 240 seconds to complete. The second one (using the older syntax) needed only 83 seconds. This may seem counterintuitive, though it's interesting to note that the newer syntax was the faster performing version of the scripts for the other 3 exercises (times shown in seconds):

ExerciseOldNew
A0.990.45
B83.67240.33
C1.780.85
D1.460.92

You can check the timing of the scripts on your machine by adding the following lines just after the import arcpy statement:

import time 
process_start_time = time.time() 

and this line at the end of the script:

print ("--- %s seconds ---" % (time.time() - process_start_time))

If you're interested in learning more about testing your script's performance along with methods for determining how long it takes to execute various parts of your script (e.g., to explore why there is such a difference in performance between the two Exercise B solutions), you should consider enrolling in our GEOG 489 class.

Lesson 3 Practice Exercise C

Lesson 3 Practice Exercise C jed124

This practice exercise uses the same starting data as Lesson 3 Practice Exercise A. It is designed to give you practice with extracting data based on an attribute query.

The objective

Select all park and ride facilities with a capacity of more than 500 parking spaces and put them into their own feature class. The capacity of each park and ride is stored in the "Approx_Par" field.

Tips

Use the SelectLayerByAttribute_management tool to perform the selection. You will set up a SQL expression and pass it in as the third parameter for this tool.

Once you make the selection, use the Copy Features tool to create the new feature class. You can pass a feature layer directly into the Copy Features tool, and it will make a new feature class from all features in the selection.

Remember to delete your feature layers when you are done.

Lesson 3 Practice Exercise C Solution

Lesson 3 Practice Exercise C Solution jed124

Below is one approach to Lesson 3 Practice Exercise C. The number of spaces to query is stored in a variable at the top of the script, allowing for easy testing with other values.

# Selects park and ride facilities with over a certain number of parking spots
#  and exports them to a new feature class using CopyFeatures
 
import arcpy

parkingSpaces = 500
arcpy.env.workspace = r"C:\PSU\geog485\L3\PracticeExerciseC\Washington.gdb"
arcpy.env.overwriteOutput = True
 
# Set up the SQL expression to query the parking capacity
parkingQuery = "Approx_Par > " + str(parkingSpaces)

# Select the park and rides that applies the SQL expression
parkAndRideLayer = arcpy.SelectLayerByAttribute_management("ParkAndRide", "NEW_SELECTION", parkingQuery)

# Copy the features to a new feature class and clean up
arcpy.CopyFeatures_management(parkAndRideLayer, "BigParkAndRideFacilities")
arcpy.Delete_management(parkAndRideLayer)

The video below offers some line-by-line commentary on the structure of the above solution:

Video: Solution to Lesson 3, Practice Exercise C (3:32)

Solution to Lesson 3, Practice Exercise C

This video describes a solution to Lesson 3, Practice Exercise C, which requires selecting park and ride facilities that meet a certain minimum number of parking spaces, and then, copying them to their own new feature class.

After importing the arcpy site package in line 4, we set up a variable representing that parking threshold. So in this case, we set that equal to 500 parking spaces.

Putting the variable at the top of the script is helpful in case we want to adjust the value and test with other values. It’s easy to find and not buried down later in our code.

In line 7, I'm setting up the arcpy workspace to be equal to my file geodatabase location.

Line 11 is probably the most critical line of this script. It's setting up the SQL query expression to get park and ride facilities that have greater than the number of parking spaces that was specified above on line 6. So if you were to look at this in Pro, where the park and ride facilities are the black dots, we would do a Select By Attributes. And we query on that attribute Approx_Par is greater than 500. And that would give us those large park and rides that are primarily found in the Seattle and Tacoma areas.

Notice that you need to convert that integer value of 500 into a string so that you can concatenate it with the rest of the query expression. However, you don't need to enclose that value in quotes because when you build queries based on numeric values, the number is not put in quotes.

Once you have that query string all set up, you can run SelectLayerByAttribute to get a ParkAndRide Feature Layer with just those selected park and rides that meet the criterion.

So, there are three parameters to pass in here. The first one is the name of the feature class that we're operating on. Because we set up the workspace, we can just put the name of the feature class rather than its full path. We also could have stored the name of the feature class in a variable and plugged that variable in here.

The second parameter is the selection method (whether we want to create a new selection, add to the existing selection, select from the current selection, etc.). Here we want to create a new selection.

And finally, the third parameter is that SQL query expression.

The tool returns to us a Feature Layer containing the selected ParkAndRide features, which we store in the parkAndRideLayer variable.

Once we have that feature layer, we can run the Copy Features tool to make a brand new feature class with just these selected features. So the two parameters here-- the first one is the variable that holds the feature layer that’s the source of the features we want to copy. The second parameter, BigParkAndRideFacilities, is the name of the new feature class that we want to create. And so, once we've done that, we have a new feature class, and we can run the Delete tool to clean up our feature layer. And that's all we need to do in this script.

Credit: S. Quinn, J. Detwiler © Penn State is licensed under CC BY-NC-SA 4.0.

Below is an alternate approach to the exercise.

# Selects park and ride facilities with over a certain number of parking spots
#  and exports them to a new feature class using CopyFeatures
 
import arcpy

parkingSpaces = 500
arcpy.env.workspace = r"C:\PSU\geog485\L3\PracticeExerciseC\Washington.gdb"
arcpy.env.overwriteOutput = True

# Set up the SQL expression to query the parking capacity
parkingQuery = "Approx_Par > " + str(parkingSpaces)
 
# Make a feature layer of park and rides that applies the SQL expression
arcpy.MakeFeatureLayer_management("ParkAndRide", "ParkAndRideLayer", parkingQuery)
 
# Copy the features to a new feature class and clean up
arcpy.CopyFeatures_management("ParkAndRideLayer", "BigParkAndRideFacilities")
arcpy.Delete_management("ParkAndRideLayer")

The video below offers some line-by-line commentary on the structure of the above solution:

Video: Alternate Solution to Lesson 3, Practice Exercise C (1:18)

This video describes an alternate solution to Lesson 3, Practice Exercise C. This solution differs from the first one starting on line 14. Instead of using SelectLayerByAttribute and storing its returned Feature Layer in a variable, this version of the script uses the MakeFeatureLayer tool to create the Feature Layer that can be referred to later using the string “ParkAndRideLayer”.

Looking closely at the parameters supplied in line 14, it’s important to note that the first parameter, “ParkAndRide” is specifying the name of a feature class. Found where? Found in the workspace, which we set to the Washington geodatabase on line 7.

The second parameter is also a string, but it’s just the name that we’re giving to the temporary, in-memory feature layer we’re creating.

So in this version of the script, when we want to write the selected features to disk, we plug that named feature layer into the CopyFeatures statement rather than a variable.

And likewise, when we want to clean up the feature layer, we again specify that same name, as a string, rather than a variable.

Credit: S. Quinn, J. Detwiler © Penn State is licensed under CC BY-NC-SA 4.0.

Lesson 3 Practice Exercise D

Lesson 3 Practice Exercise D jed124

This practice exercise requires applying both an attribute selection and a spatial selection. It is directly applicable to Project 3 in many ways. The data is the same that you used in exercises A and C.

The objective

Write a script that selects all the park and ride facilities in a given city and saves them out to a new feature class. You can test with the city of 'Federal Way'.

Tips

Start by making a feature layer from the CityBoundaries feature class that contains just the city in question. You'll then need to make a feature layer from the park and ride feature class and perform a spatial selection on it using the "WITHIN" operation. Then use the Copy Features tool as in the previous lesson to move the selected park and ride facilities into their own feature class.

Lesson 3 Practice Exercise D Solution

Lesson 3 Practice Exercise D Solution jed124

Below is one possible approach to Lesson 3 Practice Exercise D. Notice that the city name is stored near the top of the script in a variable so that it can be tested with other values.

# Selects park and ride facilities in a given target city and
#  exports them to a new feature class
 
import arcpy

targetCity = "Federal Way"     # Name of target city
arcpy.env.workspace = r"C:\PSU\geog485\L3\PracticeExerciseD\Washington.gdb"
arcpy.env.overwriteOutput = True
parkAndRideFC = "ParkAndRide"    # Name of P & R feature class
citiesFC = "CityBoundaries"      # Name of city feature class
 
# Set up the SQL expression of the query for the target city
cityQuery = "NAME = '" + targetCity + "'"

# Select just the target city
cityLayer = arcpy.SelectLayerByAttribute_management(citiesFC, "NEW_SELECTION", cityQuery)
 
# Select all park and rides in the target city
parkAndRideLayer = arcpy.SelectLayerByLocation_management(parkAndRideFC, "CONTAINED_BY", cityLayer)
 
# Copy the features to a new feature class and clean up
arcpy.CopyFeatures_management(parkAndRideLayer, "TargetParkAndRideFacilities")
 
arcpy.Delete_management(parkAndRideLayer)
arcpy.Delete_management(cityLayer)

See the video below for some line-by-line commentary on the above solution:

Solution to Lesson 3, Practice Exercise D (4:06)

In this video, I'll be explaining one solution to Lesson 3 Practice Exercise D, in which we need to select park and ride facilities in a given target city and export them to a new feature class.

This solution will have a lot of elements that were used in Exercises B and C. These patterns should start to look familiar to you.

In line 4, I import the arcpy site package, and then in line 6, I set up a variable to represent the city on which I want to query for park and rides. I'm going to be doing an attribute query for this city, and then I'm going to follow it up with a spatial query to just get the park and rides that fall within that selected city. And putting line 6 at the top like this allows me to change the city, so I can test with different values without hunting through my code and finding where that is.

In line 7, I set up the workspace, which is going to be my file geodatabase. I'll be working with 3 different feature classes, all in the same geodatabase, so setting the workspace like this makes it easier for me to refer to those feature classes.

In lines 8 and 9, I set up variables to store the names of the two input feature classes.

The rest of the code, some of it could go within try, except, and finally blocks. For simplicity, I've left those out here, although for code quality, you're asked to include those in appropriate places in your project submissions.

What I do next on line 13 is set up the SQL query string to get just the city of Federal Way on its own. So, I'm doing this attribute query here, similar to what was done in Exercise B. I'm querying on the NAME field for the city. The expression as a whole being a string needs to be in quotes. And I use string concatenation to plug the name of the target city inside single quotes.

Once I have my query string, on line 16 I can perform a selection in which I end up with a feature layer of just that target city.

And by now, the use of SelectLayerByAttribute should be familiar to you. The first parameter is the name of the feature class that I'm querying. And remember, in line 10, I set up a variable for that. The second parameter is the selection method I want to use. And then the third parameter is the SQL query expression, and it's what will narrow down the cities to, in this case, just the city of Federal Way.

Now that I have a feature layer representing Federal Way, I can now do a Select Layer by Location so that I can get just the park and rides that fall within that city. So, I pass in the ParkAndRide feature class, the type of spatial relationship which I'm using, which is Contained By, which you've seen in previous practice exercises.

And then it’s the city layer that I want to use to do the selection. So, that's the third parameter. Once I have the selection made, then I can copy those selected features into a new feature class. And just like you saw in Practice Exercise C, I'm using the Copy Features tool, and I specify my ParkAndRide feature layer as the source of the features that I want to copy.

And then the second parameter is the name of the new feature class that will be created. Because I only specified a name and not a path, the new feature class is going to be saved in the workspace, which is the Washington file geodatabase. And I'm going to call that feature class TargetParkAndRideFacilities.

Now at the end of your code, preferably within a Finally block or somewhere at the end, you're going to delete the feature layers to clean them up. And in this case, we have two feature layers to clean up.

And that's all it takes to complete this exercise.

Credit: S. Quinn, J. Detwiler © Penn State is licensed under CC BY-NC-SA 4.0.

Here is an alternate solution for this exercise:

# Selects park and ride facilities in a given target city and
#  exports them to a new feature class
 
import arcpy

targetCity = "Federal Way"     # Name of target city
arcpy.env.workspace = r"C:\PSU\geog485\L3\PracticeExerciseD\Washington.gdb"
arcpy.env.overwriteOutput = True
parkAndRideFC = "ParkAndRide"    # Name of P & R feature class
citiesFC = "CityBoundaries"      # Name of city feature class
 
# Set up the SQL expression of the query for the target city
cityQuery = "NAME = '" + targetCity + "'"

# Make feature layers for the target city and park and rides
arcpy.MakeFeatureLayer_management(citiesFC, "CityLayer", cityQuery)
arcpy.MakeFeatureLayer_management(parkAndRideFC, "ParkAndRideLayer")
 
# Select all park and rides in the target city
arcpy.SelectLayerByLocation_management("ParkAndRideLayer", "CONTAINED_BY", "CityLayer")
 
# Copy the features to a new feature class and clean up
arcpy.CopyFeatures_management("ParkAndRideLayer", "TargetParkAndRideFacilities")
 
arcpy.Delete_management("ParkAndRideLayer")
arcpy.Delete_management("CityLayer")

See the video below for some line-by-line commentary on the above solution:

Alternate Solution to Lesson 3, Practice Exercise D (1:36)

This video shows an alternate solution to Lesson 3 Practice Exercise D.

Everything is the same in this version down until line 16. On that line, I use MakeFeatureLayer to create a feature layer containing just the target city. I give that feature layer object a name of “CityLayer”.

Then on line 17, I create another FeatureLayer, this one of all the park and ride facilities. In this line, I only pass in two parameters because I want all of the park and rides. So, I'm not going to pass in any type of query string here.

With those two layers created, I can now do a Select Layer by Location so that I can get just the park and rides that fall within the selected city. So in this use of the tool, the inputs I’m supplying are strings, that are the names that I assigned to the Feature Layer objects. That’s instead of a feature layer held in a variable and the name of a feature class, as I did it in the first solution.

So coming into line 20, “ParkAndRideLayer” refers to all 365 ParkAndRide features. After line 20, it refers to just the 8 features that fall within the boundaries of Federal Way.

I then insert the name of that layer into a call to the CopyFeatures tool and then finish up my cleaning up the two feature layers.

Credit: S. Quinn, J. Detwiler © Penn State is licensed under CC BY-NC-SA 4.0.

Project 3: Data extraction for a pro hockey team's scouting department

Project 3: Data extraction for a pro hockey team's scouting department jed124

In this project, you'll use your new skills working with selections and cursors to process some data from a "raw" format into a more specialized dataset for a specific mapping purpose. The data from this exercise was retrieved from the National Hockey League's undocumented statistics API.

Download the data for this project

Background

In this exercise, suppose you are a data analyst for an NHL franchise. In preparation for the league draft, the team general manager has asked you to make it possible for him to retrieve all current players born in a particular country (say, Sweden) broken down by position. Your predecessor passed along to you a rosters geodatabase, but unfortunately the player birth countries are not included in the attributes. You do have a feature class of world country boundaries, though...

Task

Write a script that makes a separate feature class for each of the three forward positions (center, right wing, and left wing) within the boundary of Sweden. Write this script so that the user can change the country or list of positions simply by editing a couple of lines of code at the top of the script.

In browsing the attribute table, you'll note that player heights and weights are stored in imperial units (feet & inches and pounds). As part of this extraction task, to simplify comparisons against scouting reports written using metric units, you should also add two new numeric fields to the attribute table -- to the new feature classes only, not the original  -- to hold height and weight values in centimeters and kilograms. For every record, populate these fields with values based on the following formulas:

height_cm = height (in inches) * 2.54

weight_kg = weight (in pounds) * 0.453592

Your result should look something like the figure below if viewed in Pro. The custom symbolization and labels are not required to be part of your script.

Map of Project 3 results
Figure 3.5 Example output from Project 3, viewed in Pro.

The above requirements are sufficient for receiving 95% of the credit on this assignment. The remaining 5% is reserved for "Over and above" efforts, such as making a script tool, or extending the script to handle multiple target countries, other combinations of fields and queries, etc. For these over and above efforts, we prefer that you submit two copies of the script: one with the basic functionality and one with the extended functionality. This will make it more likely that you'll receive the base credit if something fails with your over and above coding.

Deliverables

Deliverables for this project are as follows:

  • the source .py file containing your script
  • a detailed written narrative that explains in plain English what the various lines of your script do.  In this document, we will be looking for you to demonstrate that you understand how the code you're submitting works. You may find it useful to emulate the practice exercise solution explanations.  Alternatively, you may record a video in which you walk the grader through your solution beginning to end.
  • a short writeup (about 300 words) describing how you approached the project, how you successfully dealt with any roadblocks, and what you learned along the way. You should include which requirements you met, or failed to meet. If you added some of the "over and above" efforts, please point these out, so the grader can look for them.

You do not have to create a script tool for this assignment; you can hard-code the initial parameters. Nevertheless, consider what might be done to make the script easier to run with different parameter values.

Once you get everything working, creating a script tool is a good way to achieve the "over and above" credit for this assignment. If you do this, then please zip all supporting files before placing them in the drop box.

Notes about the data

Take a look at the provided datasets in Pro, particularly the attribute tables. The geodatabase includes rosters for multiple years; please utilize the data for the most recent season. The rosters feature class contains a field called "position" that provides the values you need to examine (RW = Right Wing, LW = Left Wing, C = Center, D = Defenseman, G = Goaltender). As mentioned, you've been asked to extract the forward positions (RW, LW, and C) to new feature classes, but you want to write a script that's capable of handling some other combination of positions, too. There are several other fields that offer the potential for interesting queries as well, if you're looking for over and above ideas.

The Countries_WGS84 feature class has a field called "CNTRY_NAME". You can make an attribute selection on this field to select Sweden, then follow that up with a spatial selection to grab all the players that fall within this country. Finally, narrow down those players to just the ones that play the desired position.

Tips

Once you've selected Swedish players at the desired position, use the Copy Features tool to save the selected features into a new feature class, in the same fashion as in the practice exercises.

Take this project one step at a time. It's probably easiest to tackle the extraction-into-feature class portion first. Once you have all the new position feature classes created, go through them one by one, use the "Add Field" tool to add the "height_cm" and "weight_kg" fields, followed by an UpdateCursor to loop through all rows and populate these new fields with appropriate values.

It might be easiest to get the whole process working with a single position, then add the loop for all the positions later after you have finalized all the other script logic.

The height field is of type Text to allow for the ' and " characters. The string slicing notation covered earlier in the course can be used to obtain the feet and inches components of the player height. Use the inches to centimeters formula shown above to compute the height in metric units.

For the purposes of this exercise, don't worry about capturing points that fall barely outside the edge of the boundary (e.g., points in coastal cities that appear in the ocean). To capture all these in real life, you would just need to obtain a higher resolution boundary file. The code would be the same.

You should be able to complete this exercise in about 50 lines of code (including whitespace and comments). If your code gets much longer than this, you are probably missing an easier way.

Final Project proposal assignment

Final Project proposal assignment jed124

At some point during this course, you've hopefully felt "the lightbulb go on" regarding how you might apply the lesson material to your own tasks in the GIS workplace. To conclude this course, you will be expected to complete an individual project that uses Python automation to make some GIS task easier, faster, or more accurate.

The project goal is up to you, but it is preferably one that relates to your current field of work or a field in which you have a personal interest. Since you're defining the requirements from the beginning, there is no "over and above" credit factored into this project grade. The number of lines of code you write is not as important as the problem you solved. However, we encourage you to propose a project that meets or even slightly exceeds your relative level of experience with programming.

You will have two weeks at the end of the term to dedicate completely toward the project and the Review Quiz. This is your chance to apply what you've learned about Python to a problem that really interests you.

One week into Lesson 4, you are required to submit a project proposal to the Final Project Proposal Drop Box in Canvas. This proposal must clearly explain:

  1. the task you intend to accomplish using Python;
  2. how your proposed solution will make the task easier, faster, and/or more accurate; also explain why your task could not simply be accomplished using the "out-of-the-box" tools from Esri, or why your script gives a particular advantage over those tools;
  3. the deliverables you will submit for the project; a well-documented script tool is highly encouraged; if the script requires data, describe how the instructors will be able to evaluate your script; possible solutions are to zip a sample dataset for the instructors, demonstrate your script during a Zoom session, or make the script flexible enough that it could be used with any dataset.

The proposal will contribute toward 10% of your Final Project grade, and will be used to help grade the rest of your project. Your proposal must be approved by the instructors before you move forward with coding the project. We may also offer some guidance on how to approach your particular task, and we'll provide thoughts on whether you are taking on too much or too little work to be successful.

As you work on your project, you're encouraged to seek help from all resources discussed in this class, including existing code samples and scripts on the Internet. If you re-use any long sections of code that you found on the Internet, please thoroughly explain in your project write up how you found it, tested it, and extracted only the parts you needed.

Project ideas

If you're having trouble thinking up a project, you can derive a proposal from one of the suggestions here. You may have to spend a little bit of time acquiring or making up some test datasets to fit these project ideas. I also suggest that you read through the Lesson 4 material before selecting a project, just so you have a better idea of what types of things are possible with Python.

  • Compare dataset statistics: Make a tool or script that takes two feature classes as input, along with a field name. The tool should check whether the field is numeric and exists in both feature classes. If both these conditions are met, the tool should calculate statistics for that field for both feature classes and report the difference. Statistics could be sum, average, standard deviation (if you are feeling brave), etc.
  • Compare existence of features in two datasets: Make a tool or script that reads two feature classes based on a key field (such as OBJECTID). The tool should figure out which features only appear in one of the feature classes and write them to a third feature class. As a variation on this, the tool could figure out which features appear in both feature classes and write them to a third feature class. You could even allow the tool user to set a parameter to determine this.
  • Calculate and compare areas: Make a tool or script that tallies the areas of all geometries in a feature class, or subsets of geometries based on a query and reports the difference. For example, this tool might compare "Acres of privately owned wetlands in 2018" and "Acres of privately owned wetlands in 2019."
  • Find and replace: Make a tool flexible enough to search for any term in any field in a feature class and replace it with another user-provided term. Ensure in your code that users cannot modify the critical fields' OBJECTIDs or SHAPEs. Also ensure that partial strings are supported, such that if the search term is found anywhere within a string, it will be replaced while leaving the rest of the string intact.
  • Parse KML, XML, or JSON and write to a feature class: Make a tool or script that reads a KML file, or an XML or JSON response from a Web service, and writes the geometries to a feature class. (You'll get some exposure to reading text-based files in Lesson 4.)
  • Concatenate name fields: Write a tool or script that takes a feature class as input, as well as "First name," "Middle name," and "Last name" parameters that represent fields in the feature class. Your tool should add a new field for each record that contains the first, middle, and last names separated by one space. Your tool should intuitively handle blank records and records that have no middle name.
  • Process rasters to meet an organizational need: Write a tool or script that takes a raw elevation dataset (such as a DEM), clips it to your study area, creates both hillshade and slope rasters, and projects them into your organization's most commonly used projection. Expose the study area feature class to the end user as a parameter.
  • Parse raw textual data and write to feature class: Find some data available on the Internet that has lat/lon locations but is in text-based format with no Esri feature class available (for example, weather station readings or GPS tracks). If you need to, you can copy the HTML out of the Web page and paste it in a .txt file to help you get started. Read the .txt file and write the data to a new feature class. (You'll get some exposure to reading text-based files in Lesson 4.)
  • Make an ArcGIS Pro Project repair tool: Make a tool that takes an old and new workspace path as inputs and then repairs all the broken data links in an ArcGIS Pro Project. (You can do this using the arcpy.mp module described in Lesson 4.)
  • Make a "map book": Make a tool that opens a series of maps or layouts and constructs a multi-page PDF from them (You can do this using the arcpy.mp module described in Lesson 4.)

Lesson 4: Practical Python for the GIS analyst

Lesson 4: Practical Python for the GIS analyst jed124

The links below provide an outline of the material for this lesson. Be sure to carefully read through the entire lesson before returning to Canvas to submit your assignments.

Lesson 4 Overview

Lesson 4 Overview jed124

Lesson 4 contains a variety of subjects to help you use Python more effectively as a GIS analyst. The sections of this lesson will reinforce what you've learned already, while introducing some new concepts that will help take your automation to the next level.

You'll learn now to modularize a section of code to make it usable in multiple places. You'll learn how to use new Python modules, such as os, to open and read files; then you'll transfer the information in those files into geographic datasets that can be read by ArcGIS. Finally, you'll learn how to use your operating system to automatically run Python scripts at any time of day.

Lesson 4 Checklist

Lesson 4 Checklist jed124

Lesson 4 explores some more advanced Python concepts, including reading and parsing text. To complete Lesson 4, do the following:

  1. One week into the lesson, submit your Final Project proposal to the instructors using the Final Project Proposal Drop Box in the Final Project lesson under the Modules tab in Canvas. For the exact due date, see the Calendar tab in Canvas.
  2. Work through the course lesson materials.
  3. Read Zandbergen chapters 4.17, and 9.1 - 9.7. In the online lesson pages, I have inserted instructions about when it is most appropriate to read each of these chapters.
  4. Complete Project 4 and submit your zipped deliverables to the Project 4 drop box in Canvas .
  5. Complete the Lesson 4 Quiz in Canvas .

Do items 1 - 3 (including any of the practice exercises you want to attempt) during the first week of the lesson. You will need time during the second week of the lesson to concentrate on the project and the quiz.

Lesson objectives

By the end of this lesson, you should:

  • understand how to create and use functions and modules;
  • be able to read and parse text in Python (e.g., using the csv module);
  • be able to create new geometries and insert them into feature classes using insert cursors;
  • understand how to automate tasks with scheduling and batch files;
  • know the basics of dealing with Pro projects in arcpy;
  • understand Python dictionaries;
  • be able to write ArcGIS scripts that create new features and feature classes, e.g., from information in text files.

4.1 Functions and modules

4.1 Functions and modules jed124

To start this lesson, we'll talk about functions and how you can use them to your benefit as you begin writing longer scripts.

A function contains one focused piece of functionality in a reusable section of code. The idea is that you write the function once, then use, or call, it throughout your code whenever you need to. We also introduced modules as a collection of classes in an earlier lesson. A module can also be a group of related functions, so you can use them in many different scripts. When used appropriately, functions eliminate code repetition and make the main body of your script shorter and more readable.

Functions exist in many programming languages, and each has its way of defining a function. In Python, you define a function using the def statement. Each line in the function that follows the def is indented. Here's a simple function that reads the radius of a circle and reports the circle's approximate area. (Remember that the area is equal to pi [3.14159...] multiplied by the square [** 2] of the radius.)

>>> def findArea(radius):
... 	area = 3.14159 * radius ** 2
... 	return area
... 
>>> findArea(3)
28.27431

Notice from the above example that functions can take parameters, or arguments. When you call the above function, you supply the radius of the circle in parentheses. The function returns the area (notice the return statement, which is new to you).

Thus, to find the area of a circle with a radius of 3 inches, you could make the function call findArea(3) and get the return value 28.27431 (inches).

It's common to assign the returned value to a variable and use it later in your code. For example, you could add these lines in the Python Interpreter:

In [1]: aLargerCircle = findArea(4)
In [2]: print (aLargerCircle)
50.26544

Please click this link to take a close look at what happens when thefindArea(...) function is called and executed in this example using the code execution visualization feature of pythontutor.com. In the browser window that opens, you will see the code in the top left. Clicking the "Forward" and "Back" buttons allows you to step through the code, while seeing what Python stores in memory at any given moment in the window in the top right.

  • After clicking the "Forward" button the first time, the definition of the function is read in and a function object is created that is globally accessible under the name findArea.
  • In the next step, the call of findArea(4) in line 5 is executed. This results in a new variable with the name of the function's only parameter, so radius, being created and assigned the value 4. This variable is a local variable that is only accessible from the code inside the body of the function though.
  • Next, the program execution jumps to the code in the function definition starting in line 1.
  • In step 5, line 2 is executed and another local variable with the name area is created and assigned the result of the computation in line 2 using the current value of the local variable radius, which is 4.
  • In the next step, the return statement in line 3 sets the return value of the function to the current value of local variable area (50.2654).
  • When pressing "Forward" again, the program execution jumps back to line 5 from where the function was called. Since we are now leaving the execution of the function body, all local variables of the function (radius and area) are discarded. The return value is used in place of the function call in line 5, in this case meaning that it is assigned to a new global variable called aLargerCircle now appearing in memory.
  • In the final step, the value assigned to this variable (50.2654) is printed out.

It is important to understand the mechanisms of (a) jumping from the call of the function (line 5) to the code of the function definition and back, and of (b) creating local variables for the parameter(s) and all new variables defined in the function body and how they are discarded again when the end of the function body has been reached. The return value is the only piece of information that remains and is given back from the execution of the function.

A function is not required to return any value. For example, you may have a function that takes the path of a text file as a parameter, reads the first line of the file, and prints that line to the Console. Since all the printing logic is performed inside the function, there is really no return value.

Neither is a function required to take a parameter. For example, you might write a function that retrieves or calculates some static value. Try this in the Console:

In [1]: def getCurrentPresident():
...: 	return "Joseph R. Biden Jr"
...: 
In [2]: president = getCurrentPresident()
In [3]: print (president)
Joseph R. Biden Jr

The function getCurrentPresident() doesn't take any user-supplied parameters. Its only "purpose in life" is to return the name of the current president. It cannot be asked to do anything else.

Modules

You may be wondering what advantage you gain by putting the above getCurrentPresident() logic in a function. Why couldn't you just define a string currentPresident and set it equal to "Joseph R. Biden Jr"? The big reason is re-usability.

Suppose you maintain 20 different scripts, each of which works with the name of the current President in some way. You know that the name of the current President will eventually change. Therefore, you could put this function in what's known as a module file and reference that file inside your 20 different scripts. When the name of the President changes, you don't have to open 20 scripts and change them. Instead, you just open the module file and make the change once.

You may remember that you've already worked with some of Python's built-in modules. The Hi Ho! Cherry O example in Lesson 2 imported the random module so that the script could generate a random number for the spinner result. This spared you the effort of writing or pasting any random number generating code into your script.

You've also probably gotten used to the pattern of importing the arcpy site package at the beginning of your scripts. A site package can contain numerous modules. In the case of arcpy, these modules include Esri functions for geoprocessing.

As you use Python in your GIS work, you'll probably write functions that are useful in many types of scripts. These functions might convert a coordinate from one projection to another, or create a polygon from a list of coordinates. These functions are perfect candidates for modules. If you ever want to improve on your code, you can make the change once in your module instead of finding each script where you duplicated the code.

Creating a module

To create a module, create a new script in PyScripter and save it with the standard .py extension; but instead of writing start-to-finish scripting logic, just write some functions. Here's what a simple module file might look like. This module only contains one function, which adds a set of points to a feature class given a Python list of coordinates.

# This module is saved as practiceModule1.py

# The function below creates points from a list of coordinates
#  Example list: [[-113,23][-120,36][-116,-2]]

def createPoints(coordinateList, featureClass):

    # Import arcpy and create an insert cursor  
    import arcpy

    with arcpy.da.InsertCursor(featureClass, ("SHAPE@XY")) as rowInserter:

        # Loop through each coordinate in the list and make a point    
        for coordinatePair in coordinateList:
            rowInserter.insertRow([coordinatePair])

The above function createPoints could be useful in various scripts, so it's very appropriate for putting in a module. Notice that this script has to work with an insert cursor, so it requires arcpy. It's legal to import a site package or module within a module.

Also notice that arcpy is imported within the function, not at the very top of the module like you are accustomed to seeing. This is done for performance reasons. You may add more functions to this module later that do not require arcpy. You should only do the work of importing arcpy when necessary, that is, if a function is called that requires it.

The arcpy site package is only available inside the scope of this function. If other functions in your practice module were called, the arcpy module would not be available to those functions. Scope applies also to variables that you create in this function, such as rowInserter. Scope can be further limited by loops that you put in your function. The variable coordinatePair is only valid inside the for loop inside this particular function. If you tried to use it elsewhere, it would be out of scope and unavailable.

Using a module

So how could you use the above module in a script? Imagine that the module above is saved on its own as practiceModule1.py. Below is an example of a separate script that imports practiceModule1.

# This script is saved as add_my_points.py

# Import the module containing a function we want to call
import practiceModule1

# Define point list and shapefile to edit
myWorldLocations = [[-123.9,47.0],[-118.2,34.1],[-112.7,40.2],[-63.2,-38.7]]
myWorldFeatureClass = "c:\\Data\\WorldPoints.shp"

# Call the createPoints function from practiceModule1
practiceModule1.createPoints(myWorldLocations, myWorldFeatureClass)

The above script is simple and easy to read because you didn't have to include all the logic for creating the points. That is taken care of by the createPoints function in the module you imported, practiceModule1. Notice that to call a function from a module, you need to use the syntax module.function().

Readings

To reinforce the material in this section, we'd like you to read Zandbergen's chapter on Creating Python functions and classes. However, you won't find that chapter in his Python Scripting for ArcGIS Pro. When revising his original ArcMap edition of the book, he decided to write a companion Advanced Python Scripting for ArcGIS Pro and he moved this chapter to the advanced book. You do not have to buy the advanced book just for this chapter. You can access the content (Chapter 2) through the e-book made available through the Penn State Library

Practice

Before moving ahead, get some practice in PyScripter by trying to write the following functions. These functions are not graded, but the experience of writing them will help you in Project 4. Use the course forums to help each other.

  • A function that returns the perimeter of a square given the length of one side.
  • A function that takes a path to a feature class as a parameter and returns a Python list of the fields in that feature class. Practice calling the function and printing the list. However, do not print the list within the function.
  • A function that returns the Euclidean distance between any two coordinates. The coordinates can be supplied as parameters in the form (x1, y1, x2, y2). For example, if your coordinates were (312088, 60271) and (312606, 59468), your function call might look like this: findDistance(312088, 60271, 312606, 59468). Use the Pythagorean formula A ** 2 + B ** 2 = C ** 2. For an extra challenge, see if you can handle negative coordinates.

The best practice is to put your functions inside a module and see if you can successfully call them from a separate script. If you try to step through your code using the debugger, you'll notice that the debugger helpfully moves back and forth between the script and the module whenever you call a function in the module.

4.2 Python Dictionaries

4.2 Python Dictionaries jed124

In programming, we often want to store larger amounts of data that somehow belongs together inside a single variable. In Lesson 2, you already learned about lists, which provide one option to do so. As long as available memory permits, you can store as many elements in a list as you wish and the append(...) method allows you to add more elements to an existing list.

Dictionaries are another data structure that allows for storing complex information in a single variable. While lists store elements in a simple sequence and the elements are then accessed based on their index in the sequence, the elements stored in a dictionary consist of key-value pairs and one always uses the key to retrieve the corresponding values from the dictionary. It works like in a real dictionary, where you look up information (the stored value) under a particular keyword (the key).

Dictionaries can be useful to realize a mapping, for instance from English words to the corresponding words in Spanish. Here is how you can create such a dictionary for just the numbers from one to four:

In [1]: englishToSpanishDic = { "one": "uno", "two": "dos", "three": "tres", "four": "cuatro"  }

The curly brackets { } delimit the dictionary, similarly to how squared brackets [ ] do for lists. Inside the dictionary, we have four key-value pairs separated by commas. The key and value for each pair are separated by a colon. The key appears on the left of the colon, while the value stored under the key appears on the right side of the colon.

We can now use the dictionary stored in variable englishToSpanishDic to look up the Spanish word for an English number, e.g.

In [2]: print (englishToSpanishDic["two"])
dos

To retrieve some value stored in the dictionary, we here use the name of the variable followed by squared brackets containing the key under which the value is stored in the dictionary. If we use the same notation but on the left side of an assignment operator (=), we can add a new key-value pair to an existing dictionary:

In [3]: englishToSpanishDic["five"] = "cinco"    
In [4]: print (englishToSpanishDic)
{'four': 'cuatro', 'three': 'tres', 'five': 'cinco', 'two': 'dos', 'one': 'uno'}

We here added the value "cinco" appearing on the right side of the equal sign under the key "five" to the dictionary. If something would have already been stored under the key "five" in the dictionary, the stored value would have been overwritten. You may have noticed that the order of the elements of the dictionary in the output has changed, but that doesn’t matter since we always access the elements in a dictionary via their key. If our dictionary would contain many more word pairs, we could use it to realize a very primitive translator that would go through an English text word-by-word and replace each word by the corresponding Spanish word retrieved from the dictionary. Admittedly, using this simple approach would probably result in pretty hilarious translations.

Now let’s use Python dictionaries to do something a bit more complex. Let’s simulate the process of creating a book index that lists the page numbers on which certain keywords occur. We want to start with an empty dictionary and then go through the book page-by-page. Whenever we encounter a word that we think is important enough to be listed in the index, we add it and the page number to the dictionary.

To create an empty dictionary in a variable called bookIndex, we use the notation with the curly brackets but nothing in between:

In [5]: bookIndex = {}
In [6]: print (bookIndex)
{} 

Now, let’s say the first keyword we encounter in the imaginary programming book we are going through is the word "function" on page 2. We now want to store the page number 2 (value) under the keyword "function" (key) in the dictionary. But since keywords can appear on many pages, what we want to store as values in the dictionary are not individual numbers but lists of page numbers. Therefore, what we put into our dictionary is a list with the number 2 as its only element:

In [7]: bookIndex["function"] =  [2]
In [8]: print (bookIndex)
{'function': [2]} 

Next, we encounter the keyword "module" on page 3. So, we add it to the dictionary in the same way:

In [9]: bookIndex["module"] =  [3]
In [10]: print (bookIndex)
{'function': [2], 'module': [3]}

So now our dictionary contains two key-value pairs, and for each key it stores a list with just a single page number. Let’s say we next encounter the keyword “function” a second time, this time on page 5. Our code to add the additional page number to the list stored under the key “function” now needs to look a bit differently because we already have something stored for it in the dictionary, and we do not want to overwrite that information. Instead, we retrieve the currently stored list of page numbers and add the new number to it with append(…):

In [11]: pages = bookIndex["function"] 
In [12]: pages.append(5)
In [13]: print (bookIndex)
{'function': [2, 5], 'module': [3]}
In [14]: print (bookIndex["function"])
[2, 5]

Please note that we didn’t have to put the list of page numbers stored in variable pages back into the dictionary after adding the new page number. Both, variable pages and the dictionary refer to the same list such that appending the number changes both. Our dictionary now contains a list of two page numbers for the key “function” and still a list with just one page number for the key “module”. Surely, you can imagine how we would build up a large dictionary for the entire book by continuing this process. Dictionaries can be used in concert with a for loop to go through the keys of the elements in the dictionary. This can be used to print out the content of an entire dictionary:

In [15]: for k in bookIndex:  # loop through keys of the dictionary
...:    print ("keyword: " + k)                # print the key
...:    print ("pages: " + str(bookIndex[k]))  # print the value
...:
keyword: function
pages: [2, 5]
keyword: module
pages: [3]

When adding the second page number for “function”, we ourselves decided that this needs to be handled differently than when adding the first page number. But how could this be realized in code? We can check whether something is already stored under a key in a dictionary using an if-statement together with the “in” operator:

In [16]: keyword = "function"
In [17]: if keyword in bookIndex: 
...:    print ("entry exists")
...: else:
...:    print ("entry does not exist")
...: 
entry exists

So assuming we have the current keyword stored in variable word and the corresponding page number stored in variable pageNo, the following piece of code would decide by itself how to add the new page number to the dictionary:

word = "module"
pageNo = 7

if word in bookIndex:
	# entry for word already exists, so we just add page
	pages = bookIndex[word]	
	pages.append(pageNo)
else:
	# no entry for word exists, so we add new entry
	bookIndex[word] = [pageNo] 

A more sophisticated version of this code would also check whether the list of page numbers retrieved in the if-block already contains the new page number to deal with the case that a keyword occurs more than once on the same page. Feel free to think about how this could be included.

Readings

Read Zandbergen section 4.17 on using Python dictionaries.

4.3 Reading and parsing text using the Python csv module

4.3 Reading and parsing text using the Python csv module jed124

One of the best ways to increase your effectiveness as a GIS programmer is to learn how to manipulate text-based information. In Lesson 3, we talked about how to read data in ArcGIS's native formats, such as feature classes. But often GIS data is collected and shared in more "raw" formats such as a spreadsheet in CSV (comma-separated value) format, a list of coordinates in a text file, or an XML response received through a Web service.

When faced with these files, you should first understand if your GIS software already comes with a tool or script that can read or convert the data to a format it can use. If no tool or script exists, you'll need to do some programmatic work to read the file and separate out the pieces of text that you really need. This is called parsing the text.

For example, a Web service may return you many lines of XML describing all the readings at a weather station, when all you're really interested in are the coordinates of the weather station and the annual average temperature. Parsing the response involves writing some code to read through the lines and tags in the XML and isolating only those three values.

There are several different approaches to parsing. Usually, the wisest is to see if some Python module exists that will examine the text for you and turn it into an object that you can then work with. In this lesson, you will work with the Python "csv" module that can read comma-delimited values and turn them into a Python list. Other helpful libraries such as this include lxml and xml.dom for parsing XML, and BeautifulSoup for parsing HTML.

If a module or library doesn't exist that fits your parsing needs, then you'll have to extract the information from the text yourself using Python's string manipulation methods. One of the most helpful ones is string.split(), which turns a big string into a list of smaller strings based on some delimiting character, such as a space or comma. For instance, the following example shows how to split a string in variable text at each occurence of a comma and produce a list of strings of the different parts:

>>> text = "green,red,blue"
>>> text.split(",")
['green', 'red', 'blue']

When you write your own parser, however, it's hard to anticipate all the exceptional cases you might run across. For example, sometimes a comma-separated value file might have substrings that naturally contain commas, such as dates or addresses. In these cases, splitting the string using a simple comma as the delimiter is not sufficient, and you need to add extra logic.

Another pitfall when parsing is the use of "magic numbers" to slice off a particular number of characters in a string, to refer to a specific column number in a spreadsheet, and so on. If the structure of the data changes, or if the script is applied to data with a slightly different structure, the code could be rendered inoperable and would require some precision surgery to fix. People who read your code and see a number other than 0 (to begin a series) or 1 (to increment a counter) will often be left wondering how the number was derived and what it refers to. In programming, numbers other than 0 or 1 are magic numbers that should typically be avoided, or at least accompanied by a comment explaining what the number refers to.

There are an infinite number of parsing scenarios that you can encounter. This lesson will attempt to teach you the general approach by walking through just one module and example. In your final project for this course, you may choose to explore parsing other types of files.

The Python csv module

A common text-based data interchange format is the comma-separated value (CSV) file. This is often used when transferring spreadsheets or other tabular data. Each line in the file represents a row of the dataset, and the columns in the data are separated by commas. The file often begins with a header line containing all the field names.

Spreadsheet programs like Microsoft Excel can understand the CSV structure and display all the values in a row-column grid. A CSV file may look a little messier when you open it in a text editor, but it can be helpful to always continue thinking of it as a grid structure. If you had a Python list of rows and a Python list of column values for each row, you could use looping logic to pull out any value you needed. This is exactly what the Python csv module gives you.

It's easiest to learn about the csv module by looking at a real example. The scenario below shows how the csv module can be used to parse information out of a GPS track file.

Introducing the GPS track parsing example

This example reads a text file collected from a GPS unit. The lines in the file represent readings taken from the GPS unit as the user traveled along a path. In this section of the lesson, you'll learn one way to parse out the coordinates from each reading. The next section of the lesson uses a variation of this example to show how you could write the user's track to a polyline feature class.

The file for this example is called gps_track.txt, and it looks something like the text string shown below. (Please note, line breaks have been added to the file shown below to ensure that the text fits within the page margins. Click on this link to the gps track.txt file to see what the text file actually looks like.)

type,ident,lat,long,y_proj,x_proj,new_seg,display,color,altitude,depth,temp,time,model,filename,ltime
TRACK,ACTIVE LOG,40.78966141,-77.85948515,4627251.76270444,1779451.21349775,True,False,
    255,358.228393554688,0,0,2008/06/11-14:08:30,eTrex Venture, ,2008/06/11 09:08:30
TRACK,ACTIVE LOG,40.78963995,-77.85954952,4627248.40489401,1779446.18060893,False,False,
    255,358.228393554688,0,0,2008/06/11-14:09:43,eTrex Venture, ,2008/06/11 09:09:43
TRACK,ACTIVE LOG,40.78961849,-77.85957098,4627245.69008772,1779444.78476531,False,False,
    255,357.747802734375,0,0,2008/06/11-14:09:44,eTrex Venture, ,2008/06/11 09:09:44
TRACK,ACTIVE LOG,40.78953266,-77.85965681,4627234.83213242,1779439.20202706,False,False,
    255,353.421875,0,0,2008/06/11-14:10:18,eTrex Venture, ,2008/06/11 09:10:18
TRACK,ACTIVE LOG,40.78957558,-77.85972118,4627238.65402635,1779432.89982442,False,False,
    255,356.786376953125,0,0,2008/06/11-14:11:57,eTrex Venture, ,2008/06/11 09:11:57
TRACK,ACTIVE LOG,40.78968287,-77.85976410,4627249.97592111,1779427.14663093,False,False,
    255,354.383178710938,0,0,2008/06/11-14:12:18,eTrex Venture, ,2008/06/11 09:12:18
TRACK,ACTIVE LOG,40.78979015,-77.85961390,4627264.19055204,1779437.76243578,False,False,
    255,351.499145507813,0,0,2008/06/11-14:12:50,eTrex Venture, ,2008/06/11 09:12:50
etc. ...

Notice that the file starts with a header line, explaining the meaning of the values contained in the readings from the GPS unit. Each subsequent line contains one reading. The goal for this example is to create a Python list containing the X,Y coordinates from each reading. Specifically, the script should be able to read the above file and print a text string like the one shown below.

[['-77.85948515', '40.78966141'], ['-77.85954952', '40.78963995'], ['-77.85957098', '40.78961849'], etc.]

Approach for parsing the GPS track

Before you start parsing a file, it's helpful to outline what you're going to do and break up the task into manageable chunks. Here's some pseudocode for the approach we'll take in this example:

  1. Open the file.
  2. Read the header line.
  3. Loop through the header line to find the index positions of the "lat" and "long" values.
  4. Read the rest of the lines one by one.
  5. Find the values in the list that correspond to the lat and long coordinates and write them to a new list.

Importing the module

When you work with the csv module, you need to explicitly import it at the top of your script, just like you do with arcpy.

import csv

You don't have to install anything special to get the csv module; it just comes with the base Python installation.

Opening the file and creating the CSV reader

The first thing the script needs to do is open the file. Python contains a built-in open() method for doing this. The parameters for this method are the path to the file and the mode in which you want to open the file (read, write, etc.). In this example, "r" stands for read-only mode. If you wanted to write items to the file, you would use "w" as the mode. The open() method is commonly used within a "with" statement, like cursors were instantiated in the previous lesson, for much the same reason: it simplifies "cleanup." In the case of opening a file, using "with" is done so that the file is closed automatically when execution of the "with" block is completed. A close() method does exist, but need not be called explicitly.

with open("C:\\data\\Geog485\\gps_track.txt", "r") as gpsTrack:

Notice that your file does not need to have the extension .csv in order to be read by the CSV module. It can be suffixed .txt as long as the text in the file conforms to the CSV pattern where commas separate the columns and carriage returns separate the rows. Once the file is open, you create a CSV reader object, in this manner:

csvReader = csv.reader(gpsTrack)

This object is kind of like a cursor. You can use the next() method to go to the next line, but you can also use it with a for loop to iterate through all the lines of the file. Note that this and the following lines concerned with parsing the CSV file must be indented to be considered part of the "with" block.

Reading the header line

The header line of a CSV file is different from the other lines. It gets you the information about all the field names. Therefore, you will examine this line a little differently than the other lines. First, you advance the CSV reader to the header line by using the next() method, like this:

header = next(csvReader)

This gives you back a Python list of each item in the header. Remember that the header was a pretty long string beginning with: "type,ident,lat,long...". The CSV reader breaks the header up into a list of parts that can be referenced by an index number. The default delimiter, or separating character, for these parts is the comma. Therefore, header[0] would have the value "type", header[1] would have the value "ident", and so on.

We are most interested in pulling latitude and longitude values out of this file, therefore we're going to have to take note of the position of the "lat" and "long" columns in this file. Using the logic above, you would use header[2] to get "lat" and header[3] to get "long". However, what if you got some other file where these field names were all in a different order? You could not be sure that the column with index 2 represented "lat" and so on.

A safer way to parse is to use the list.index() method and ask the list to give you the index position corresponding to a particular field name, like this:

latIndex = header.index("lat")
lonIndex = header.index("long")

In our case, latIndex would have a value of 2 and lonIndex would have a value of 3, but our code is now flexible enough to handle those columns in other positions.

Processing the rest of the lines in the file

The rest of the file can be read using a loop. In this case, you treat the csvReader as an iterable list of the remaining lines in the file. Each run of the loop takes a row and breaks it into a Python list of values. If we get the value with index 2 (represented by the variable latIndex), then we have the latitude. If we get the value with index 3 (represented by the variable lonIndex), then we get the longitude. Once we get these values, we can add them to a list we made, called coordList:

# Make an empty list
coordList = []

# Loop through the lines in the file and get each coordinate
for row in csvReader:
    lat = row[latIndex]
    lon = row[lonIndex]
    coordList.append([lat,lon])

# Print the coordinate list
print (coordList)

Note a few important things about the above code:

  • coordList actually contains a bunch of small lists within a big list. Each small list is a coordinate pair representing the x (longitude) and y (latitude) location of one GPS reading.
  • The list.append() method is used to add items to coordList. Notice again that you can append a list itself (representing the coordinate pair) using this method.

Full code for the example

Here's the full code for the example. Feel free to download the text file and try it out on your computer.

# This script reads a GPS track in CSV format and
#  prints a list of coordinate pairs
import csv

# Open the input file
with open("C:\\Users\\jed124\\Documents\\geog485\\Lesson4\\gps_track.txt", "r") as gpsTrack:
    #Set up CSV reader and process the header
    csvReader = csv.reader(gpsTrack)
    header = next(csvReader)
    latIndex = header.index("lat")
    lonIndex = header.index("long")
 
    # Make an empty list
    coordList = []
 
    # Loop through the lines in the file and get each coordinate
    for row in csvReader:
        lat = row[latIndex]
        lon = row[lonIndex]
        coordList.append([lat,lon])
 
    # Print the coordinate list
    print (coordList)

Applications of this script

You might be asking at this point, "What good does this list of coordinates do for me?" Admittedly, the data is still very "raw." It cannot be read directly in this state by a GIS. However, having the coordinates in a Python list makes them easy to get into other formats that can be visualized. For example, these coordinates could be written to points in a feature class, or vertices in a polyline or polygon feature class. The list of points could also be sent to a Web service for reverse geocoding, or finding the address associated with each point. The points could also be plotted on top of a Web map using programming tools like the ArcGIS JavaScript API. Or, if you were feeling really ambitious, you might use Python to write a new file in KML format, which could be viewed in 3D in Google Earth.

Summary

Parsing any piece of text requires you to be familiar with file opening and reading methods, the structure of the text you're going to parse, the available parsing modules that fit your text structure, and string manipulation methods. In the preceding example, we parsed a simple text file, extracting coordinates collected by a handheld GPS unit. We used the csv module to break up each GPS reading and find the latitude and longitude values. In the next section of the lesson, you'll learn how you could do more with this information by writing the coordinates to a polyline dataset.

As you use Python in your GIS work, you could encounter a variety of parsing tasks. As you approach these, don't be afraid to seek help from Internet examples, code reference topics such as the ones linked to in this lesson, and your textbook.

4.4 Writing geometries

4.4 Writing geometries jed124

As you parse out geographic information from "raw" sources such as text files, you may want to convert it to a format that is native to your GIS. This section of the lesson discusses how to write vector geometries to ArcGIS feature classes. We'll read through the same GPS-produced text file from the previous section, but this time we'll add the extra step of writing each coordinate to a polyline shapefile.

You've already had some experience writing point geometries when we learned about insert cursors. To review, if you put the X and Y coordinates in a tuple or list, you can plug it in the tuple given to insertRow() for the geometry field referred to using the "SHAPE@XY" token (see page 4.1).

# Create coordinate tuple
inPoint = (-121.34, 47.1)
...

# Create new row
cursor.insertRow((inPoint))

Esri made it possible to create polylines and polygons by putting together a list or tuple of coordinate pairs (vertices) like the one above in sequence. When you pass that list or tuple to the insertRow() method, arcpy will "connect the dots" to create a polyline or a polygon (depending on the geometry type of the feature class you opened the insert cursor on). Multi-part and multi-ring geometries are a bit more complicated than that, but that's the basic idea for single-part geometries.

The code below creates an empty list and adds three points using the list.append() method. Then that list is plugged into an insertRow() statement, where it will result in the creation of a Polyline object.

with arcpy.da.InsertCursor(polylineFC, (
"SHAPE@"
)) as cursor:

In addition to the requirement that the geometry be single-part, you should also note that this list-of-coordinate-pairs approach to geometry creation requires that the spatial reference of your coordinate data matches the spatial reference of the feature class. If the coordinates are in a different spatial reference, you can still create the geometry, but you'll need to use the alternate approach covered at the bottom of this page.

Of course, you usually won't create points manually in your code like this with hard-coded coordinates. It's more likely that you'll parse out the coordinates from a file or capture them from some external source, such as a series of mouse clicks on the screen.

Creating a polyline from a GPS track

Here's how you could parse out coordinates from a GPS-created text file like the one in the previous section of the lesson. This code reads all the points captured by the GPS and adds them to one long polyline. The polyline is then written to an empty, pre-existing polyline shapefile with a geographic coordinate system named tracklines.shp. If you didn't have a shapefile already on disk, you could use the Create Feature Class tool to create one with your script.

# This script reads a GPS track in CSV format and
#  writes geometries from the list of coordinate pairs
import csv
import arcpy

polylineFC = r"C:\PSU\geog485\L4\trackLines.shp"
 
# Open the input file 
with open(r"C:\PSU\geog485\L4\gps_track.txt", "r") as gpsTrack:
    # Set up CSV reader and process the header
    csvReader = csv.reader(gpsTrack)
    header = next(csvReader)
    latIndex = header.index("lat")  
    lonIndex = header.index("long")
 
    # Create an empty list
    vertices = []
 
    # Loop through the lines in the file and get each coordinate
    for row in csvReader:
        lat = float(row[latIndex])
        lon = float(row[lonIndex])
 
        # Put the coords into a tuple and add it to the list
        vertex = (lon,lat)
        vertices.append(vertex)
 
    # Write the coordinate list to the feature class as a polyline feature
    with arcpy.da.InsertCursor(polylineFC, ('SHAPE@')) as cursor:
        cursor.insertRow((vertices,))

The above script starts out the same as the one in the previous section of the lesson. First, it parses the header line of the file to determine the position of the latitude and longitude coordinates in each reading. After that, a loop is initiated that reads each line and creates a tuple containing the longitude and latitude values. At the end of the loop, the tuple is added to the list.

Once all the lines have been read, the loop exits and an insert cursor is created using "SHAPE@" as the only element in the tuple of affected fields. Then the insertRow() method is called, passing it the list of coordinate tuples within a tuple. It's very important to note that this statement is cursor.insertRow((vertices,)), not cursor.insertRow(vertices,). Just as the fields supplied when opening the cursor must be in the form of a tuple, even if it's only one field, the values in the insertRow() statement must be a tuple.

Remember that the cursor places a lock on your dataset, so this script doesn't create the cursor until absolutely necessary (in other words, after the loop). Finally, note that the tuple plugged into the insertRow() statement includes a trailing comma. This odd syntax is needed only in the case where the tuple contains just a single item. For tuples containing two or more items, the trailing comma is not needed. Alternatively, the coordinate list could be supplied as a list within a list ([vertices]) rather than within a tuple, in which case a trailing comma is also not needed.

Extending the example for multiple polylines

Just for fun, suppose your GPS allows you to mark the start and stop of different tracks. How would you handle this in the code? You can download this modified text file with multiple tracks if you want to try out the following example.

Notice that in the GPS text file, there is an entry new_seg:

type,ident,lat,long,y_proj,x_proj,new_seg,display,color,altitude,depth,temp,time,model,filename,ltime

new_seg is a boolean property that determines whether the reading begins a new track. If new_seg = true, you need to write the existing polyline to the shapefile and start creating a new one. Take a close look at this code example and notice how it differs from the previous one in order to handle multiple polylines:

# This script reads a GPS track in CSV format and
#  writes geometries from the list of coordinate pairs
#  Handles multiple polylines

# Function to add a polyline
def addPolyline(cursor, coords):
    cursor.insertRow((coords,))
    del coords[:] # clear list for next segment

# Main script body
import csv
import arcpy

polylineFC = "C:\\data\\Geog485\\tracklines.shp" 

# Open the input file 
with open(r"C:\PSU\geog485\L4\gps_track_multiple.txt", "r") as gpsTrack:
    # Set up CSV reader and process the header
    csvReader = csv.reader(gpsTrack)
    header = next(csvReader)
    latIndex = header.index("lat")  
    lonIndex = header.index("long")
    newIndex = header.index("new_seg")

    # Write the coordinates to the feature class as a polyline feature
    with arcpy.da.InsertCursor(polylineFC, ("SHAPE@")) as cursor:
        # Create an empty vertex list
        vertices = []

        # Loop through the lines in the file and get each coordinate
        for row in csvReader:
            isNew = row[newIndex].upper()
            # If about to start a new line, add the completed line to the
            #  feature class
            if isNew == "TRUE":
                if len(vertices) > 0:
                    addPolyline(cursor, vertices)
            # Get the lat/lon values of the current GPS reading
            lat = float(row[latIndex])
            lon = float(row[lonIndex])

            # Add coordinate pair tuple to vertex list
            vertices.append((lon, lat))

        # Add the final polyline to the shapefile
        addPolyline(cursor, vertices)

The first thing you should notice is that this script uses a function. The addPolyline() function adds a polyline to a feature class, given two parameters: (1) an existing insert cursor, and (2) a list of coordinate pairs. This function cuts down on repeated code and makes the script more readable.

Here's a look at the addPolyline function:

# Function to add a polyline
def addPolyline(cursor, coords):
    cursor.insertRow((coords,))
    del coords[:]

The addPolyline function is referred to twice in the script: once within the loop, which we would expect, and once at the end to make sure the final polyline is added to the shapefile. This is where writing a function cuts down on repeated code.

As you read each line of the text file, how do you determine whether it begins a new track? First of all, notice that we've added one more value to look for in this script:

  newIndex = header.index("new_seg")

The variable newIndex shows us which position in the line is held by the boolean new_seg property that tells us whether a new polyline is beginning. If you have sharp eyes, you'll notice we check for this later in the code:

  isNew = row[newIndex].upper()

  # If about to start a new line, add the completed line to the
  #  feature class
  if isNew == "TRUE":

In the above code, the upper() method converts the string into all upper-case, so we don't have to worry about whether the line says "true," "True," or "TRUE." But there's another situation we have to handle: What about the first line of the file? This line should read "true," but we can't add the existing polyline to the file at that time because there isn't one yet. Notice that a second check is performed to make sure there are more than zero points in the list before attempting to add a new polyline:

        
  if len(vertices) > 0:
      addPolyline(cursor, vertices)

Only if there's at least one point in the list does the addPolyline() function get called, passing in the cursor and the list.

Alternate method for creating Polylines and Polygons

Prior to ArcGIS Desktop v10.6/ArcGIS Pro v2.1, the list-of-coordinate-pairs approach to creating geometries described above was not available. The only way to create Polylines and Polygons was to create an arcpy Point object from each set of coordinates and add that Point to an Array object. Then a Polyline or Polygon object could be constructed from the Array.

The code below creates an empty array and adds three points using the Array.add() method. Then the array is used to create a Polyline object.

# Make a new empty array
array = arcpy.Array()

# Make some points
point1 = arcpy.Point(-121.34,47.1)
point2 = arcpy.Point(-121.29,47.32)
point3 = arcpy.Point(-121.31,47.02)

# Put the points in the array
array.add(point1)
array.add(point2)
array.add(point3)

# Make a polyline out of the now-complete array
polyline = arcpy.Polyline(array, spatialRef)

The first parameter you pass in when creating a polyline is the array containing the points for the polyline. The second parameter is the spatial reference of the coordinates. Recall that we didn't have any spatial reference objects in our earlier list-of-coordinate-pairs examples. That's because you can only use that method when the spatial reference of the coordinates is the same as the feature class. But when creating a Polyline (or Polygon) object from an Array, you have the option of specifying the spatial reference of the coordinates. If that spatial reference doesn't match that of the feature class, then arcpy will re-project the geometry into the feature class spatial reference. If the two spatial references are the same, then no re-projection is needed. It can't hurt to include the spatial reference, so it's not a bad idea to get in the habit of including it if you find yourself creating geometries with this alternate syntax.

Here is a version of the GPS track script that uses the Array-of-Points approach:

# This script reads a GPS track in CSV format and
#  writes geometries from the list of coordinate pairs
#  Handles multiple polylines

# Function to add a polyline
def addPolyline(cursor, array, sr):
    polyline = arcpy.Polyline(array, sr)
    cursor.insertRow((polyline,))
    array.removeAll()

# Main script body
import csv
import arcpy

polylineFC = "C:\\data\\Geog485\\tracklines_sept25.shp" 
spatialRef = arcpy.Describe(polylineFC).spatialReference

# Open the input file 
with open("C:\\data\\Geog485\\gps_track_multiple.txt", "r") as gpsTrack:
    # Set up CSV reader and process the header
    csvReader = csv.reader(gpsTrack)
    header = next(csvReader)
    latIndex = header.index("lat")  
    lonIndex = header.index("long")
    newIndex = header.index("new_seg")
 
    # Write the array to the feature class as a polyline feature
    with arcpy.da.InsertCursor(polylineFC, ("SHAPE@")) as cursor:
     
        # Create an empty array object
        vertexArray = arcpy.Array()
     
        # Loop through the lines in the file and get each coordinate
        for row in csvReader:
             
            isNew = row[newIndex].upper()
     
            # If about to start a new line, add the completed line to the
            #  feature class
            if isNew == "TRUE":
                if vertexArray.count > 0:
                    addPolyline(cursor, vertexArray, spatialRef)
     
            # Get the lat/lon values of the current GPS reading
            lat = float(row[latIndex])
            lon = float(row[lonIndex])
     
            # Make a point from the coordinate and add it to the array
            vertex = arcpy.Point(lon,lat)
            vertexArray.add(vertex)
     
        # Add the final polyline to the shapefile
        addPolyline(cursor, vertexArray, spatialRef)

Summary

The ArcGIS Pro documentation does a nice job of summarizing the benefits and limitations of creating geometries from lists of coordinates:

Geometry can also be created from a list of coordinates. This approach can provide performance gains, as it avoids the overhead of creating geometry objects. However, it is limited to only features that are singlepart, and in the case of polygons, without interior rings. All coordinates should be in the units of the feature class's spatial reference.

If you need to create a multi-part feature (such as the state of Hawaii containing multiple islands), or a polygon with a "hole" in it, then you'll need to work with Point and Array objects as described in the Alternate method section of this page. You would also use this method if your coordinates are in a different spatial reference than the feature class.

Readings

Read the Writing geometries page in the ArcGIS Pro documentation, and pay particular attention to the multipart polygon example if you deal with these sorts of geometries in your work.

Read Zandbergen 9.1 - 9.7, which contains a good summary of how to read and write Esri geometries using the Points-in-an-Array method.

4.5 Automation with batch files and scheduled tasks

4.5 Automation with batch files and scheduled tasks jed124

In this course, we've talked about the benefits of automating your work through Python scripts. It's nice to be able to run several geoprocessing tools in a row without manually traversing the Esri toolboxes, but what's so automatic about launching PyScripter, opening your script, and clicking the Run button? In this section of the lesson, we'll take automation one step further by discussing how you can make your scripts run automatically.

Scripts and your operating system

Most of the time we've run scripts in this course, it's been through PyScripter. Your operating system (Windows) can run scripts directly. Maybe you've tried to double-click a .py file to run a script. As long as Windows understands that .py files represent a Python script and that it should use the Python interpreter to run the script, the script will launch immediately.

When you try to launch a script automatically by double-clicking it, it's possible you'll get a message saying Windows doesn't know which program to use to open your file. If this happens to you, use the Browse button on the error dialog box to browse to the Python executable, most likely located in C:\Program Files\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\python.exe. Make sure "Always use the selected program to open this kind of file" is checked, and click OK. Windows now understands that .py files should be run using Python.

Double-clicking a .py file gives your operating system the simple command to run that Python script. You can alternatively tell your operating system to run a script using the Windows command line interface. This environment just gives you a blank window with a blinking cursor and allows you to type the path to a script or program, followed by a list of parameters. It's a clean, minimalist way to run a script. In Windows, you can open the command line by clicking Start > Windows System > Command Prompt or by searching for Command Prompt in the Search box.

The command line

Advanced use of the command line is outside the scope of this course. For now, it's sufficient to say that you can run a script from the command line by typing the path of the Python executable, followed by the full path to the script, like this:

"C:\Program Files\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\python.exe" C:\PSU\Geog485\Lesson1\Project1.py

Note that:

  • The path to the Python executable is enclosed in quotes, since the space in Program Files would otherwise cause the command to be misinterpreted. You would need quotes around other paths in the command if they likewise contained spaces (which was not the case here).
  • While text wrapping will probably cause this command to be spread across two lines in your browser, it is important that your commands have all the components -- Python path, script path, and parameters -- on the same line. If you copy/paste the example above, you should see those command components all on the same line.

If the script takes parameters, you must also type each argument separated by a space. Remember that arguments are the values you supply for the script's parameters. Here's an example of a command that runs a script with two arguments, both strings that represent pathnames. Notice that you should use the regular \ in your paths when providing arguments from the command line (not / or \\ as you would use in PyScripter).

"C:\Program Files\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\python.exe" C:\PSU\Geog485\Lesson2\Project2.py C:\PSU\Geog485\Lesson2\ C:\PSU\Geog485\Lesson2\CityBoundaries.shp

If the script executes successfully, you often won't see anything except a new command prompt (remember, this is minimalist!). If your script is designed to print a message, you should see the message. If your script is designed to modify files or data, you can check those files or data (perhaps using the Catalog pane in Pro) to make sure the script ran correctly.

You'll also see messages if your script fails. Sometimes these are the same messages you would see in the PyScripter Python Interpreter Console. At other times, the messages are more helpful than what you would see in PyScripter, making the command line another useful tool for debugging. Unfortunately, at some times the messages are less helpful.

Batch files

Why is the command line so important in a discussion about automation? After all, it still takes work to open the command line and type the commands. The beautiful thing about commands is that they, too, can be scripted. You can list multiple commands in a simple text-based file, called a batch file. Running the batch file runs all the commands in it.

Here's an example of a simple batch file that runs the two scripts above. To make this batch file, you could put the text below inside an empty Notepad file and save it with a .bat extension. Remember that this is not Python; it's command syntax:

@ECHO OFF 
REM Runs both my project scripts

"C:\Program Files\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\python.exe" C:\PSU\Geog485\Lesson1\Project1.py
ECHO Ran project 1
"C:\Program Files\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\python.exe" C:\PSU\Geog485\Lesson2\Project2.py C:\PSU\Geog485\Lesson2\ C:\PSU\Geog485\Lesson2\CityBoundaries.shp
ECHO Ran project 2
PAUSE

Here are some notes about the above batch file, starting from the top:

  • @ECHO OFF prevents all the lines in your batch file from being printed to the command line window, or console, when you run the file. It's standard procedure to use this as the first line of your batch file, unless you really want to see which line of the file is executing (perhaps for debugging purposes).
  • REM is how you put a comment in your batch file, the same way # denotes a comment in Python.
  • You put commands in your batch file using the same syntax you used from the command line.
  • ECHO prints something to the console. This can be useful for debugging, especially when you've used @ECHO OFF because you don't want to see every line of your batch file printed to the console.
  • PAUSE gives a "Press any key to continue..." prompt. If you don't put this at the end of your batch file, the console will immediately close after the file is done executing. When you're writing and debugging the batch file, it's useful to put PAUSE at the end so you can see any error messages that were printed when running the file. Once your batch file is tested and working correctly, you may decide to remove the PAUSE.

Batch files can contain variables, loops, comments, and conditional logic, all of which are beyond the scope of this lesson. However, if you'll be writing and running many scripts for your organization, it's worthwhile to spend some time learning more about batch files. Fortunately, batch files have been around for a long time (they are older than Windows itself), so there's an abundance of good information available on the Internet to help you.

Scheduling tasks

At this point, we've come pretty close to reaching true automation, but there's still that need to launch the Python script or the batch file, either by double-clicking it, invoking it from the command line, or otherwise telling the operating system to run it. To truly automate the running of scripts and batch files, you can use an operating system utility such as Windows Task Scheduler.

Task Scheduler is one of those items hidden in Windows Administrative Tools that you may not have paid any attention to before. It's a relatively simple program that allows you to schedule your scripts and batch files to run on a regular basis. This is helpful if the task needs to run often enough that it would be burdensome to launch the batch file manually, but it's even more helpful if the task takes some of your computing resources, and you want to run it during the night or weekend to minimize impact on others who may be using the computer.

Here's a real-world scenario where Task Scheduler (or a comparable utility if you're running on a Mac, Linux, or UNIX) is very important: Fast Web maps tend to use a server-side cache of pregenerated map images, or tiles, so that the server doesn't have to draw the map each time someone navigates to an area. A Web map administrator who has ArcGIS Server can run the tool Manage Map Server Cache Tiles to make the tiles before he or she deploys the Web map. After deployment, the server quickly sends the appropriate tiles to people as they navigate the Web map. So far, so good.

As the source GIS data for the map changes, however, the cache tiles become out of date. They are just images and do not know how to update themselves automatically. The cache needs to be updated periodically, but cache tile creation is a time consuming and CPU-intensive operation. For this reason, many server administrators use Task Scheduler to update the cache. This usually involves writing a script or batch file that runs Manage Map Server Cache Tiles and other caching tools, then scheduling that script to run on nights or weekends when it would be least disruptive to users of the Web map.

Inside Windows Task Scheduler

Let's take a quick look inside Windows Task Scheduler. The instructions below are for Windows Vista (and probably Windows 7). Other versions of Windows have a very similar Task Scheduler, and with some adaptation, you can also use the instructions below to understand how to schedule a task.

  1. Open Task Scheduler by navigating the Windows Start menu to Windows Administrative Tools > Task Scheduler.
  2. In the Actions pane on the right side of the window, click Create Basic Task. This walks you through a simple wizard to set up the task. You can configure advanced options on the task later.
  3. Give your task a Name that will be easily remembered and optionally, a Description. Then click Next.
  4. Choose how often you want the task to run. For this example, choose Daily. Then click Next.
  5. Choose a Start time and a recurrence frequency. If you want, choose a time a few minutes ahead of the current time, so you can see what it looks like when a task runs. Then click Next.
  6. Choose Start a program, then click Next.
  7. Here's the moment of truth, where you specify which script or batch file you want to run. Click Browse and navigate to one of the Python scripts you've written during this course. It's going to be easiest here if you pick a script that doesn't take any arguments, such as your project 1 script that makes contour lines from hard-coded datasets, but if you are feeling brave, you can also add arguments in this panel of the wizard. Then click Next.
  8. Review the information about your task, then click Finish.
  9. Notice that your task now appears in the list in Task Scheduler. You can highlight the task to see its properties, or right-click the task and click Properties to actually set those properties. You can use the advanced properties to get your script to run even more frequently than daily, for example, every 15 minutes.
  10. Wait for your scheduled time to occur, or if you don't want to wait, right-click the task and click Run. Either way, you'll see a console window appear when the script begins and disappear once the script has finished. (If you're running a Python script, and you don't want the console window to disappear at the end, you can put a line at the end of the script such as lastline = input(">"). This stops the script until the user presses Enter. Once you're comfortable with the script running on a regular basis, you'll probably want to remove this line to keep open console windows from cluttering your screen. After all, the idea of a scheduled task is that it happens in the background without bothering you.)

    Screen capture of the Windows Task Scheduler
    Figure 4.1 The Windows Task Scheduler.

Summary

To make your scripts run automatically, you use Windows Task Scheduler to create a task that the operating system runs at regular intervals. The task can point at either a .py file (for a single script), or a .bat file (for multiple scripts). Using scheduled tasks, you can achieve full automation of your GIS processes.

4.6 Running any tool in the box

4.6 Running any tool in the box jed124

Sooner or later, you're going to have to include a geoprocessing tool in your script that you have never run before. It's possible that you've never even heard of the tool or run it from its GUI, let alone a script.

In other cases, you may know the tool very well, but your Python may be rusty, or you may not be sure how to construct all the necessary parameters.

The approach for both of these situations is the same. Here are some suggested steps for running any tool in the ArcGIS toolboxes using Python:

  1. Find the tool reference documentation. We've seen this already during the course. Each tool has its own topic in the ArcGIS Pro tool reference section of the Help. Open that topic and read it before you do anything else. Read the "Usage" section at the beginning to make sure that it's the right tool for you and that you are about to employ it correctly.
  2. Examine the parameters. Scroll down to the "Syntax" section of the topic and read which parameters the tool accepts. Note which parameters are required and which are optional, and decide which parameters your script is going to supply.
  3. In your Python script, create variables for each parameter. Note that each parameter in the "Syntax" section of the topic has a data type listed. If the data type for a certain parameter is listed as "String," you need to create a Python string variable for that parameter.

    Sometimes the translation from data type to Python variable is not direct. For example, sometimes the tool reference will say that the required variable is a "Feature Class." What this really means for your Python script is that you need to create a string variable containing the path to a feature class.

    Another example is if the tool reference says that the required data type is a "Long." What this means in Python is that you need to create a numerical variable (as opposed to a string) for that particular parameter.

    If you have doubts about how to create your variable to match the required data type, scroll down to the "Code Sample" in the tool reference topic, paying particular attention to the stand-alone script example. Try to find the place where the example script defines the variable you're having trouble with. Copy the patterns that you see in the example script, and usually, you'll be okay.

    Most of the commonly used tools have excellent example scripts, but others are hit or miss. If your tool of interest doesn't have a good example script, you may be able to find something on the Esri forums or a well-phrased Google search.

  4. Run the tool...with error handling. You can run your script without try/except blocks to catch any basic errors in the Python Interpreter. If you're still not getting anything helpful, a next resort is to add the try/except blocks and put print arcpy.GetMessages()in the except block.

In Project 4, you'll get a chance to practice these skills to run a tool you previously haven't worked with in a script.

4.7 Working with Pro projects

4.7 Working with Pro projects jed124

To this point, we've talked about automating geoprocessing tools, updating GIS data, and reading text files. However, we've not covered anything about working with a Pro project file. There are many tasks that can be performed on a project file that are well-suited for automation. These include:

  • finding and replacing text in a layout or series of layouts; for example, a copyright notice for 2018 becomes 2019;
  • repairing layers that are referencing data sources using the wrong paths; for example, your map was sitting on a computer where all the data was in C:\data, and now it is on a computer where all the data is in D:\myfolder\mydata.
  • printing a series of maps or data frames;
  • exporting a series of maps to PDF and joining them to create a "map book";
  • making a series of maps available to others on ArcGIS Server.

Pro projects are binary files, meaning they can't be easily read and parsed using the techniques we covered earlier in this lesson. Prior to the release of ArcGIS Desktop 10.0, the only way to automate anything with a map document (Desktop's analog to the Pro project) was to use ArcObjects, which was challenging for beginners and required using a language other than Python. ArcGIS Desktop 10.0 introduced a mapping module for automating common tasks with map documents. The development of ArcGIS Pro led to a very similar module, though enough differences resulted such that a new mp module was created.

The arcpy.mp module

arcpy.mp is a module you can use in your scripts to work with Pro projects. Please take a detour at this point to read the Esri Introduction to arcpy.mp.

The most important object in this module is ArcGISProject. This tells your script which Pro project you'll be working with. You can obtain an ArcGISProject object by referencing a path, like this:

project = arcpy.mp.ArcGISProject(r"C:\data\Alabama\UtilityNetwork.aprx")

Notice the use of r in the line above to denote a string literal. In other words, if you include r right before you begin your string, it's safe to use reserved characters like the single backslash \. I've done it here because you'll see it in a lot of the Esri examples with arcpy.mp.

Instead of directly using a string path, you could alternatively put a variable holding the path. This would be useful if you were iterating through all the project files in a folder using a loop, or if you previously obtained the path in your script using something like arcpy.GetParameterAsText().

It can be convenient to work with arcpy.mp in the Python window in Pro. In this case, you do not have to put the path to the project. There's a special keyword "CURRENT" that you can use to get a reference to the currently-open project.

project = arcpy.mp.ArcGISProject("CURRENT")

Once you get a project, then you do something with it. Let's look at this example script, which updates the year in a layout text element, then exports the layout to a PDF, and scrutinize what is going on. I've added comments to each line.

# Create an ArcGISProject object referencing the project you want to update
project = arcpy.mp.ArcGISProject(r"C:\GIS\TownCenter_2015.aprx")

# Get layout
lyt = project.listLayouts()[0] # only 1 layout in project

# Loop through each text element in the layout
for textElement lyt.listElements("TEXT_ELEMENT"):
    
    # Check if the text element contains the out of date text
    if textElement.text == "GIS Services Division 2018":
	    
	# If out of date text is found, replace it with the new text
        textElement.text = "GIS Services Division 2019"
		
# Export the updated layout to a PDF
lyt.ExportToPDF(r"C:\GIS\TownCenterUpdate_2016.pdf")

# Clean up the MapDocument object by deleting it
del mxd

The first line in the above example gets an ArcGISProject project object referencing C:\GIS\TownCenter_2015.aprx. The example then uses the ArcGISProject object's listLayouts() method to retrieve a Python list of the layouts saved in the project. This project happens to have just one layout. Getting a reference to that layout still requires using listLayouts(), but using [0] to get the first (and only) object from the list. The Layout object in turn has a listElements() method that can be used to retrieve a list of its elements. Notice that the script asks for a specific type of element, "TEXT_ELEMENT". (Examine the documentation for the Layout class to understand the other types of elements you can get back using the listElements() method.)

The method returns a Python list of TextElement objects representing all the text elements in the map document. You know what to do if you want to manipulate every item in a Python list. In this case, the example uses a for loop to check the TextElement.text property of each element. This property is readable and writeable, meaning if you want to set some new text, you can do so by simply using the equals sign assignment operator as in textElement.text = "GIS Services Division 2019"

The ExportToPDF method is very simple in this script. It takes the path of the desired output PDF as its only parameter. If you look again at the Layout class's page in the documentation, you'll see that the ExportToPDF() method has a lot of other optional parameters, such as whether to embed fonts, that are just left as defaults in this example.

Learning arcpy.mp

The best way to learn arcpy.mp is to try to use it. Because of its simple, "one-line-fix" nature, it's a good place to practice your Python. It's also a good way to get used to the Python window in Pro because you can immediately see the results of your actions.

Although there is no arcpy.mp component to this lesson's project, you're welcome to use it in your final project. If you've already submitted your final project proposal, you can amend it to use arcpy.mp by emailing and obtaining approval from the instructor/grading assistant. If you use arcpy.mp in your final project, you should attempt to incorporate several of the functions or mix it with other Python functionality you've learned, making something more complex than the "one line fix" type of script I mentioned above.

By now, you'll probably have experienced the reality that your code does not always run as expected on the first try. Before you start running arcpy.mp commands on your production projects, I suggest making backup copies.

Here are a few additional places where you can find excellent help on learning arcpy.mp:

4.8 Limitations of Python scripting with ArcGIS

4.8 Limitations of Python scripting with ArcGIS jed124

In this course, you've learned the basics of programming and have seen how Python can automate any GIS function that can be performed with the ArcGIS toolboxes. There's a lot of power available to you through scripting, and hopefully, you're starting to get ideas about how you can apply that in your work outside this course.

To conclude this lesson, however, it's important to talk about what's not available through Python scripting in ArcGIS.

Limits with fine-grained access to the "guts" of ArcGIS

Python interaction with ArcGIS is mainly limited to reading and writing data, editing the properties of project files, and running the tools that are included with ArcGIS. Although the ArcGIS tools are useful, they are somewhat black box, meaning you put things in and get things out without knowing or being concerned about what is happening inside. In ArcGIS Desktop, the ArcObjects SDK could be used to gain access to "the building blocks" of the software, providing a greater degree of control over the tools being developed. And a product called ArcGIS Engine made it possible to develop stand-alone applications that provide narrowly tailored functionality. Working with ArcObjects/ArcGIS Engine required coding in a non-Python language, such as Visual Basic .NET or C# .NET.

In the transition to ArcGIS Pro, the same access to the fine-grained building blocks does not (yet?) exist. However, developers can extend/customize the Pro user interface using the Pro SDK and can develop stand-alone apps using the ArcGIS Runtime SDKs. As with the Desktop ArcObjects and ArcGIS Engine products, the Pro SDK and Runtime SDKs require coding with a language other than Python.

Limits with user interface customization

In this course, we have done nothing with customizing Pro to add special buttons, toolbars, and so on that trigger our programs. Our foray into user interface design has been limited to making a script tool and toolbox. Although script tools are useful, there are times when you want to take the functionality out of the toolbox and put it directly into Pro as a button on a toolbar. You may want that button to launch a new window with text boxes, labels, and buttons that you design yourself.

This is one place that capabilities have taken a step backward (so far?) in the development of Pro. In ArcGIS Desktop (starting at 10.1), Python developers could create such UI customizations relatively easily by developing what Esri called "Python add-ins." Unfortunately, such easy-to-develop add-ins for Pro cannot be developed with Python. Development of custom Pro interfaces can be done with Python GUI toolkits, a topic covered in our Advanced Python class (GEOG 489). Such customization can also be accomplished with the Pro SDK. Both of these pathways are recommended for folks who've done very well in this class and/or already have strong programming experience.

Lesson 4 Practice Exercises

Lesson 4 Practice Exercises jed124

These practice exercises will give you some more experience applying the Lesson 4 concepts. They are designed to prepare you for some of the techniques you'll need to use in your Project 4 script.

Download the data for the practice exercises

Both exercises involve opening a file and parsing text. In Practice Exercise A, you'll read some coordinate points and make a polygon from those points. In Practice Exercise B, you'll work with dictionaries to manage information that you parse from the text file.

Example solutions are provided for both practice exercises. You'll get the most value out of the exercises if you make your best attempt to complete them on your own before looking at the solutions. In any case, the patterns shown in the solution code can help you approach Project 4.

Lesson 4 Practice Exercise A

Lesson 4 Practice Exercise A jed124

This practice exercise is designed to give you some experience writing geometries to a shapefile. You have been provided two things:

  • A text file MysteryStatePoints.txt containing the coordinates of a state boundary.
  • An empty polygon shapefile that uses a geographic coordinate system.

The objective

Your job is to write a script that reads the text file and creates a state boundary polygon out of the coordinates. When you successfully complete this exercise, you should be able to preview the shapefile in Pro and see the state boundary.

Tips

If you're up for the challenge of this script, go ahead and start coding. But if you're not sure how to get started, here are some tips:

  • In this script, there is no header line to read. You are allowed to use the values of 0 and 1 directly in your code to refer to the longitude and latitude, respectively. This should actually make the file easier to process.
  • Before you start looping through the coordinates, create an empty list (or Array object) to hold all the (x,y) tuples (or Point objects) of your polygon.
  • Loop through the coordinates and create a (x,y) tuple (or Point object) from each coordinate pair. Then add the tuple (or Point object) to your list (or Array object).
  • Once you're done looping, create an insert cursor on your shapefile. Go to the first row and assign your list (or Array) to the SHAPE field.

Lesson 4 Practice Exercise A Solution

Lesson 4 Practice Exercise A Solution jed124

Here's one way you could approach Lesson 4 Practice Exercise A with comments to explain what is going on. If you find a more efficient way to code a solution, please share it through the discussion forums.

# Reads coordinates from a text file and writes a polygon

import arcpy
import csv

arcpy.env.workspace = r"C:\PSU\geog485\L4\Lesson4PracticeExerciseA"

shapefile = "MysteryState.shp"
pointFilePath = arcpy.env.workspace + r"\MysteryStatePoints.txt"

# Open the file and read the first (and only) line
with open(pointFilePath, "r") as pointFile:
    csvReader = csv.reader(pointFile)

    # This list will hold a clockwise "ring" of coordinate pairs
    #  that will form a polygon 
    ptList = []

    # Loop through each coordinate pair 
    for coords in csvReader:
        # Append coords to list as a tuple
        ptList.append((float(coords[0]),float(coords[1])))

    # Create an insert cursor and apply the Polygon to a new row
    with arcpy.da.InsertCursor(shapefile, ("SHAPE@")) as cursor:
        cursor.insertRow((ptList,))

Alternatively, an arcpy Array containing a sequence of Point objects could be passed to the insertRow() method rather than a list of coordinate pairs. Here is how that approach might look:

# Reads coordinates from a text file and writes a polygon

import arcpy
import csv

arcpy.env.workspace = r"C:\PSU\geog485\L4\Lesson4PracticeExerciseA"

shapefile = "MysteryState.shp"
pointFilePath = arcpy.env.workspace + r"\MysteryStatePoints.txt"
spatialRef = arcpy.Describe(shapefile).spatialReference

# Open the file 
with open(pointFilePath, "r") as pointFile:
    csvReader = csv.reader(pointFile)

    # This Array object will hold a clockwise "ring" of Point
    #  objects, thereby making a polygon.
    polygonArray = arcpy.Array()

    # Loop through each coordinate pair and make a Point object
    for coords in csvReader:

        # Create a point, assigning the X and Y values from your list    
        currentPoint = arcpy.Point(float(coords[0]),float(coords[1]))

        # Add the newly-created Point to your Array    
        polygonArray.add(currentPoint)

    # Create a Polygon from your Array
    polygon = arcpy.Polygon(polygonArray, spatialRef)

    # Create an insert cursor and apply the Polygon to a new row
    with arcpy.da.InsertCursor(shapefile, ("SHAPE@")) as cursor:
        cursor.insertRow((polygon,))

Below is a video offering some line-by-line commentary on the structure of these solutions. Note that the scripts shown in the video don't use the float() function as shown in the solutions above. It's likely you can run the script successfully without float(), but we've found that on some systems the coordinates are incorrectly treated as strings unless explicitly cast as floats.

Video: Solution to Lesson 4 Practice Exercise A (11:02)

This video walks through the solution to Lesson 4, practice exercise A where you have a text file with some coordinates that looks like this. And you need to loop through them, parse them out, and then create a polygon out of it to create the shape of a US state.

This is a very basic list of coordinates. There's no header here. So, we're just going to blast right through this as if it were a csv file with two columns and no header.

In order to do this, we're going to use arcpy, which we import in line 3, and the csv module for Python, which we import in line 4. In lines 8 and 9, we set up some variables referencing the files that we're going to work with. So, we're working with the mystery state shapefile that was provided for you.

This shapefile is in the WGS84 geographic coordinate system. However, there is no shape or polygon within this shapefile yet. You will add that. And then, line 9 references the path to the text file containing all the coordinate points.

In line 12, we open up that text file that has the points. The first parameter in the open method is pointFilePath, which we created in line 9. And then the second parameter is r in quotes. And that means we're opening it in read mode. We're just going to read this file. We're not going to write anything into it.

And in line 13, we then pass that file into the reader() method so that we can create this csvReader object that will allow us to iterate through all the rows in the file in an easy fashion. Before we start doing that though, in line 17, we create an empty list which we’ll use to store the coordinate pairs we read in from the text file.

We know, in this case, we're just going to create one geometry. It's going to be one polygon created from one list of coordinates. So, it's OK to create that list before we start looping. And we, actually, don't want to create it inside the loop because we would be creating multiple lists, and we just need one of them in this case.

In line 20, we actually start looping through the rows in the file. Each row is going to be represented here by a variable called coords. Inside the loop, we’re going to get the X coordinate and the Y coordinate and append them to the list as a tuple. It’s very important that the first value in this tuple be the X and the second value the Y.

Now, I think a lot of us typically say “latitude/longitude” when talking about coordinates, with the word latitude coming first. It’s important to remember that latitude is the Y value and longitude is the X. So in defining this tuple, we want to make sure we’re supplying the longitude first, then the latitude.

Looking at our text file, we see that the first column is the longitude and the second column is the latitude. (We know that because latitudes don’t go over 90.) So the coordinates are in the correct X/Y order, we just need to keep them that way.

Back to our script, as we iterate through the CSV Reader object, remember that on each iteration, the Reader object simply gives us a list of the values from the current line of the file, which we’re storing in the coords variable. So if we want the longitude value, that’ll be the item from the list at position 0. If we want the latitude, that’ll be the item at position 1. So, we use coords[0] to get the longitude and coords[1] to get the latitude.

Now, going back to our file, you'll see that in the first column or column 0, that's the longitude. So, we have coords[0] in our code. And then, the next piece is the latitude. So, we use coords[1] for that.

So we’re only doing one thing in this loop: appending a new X/Y coordinate pair to the list on each iteration.

Now, line 25 is the first statement after the loop. We know that because that line of code is not indented. At this point, we’ve got the coordinates of the mystery state in a list, so we’re ready to add a polygon to the shapefile. To add new features to a feature class, we need to use an InsertCursor. And that’s what we’re doing there on line 25: opening up a new insert cursor.

That cursor needs a couple of parameters in order to get started. It needs the feature class that we're going to modify, which is referenced by the variable shapefile. Remember, we created that back in line 8.

And then, the second parameter we need to supply a tuple of fields that we're going to modify. In our case, we're only going to modify the geometry. We're not going to modify any attributes, so the way that we supply this tuple is it just has one item, or token, SHAPE with the @ sign.

And then, our final line 26 actually inserts the row. And it inserts just the list of points we populated in our loop. Because the shapefile was created to hold polygons, a polygon geometry will be created from our list of points. When this is done, we should be able to go into ArcGIS Pro and verify that we indeed have a polygon inside of that shapefile.

One little quirky thing to note about this code is in the last line, where I supplied the tuple of values corresponding to the tuple of fields that was associated with the cursor. When you’re supplying just a single value as we were here, it’s necessary to follow that value up with a comma. If my cursor was defined with more than one field, say 2 fields, then my values tuple would not need to end in a comma like this. It’s only when specifying just a single value that the values tuple needs this trailing comma.

This is a good place to remind you that you’re not limited to setting the value of a single field. For example, if there were a Name field in this shapefile, and we happened to know the name of the state we were adding, we could add Name to the fields tuple and then when doing insertRow() we could supply the name of the state we’re adding in addition to the point list.

Now, the script we just walked through uses the approach to geometry creation that’s recommended to use whenever possible because it offers the best performance. That approach works when you’re dealing with simple, single-part features and the coordinates are in the same spatial reference as the feature class you’re putting the geometry into. And when you’re working with ArcGIS Desktop 10.6 or later or ArcGIS Pro 2.1 or later because that’s when support for this approach was added.

So for example, this approach would *not* work for adding the state of Hawaii as a single feature since that would require inserting a multi-part polygon. It also wouldn’t work if our empty shapefile was in, say, the State Plane Coordinate System rather than WGS84 geographic coordinates.

In those cases, you’d have no choice but to use this alternate solution that I'm going to talk through now.

In this version of the script, note that in line 10, we get the spatial reference of the MysteryState shapefile using the Describe method. We’ll go on to use this SpatialReference object later in the script when we create the Polygon object. We didn’t bother to get the spatial reference of the shapefile in the other version of the script because that method of creating geometries doesn’t support supplying a spatial reference.

The next difference in this second script comes on line 18. Instead of creating a new empty Python list, we create an arcpy Array object.

Then inside the loop, instead of putting the coordinates into a tuple and appending the tuple to the list, we create an arcpy Point object, again in X/Y order. Then, we use the Array’s add() method to add the Point object to the Array.

After the loop is complete and we’ve populated the Array, we’re ready to create an object of the Polygon class. The Polygon constructor requires that we specify an Array of Points, which is in our polygonArray variable. Optionally, we can supply a SpatialReference object, which we’re doing here with our spatialRef variable.

Supplying that spatial reference doesn’t really have any effect in this instance because our coordinates and feature class are in the same spatial reference, but you could see the difference between these two if you created an empty feature class that was defined to use a different spatial reference, say one of the State Plane coordinate systems for New Mexico. Running this version of the script would result in the data being properly aligned because we’re telling arcpy how our data is projected, so it can re-project it behind the scenes into the State Plane coordinates. With the other version of the script, the data can't be properly aligned because that version doesn’t specify a coordinate system.

In any case, this version of the script finishes up by passing the Polygon object to the insertRow() method rather than a list of coordinate pairs.

So, those are two ways you can solve this exercise.

Credit: S. Quinn and J. Detwiler © Penn State is licensed under CC BY-NC-SA 4.0.

Lesson 4 Practice Exercise B

Lesson 4 Practice Exercise B jed124

This practice exercise does not do any geoprocessing or GIS, but it will help you get some experience working with functions and dictionaries. The latter will be especially helpful as you work on Project 4.

The objective

You've been given a text file of (completely fabricated) soccer scores from some of the most popular teams in Buenos Aires. Write a script that reads through the scores and prints each team name, followed by the maximum number of goals that team scored in a game, for example:

River: 5
Racing: 4
etc.

Keep in mind that the maximum number of goals scored might have come during a loss.

You are encouraged to use dictionaries to complete this exercise. This is probably the most efficient way to solve the problem. You'll also be able to write at least one function that will cut down on repeated code.

I have purposefully kept this text file short to make things simple to debug. This is an excellent exercise in using the debugger, especially to watch your dictionary as you step through each line of code.

This file is space-delimited, therefore, you must explicitly set up the CSV reader to use a space as the delimiter instead of the default comma. The syntax is as follows:

csvReader = csv.reader(scoresFile, delimiter=" ")

Tips

If you want a challenge, go ahead and start coding. Otherwise, here are some tips that can help you get started:

  • Create variables for all your items of interest, including winner, winnerGoals, loser, and loserGoals and assign them appropriate values based on what you parsed out of the line of text. Because there are often ties in soccer, the term "winner" in this context means the first score given and the term "loser" means the second score given.
  • Review section 4.17 on dictionaries in the Zandbergen text. You want to make a dictionary that has a key for each team, and an associated value that represents the team's maximum number of goals. If you looked at the dictionary in the debugger it would look like {'River': '5', 'Racing': '4', etc.}
  • You can write a function that takes in three things: the key (team name), the number of goals, and the dictionary name. This function should then check if the key has an entry in the dictionary. If not, a key should be added and its value set to the current number of goals. If a key is found, you should perform a check to see if the current number of goals is higher than the value associated with that key. If so, you should set a new value. Notice how many "ifs" appear in the preceding sentences.

Lesson 4 Practice Exercise B Solution

Lesson 4 Practice Exercise B Solution jed124

This practice exercise is a little trickier than previous exercises. If you were not able to code a solution, study the following solution carefully and make sure you know the purpose of each line of code.

The code below refers to the "winner" and "loser" of each game. This really refers to the first score given and the second score given, in the case of a tie.

# Reads through a text file of soccer (football)
#  scores and reports the highest number of goals
#  in one game for each team

# ***** DEFINE FUNCTIONS *****

# This function checks if the number of goals scored
#  is higher than the team's previous max.
def checkGoals(team, goals, dictionary):
    #Check if the team has a key in the dictionary
    if team in dictionary:
        # If a key was found, check goals against team's current max
        if goals > dictionary[team]:
            dictionary[team] = goals
        else:
            pass
    # If no key found, add one with current number of goals
    else:
        dictionary[team] = goals

# ***** BEGIN SCRIPT BODY *****

import csv

# Open the text file of scores
scoresFilePath = "C:\\Users\\jed124\\Documents\\geog485\\Lesson4\\Lesson4PracticeExercises\\Lesson4PracticeExerciseB\\Scores.txt"
with open(scoresFilePath) as scoresFile:
    # Read the header line and get the important field indices
    csvReader = csv.reader(scoresFile, delimiter=" ")
    header = next(csvReader)
    
    winnerIndex = header.index("Winner")
    winnerGoalsIndex = header.index("WG")
    loserIndex = header.index("Loser")
    loserGoalsIndex = header.index("LG")
    
    # Create an empty dictionary. Each key will be a team name.
    #  Each value will be the maximum number of goals for that team.
    maxGoalsDictionary = {}
    
    for row in csvReader:
    
        # Create variables for all items of interest in the line of text    
        winner = row[winnerIndex]
        winnerGoals = int(row[winnerGoalsIndex])
        loser = row[loserIndex]
        loserGoals = int(row[loserGoalsIndex])
        
        # Check the winning number of goals against the team's max
        checkGoals(winner, winnerGoals, maxGoalsDictionary)
        
        # Also check the losing number of goals against the team's max    
        checkGoals(loser, loserGoals, maxGoalsDictionary)

    # Print the results
    for key in maxGoalsDictionary:
        print (key + ": " + str(maxGoalsDictionary[key]))

Below is a video offering some line-by-line commentary on the structure of this solution.

Video: Solution to Lesson 4 Practice Exercise B (13:03)

This video describes one possible solution for Lesson 4 practice exercise B where you are reading the names of soccer teams in Buenos Aires, looking at their scores, and then compiling a report about the top number of goals scored by each team over the course of the games covered by the file.

In order to maintain all this information, it's helpful to use a dictionary, which is a way of storing information in the computer's memory based on key-value pairs. So, the key in this case will be the name of the team, and the value will be the maximum number of goals found for that team as we read the file line by line. Now, this file is sort of like a comma separated value file, though the delimiter in this case is a space. The file does have a header, which we'll use to pull out information.

The header is organized in terms of winner, winner goals, loser, and loser goals. Although, really, there are some ties in here. So, we might say first score and second score rather than winner and loser. For our purposes, it doesn't matter who won or lost because the maximum number of goals might have come during a loss or a tie.

So, this solution is a little more complex than some of the other practice exercises. It involves a function which you can see beginning in line 9, but I'm not going to describe the function just yet. I'll wait until we get to the point where we need the logic that's in that function. So, I'll start explaining this solution by going to line 23, where we import the Python CSV module.

Now, there's nothing in this script that uses ArcGIS or arcpy geometries or anything like that, so I don't import arcpy at all. But you will do that in Project 4, where you'll use a combination of the techniques used here along with ArcGIS geometries and really put everything together from both the practice exercises. In line 26, we set up a variable representing the path to the scores text file. And in line 27, we actually open the file. By default, I'm opening it here in read mode. The opening mode parameter is not specifically supplied here.

In line 29, we create the CSV reader object. You should be familiar with this from the other examples in the lesson and the other practice exercise. One thing that's different here is, as a second parameter, we can specify the delimiter using this type of syntax-- delimiter equals, and then the space character.

Again, this file does have a header. So in line 30, we'll read the header, and then we figure out the index positions of all of the columns in the file. That's what's going on in lines 32 through 35.

Now, we know that these columns are in a particular order. But writing it in this way where we use the header.index method makes the script a little more flexible in case the column order had been shifted around by somebody, which could easily happen if somebody had previously opened this file in a spreadsheet program and moved things around.

In line 39, we're going to create a blank dictionary to keep track of each team and the maximum number of goals they've scored. We'll refer to that dictionary frequently as we read through the file.

In line 41, we begin a loop that actually starts reading data in the file below the header. And so, lines 44 through 47 are pulling out those four pieces of information-- basically, the two team names and the number of goals that each scored. Note that the int() function is used to convert the number of goals in string format to integers. Now, when we get a team name and a number of goals, we need to check it against our dictionary to see if the number of goals scored is greater than that team's maximum that we’ve encountered so far. And we need to do this check for both the winner and the loser-- or in other words, the first team and the second team listed in the row.

To avoid repeating code, this is a good case for a function. Because we're going to use the same logic for both pieces so why not write the code just once in a function? So, in lines 50 and 53, you'll see that I'm invoking a function called checkGoals. And I pass in three things. I pass in the team name, I pass in the number of goals, and the dictionary.

This function is defined up here in line 9. Line 9 defines a function called checkGoals, and I create variables here for those three things that the function needs to do its job -- the team, the number of goals, and the dictionary. Line 11 performs a check to see if the team already has a key in the dictionary. If it does, then we need to look at the number of goals that have been stored for the team and check it against the score that was passed into the function to see if that maximum number of goals needs to be updated.

So in line 13, that check is occurring. And if indeed the number of goals passed into the function is greater than the maximum that we’ve run into so far, then in line 14, we update the team’s entry in the dictionary, setting it equal to what was passed into the goals variable. If the number of goals passed into the function is not greater than our maximum, then we don't want to do anything. So that's what's in line 16 where it says pass. The pass is just a keyword that means don't do anything here. And really, we could eliminate the else clause altogether if we wanted to.

Now, if the team has never been read before, and it doesn't have an entry in the dictionary, then we're going to jump down to line 18. We're going to add the team to the dictionary, and we're going to set its max goals to the number passed into the goals variable. No need to check against another number, since there’s nothing in the dictionary yet for that team.

Now, what we can do to get a better feel for how this script works is to run it with the debugging tools. So I’ve inserted a breakpoint on the first line inside the checkGoals function, and I'm going to run the script up until the first time this function gets called. And I can click on the Variables window here to follow along with what’s happening to my variables.

I can see that there are global and local variables. The local variables are the ones defined within the checkGoals function since that's where the script execution is currently paused. If I look at the globals list, I can find the row variable, which holds the list of values from the first line of the file. The variables I want to focus on though are the ones defined as part of the function, the locals -- the dictionary, goals, and team variables.

So on this first call to the function we've passed in Boca as the team and 2 as the number of goals. And right now, there's nothing in our dictionary. So when evaluating line 11, we’d expect PyScripter to jump down to line 19 because the team does not have a key in the dictionary yet. So I'll see what happens by clicking on the Step button here. And we see that we do in fact jump down to line 19. And if I step again, I will be adding that team to the dictionary. I can check on that; note that it's jumped out of the function because that was the last line of the function. It's jumped back to the main body of the script, which is another call to the checkGoals function, this time for the losing team. While we're paused here, if I scroll down in the list of locals I can find the maxGoalsDictionary and I can see that it now has an entry for Boca with a value of 2.

If I hit Resume again, it will run to the breakpoint again. So now it's dealing with data for the losing team from that first match -- Independiente with 1 goal. Again, because Independiente doesn't have a key in the dictionary, we would expect the script to jump down to line 19, and indeed that is what happens. So when I hit Step again, it's going to add an entry to the dictionary for that second team.

I’m going to hit the Resume button a couple more times for that second game, adding two more teams to the dictionary. Now I’m going to pause as we’ve hit a line where we have teams that we have encountered before. On this current call to the function, the team is River, and they scored 2 goals in this particular match. Looking at the dictionary, we can see that River's current maximum is 0. So, in this case we would expect on line 11 to find that yes, the team is already in the dictionary and so we'd expect it to jump down to line 13 instead of line 19. And that's what happens. So now we're going to check -- Is the value in the goals variable greater than their current maximum, which is 0. And indeed it is, so we’ll execute line 14, updating the dictionary with a new maximum goals for that team.

And we could continue stepping through the script, but hopefully, you get the feel for how the logic plays out now.

So as you're working on Project 4, and you're working with dictionaries in this manner, it's a good idea to keep the debugger open and watch what's happening, and you should be able to tell if your dictionary is being updated in the way that you expect.

So to finish out this script, once we have our dictionary all built, after this for loop is finished, then we're going to loop through the dictionary and print each key and each value. And that can be done using a simple for loop, like in line 56. In this case, the variable key represents a key in the dictionary. And in line 57, we print out that key, we print a colon and a space, and then we print the associated value with that key. If you want to pull a value out of a dictionary, you use square brackets, and you pass in the key name. And so, that's what we're doing there. So running this all the way through should produce a printout in the Python Interpreter of the different teams, as well as the maximum number of goals found for each.

Credit: S. Quinn and J. Detwiler © Penn State is licensed under CC BY-NC-SA 4.0.

Project 4: Reconstructing a car's path

Project 4: Reconstructing a car's path jed124

In this project, you're working as a geospatial consultant to a company that offers auto racing experiences to the public at Wakefield Park Raceway near Goulburn, New South Wales, Australia. The company's cars are equipped with a GPS device that records lots of interesting data on the car's movements and they'd like to make their customers' ride data, including a map, available through a web app. The GPS units export the track data in CSV format.

Your task is to write a script that will turn the readings in the CSV file into a vector dataset that you can place on a map. This will be a polyline dataset showing the path the car followed over the time the data was collected. You are required to use the Python csv module to parse the text and arcpy geometries to write the polylines.

The data for this project were made possible by faculty member and Aussie native James O'Brien, who likes to visit Wakefield Park to indulge his love of racing.

Please carefully read all the following instructions before beginning the project. You are not required to use functions in this project but you can gain over & above points by breaking out repetitive code into functions.

Deliverables

This project has the following deliverables:

  1. Your plan of attack for this programming problem, written in pseudocode in any text editor but as a separate document. This should consist only of short, focused steps describing what you are going to do to solve the problem. This is a separate deliverable from your customary project writeup and the well-documented script.
  2. A Python script that reads the data from the file and creates, from scratch, a polyline shapefile with n polylines, n being the number of laps recorded in the file. Each polyline should represent a single lap around the circuit. Each polyline should also have a Short field that stores the corresponding lap number from the CSV file. The shapefile should use the WGS 1984 geographic coordinate system.
  3. A detailed written narrative that explains in plain English what the various lines of your script do.  In this document, we will be looking for you to demonstrate that you understand how the code you're submitting works. You may find it useful to emulate the practice exercise solution explanations.  Alternatively, you may record a video in which you walk the grader through your solution beginning to end.
  4. A short writeup (~300 words) explaining what you learned during this project and which requirements you met, or failed to meet. Also describe any "over and above" efforts here so that the graders can look for them.

Successful delivery of the above requirements is sufficient to earn 95% on the project. The remaining 5% is reserved for efforts that go "over and above" the minimum requirements. This could include (but is not limited to) a batch file that could be used to automate the script, creation of the feature class in a file geodatabase instead of a shapefile, or the breaking out of repetitive code into functions and/or modules. Other over and above opportunities are described below.

Challenges

You may already see some immediate challenges in this task:

  • You have not previously created a feature class programmatically. You must find and run ArcGIS geoprocessing tools that will create an empty polyline shapefile with a Short Integer field for storing the lap number. You must also assign the WGS 1984 geographic coordinate system as the spatial reference for this shapefile.
  • Almost all of the lines in the file contain a set of 13 values, corresponding to the column names found in the header. However, at the end of each lap you will find a line that records the time required to complete the lap. For example:
    # Lap 1: 00:01:24.259
    Your script will need to skip over these lap time lines without choking. Similarly, the file ends with a line -- # Session End -- that should not break the script.

Hints

  • Before you start writing code, write a plan of attack describing the logic your script will use to accomplish this task. Break up the original task into small, focused chunks. You can write this in Word or even Notepad. Your objective is not to write fancy prose, but rather short, terse statements of what your code will do: in other words, pseudocode.
  • You will have a much easier time with this assignment if you approach it incrementally. Start by opening the CSV file, reading through the lines, properly handling the various types of lines, and printing the values of interest to the Console. If you can get that to work, see if you can create a single polyline from the full set of points, ignoring the laps. If you can get that to work, try to tackle the requirement of creating a separate polyline for each lap, etc.
  • A Python dictionary is an excellent structure for storing a lap number coupled with the lap's list of points. A dictionary is similar to a list, but it stores items in key-value pairs. In this scenario, we recommend using the lap numbers as the dictionary keys and lists of the points associated with the laps as the values associated with the keys. You can retrieve any value based on its key, and you can also check whether a key exists using a simple if key in dictionary: check.
  • To create your shapefile programmatically, use the Create Feature Class tool. The ArcGIS Pro Help has several examples of how to use this tool. Note that the feature class's schema (i.e., its fields) isn't specified as part of the Create Feature Class tool. You'll need to use a different tool, which can be found through a documentation search or a web search, to add the Lap field. If you can't figure out the setup of the new feature class programmatically, I suggest you create it manually and work on writing the rest of the script. You can then return to this part at the end, if you have time.
  • In order to get the shapefile in WGS 1984, you'll need to create a spatial reference object that you can assign to the shapefile at the time you create it. I recommend using the arcpy.SpatialReference() method. Be warned that if you do not correctly apply the spatial reference, your polyline precision could be diluted.
  • Remember that polylines can be produced from either an arcpy Array of Point objects or a list of x, y coordinate pairs.
  • If you do things right, your polylines should look like the image below (the line feature class has been added and symbolized manually showing each lap in a different color):
Laps using polylines
The line feature class symbolized manually showing each lap in a different color
Credit: Credit: Image developed by J. Detwiler © Penn State with ArcGIS Pro, 2021 and licensed under CC BY-NC-SA 4.0

Over and above opportunities

There are numerous opportunities for meeting the project's over and above requirement. Here is a "package" of ideas that you might consider implementing together to do a better job of meeting the original scenario requirements:

  • Omit the first and last laps from the data you insert into the new shapefile. The driver is pulling out of and back into the pits during these laps and is not interested in the times associated with them.
  • Record the time needed to drive each lap. There are a couple of ways to approach this problem: a) working with the lap time strings found after the last point in each lap, parse the string into its constituent parts using string manipulation functions and convert to just seconds, or b) compute the difference between each lap's last point and first point Time values. (The value to record for the lap noted above would be 84.259, in a Float field.)
  • The car position is recorded many times per second, resulting in thousands of "breadcrumbs" in the export file. The company would like to avoid using all of the position data so as to minimize their disk usage with the cloud provider they're using for the web app. In developing your script, include every nth point from the input CSV file, where n is defined at the top of the script or through a script tool parameter. The only exception to this rule is that they'd also like you to include the first and last point of every lap.
  • Run your script with a Pro tool and programmatically add the new feature class to the current Pro project with a symbology like shown in the figure above.

Moving beyond these ideas, there is a lot of potentially interesting information hidden in the time values associated with the points (which gets lost when constructing lines from the points). One fairly easy step toward analyzing the original data is to create a script tool that includes not only the option to create the polyline feature class described above, but also a point feature class (including the time, lap, speed, and heading values for each point). Or if you want a really big challenge, you could divide the track into segments and analyze the path data to find the lap in which the fastest time was recorded within each segment.

Term project and final review quiz

Term project and final review quiz juw30

By this time, you should have submitted your Final Project proposal and received a response from one of your instructors. You have the final two weeks of the course to work on your individual projects. Please submit your agreed-upon deliverables to the Final Project Drop Box by the course end date on the calendar.

There are two parts to the term project submission:

  1. the documented code and data needed to run it as agreed-upon in the proposal feedback process.
  2. a write-up describing the project purpose and approach and reflecting on the development and lessons learned from it, as well as providing instructions on how to run and test the code.

More information on these two parts of your term project submission and how they should be submitted can be found below. Please see the project grading rubric on Canvas to understand exactly how these requirements will be evaluated.

In addition to the term project, please don't forget to take the final review quiz linked in the Final Project section on Canvas. The quiz is worth 10% of your final grade.

Code and data

Make sure that the code you submit is clean, well documented and of high quality. Graders should be able to test run your code with only minimal adaptations. If you cannot avoid hard-coded paths, make sure that these are cleanly defined at the beginning of your script and your grader only needs to change the path at one place in the code.

Deliverable

Submit a single .zip file to the corresponding drop box in the Final Project section on the Canvas; the zip file should contain:

  • Your code and all other files making up your project including the data needed to run and test it
  • Your project write-up (see below)

When you turn in your final project, make sure you include sample data so that your grader can evaluate how well it works. If your sample data is large (greater than 10-20 MB), please keep in touch with your grader to ensure that he or she can successfully get the data. You can use your Penn State OneDrive storage to deliver your data to your grader or alternatively a public service like DropBox, Google Drive, etc., and include the link to your data in your submission.

Project write-up

Your write-up should provide enough information for the grader to test and evaluate the project, preferably as a set of numbered steps that graders can follow. If the graders cannot figure out how to run your project, they may deduct functionality points. The write-up should also describe how you approached the project, whether you ran into any issues and how you solved them, and something you learned during the project. The write-up should be well written and structured, reflective, and it should not seem “rushed”.

This course has been a pleasure and I wish you the best in your future Python and programming endeavors!

KEEP - OLD - Lessons - Pre-ArcGIS Pro Revisions

KEEP - OLD - Lessons - Pre-ArcGIS Pro Revisions mxw142

KEEP - OLD - Lesson 1: Introduction to GIS modeling and Python

KEEP - OLD - Lesson 1: Introduction to GIS modeling and Python mxw142

KEEP--1.6.2 Example: Printing the spatial reference of a feature class

KEEP--1.6.2 Example: Printing the spatial reference of a feature class sdq107

This first example script reports the spatial reference (coordinate system) of a feature class stored in a geodatabase:

# Opens a feature class from a geodatabase and prints the spatial reference

import arcpy

featureClass = "C:/Data/USA/USA.gdb/StateBoundaries"

# Describe the feature class and get its spatial reference   
desc = arcpy.Describe(featureClass)
spatialRef = desc.spatialReference

# Print the spatial reference name
print (spatialRef.Name)

This may look intimidating at first, so let’s go through what’s happening in this script, line by line. Watch this video (4:42) to get a visual walkthrough of the code. You'll notice a typo in the video / screenshot below desc.SpatialReference should be desc.spatialReference - as per the documentation.

PRESENTER: The purpose of this video is to walk through your first Python script, and this will be especially helpful if you're beginning Python.

I have PythonWin open here. I've displayed the line numbers and I've made the text a little bit larger than normal just so you can see it on the video.

It also makes it so it's not quite as intimidating that way.

The first few lines of this script start out with a comment, and they say actually what we're going to do with this script which is to look at a feature class inside of a geodatabase and then print the spatial reference of that feature class.

In line 4, we import the arcpy site package. Now, if you work with ESRI geoprocessing in your script or looking at ESRI data sets, this is what you're going to put at the top of your script most of the time. So you can get in the habit of using import arcpy at the top of your scripts.

In line 6, we create a string variable. We call this variable feature class, but we could really call it anything. What to call the variable is up to us. What to assign the variable is the path of the feature class we want to look at, and we put that in quotes, and that makes it a string in the eyes of Python. And when you make a string, PythonWin helps you out by putting that text in yellow so it's easy to spot in your code. In the path, we use forward slashes. This is different from the common backslash that you would see usually when you use a path. The backslash in python is a reserved character, so you have to either use two backslashes or a forward slash. You'll see both in this course, just so you can get in the habit of knowing that you can use both. In this case, the path is pointing at a file geodatabase. That's why you see USA.gdb. And inside that file geodatabase is a feature class called StateBoundaries. This is probably a polygon or a line feature class. If you didn't know the exact path to get, you could open our catalog and and highlight the feature class itself, and then you would see in the upper location bar the exact type of path to use.

In line 9, we actually call a method on arcpy the arcpy describe method, and this gives us back a describe object. We've mentioned in the course material that everything in Python is an object. Objects have properties that describe them and methods that are things that they can do.

If you wanted to learn more about the describe object, you could look in the ESRI help.

In our case, the describe object is going to have a spatial reference. Now, before we move off of line 9, I'd like to point out the parentheses there. So when you call the method arcpy.Describe, you need to put something inside that parentheses which is the path of the feature class that you want to describe. Now we don't have to type the full path again, because we made a variable earlier on to represent that. So this is where you can just put in the feature class variable name, and Python will read that as the actual path C:data etc.

So after line 9, we have a describe object. That describe object has a property which is the spatial reference. So, in line 10, that's what we're getting that. "desc" is the name of our described object and .SpatialReference gets us a spatial reference object.

Now, we can't print out an object. If you try to print an object to the interactive window, you'll get a bunch of text it's hard to understand, so we actually need to get a property off of this spatial reference object, and this property is the name of the spatial reference. So that's why, in line 13, we do spatialRef.name.

By this time, we've gone down far enough that we've gotten to a string; spatialRef.name just gives you back a string with the spatial reference and so that's printable to the interactive window.

And now, I'm going to run this script. I'll just move this out of the way, and behind there you can see the interactive window. It's ready to go.

To run a script, I just highlight the window that I want to run and I click this running man icon. The script file name is populated for me. It just comes from this window that's open. And if my script had any arguments that it needed in order to run, I could type them in here separated by spaces. Sometimes you have to supply a path name or a number or something, especially if you're gonna go ahead later on and make a tool out of this in ArcGIS. But for now, we're just going to run the script right out of PythonWin so there's no need to do any arguments.

And I'll just click OK and wait for this to run and the interactive window prints out geographic, meaning that this particular feature class has a geographic coordinate system.

Source: Sterling Quinn

Again, notice that:

  • a comment begins the script to explain what’s going to happen.
  • case sensitivity is applied in the code. "import" is all lower-case. So is "print". The module name "arcpy" is always referred to as "arcpy," not "ARCPY" or "Arcpy". Similarly, "Describe" is capitalized in arcpy.Describe.
  • The variable names featureClass, desc, and spatialRef that the programmer assigned are short, but intuitive. By looking at the variable name, you can quickly guess what it represents.
  • The script creates objects and uses a combination of properties and methods on those objects to get the job done. That’s how object-oriented programming works.

Trying the example for yourself

The best way to get familiar with a new programming language is to look at example code and practice with it yourself. See if you can modify the script above to report the spatial reference of a feature class on your computer. In my example, the feature class is in a file geodatabase; you’ll need to modify the structure of the featureClass path if you are using a shapefile (for example, you'll put .shp at the end of the file name, and you won't have .gdb in your path).

Follow this pattern to try the example:

  1. Open PythonWin and click File > New.
  2. Choose to make a Python script and click OK.
  3. Paste in the code above and modify it to fit your data (change the path).
  4. Save your script as a .py file.
  5. Click the Run button to run the script. Make sure the Interactive Window is visible when you do this, because this is where you’ll see the output from the print keyword. The print keyword does not actually cause a hard copy to be printed!

Readings

We'll take a short break and do some reading from another source. If you are new to Python scripting, it can be helpful to see the concepts from another point of view.

Read parts of Zandbergen chapters 4 &  5. This will be a valuable introduction to Python in ArcGIS, on how to work with tools and toolboxes (very useful for Project 1), and also on some concepts which we'll revisit later in Lesson 2 (don't worry if the bits we skip over seem daunting - we'll explain those in Lesson 2).
  • Chapter 4 deals with the fundamentals of Python. We will need a few of these to get started, and we'll revisit this chapter in Lesson 2. For now, read sections 4.1-4.7, reviewing 4.5, which you read earlier.)
  • Chapter 5 talks about working with arcpy and functions - read sections 5.1-5.6.

KEEP--1.6.3 Example: Performing map algebra on a raster

KEEP--1.6.3 Example: Performing map algebra on a raster sdq107

Here’s another simple script that finds all cells over 3500 meters in an elevation raster and makes a new raster that codes all those cells as 1. Remaining values in the new raster are coded as 0. This type of “map algebra” operation is common in site selection and other GIS scenarios.

Something you may not recognize below is the expression Raster(inRaster). This function just tells ArcGIS that it needs to treat your inRaster variable as a raster dataset so that you can perform map algebra on it. If you didn't do this, the script would treat inRaster as just a literal string of characters (the path) instead of a raster dataset.

# This script uses map algebra to find values in an
# elevation raster greater than 3500 (meters).

import arcpy
from arcpy.sa import *

# Specify the input raster
inRaster = "C:/Data/Elevation/foxlake"
cutoffElevation = 3500 

# Check out the Spatial Analyst extension
arcpy.CheckOutExtension("Spatial")

# Make a map algebra expression and save the resulting raster
outRaster = Raster(inRaster) > cutoffElevation
outRaster.save("C:/Data/Elevation/foxlake_hi_10")
      
# Check in the Spatial Analyst extension now that you're done
arcpy.CheckInExtension("Spatial")

Begin by examining this script and trying to figure out as much as you can based on what you remember from the previous scripts you’ve seen.

The main points to remember on this script are:

  • Notice the lines of code that check out the Spatial Analyst extension before doing any map algebra and check it back in after finishing. Because each line of code takes some time to run, avoid putting unnecessary code between checkout and checkin. This allows others in your organization to use the extension if licenses are limited. The extension automatically gets checked back in when your script ends, thus some of the Esri code examples you will see do not check it in. However, it is a good practice to explicitly check it in, just in case you have some long code that needs to execute afterward, or in case your script crashes and against your intentions "hangs onto" the license.
  • inRaster begins as a string, but is then used to create a Raster object once you run Raster(inRaster). A Raster object is a special object used for working with raster datasets in ArcGIS. It's not available in just any Python script: you can use it only if you import the arcpy module at the top of your script.
  • cutoffElevation is a number variable that you declare early in your script and then use later on when you build the map algebra expression for your outRaster.
  • Your expression outRaster = Raster(inRaster) > cutoffElevation is saying, in plain terms, "Make a new raster and call it outRaster. Do this by taking all the cells of the raster dataset at the path of inRaster that are greater than the number I assigned to the variable cutoffElevation."
  • outRaster is also a Raster object, but you have to call the method outRaster.save() in order to make it permanent on disk. The save() method takes one argument, which is the path to which you want to save.

Now try to run the script yourself using the FoxLake digital elevation model (DEM) in your Lesson 1 data folder. If it doesn’t work the first time, verify that:

  • you have supplied the correct input and output paths;
  • your path name contains forward slashes (/) or double backslashes (\\), not single backslashes (\);
  • you have the Spatial Analyst Extension installed and enabled. To check this, open ArcMap, click Customize > Extensions and ensure Spatial Analyst is checked;
  • you do not have any of the datasets open in ArcMap;
  • the output data does not exist yet. If you want to be able to overwrite the output, you need to add the line arcpy.env.overwriteOutput = True. This line can be placed immediately after import arcpy.

You can experiment with this script using different values in the map algebra expression (try 3000 for example).

Readings

Read the sections of Chapter 5 that talk about environment variables and licenses (>5.9 & 5.11) which we covered in this part of the lesson.

KEEP--1.6.4 Example: Creating buffers

KEEP--1.6.4 Example: Creating buffers sdq107

Think about the previous example where you ran some map algebra on an elevation raster. If you wanted to change the value of your cutoff elevation to 2500 instead of 3500, you had to open the script itself and change the value of the cutoffElevation variable in the code.

This third example is a little different. Instead of hard-coding the values needed for the tool (in other words, literally including the values in the script) we’ll use some user input variables, or parameters. This allows people to try different values in the script without altering the code itself. Just like in ModelBuilder, parameters make your script available to a wider audience.

The simple example below just runs the Buffer tool, but it allows the user to enter the path of the input and output datasets as well as the distance of the buffer. The user-supplied parameters make their way into the script with the arcpy.GetParameterAsText() method.

Examine the script below carefully, but don't try to run it yet. You'll do that in the next part of the lesson.

# This script runs the Buffer tool. The user supplies the input
# and output paths, and the buffer distance.

import arcpy
arcpy.env.overwriteOutput = True

try:
     # Get the input parameters for the Buffer tool
     inPath = arcpy.GetParameterAsText(0)
     outPath = arcpy.GetParameterAsText(1)
     bufferDistance = arcpy.GetParameterAsText(2)

     # Run the Buffer tool
     arcpy.Buffer_analysis(inPath, outPath, bufferDistance)

     # Report a success message 
     arcpy.AddMessage("All done!")

except:
     # Report an error messages
     arcpy.AddError("Could not complete the buffer")

     # Report any error messages that the Buffer tool might have generated 
     arcpy.AddMessage(arcpy.GetMessages())

Again, examine the above code line by line and figure out as much as you can about what the code does. If necessary, print the code and write notes next to each line. Here are some of the main points to understand:

  • GetParameterAsText() is a function in the arcpy module. Notice that it takes a zero-based integer (0, 1, 2, 3, etc.) as an argument. If you’re going to go ahead and make a tool out of this script, as we are going to do in the next page of this lesson, then it’s important you define the parameters in the same order you want them to appear in the tool’s dialog.
  • When we called the Buffer tool in this script, we supplied only three parameters. By not supplying any more, we accepted the default values for the rest of the tool’s parameter (Side Type, End Type, etc.).
  • The try and except blocks of code are a way that you can prevent your script from crashing if there is an error. Your script attempts to run all of the code in the try block. If the script cannot continue for some reason, it jumps down and runs the code in the except block. Inserting try/except blocks like this is a good practice to do once you think you've gotten all the errors out of your script, or when you want to make sure your code will run a certain line at the end no matter what happens.

    When you are first writing and debugging your script, sometimes it's more useful to leave out try/except and let the code crash, because the (red) error messages reported in the Interactive Window sometimes give you better clues on how to diagnose the problem in your code. Suppose you put a print statement in your except block saying "There was an error. Please try again." For the end user of your script, this is nicer than seeing a nasty (red) error message; however, as a programmer debugging the script, you want to see the (red) error message to get any insight you can about what went wrong.

    Projects that you submit in this course require error handling using try/except in order to receive full credit.
  • The arcpy.AddMessage() and arcpy.AddError() methods are ways of adding additional messages to the user of the tool. Whenever you run a tool, the geoprocessor prints messages, which you have probably seen before (for example, “Executed (Buffer) successfully. End time: Sat Oct 03 07:37:31 2009”). You have the power to add more messages through these methods. The messages have differing levels of severity, hence different methods for AddMessage and AddError. Sometimes people choose to view only the errors instead of all the messages, or they do a quick visual scan of the messages for errors.

    When you use arcpy.GetMessages(), you get all the messages generated by the tool itself. These will tell you things such as whether the user entered invalid parameters. Notice in this script the somewhat complex syntax you have to use to first get the messages, then add them: arcpy.AddMessage(arcpy.GetMessages()). If this line of code is confusing to understand, remember that the order of functions works like math operations: you start by working inside the parentheses first to get the messages, then you add them.

    The AddError and AddMessage methods are only used when making script tools (which you'll learn about in the very next section). When you are just running a script in PythonWin (not making a script tool), you can still get the messages using a print statement with GetMessages(), like this: print arcpy.GetMessages().

Readings

Read the section of Chapter 5 that talks about working with tool messages (5.10) for another perspective on handling tool output.

KEEP--Project 1, Part I: Modeling precipitation zones in Nebraska

KEEP--Project 1, Part I: Modeling precipitation zones in Nebraska sdq107

Suppose you're working on a project for the Nebraska Department of Agriculture and you are tasked with making some maps of precipitation in the state. Members of the department want to see which parts of the state were relatively dry and wet in the past year, classified in zones. All you have is a series of weather station readings of cumulative rainfall for 2008 that you've obtained from within Nebraska and surrounding areas. This is a shapefile of points called Precip2008Readings.shp. It is in your Lesson 1 data folder.

Precip2008Readings.shp is a fictional dataset created for this project. The locations do not correspond to actual weather stations. However, the measurements are derived from real 2008 precipitation data created by the PRISM Climate Group at Oregon State University, 2009.

You need to do several tasks in order to get this data ready for mapping:

  • Interpolate a precipitation surface from your points. This creates a raster dataset with estimated precipitation values for your entire area of interest. You've already planned for this, knowing that you are going to use inverse distance weighted (IDW) interpolation. Click the following link to learn how the IDW technique works. You've also selected your points to include some areas around Nebraska to avoid edge effects in the interpolation.
  • Reclassify the interpolated surface into an ordinal classification of precipitation "zones" that delineate relatively dry, medium, and wet regions.
  • Create vector polygons from the zones.
  • Clip the zone polygons to the boundary of Nebraska.
    Precipitation readings to an interpolated surface to a reclassified surface and vectorized zones which results in precipitation zones
    Figure 1.15 Mapping the data.

It's very possible that you'll want to repeat the above process in order to test different IDW interpolation parameters or make similar maps with other datasets (such as next year's precipitation data). Therefore, the above series of tasks is well-suited to ModelBuilder. Your job is to create a model that can complete the above series of steps without you having to manually open four different tools.

Model parameters

Your model should have these (and only these) parameters:

  1. Input precipitation readings- This is the location of your precipitation readings point data. This is a model parameter so that the model can be easily re-run with other datasets.
  2. Power- An IDW setting specifying how quickly influence of surrounding points decreases as you move away from the point to be interpolated.
  3. Search radius- An IDW setting determining how many surrounding points are included in the interpolation of a point. The search radius can be fixed at a certain distance, including whatever number of points happen to fall within, or its distance can vary in order for it to always include a minimum number of points. When you use ModelBuilder, you don't have to set up any of these choices; ModelBuilder does it for you when you set the Search Radius as a model parameter.
  4. Zone boundaries- This is a table allowing the user of the model to specify the zone boundaries. For example, you could configure precipitation values of 0 - 30000 to result in a reclassification of 1 (to correspond with Zone 1), 30000 - 60000 could result in a classification of 2 (to correspond with Zone 2), and so on. The way to get this table is to make a variable from the Reclassification parameter of the Reclassify tool and set it as a model parameter.
  5. Output precipitation zones- This is the location where you want the output dataset of clipped vector zones to be placed on disk.

As you build your model, you will need to configure some settings that will not be exposed as parameters. These include the clip feature, which is the state of Nebraska outline Nebraska.shp in your Lesson 1 data folder. There are many other settings such as "Z Value field" and "Input barrier polyline features" (for IDW) or "Reclass field" (for Reclassify) that should not be exposed as parameters. You should just set these values once when you build your model. If you ever ask someone else to run this model, you don't want them to be overwhelmed with choices stemming from every tool in the model; you should just expose the essential things they might want to change.

For this particular model, you should assume that any input dataset will conform to the same schema as your Precip2008Readings.shp feature class. For example, an analyst should be able to submit a similar Precip2009Readings dataset with the same fields, field names, and data types. However, he or she should not expect to provide any feature class with a different set of fields and field names, etc. As you might discover, handling all types of feature class schemas would make your model more complex than we want for this assignment.

When you double-click the model to run it, the interface should look like the following:

A screen capture of the Create Precipitation Zones dialog box
Figure 1.16 The model interface.

Running the model with the exact parameters listed above should result in the following (I have symbolized the zones in ArcMap with different colors to help distinguish them). This is one way you can check your work:

Model output with parameters shown above
Figure 1.17 The completed model output.

Tips

The following tips may help you as you build your model:

  • Your model needs to include the following tools in this order: IDW (from the Spatial Analyst toolbox), Reclassify, Raster to Polygon, Clip (from the Analysis toolbox).
  • An easy way to find the tools you need in ArcMap is to click Windows > Search and type the name of the tool you want in the search box. Be careful when multiple tools have the same name. You'll typically be using tools from the Spatial Analyst toolbox in this assignment.
  • Once you drag and drop a tool onto the ModelBuilder canvas, double-click it and set all the parameters the way you want. These will be the default settings for your model.
  • If there is a certain parameter for a tool that you want to expose as a model parameter, right-click the tool in the ModelBuilder canvas, then click Make Variable > From Parameter and choose the parameter. Once the oval appears for the variable, right-click it and click Model Parameter.
  • If you receive errors that a tool is not able to run, or that no Spatial Analyst Extension is installed, you may need to enable the extension. In ArcMap, click Customize > Extensions and then check the Spatial Analyst checkbox.

KEEP--Project 1, Part II: Creating contours for the Fox Lake DEM

KEEP--Project 1, Part II: Creating contours for the Fox Lake DEM sdq107

The second part of Project 1 will help you get some practice with Python. At the end of Lesson 1, you saw three simple scripting examples; now your task is to write your own script. This script will create vector contour lines from a raster elevation dataset. Don't forget that the ArcGIS Desktop Help can indeed be helpful if you need to figure out the syntax for a particular command.

Earlier in the lesson, you were introduced to the Fox Lake DEM in your Lesson 1 data folder. It represents elevation in the Fox Lake Quadrangle, Utah. Write a script that uses the Contour tool in the Spatial Analyst toolbox to create contour lines for the quadrangle. The contour interval should be 25 meters, and the base contour should be 0. Remember that the native units of the DEM are meters, so no unit conversions are required.

Running the script should immediately create a shapefile of contour lines on disk.

Follow these guidelines when writing the script:

  • The purpose of this exercise is just to get you some practice writing Python code. Therefore, you are not required to use arcpy.GetParameterAsText() to get the input parameters. Go ahead and hard-code the values (such as the path name to the dataset).
  • Consequently, you are not required to create a script tool for this exercise. This will be required in Project 2.
  • Your code should run correctly from PythonWin. For full credit, it should also contain comments, attempt to handle errors, and use legal and intuitive variable names.

Deliverables

The deliverables for Project 1 are:

  • The .tbx file of the toolbox containing your Part I model. The easiest way to find it is to right-click your toolbox in the Catalog window, click Properties, and note the Location. If you can't browse to this path in Windows Explorer, you'll need to enable the Windows option to show hidden files and folders.
  • The .py file containing your Part II script.
  • A short writeup (about 400 words) describing how you approached the two problems, any setbacks you faced, and how you dealt with those. These writeups will be required on all projects.

Successful delivery of the above requirements is sufficient to earn 90% on the project. The remaining 10% is reserved for efforts that go "over and above" the minimum requirements. For Part I, this could include (but is not limited to) meaningful labels on and around model elements, analysis of how different input values affect the output, substitution of some other interpolation method instead of IDW (for example Kriging), documentation for your model parameters that appears in the side-panel help, or demonstration of how your model was successfully run on a different input dataset. For Part II, in addition to the Contour tool, you could run some other tool that also takes a DEM as an input.

As a general rule throughout the course, full credit in the "over and above" category requires the implementation of 2-4 different ideas, with more complex ideas earning more credit. Note that for future projects, we won't be listing off ideas as we've done here.  Otherwise, it wouldn't really be an over and above requirement. smiley

Finishing Lesson 1

To complete Lesson 1, please zip all your Project 1 deliverables (for parts I and II) into one file and submit them to the Project 1 Drop Box in Canvas. Then take the Lesson 1 Quiz if you haven't taken it already.

KEEP - OLD - Lesson 2: Python and programming basics

KEEP - OLD - Lesson 2: Python and programming basics mxw142

KEEP--2.2.3 Printing messages from the Esri geoprocessing framework

KEEP--2.2.3 Printing messages from the Esri geoprocessing framework sdq107

When you work with geoprocessing tools in Python, sometimes a script will fail because something went wrong with the tool. It could be that you wrote flawless Python syntax, but your script doesn't work as expected because Esri geoprocessing tools cannot find a dataset or otherwise digest a tool parameter. You won't be able to catch these errors with the debugger, but you can get a view into them by printing the messages returned from the Esri geoprocessing framework.

Esri has configured its geoprocessing tools to frequently report what they're doing. When you run a geoprocessing tool from ArcMap or ArcCatalog, you see a box with these messages, sometimes accompanied by a progress bar. You learned in Lesson 1 that you can use arcpy.GetMessages() to access these messages from your script. If you only want to view the messages when something goes wrong, you can include them in an except block of code, like this.

try:
    . . .
except:
    print (arcpy.GetMessages())

Remember that when using try/except, in the normal case Python will execute everything in the try-block (= everything following the "try:" that is indented relative to it) and then continue after the except-block (= everything following the "except:" that is indented relative to it). However, if some command in the try-block fails, the program execution directly jumps to the beginning of the except-block and, in this case, prints out the messages we get from arcpy.GetMessages(). After the except-block has been executed, Python will continue with the next statement after the except-block.

Geoprocessing messages have three levels of severity: Message, Warning, and Error. You can pass an index to the arcpy.GetMessages() method to filter through only the messages that reach a certain level of severity. For example, arcpy.GetMessages(2) would return only the messages with a severity of "Error". Error and warning messages sometimes include a unique code that you can use to look up more information about the message. The ArcGIS Desktop Help contains topics that list the message codes and provide details on each. Some of the entries have tips for fixing the problem.

Try/except can be used in many ways and at different indentation levels in a script. For instance, you can have a single try/except construct as in the example above where the try-block contains the entire functionality of your program. If something goes wrong, the script will output some error message and then terminate. In other situations, you may want to split the main code into different parts, each contained by its own try/except construct to deal with a the particular problems that may occur in this part of the code. For instance, the following structure may be used in a batch script that performs two different geoprocessing operations on a set of shapefiles when an attempt should be made to perform the second operation, even if the first one fails.

for featureClass in fcList: 
    try:
       . . .   # perform geoprocessing operation 1
    except:
       . . .   # deal with failure of geoprocessing operation 1

    try:
       . . .   # perform geoprocessing operation 2
    except:
       . . .   # deal with failure of geoprocessing operation 2

Let us assume, the first geoprocessing operation fails for the first shapefile in fcList. As a result, the program execution will jump to the first except-block which contains the code for dealing with this error situation. In the simplest case, it will just produce some output messages. After the except-block has been executed, the program execution will continue as normal by moving on to the second try/except statement and attempt to perform the second geoprocessing operation. Either this one succeeds or it fails too, in which case the second except-block will be executed. The last thing to note is that since both try/except statements are contained in the body of the for-loop going through the different shapefiles, even if both of the operations fail for one of the files, the script will always jump back to the beginning of the loop body and continue with the next shapefile in the list which is often the desired behavior of a batch script.

Further reading

Please take a look at the official ArcGIS documentation for more detail about geoprocessing messages. Be sure to read these topics:

  • Understanding message types and severity
  • Understanding geoprocessing tool errors and warnings - This is the gateway into the error and warning reference section of the help that explains all the error codes. Sometimes you'll see these codes in the messages you get back, and the specific help topic for the code can help you understand what went wrong. The article also talks about how you can trap for certain conditions in your own scripts and cause specific error codes to appear. This type of thing is optional, for over and above credit, in this course.

 

KEEP--Project 2: Batch reprojection tool for vector datasets

KEEP--Project 2: Batch reprojection tool for vector datasets sdq107

Some GIS departments have determined a single, standard projection in which to maintain their source data. The raw datasets, however, can be obtained from third parties in other projections. These datasets then need to be reprojected into the department's standard projection. Batch reprojection, or the reprojection of many datasets at once, is a task well suited to scripting.

In this project, you'll practice Python fundamentals by writing a script that re-projects the vector datasets in a folder. From this script, you will then create a script tool that can easily be shared with others.

The tool you will write should look like the image below. It has two input parameters and no output parameters. The two input parameters are:

  1. A folder on disk containing vector datasets to be re-projected.
  2. The path to a vector dataset whose spatial reference will be used in the re-projection. For example, if you want to re-project into NAD 1983 UTM Zone 10, you would browse to some vector dataset already in NAD 1983 UTM Zone 10. This could be one of the datasets in the folder you supplied in the first parameter, or it could exist elsewhere on disk.
    Screen capture showing the project 2 tool
    Figure 2.1 The Project 2 tool with two input parameters and no output parameters.

Running the tool causes re-projected datasets to be placed on disk in the target folder.

Requirements

To receive full credit, your script:

  • must re-project shapefile vector datasets in the folder to match the target dataset's projection;
  • must append "_projected" to the end of each projected dataset name. For example: CityBoundaries_projected.shp;
  • must skip projecting any datasets that are already in the target projection;
  • must report a geoprocessing message telling which datasets were projected. In this message, the dataset names can be separated by spaces. In the message, do not include datasets that were skipped because they were already in the target projection. This must be a single message, not one message per projected dataset. Notice an example of this type of custom message below in the line "Projected . . . :"
    Screen capture showing the project 2 tool after running
    Figure 2.2 Your script must report a geoprocessing message telling which datasets were projected.
  • Must not contain any hard-coded values such as dataset names, path names, or projection names.
  • Must be made available as a script tool that can be easily run from ArcToolbox by someone with no knowledge of scripting.

Successful completion of the above requirements is sufficient to earn 90% of the credit on this project. The remaining 10% is reserved for "over and above" efforts which could include, but are not limited to, the following:

  • Your geoprocessing message of projected datasets contains commas between the dataset names, with no extra "trailing" comma at the end.
  • Side panel help documentation is provided for your script tool. This means that when you open the tool dialog and click Show Help, instructions for each parameter appear on the side. The ArcGIS Desktop Help can teach you how to do this.
  • Your script tool uses relative paths to the .py file and is easily deployed without having to "re-wire" the toolbox to the script.

You are not required to handle datum transformations in this script. It is assumed that each dataset in the folder uses the same datum, although the datasets may be in different projections. Handling transformations would cause you to have to add an additional parameter in the Project tool and would make your script more complicated than you would probably like for this assignment.

Sample data

The Lesson 2 data folder contains a set of vector shapefiles for you to work with when completing this project (delete any subfolders in your Lesson 2 data folder—you may have one called PracticeData—before beginning this project). These shapefiles were obtained from the Washington State Department of Transportation GeoData Distribution Catalog, and they represent various geographic features around Washington state. For the purpose of this project, I have put these datasets in various projections. These projections share the same datum (NAD 83) so that you do not have to deal with datum transformations.

The datasets and their original projections are:

  • CityBoundaries and StateRoutes - NAD_1983_StatePlane_Washington_South_FIPS_4602
  • CountyLines - NAD_1983_UTM_Zone_10N
  • Ferries - USA_Contiguous_Lambert_Conformal_Conic
  • PopulatedPlaces - GCS_NorthAmerican_1983

Deliverables

Deliverables for this project are as follows:

  • the source .py file containing your script;
  • the .tbx file containing your script tool;
  • a short writeup (about 300 words) describing how you approached the project, how you successfully dealt with any roadblocks, and what you learned along the way. You should include which requirements you met, or failed to meet. If you added some of the "over and above" efforts, please point these out so the grader can look for them.

Tips

The following tips can help improve your possibility of success with this project:

  • Do not use the Esri Batch Project tool in this project. In essence, you're required to make your own variation of a batch project tool in this project by running the Project tool inside a loop. Your tool will be easier to use because it's customized to the task at hand.
     
  • There are a lot of ways to insert "_projected" in the name of a dataset, but you might find it useful to start by temporarily removing ".shp" and adding it back on later. To make your code work for both a shapefile (which has the extension .shp) and a feature class in a geodatabase (which does not have the extension .shp), you can use the following:           

    ​rootName = fc
    if rootName.endswith(".shp"):
          rootName = rootName.replace(".shp","")

    In the above code, fc is your feature class name. If it is the name of a shapefile it will include the .shp . The replace function searches for any string ".shp" (the first parameter) in the file name and replaces it with nothing (symbolized in the second parameter by empty quotes ""). So after running this code, variable rootName will contain the name of the feature class name without the ".shp" . Since replace(...) does not change anything if the string given as the first parameter does not occur in fc, the code above can be replaced by just a single line:

    rootName = fc.replace(".shp","")
    You could also potentially chop off the last four characters using something like         

    rootName = fc[:-4]

    but hard-coding numbers other than 0 or 1 in your script can make the code less readable for someone else. Seeing a function like replace is a lot easier for someone to interpret than seeing -4 and trying to figure out why that number was chosen. You should therefore use replace(...) in your solution instead.

  • To check if a dataset is already in the target projection, you will need to obtain a Spatial Reference object for each dataset (the dataset to be projected and the target dataset). You will then need to compare the spatial reference names of these two datasets. Be sure to compare the Name property of the spatial references; do not compare the spatial reference objects themselves. This is because you can have two spatial reference objects that are different entities (and are thus "not equal"), but have the same name property.

    You should end up with a line similar to this:
    if fcSR.Name != targetSR.Name: 
    where fcSR is the spatial reference of the feature class to be projected and targetSR is the target spatial reference obtained from the target projection shapefile.
     
  • If you want to show all the messages from each run of the Project tool, add the line: arcpy.AddMessage(arcpy.GetMessages()) immediately after the line where you run the Project tool. Each time the loop runs, it will add the messages from the current run of the Project tool into the results window. It's been my experience that if you wait to add this line until the end of your script, you only get the messages from the last run of the tool, so it's important to put the line inside the loop. Remember that while you are first writing your script, you can use print statements to debug, then switch to arcpy.AddMessage() when you have verified that your script works and you are ready to make a script tool.
  • If you need extra help with making the script tool, refer back to Lesson 1.7.1 and also read Zandbergen 13.1 - 13.10 where he goes in depth about making your own tools.
  • If, after all your best efforts, you ran out of time and could not meet one of the requirements, comment out the code that is not working (using a # sign at the beginning of each line) and send the code anyway. Then explain in your brief writeup which section is not working and what troubles you encountered. If your commented code shows that you were heading down the right track, you may be awarded partial credit.

KEEP - OLD - Lesson 3: GIS data access and manipulation with Python

KEEP - OLD - Lesson 3: GIS data access and manipulation with Python mxw142

KEEP--Project 3: Extracting amenities from OpenStreetMap data

KEEP--Project 3: Extracting amenities from OpenStreetMap data sdq107

In this project, you'll use your new skills working with selections and cursors to process some data from a "raw" format into a more specialized group of datasets for a specific mapping purpose. The data from this exercise was derived from OpenStreetMap, a free map of the world where anyone can contribute anything, similar in nature to Wikipedia. Although it's easy to put data into OpenStreetMap (See Lesson 9 of Geog 585 if you're interested in trying it), it can be more difficult to get the data out and filter it down to meet specific needs.

Download the data for this project

Background

In this exercise, suppose you are making a humanitarian-themed map for a non-profit agency in El Salvador. You want to show hospitals, schools, and places of worship, with the intent that these facilities could be utilized for shelter or medical care in times of natural disaster.

You currently have a messy shapefile with all manner of points covering a broad area that have been exported out of OpenStreetMap (OSMpoints.shp). You also have a shapefile of country boundaries in Central America (CentralAmerica.shp). You want to make shapefiles of just the things you're interested in, and just within El Salvador.

Task

Write a script that makes a separate shapefile for each of these types of amenities (schools, hospitals, places of worship) within the boundary of El Salvador. Write this script so that the user can change the country or list of amenity types simply by editing a couple of lines of code at the top of the script, like this:

import arcpy
amenities = ['school','hospital','place_of_worship']
country = 'El Salvador'
. . .

As part of this extraction task, you should also add a text field named "source" to each new shapefile. For every record, populate this field with the value "OpenStreetMap" so that future users know where the data came from. (This could be valuable in case they later decide to add or merge in other features.)

Your result should look something like this if viewed in ArcMap:

Example of Project 3 result
Figure 3.5  Example output from Project 3, viewed in ArcMap.

The above requirements are sufficient for receiving 90% of the credit on this assignment. The remaining 10% is reserved for "Over and above" efforts, such as making an ArcToolbox script tool, or extending the script to handle other amenity types, multiple countries, etc. For these over and above efforts, we prefer that you submit two copies of the script: one with the basic functionality and one with the extended functionality. This will make it more likely that you'll receive the base credit if something fails with your over and above coding.  

Deliverables

Deliverables for this project are as follows:

  • The source .py file containing your script
  • A short writeup (about 300 words) describing how you approached the project, how you successfully dealt with any roadblocks, and what you learned along the way. You should include which requirements you met, or failed to meet. If you added some of the "over and above" efforts, please point these out so the grader can look for them.

You do not have to create an ArcToolbox script tool for this assignment; you can hard-code the initial parameters. Nevertheless, put all the parameters at the top so they can be easily manipulated by whoever tests the script.

Once you get everything working, creating an ArcToolbox script tool is a good way to achieve the "over and above" credit for this assignment. If you do this, then please zip all supporting files before placing them in the drop box.

Notes about the data

Take a look at the OSMpoints.shp dataset in ArcMap, particularly the attribute table. It contains an "amenity" field that you can use to select the things you're interested in. This field corresponds to the amenity tag in OpenStreetMap. If you take a look at the amenity tag documentation, you will see that the official strings for your amenities of interest are: 'hospital', 'school', and 'place_of_worship'...but you want to be able to write a script that's capable of handling other amenities, too.

The Central America shapefile has a field called "NAME". You can make an attribute selection on this field to select El Salvador, then follow that up with a spatial selection to grab all the amenity points that fall within this country. Finally, narrow down those amenity points to just the ones that satisfy your desired amenity.

Tips

As mentioned above, you'll be starting out by making a list of all the things you want to capture:

amenities = ['school','hospital','place_of_worship']

Then, later in your code, you can loop through this list:

for amenity in amenities:
    amenitySelectionClause = '"amenity" = ' + "'" + amenity + "'"
    # select each amenity and make a new layer

Once you've got the selection made, use the Copy Features tool to save the selected features into a new file, in the same fashion as in the practice exercises.

Take this project one step at a time. It's probably easiest to tackle the extraction-into-shapefile portion first.  Once you have all the new amenity shapefiles created, go through them one by one, use the "Add Field" tool to add the "source" field, followed by an UpdateCursor to loop through all rows and populate this new field with 'OpenStreetMap'.

It might be easiest to get the whole process working with a single amenity, then add the loop for all the amenities later after you have finalized all the other script logic.

For the purposes of this exercise, don't worry about capturing points that fall barely outside the edge of the boundary (e.g., points in coastal cities that appear in the ocean). To capture all these in real life, you would just need to obtain a higher resolution boundary file. The code would be the same.

You should be able to complete this exercise in about 50 lines of code (including whitespace and comments). If your code gets much longer than this, you are probably missing an easier way.

Note:

In some cases, students have reported that the values in the amenity field contain a leading space (e.g., " school" rather than "school"). If this happens, your input file got corrupted, most likely by applying AddField(...) and/or an update cursor to a layer in memory. In this case, please re-download the input files and make sure that you only use AddField(...) and the update cursor on the produced output shapefiles not on a layer.

KEEP--Project 3: Extracting data for a pro hockey team's scouting department

KEEP--Project 3: Extracting data for a pro hockey team's scouting department jed124

In this project you'll use your new skills working with selections and cursors to process some data from a "raw" format into a more specialized dataset for a specific mapping purpose. The data from this exercise was retrieved from the National Hockey League's undocumented statistics API.

Download the data for this project

Background

In this exercise, suppose you are a data analyst for an NHL franchise.  In preparation for the league draft, the team general manager has asked you to make it possible for him to retrieve all current players born in a particular country (say, Sweden) broken down by position.  Your predecessor passed along to you the player shapefile, but unfortunately the player's birth country is not included as one of the attributes.  You do have a second shapefile of world country boundaries though...

Task

Write a script that makes a separate shapefile for each of the three forward positions (center, right wing, and left wing) within the boundary of Sweden. Write this script so that the user can change the country or list of positions simply by editing a couple of lines of code at the top of the script.

In browsing the attribute table, you'll note that player heights and weights are stored in imperial units (feet & inches and pounds).  As part of this extraction task, to simplify comparisons against scouting reports written using metric units, you should also add two new numeric fields to the attribute table -- to the new shapefiles only, not the original nhlrosters.shp -- to hold height and weight values in centimeters and kilograms.  For every record, populate these fields with values based on the following formulas:

height_cm = height (in inches) * 2.54

weight_kg = weight (in pounds) * 0.453592

Your result should look something like this if viewed in ArcMap:

Graphic showing required Project 3 output

Figure 3.5  Example output from Project 3, viewed in ArcMap.

The above requirements are sufficient for receiving 90% of the credit on this assignment. The remaining 10% is reserved for "Over and above" efforts, such as making an ArcToolbox script tool, or extending the script to handle other positions, multiple countries, etc. For these over and above efforts, we prefer that you submit two copies of the script: one with the basic functionality and one with the extended functionality. This will make it more likely that you'll receive the base credit if something fails with your over and above coding.  

Deliverables

Deliverables for this project are as follows:

  • The source .py file containing your script
  • A short writeup (about 300 words) describing how you approached the project, how you successfully dealt with any roadblocks, and what you learned along the way. You should include which requirements you met, or failed to meet. If you added some of the "over and above" efforts, please point these out so the grader can look for them.

You do not have to create an ArcToolbox script tool for this assignment; you can hard-code the initial parameters. Nevertheless, put all the parameters at the top so they can be easily manipulated by whoever tests the script.

Once you get everything working, creating an ArcToolbox script tool is a good way to achieve the "over and above" credit for this assignment. If you do this, then please zip all supporting files before placing them in the drop box.

Notes about the data

Take a look at the provided datasets in ArcMap, particularly the attribute tables.  The nhlrosters shapefile contains a field called "position" that provides the values you need to examine (RW = Right Wing, LW = Left Wing, C = Center, D = Defenseman, G = Goaltender).  As mentioned, you've been asked to extract the forward positions (RW, LW, and C) to new shapefiles, but you want to write a script that's capable of handling some other combination of positions, too.  There are several other fields that offer the potential for interesting queries as well, if you're looking for over and above ideas.

The Countries_WGS84 shapefile has a field called "CNTRY_NAME". You can make an attribute selection on this field to select Sweden, then follow that up with a spatial selection to grab all the players that fall within this country. Finally, narrow down those players to just the ones that play the desired position.

Tips

Once you've selected Swedish players at the desired position, use the Copy Features tool to save the selected features into a new feature class, in the same fashion as in the practice exercises.

Take this project one step at a time. It's probably easiest to tackle the extraction-into-shapefile portion first.  Once you have all the new position shapefiles created, go through them one by one, use the "Add Field" tool to add the "height_cm" and "weight_kg" fields, followed by an UpdateCursor to loop through all rows and populate these new fields with appropriate values.

It might be easiest to get the whole process working with a single position, then add the loop for all the positions later after you have finalized all the other script logic.

The height field is of type Text to allow for the ' and " characters.  The string slicing notation covered earlier in the course can be used to obtain the feet and inches components of the player height.  Use the inches to centimeters formula shown above to compute the height in metric units.

For the purposes of this exercise, don't worry about capturing points that fall barely outside the edge of the boundary (e.g., points in coastal cities that appear in the ocean). To capture all these in real life, you would just need to obtain a higher resolution boundary file. The code would be the same.

You should be able to complete this exercise in about 50 lines of code (including whitespace and comments). If your code gets much longer than this, you are probably missing an easier way.

KEEP - OLD - Lesson 4: Practical Python for the GIS analyst

KEEP - OLD - Lesson 4: Practical Python for the GIS analyst sdq107

KEEP--4.2 Python Dictionaries

KEEP--4.2 Python Dictionaries mxw142

In programming, we often want to store larger amounts of data that somehow belongs together inside a single variable. In Lesson 2, you already learned about lists, which provide one option to do so. As long as available memory permits, you can store as many elements in a list as you wish and the append(...) method allows you to add more elements to an existing list.

Dictionaries are another data structure that allows for storing complex information in a single variable. While lists store elements in a simple sequence and the elements are then accessed based on their index in the sequence, the elements stored in a dictionary consist of key-value pairs and one always uses the key to retrieve the corresponding values from the dictionary. It works like in a real dictionary where you look up information (the stored value) under a particular keyword (the key).

Dictionaries can be useful to realize a mapping, for instance from English words to the corresponding words in Spanish. Here is how you can create such a dictionary for just the numbers from one to four:

>>> englishToSpanishDic = { "one": "uno", "two": "dos", "three": "tres", "four": "cuatro"  }

The curly brackets { } delimit the dictionary similarly to how squared brackets [ ] do for lists. Inside the dictionary, we have four key-value pairs separated by commas. The key and value for each pair are separated by a colon. The key appears on the left of the colon, while the value stored under the key appears on the right side of the colon.

We can now use the dictionary stored in variable englishToSpanishDic to look up the Spanish word for an English number, e.g.

>>> print (englishToSpanishDic["two"])
dos

To retrieve some value stored in the dictionary, we here use the name of the variable followed by squared brackets containing the key under which the value is stored in the dictionary. If we use the same notation but on the left side of an assignment operator (=), we can add a new key-value pair to an existing dictionary:

>>> englishToSpanishDic["five"] = "cinco"    
>>> print (englishToSpanishDic)
{'four': 'cuatro', 'three': 'tres', 'five': 'cinco', 'two': 'dos', 'one': 'uno'}

We here added the value "cinco" appearing on the right side of the equal sign under the key "five" to the dictionary. If something would have already been stored under the key "five" in the dictionary, the stored value would have been overwritten. You may have noticed that the order of the elements of the dictionary in the output has changed but that doesn’t matter since we always access the elements in a dictionary via their key. If our dictionary would contain many more word pairs, we could use it to realize a very primitive translator that would go through an English text word-by-word and replace each word by the corresponding Spanish word retrieved from the dictionary. Admittedly, using this simple approach would probably result in pretty hilarious translations.

Now let’s use Python dictionaries to do something a bit more complex. Let’s simulate the process of creating a book index that lists the page numbers on which certain keywords occur. We want to start with an empty dictionary and then go through the book page-by-page. Whenever we encounter a word that we think is important enough to be listed in the index, we add it and the page number to the dictionary.

To create an empty dictionary in a variable called bookIndex, we use the notation with the curly brackets but nothing in between:

>>> bookIndex = {}
>>> print (bookIndex)
{} 

Now let’s say the first keyword we encounter in the imaginary programming book we are going through is the word "function" on page 2. We now want to store the page number 2 (value) under the keyword "function" (key) in the dictionary. But since keywords can appear on many pages, what we want to store as values in the dictionary are not individual numbers but lists of page numbers. Therefore, what we put into our dictionary is a list with the number 2 as its only element:

>>> bookIndex["function"] =  [2]
>>> print (bookIndex)
{'function': [2]} 

Next, we encounter the keyword "module" on page 3. So we add it to the dictionary in the same way:

>>> bookIndex["module"] =  [3]
>>> print (bookIndex)
{'function': [2], 'module': [3]}

So now our dictionary contains two key-value pairs and for each key it stores a list with just a single page number. Let’s say we next encounter the keyword “function” a second time, this time on page 5. Our code to add the additional page number to the list stored under the key “function” now needs to look a bit differently because we already have something stored for it in the dictionary and we do not want to overwrite that information. Instead, we retrieve the currently stored list of page numbers and add the new number to it with append(…):

>>> pages = bookIndex["function"] 
>>> pages.append(5)
>>> print (bookIndex)
{'function': [2, 5], 'module': [3]}
>>> print (bookIndex["function"])
[2, 5]

Please note that we didn’t have to put the list of page numbers stored in variable pages back into the dictionary after adding the new page number. Both, variable pages and the dictionary refer to the same list such that appending the number changes both. Our dictionary now contains a list of two page numbers for the key “function” and still a list with just one page number for the key “module”. Surely you can imagine how we would build up a large dictionary for the entire book by continuing this process. Dictionaries can be used in concert with a for loop to go through the keys of the elements in the dictionary. This can be used to print out the content of an entire dictionary:

>>> for k in bookIndex:  # loop through keys of the dictionary
...    print ("keyword: " + k)                # print the key
...    print ("pages: " + str(bookIndex[k]))  # print the value
...
keyword: function
pages: [2, 5]
keyword: module
pages: [3]

When adding the second page number for “function”, we ourselves decided that this needs to be handled differently than when adding the first page number. But how could this be realized in code? We can check whether something is already stored under a key in a dictionary using an if-statement together with the “in” operator:

>>> keyword = "function"
>>> if keyword in bookIndex: 
...    print ("entry exists")
... else:
...    print ("entry does not exist")
... 
entry exists

So assuming we have the current keyword stored in variable word and the corresponding page number stored in variable pageNo, the following piece of code would decide by itself how to add the new page number to the dictionary:

word = "module"
pageNo = 7

if word in bookIndex:
	# entry for word already exists, so we just add page
	pages = bookIndex[word]	
	pages.append(pageNo)
else:
	# no entry for word exists, so we add new entry
	bookIndex[word] = [pageNo] 

A more sophisticated version of this code would also check whether the list of page numbers retrieved in the if-block already contains the new page number to deal with the case that a keyword occurs more than once on the same page. Feel free to think about how this could be included.

Readings

Read Zandbergen section 6.8 for more information and examples using Python dictionaries.

KEEP--4.7 Working with map documents

KEEP--4.7 Working with map documents sdq107

To this point, we've talked about automating geoprocessing tools, updating GIS data, and reading text files. However, we've not covered anything about working with an Esri map document. There are many tasks that can be performed on a map document that are well-suited for automation. These include:

  • finding and replacing text in a map or series of maps; for example, a copyright notice for 2015 becomes 2016;
  • repairing layers that are referencing data sources using the wrong paths; for example, your map was sitting on a computer where all the data was in C:\data and now it is on a computer where all the data is in D:\myfolder\mydata.
  • printing a series of maps or data frames;
  • exporting a series of maps to PDF and joining them to create a "map book";
  • making a series of maps available to others on ArcGIS Server.

Esri map documents are binary files, meaning they can't be easily read and parsed using the techniques we covered earlier in this lesson. Until very recently, the only way to automate anything with a map document was to use ArcObjects, which is somewhat challenging for beginners and requires using a language other than Python. With the release of ArcGIS 10.0, Esri added a Python module for automating common tasks with map documents.

The arcpy.mapping module

arcpy.mapping is a module you can use in your scripts to work with map documents. Please take a detour at this point to read the Esri Introduction to arcpy.mapping.

The most important object in this module is MapDocument. This tells your script which map you'll be working with. You can get a MapDocument by referencing a path, like this:

mxd = arcpy.mapping.MapDocument(r"C:\data\Alabama\UtilityNetwork.mxd")

Notice the use of r in the line above to denote a string literal. In other words, if you include r right before you begin your string, it's safe to use reserved characters like the single backslash \. I've done it here because you'll see it in a lot of the Esri examples with arcpy.mapping.

Instead of directly using a string path, you could alternatively put a variable holding the path. This would be useful if you were iterating through all the map documents in a folder using a loop, or if you previously obtained the path in your script using something like arcpy.GetParameterAsText().

It can be convenient to work with arcpy.mapping in the Python window in ArcMap. In this case, you do not have to put the path to the MXD. There's a special keyword "CURRENT" that you can use to get a reference to the currently-open MXD.

mxd = arcpy.mapping.MapDocument("CURRENT")

Once you get a MapDocument, then you do something with it. Most of the functions in arcpy.mapping take a MapDocument object as a parameter. Let's look at this first script from the Esri help topic linked above and scrutinize what is going on. I've added comments to each line.

# Create a MapDocument object referencing the MXD you want to update
mxd = arcpy.mapping.MapDocument(r"C:\GIS\TownCenter_2015.mxd")

# Loop through each text element in the map document
for textElement in arcpy.mapping.ListLayoutElements(mxd, "TEXT_ELEMENT"):
    
    # Check if the text element contains the out of date text
    if textElement.text == "GIS Services Division 2015":
	    
	# If out of date text is found, replace it with the new text
        textElement.text = "GIS Services Division 2016"
		
# Export the updated map to a PDF
arcpy.mapping.ExportToPDF(mxd, r"C:\GIS\TownCenterUpdate_2016.pdf")

# Clean up the MapDocument object by deleting it
del mxd

The first line in the above example gets a MapDocument object referencing C:\GIS\TownCenter_2014.mxd. The example then employs two functions from arcpy.mapping. The first is ListLayoutElements. Notice that the parameters for this function are a MapDocument and the type of layout element you want to get back, in this case, "TEXT_ELEMENT". (Examine the documentation for List Layout Elements to understand the other types of elements you can get back.)

The function returns a Python list of TextElement objects representing all the text elements in the map document. You know what to do if you want to manipulate every item in a Python list. In this case, the example uses a for loop to check the TextElement.text property of each element. This property is readable and writeable, meaning if you want to set some new text, you can do so by simply using the equals sign assignment operator as in textElement.text = "GIS Services Division 2016"

The ExportToPDF function is very simple in this script. It takes a MapDocument and the path of the output PDF as parameters. If you look at the documentation for ExportToPDF, you'll notice a lot of other optional parameters for exporting PDFs, such as whether to embed fonts, that are just left as defaults in this example.

Learning arcpy.mapping

The best way to learn arcpy.mapping is to try to use it. Because of its simple, "one-line-fix" nature, it's a good place to practice your Python. It's also a good way to get used to the Python window in ArcMap, because you can immediately see the results of your actions.

Although there is no arcpy.mapping component to this lesson's project, you're welcome to use it in your final project. If you've already submitted your final project proposal, you can amend it to use arcpy.mapping by emailing and obtaining approval from the instructors. If you use arcpy.mapping in your final project, you should attempt to incorporate several of the functions or mix it with other Python functionality you've learned, making something more complex than the "one line fix" type of script I mentioned above.

By now, you'll probably have experienced the reality that your code does not always run as expected on the first try. Before you start running arcpy.mapping commands on your production MXDs, I suggest making backup copies.

Here are a few additional places where you can find excellent help on learning arcpy.mapping:

  • Zandbergen chapter 10. I recommend that you at least skim this chapter to see the types of examples that are included.
  • The Arcpy Mapping module book in the ArcGIS Desktop Help
  • Intro to arcpy.mapping technical workshop from the 2015 Esri User Conference

KEEP - OLD - Lesson 4 Practice Exercise B

KEEP - OLD - Lesson 4 Practice Exercise B sdq107

This practice exercise does not do any geoprocessing or GIS, but it will help you get some experience working with functions and dictionaries. The latter will be especially helpful as you work on Project 4.

The objective

You've been given a text file of (completely fabricated) soccer scores from some of the most popular teams in Buenos Aires. Write a script that reads through the scores and prints each team name, followed by the maximum number of goals that team scored in a game, for example:

River: 5
Racing: 4
etc.

Keep in mind that the maximum number of goals scored might have come during a loss.

You are encouraged to use dictionaries to complete this exercise. This is probably the most efficient way to solve the problem. You'll also be able to write at least one function that will cut down on repeated code.

I have purposefully kept this text file short to make things simple to debug. This is an excellent exercise in using the debugger, especially to watch your dictionary as you step through each line of code.

This file is space-delimited, therefore, you must explicitly set up the CSV reader to use a space as the delimiter instead of the default comma. The syntax is as follows:

csvReader = csv.reader(scoresFile, delimiter=" ")

Tips

If you want a challenge, go ahead and start coding. Otherwise, here are some tips that can help you get started:

  • Create variables for all your items of interest, including winner, winnerGoals, loser, and loserGoals and assign them appropriate values based on what you parsed out of the line of text. Because there are often ties in soccer, the term "winner" in this context means the first score given and the term "loser" means the second score given.
  • Review chapter 6.8 on dictionaries in the Zandbergen text. You want to make a dictionary that has a key for each team, and an associated value that represents the team's maximum number of goals. If you looked at the dictionary in the debugger it would look like {'River': '5', 'Racing': '4', etc.}
  • You can write a function that takes in three things: the key (team name), the number of goals, and the dictionary name. This function should then check if the key has an entry in the dictionary. If not, a key should be added and its value set to the current number of goals. If a key is found, you should perform a check to see if the current number of goals is higher than the value associated with that key. If so, you should set a new value. Notice how many "ifs" appear in the preceding sentences.

KEEP--Project 4: Parsing rhinoceros sightings

KEEP--Project 4: Parsing rhinoceros sightings sdq107

In this project, you're working for a wildlife conservation group that is tracking rhinos in the African savannah. Your field workers' software resources and GIS expertise are limited, but you have managed to obtain a CSV spreadsheet showing the positions of several rhinos over time. Each record in the spreadsheet shows the latitude/longitude coordinate of a rhino along with the rhino's name (these rhinos are well known to your field workers).

Your task is to write a script that will turn the readings in the spreadsheet into a vector dataset that you can place on a map. This will be a polyline dataset showing the tracks the rhinos followed over the time the data was collected. You are required to use the Python csv module to parse the text and arcpy geometries to write the polylines.

Please carefully read all the following instructions before beginning the project. You are not required to use functions in this project but you can gain over & above points by breaking out repetitive code into functions.

Deliverables

This project has the following deliverables:

  1. Your plan of attack for this programming problem, written in pseudocode in any text editor. This should consist only of short, focused steps describing what you are going to do to solve the problem. This is a separate deliverable from your customary project writeup.
  2. A Python script that reads the data from the spreadsheet and creates, from scratch, a polyline shapefile with n polylines, n being the number of rhinos in the spreadsheet. Each polyline should represent a rhino's track chronologically from the beginning of the spreadsheet to the end of the spreadsheet. Each polyline should also have a text attribute containing the rhino's name. The shapefile should use the WGS 1984 geographic coordinate system.
  3. A short writeup (~300 words) explaining what you learned during this project and which requirements you met, or failed to meet. Also describe any "over and above" efforts here so that the graders can look for them.

Successful delivery of the above requirements is sufficient to earn 90% on the project. The remaining 10% is reserved for efforts that go "over and above" the minimum requirements. This could include (but is not limited to) a batch file that could be used to automate the script, creation of the feature class in a file geodatabase instead of a shapefile, or the breaking out of repetitive code into functions and/or modules.

Challenges

You may already see several immediate challenges in this task:

  • The rhinos in the spreadsheet appear in no guaranteed order, and not all the rhinos appear at the beginning of the spreadsheet. As you parse each line, you must determine which rhino the reading belongs to and update that rhino's polyline track accordingly. You are not allowed to sort the Rhino column in a spreadsheet program before you write the script. Your script must be "smart" enough to work with an unsorted spreadsheet in the order that the records appear.
  • You do not immediately know how many rhinos are in the file or even what their names are. Although you could visually comb the spreadsheet for this information and hard-code each rhino's name, your script is required to handle all the rhino names programmatically. The idea is that you should be able to run this script on a different file, possibly containing more rhinos, without having to make many manual adjustments.
  • You have not previously created a feature class programmatically. You must find and run ArcGIS geoprocessing tools that will create an empty polyline shapefile with a text field for storing the rhino's name. You must also assign the WGS 1984 geographic coordinate system as the spatial reference for this shapefile.

Hints

  • Before you start writing code, write a plan of attack describing the logic your script will use to accomplish this task. Break up the original task into small, focused chunks. You can write this in Word or even Notepad. Your objective is not to write fancy prose, but rather short, terse statements of what your code will do: in other words, pseudocode. Here's an example of some pseudocode that might appear in your file:

    . . .
    Read the next line.
    Pull out the pieces of info needed from the line (lat, lon, name)
    Determine if the dictionary has a key for the rhino name.
    If no key exists, create a new array object.
    Create a new point object.
    Assign the lon reading to the X coordinate of the point.
    Assign the Y reading to the lat coordinate of the point.
    Add the point to the array.
    Add the array to the dictionary using the rhino name as the key.
    . . .

    If you do a good job writing your pseudocode, you'll find that each line translates into about one line of code. Writing your script then becomes a matter of translating from English to code. You may also find it helpful to sketch out a diagram of the workflow and logistical branches in your script.
  • You will have a much easier time with this assignment if you first create the array objects representing each rhino track, then use insert cursors to add the arrays once they are completed. Not only is this easier to code, it's better for performance to open the insert cursor only once near the end of the script.
  • A Python dictionary is an excellent structure for storing a rhino name coupled with the rhino's array of observed locations. A dictionary is similar to a list, but it stores items in key-value pairs. For example, a key could be a string representing the rhino name, and that key's corresponding value could be an ArcGIS Array object containing all the points where the rhino was observed. You can retrieve any value based on its key, and you can also check whether a key exists using a simple if key in dictionary: check.

    We have not worked with dictionaries much in this course, but your Zandbergen text has an excellent section about them and there are abundant Python dictionary examples on the Internet.

    You can alternatively use lists to keep track of the information, but this will probably take more code. Using dictionaries, I was able to write this script in under 65 lines (including comments and whitespace). If you find yourself getting confused or writing a lot of code with lists, you may try to switch to dictionaries.
  • To create your shapefile programmatically, use the CreateFeatureClass tool. The ArcGIS Desktop Help has several examples of how to use this tool. If you can't figure this part out, I suggest you create the feature class manually and work on writing the rest of the script. You can then return to this part at the end, if you have time.
  • In order to get the shapefile in WGS 1984, you'll need to create a spatial reference object that you can assign to the shapefile at the time you create it. I recommend using the arcpy.SpatialReference() method. Be warned that if you do not correctly apply the spatial reference, your polyline precision could be diluted.

If you do things right, your polylines should look like this (points are included only for reference):

Final rhino tracks

Note:

Although I have placed the data in an African context (who heard of rhinos wandering New York City?), it is completely fabricated and does not resemble the path of any actual rhino, living or dead. If you exhibit a stellar performance on this project, you may choose the option of having a rhino named after you in a future offering of this course!

KEEP - OLD - Final Project and Review Quiz

KEEP - OLD - Final Project and Review Quiz sdq107

By this time, you should have submitted your Final Project proposal and received a response from one of your instructors. You have the final two weeks of the course to work on your individual projects. Please submit your agreed-upon deliverables to the Final Project Drop Box by the course end date on the Canvas calendar.

Just as with all the other projects, your Final Project submission should include a writeup discussing how you approached the problem and what your learned. You don't have to rehash your proposal, but you should include a short summary of what your script or tool does, similar to what you would provide to a colleague if you were to share this in the future.

Your writeup should also include a set of numbered steps that graders can follow in order to evaluate your project. If your script or tool requires entering parameters, please provide sample values that the graders could supply for each parameter. If the graders cannot figure out how to run your project, they may deduct functionality points.

If you are providing any sample dataset larger than about 20 MB, please keep in touch with your grader to ensure that he or she can successfully get the data. You might want to break up the dataset and e-mail it to your grader in chunks. Ideally you can clip your dataset so that it does not take so much space.

In addition to the Final Project, there is a Review Quiz that you can take at any time between now and the end of the course. The Review Quiz is comprehensive, contains 20 questions, and is worth 10% of your grade for the course.

Please keep in touch with your instructors and fellow students during this time period. I encourage you to help each other in the course forums. It may be that others are encountering the same challenges that you are while working through their projects.

This course has been a pleasure, and I wish you the best in your future Python and programming endeavors!