Lesson 3: Building a Postgres/PostGIS Database

Lesson 3: Building a Postgres/PostGIS Database jls27

Overview

Overview jls27

Now that you have a solid handle on the basics of relational database design and query writing, we're ready to dive into spatial database technology. Over the next two lessons, we'll experiment with the open-source RDBMS PostgreSQL (pronounced pōst-grɛs kyü'-ɛl ) and its spatial extension PostGIS (pronounced pōst-jis). This software combination is quite popular for those looking for an alternative to vendor solutions that are often more costly than their organization can afford.

Unlike MS-Access, which is intended for relatively small projects, Postgres is a full-fledged enterprise RDBMS more akin to the leading vendor products (e.g., Oracle and SQL Server). Though there are certainly differences between Postgres and Access, you should find that the concepts you learned earlier in the course will transfer over to this new environment.

After orienting you to working with Postgres, we'll get into the spatial functionality provided by PostGIS.

Objectives

At the successful completion of this lesson, students should be able to:

  • create a new database schema in Postgres;
  • use the Postgres shapefile loader plugin to import data in Esri's shapefile format;
  • write and execute queries in Postgres;
  • describe the spatial data types built into PostGIS;
  • translate features in textual format into PostGIS geometries;
  • imagine scenarios for storing multiple geometry columns or multiple geometry types in the same PostGIS table;
  • view PostGIS tables as layers in QGIS;
  • use QGIS to conduct basic GIS operations (such as creating a simple map).

Questions?

If you have any questions now or at any point during this week, please feel free to post them to the Lesson 3 Discussion Forum.

Checklist

Checklist jls27

Lesson 3 is one week in length. See the Canvas Calendar for specific due dates. To finish this lesson, you must complete the activities listed below:

  1. If you haven't already done so, please download and install Postgres, PostGIS, and QGIS (an open-source desktop GIS package we'll use to view our PostGIS data):
    1. Download and install Postgres 17.x, accepting all of the default settings. Go with the 64-bit version unless your computer does not support it. When running the installation, you will need to have access to the Internet.
      1. After Postgres is installed, you'll be asked if you want to launch the Stack Builder, a separate package that allows for the installation of add-ons to Postgres. Check the box for Stack Builder may be used... and in the next window, choose PostgreSQL 17 on Port 5432 as the installation you are installing software for.
      2. Under the category of Spatial Extensions, choose the PostGIS 3.x Bundle... Installing this add-on will enable you to execute the spatial queries covered in Lessons 3-4.
      3. Make your way through the installation, accepting the defaults. When the Choose Components dialog appears, check the Create spatial database box.
      4. For the Database Connection, leave the User Name set to postgres and Port set to 5432. Set the Password to postgres, same as the User Name, making it easy to remember.
      5. For the Database Name, leave it set to the default of postgis_3x_sample.
    2. Installing PostGIS if Postgres is already installed - A separate installer file is available for the version that accompanies the 64-bit version of Postgres. If for some reason you do not use the Stack Builder noted above, you can use this installer. Locating the installer at the postgis.net site is a bit tricky, so here is a direct link.
    3. Install QGIS - click the OSGeo4W Network Installer link and run the downloaded setup file to install QGIS. When prompted, choose the Express Install option.
  2. In addition to these software downloads, please also download and unzip the data needed for Lesson 3.
  3. Work through Lesson 3.
  4. Complete Project 3, and upload its deliverables to the Project 3 Dropbox.
  5. Complete the Lesson 3 Quiz.

Getting Started with PostGIS

Getting Started with PostGIS jls27

In this first part of the lesson, you'll get an introduction to Postgres's graphical interface called pgAdmin. You'll also import a shapefile, load data from a text file, and see how queries are performed in pgAdmin.

  1. Create a new schema
    1. Open pgAdmin 4. The application should open with a pane on the left side of the window labeled Object Explorer. Beneath that, you should see a tree with Servers at the top. If you expand the Servers node, you should see a Postgres 17 server.
    2. Double-click on that server to open a connection to it. You will be logging in with the default user name of postgres.
      Enter the password you defined earlier for the postgres account when you installed the software. You should now see 3 nodes beneath the localhost server: Databases, Login/Group Roles, and Tablespaces.
    3. Expand the Databases list. You should see at least one "starter" database: postgres. It was created when you installed Postgres.

      We want to create a new database that we'll use while exploring PostGIS (Postgres's spatial extension) functionality.
       
    4. Right-click on the Databases list and choose Create > Database.
    5. In the Create - Database dialog, set the Database to Lesson3db, and from the Owner list, select the postgres user name.

      Hit Save.
       
    6. Now, click on the Lesson3db database to expand its list of contents.

      Right-click on Extensions, and select Create > Extension.
       
    7. In the Create Extension dialog under the General tab, choose postgis from the Name dropdown.

      In the same dialog, select the Definition tab, and set the Version to 3.5.x.
      (The settings you just established are reflected under the SQL tab.)

      Click Save to dismiss the Create Extension dialog.

      Next, let's talk about schemas. In Postgres, schemas are the containers for a set of related tables. Generally speaking when you begin a new project, you'll want to create a new schema.
       
    8. Expand the Schemas list. At this point, you should see only one schema: public. We'll have a look at the public schema soon; but for now, let's create a new schema. This schema will store data for the United States that we will use in the next two lessons.
    9. Right-click on Schemas, and select Create > Schema.
    10. In the Create Schema dialog, specify a name of usa.
    11. Set the Owner of the schema to postgres.
    12. Click Save to create the schema.
  2. Load data from a shapefile

    A common workflow for PostGIS users is to convert their data from Esri shapefile format to PostGIS tables. Fortunately, some PostGIS developers have created a Shapefile Import/Export Manager that makes this conversion easy. In prior versions of pgAdmin, the shapefile importer was accessible as a plug-in. In pgAdmin 4, it must be run as a separate application.

    1. In Windows File Explorer, browse to the following folder:
      C:\Program Files\PostgreSQL\17\bin\postgisgui
    2. Run the executable shp2pgsql-gui.exe.

      Note

      If you encounter an error that the file "libintl-9.dll is missing," the easiest fix for this problem is to navigate up to the bin folder where libintl-9.dll is found, copy it, and paste it into the postgisgui folder.
      Since we will be using this executable several times, I suggest that you make a desktop shortcut for it.

    3. At the top of the application window, click the View connection details button.
    4. Confirm that the PostGIS Connection parameters are set as follows:

      Username: postgres
      Password: <the password you set when installing the software>
      Server Host: localhost (port 5432)
      Database: Lesson3db

      If all goes well, you should see a 'Connection succeeded' message in the Log Window at the bottom of the app.
       
    5. In the middle of the dialog, click the Add File button and navigate to the location of your Lesson 3 data.
    6. Select states.shp and click Open.
    7. In the Import List section of the dialog (above the Add File button) supply the following settings by clicking on the current values under each heading:

      Schema: usa
      Table: states
      Geo Column: geom
      SRID: 4269 (After setting this last value, be sure to click away from the row of specifications or hit Enter; otherwise, the last value changed may revert to its original value of 0.)

      Before performing the import, let's spend a moment discussing the SRID (Spatial Reference IDentification) setting. This ID is set to 0 by default, a value that indicates the spatial reference of the shapefile is unknown. As with other GIS applications, defining a dataset's spatial reference is critical in enabling most of the functionality we typically need. We'll talk more about SRIDs later. For now, it's sufficient to know that 4269 is the ID associated with the decimal degree/NAD83 coordinate system used by the Lesson 3 data.
       
    8. Click Import. After just a moment, the Log Window area of the dialog should report that the import process has been completed.
    9. Close the Import/Export Manager application by clicking the X button in the upper right of the dialog or on the Cancel button.
    10. Back in pgAdmin, expand the object list associated with the usa schema.
    11. Click on Tables. You should now see the newly imported states table.

      Note

      It's sometimes necessary to refresh the GUI after creating new objects like this. This can be done by right-clicking on the schema or Tables node in the Browser and selecting Refresh (or hitting F5 on the keyboard).

    12. Right-click on the states table and select View/Edit Data > First 100 Rows. Note the other options in this context menu, which are rather straightforward to understand.

      While looking at the table, note that the column headers include not just the names of the columns but also their data types. The gid column is an auto-incrementing integer column that was added by the importer. The presence of [PK] in its header indicates that it was also designated as the table's primary key.

      Also note that the geom column header contains a "map" icon.  Clicking it allows you to see the table's geometries on a simple map.
       
    13. Repeat these steps to import the cities shapefile. Also note for future reference that although we didn't do so here, you may import multiple shapefiles at a time.
    14. Click Cancel to close the Shapefile Import/Export Manager.
  3. Create a new table

    Loading data from a comma-delimited text file is a common workflow for database developers. Let's see how this can be done in Postgres by loading some state demographic info from the 2020 Census. We'll begin by creating a new blank table.
    1. To create a new table, right-click on Tables under the usa schema and select Create > Table.
    2. Under the General tab, set the table's Name to census2020 and the Owner to postgres.
    3. Under the Columns tab, click the + button. You should see an entry appear for setting the new column's properties.
    4. Set the column's Name to state, its Data type to character varying and its Length to 50. Finally, toggle the switch under the Primary key? heading to on.
    5. Repeat the last two steps to add a column with the Name total and Data type of integer. The length property need not be set for the integer type and the column should not be defined as the primary key.
    6. Add the following additional columns to the table, all as integer data type.

      Note

      Instead of having to expand the Data Type pick list, you can start typing the word integer, and the slot will let you autofill with choices. After you type inte, you can pick "integer." Be sure to add the columns in this order; otherwise, the data load will not work properly.

    7. Table 3.1: Additional Integer columns to be added to your usa.census2020 table.
      Integer Columns
      male
      female
      white
      black
      amind
      asian
      hawaiian
      other
      multi_race
    8. Click Save to finish creating the table.
  4. Load data using the COPY command

    Before executing the command that will import the data into the table, let's have a look at the data file in a plain text editor and also note its location.

    1. In Windows Explorer, right-click on the census2020.csv file and select Open With > Notepad. Note that the values in the file are comma-delimited and that the file includes a header row. Close the file when finished examining it.
    2. Determine the full path name of the location of your census2020.csv file. The format that you will need to use is as follows: with the drive letter, followed by the folder names, and finally the file name. Something like C:\Users\Smith\Documents\Lesson3data\census2020.csv
    3. In pgAdmin, select Tools > Query Tool. The Query Tool dialog is built such that the SQL code is entered at the top and the query's output appears at the bottom.
    4. Enter the following command, changing the path to reflect the census2020.csv file location on your machine.

      COPY usa.census2020 FROM 'C:\PSU\Geog868\Lesson3data\census2020.csv'
      WITH (FORMAT csv, HEADER True);

      Note

      If you encounter a "permission denied" error, it means the "postgres" database login doesn't have permission to read the csv file where it is currently located. Try copying it to a sub-directory belonging to the "Public" user (e.g., 'C:\Users\Public\Public Documents') or to a location that has no permission restrictions (e.g., 'C:\temp'). You could also reset the permissions on the folder that stores the CSV file, as outlined in this stackoverflow thread.

      Let's look for a moment at the options set in the WITH clause. If our input file were tab-delimited, we would use a FORMAT setting of text rather than csv. We set the HEADER option to True since our file contains a header row. A number of other options are available and can be found on the COPY command's page in the documentation. One option you may need to set is DELIMITER. This defaults to the comma for csv files and to tab for text files. If your file uses another delimiter, such as the pipe character (|), you can indicate that using the DELIMITER option. Another option to note is QUOTE. Delimited text files often have text strings embedded within quotation marks. This is done to explicitly differentiate those values from numeric or Boolean values. Typically, this quoting is done with the double-quote character. When Postgres copies the data from the text file to your table, it omits the quoting character. For example, "SMITH" in the text file becomes SMITH in the table. By default, Postgres assumes that such quoting will be done with the double quote. However, if your file's strings are instead denoted by a single quote or some other character, you can specify that by setting the QUOTE option to the appropriate character.

      The COPY command attempts to insert values from the first column of the input file into the first column of the table, values from the second column of the input file into the second column of the table, etc. The HEADER option simply tells Postgres to skip the first line; it does not tell it to read the column headers and intelligently match the columns of the input file to the columns of the table. If your table happens to have more columns than the input file, and/or the columns are in a different order, you can deal with this by supplying to the COPY command a list of column names that matches the input file after the table name. For example:

      COPY usa.census2020 (state, total, male....) FROM ....
    5. Execute the command by clicking the Execute button ("Play" icon) just above the SQL code box. You should receive a message that the query returned successfully with 52 rows affected.
    6. Confirm that the data loaded properly using the method you used for the states table.
  5. Write queries in pgAdmin

    1. Click on Tools > Query Tool again to open a new query tab.
    2. In the SQL box, enter the following query to identify the states where most of the population uses the term 'Soda' when referring to soft drinks:

      SELECT name, sub_region
      FROM states
      WHERE sub_region = 'Soda';
    3. Run the query by clicking the Execute button on the toolbar. You should receive an error message that the relation "states" does not exist.
    4. The reason for this error has to do with pgAdmin's search path. Among other things, the search path determines which schema(s) will be scanned when tables are specified using unqualified names (e.g., like we just did with "states"). There are two solutions to this problem. The first is to qualify all table names with their parent schema. For example:

      SELECT name, sub_region
      FROM usa.states
      WHERE sub_region = 'Soda';

      The second solution is to reset pgAdmin's search path so that the schema you're using is part of that path. By default, pgAdmin searches only the public schema. We will take this second approach since it allows us to omit the schema qualifier.

    5. So, highlight the text of your query and cut it out of the editor window. We'll be pasting it back in momentarily.
    6. Enter the following statement into the SQL Editor:

      SET search_path TO usa, public;
    7. Run this query by clicking the Execute button. You should receive a message that the query returned successfully, though you should expect no tabular output from a query like this. pgAdmin will now look for unqualified tables first in the usa schema, then in the public schema. We include the public schema because the search path is used not just for searching for tables but also for functions. When we move on to spatial queries, we'll need to have access to some of the functions available in the public schema.

      If you are curious, you can run the following query to find out what the search path is set to.

      SHOW search_path;
    8. You can now retry the query that you cut out. You should see a list of 17 states in the Output pane at the bottom of the window.

      Note

      Suppose the table was named States rather than states. pgAdmin converts all table/column names to lower-case prior to execution by default. Thus, if you had written your FROM clause to read "FROM States", it would be evaluated as "FROM states" and pgAdmin would not find a matching table. To override this case conversion, you can put the table/column name in double quotes like this:

      SELECT name, sub_region
      FROM "States"
      WHERE sub_region = 'Soda';

      To avoid having to qualify your table/column names in this way, it's best to use lower case in your naming.

    9. While on the subject of case, you may have noticed that my examples place all SQL keywords in upper case and table/column names in lower case. This is a convention that is followed by many SQL developers because it makes it easy to tell at a glance which parts of the statement are SQL keywords and which are schema elements. This is just a convention, not a requirement, so you should feel free to deviate from it if you prefer. For example, this query will produce the same results:

      select name, sub_region
      from states
      where sub_region = 'Soda';

    Now that you have a feel for how Postgres works, go on to the next page to practice writing queries in pgAdmin.

 

Query-Writing Practice Exercises

Query-Writing Practice Exercises jls27

To help you get oriented to writing SQL queries on the pgAdmin command line, try your hand at the following exercises. Recall that the 2018 population data, soft drink data, and geometries are in the states table, and that the 2020 data are in the census2020 table.

  1. Select the states with a 2018 population over 10 million.
  2. Select the state capitals.
  3. Select the states whose names begin with the word "New".
  4. Select the cities whose names contain the letter "z".
  5. Sort the states by their 2018 population from high to low.
  6. Sort the states first by soft drink name, then by state name.
  7. Select the states with a 2018 population over 10 million and where the majority of the population refers to soft drinks as pop.
  8. Select cities in the states of NY, NJ, and PA (using the stateabb column).
  9. For each state, compute the percentage of the 2020 population that is white. Give this output column an alias of pctwhite. Besides pctwhite, include only the name of the state in the output. Note: the columns involved in this calculation are defined as integers, which means the resulting value will be rounded to the nearest integer (0). To avoid this rounding and obtain the desired percentages, add '::double precision' after the white column in the calculation. This will convert the integer values to double precision values prior to the calculation. It is only necessary to perform this conversion for one of the columns involved in the calculation.
  10. Sum the 2018 state populations across the soft drink categories (i.e., What is the population of the 'soda' states? Of the 'pop' states? Of the 'coke' states?).
  11. Bring together data from the states and census2020 tables, outputting the name from the states table, total population from the census2020 table, and geom from the states table.
  12. Calculate the average 2020 male population across the soft drink categories.

Lesson 3 Practice Exercise Solutions (This link takes you to the end of Lesson 3. Be careful not to skip the pages that precede it.)

Introduction to Spatial Select Queries

Introduction to Spatial Select Queries jls27

What sets spatial databases apart from their non-spatial counterparts is their support for answering geometric and topological questions. Let's have a look at some simple examples to demonstrate. We'll continue working with the states table we created in the last section.

  1. Return to the Query dialog in pgAdmin and execute the following query:

    SELECT name, ST_Centroid(geom) AS centroid
    FROM states
    WHERE sub_region = 'Soda';

    The obvious difference between this and our earlier queries is that it calls upon a function called ST_Centroid(). Like the functions we worked with in Lesson 1, the ST_Centroid() function accepts inputs and returns outputs. Here, we supply the geom column as an input to the function, and it returns the geometric centers of the shapes stored in that column.

    You've probably noticed that the output from ST_Centroid() is not human friendly. It contains the coordinates of a point in the coordinate system of the input column, but expressed in hexadecimal notation. To display the coordinate values in a more readable form, we can nest the call to the ST_Centroid() function within a call to a function named ST_AsText().

  2. Modify the query as follows, then execute:

    SELECT name, ST_AsText(ST_Centroid(geom)) AS centroid
    FROM states
    WHERE sub_region = 'Soda';

    In this version of the query, the hexadecimal value returned by ST_Centroid() is immediately passed to the ST_AsText() function, which returns a value formatted for human consumption (with longitude listed first, latitude second).

    Now, let's try retrieving the areas of the states using ST_Area().

  3. Modify the query as follows, then execute:

    SELECT name, ST_AsText(ST_Centroid(geom)) AS centroid, ST_Area(geom) AS area
    FROM states
    WHERE sub_region = 'Soda';

    In the area column, take note of the values returned by ST_Area(). They are in the units of the input geometry, squared. Recall that the Lesson 3 shapefiles are in latitude/longitude coordinates, which means the area values we're seeing are in square degrees. Hopefully, you recognize that this is a poor way to compute area, since a square degree represents a different area depending on the part of the globe you're dealing with. The ST_Transform() function exists exactly for situations like this. It takes a geometry (in whatever spatial reference) as input and re-projects it into some other spatial reference.

  4. Modify the query as follows, then execute:

    SELECT name, ST_AsText(ST_Centroid(geom)) AS centroid, ST_Area(ST_Transform(geom,2163)) AS area
    FROM states
    WHERE sub_region = 'Soda';

    Take note of the values now displayed in the area column. In this version of the query, the ST_Transform() function is first used to re-project the geometry into the spatial reference 2163 before ST_Area() is called. That spatial reference is an equal-area projection in meters that is suitable for the continental U.S. Don't worry, we'll discuss how you'd find that information later in this lesson.

    Note: ST_Transform() re-projects input geometries in memory only, the input geometries stored in the table remain in the same spatial reference they had before.

We'll spend much more time discussing the spatial functions that are available in PostGIS later. Right now, let's go over the geometry types that are supported.

PostGIS Geometry Types

PostGIS Geometry Types jls27

In the last section, we worked with a table – usa.states – containing geometries of the type POLYGON. The other basic geometry types are POINT and LINESTRING. As we'll see momentarily, there are numerous other geometry types available in PostGIS that allow for the storage of multipart shapes, 3-dimensional shapes, and shapes that have a measure (or M value) associated with its vertices. If keeping all of the various types straight becomes difficult, it may help to remember that the simple geometries we deal with most often are POINT, LINESTRING, and POLYGON.

To demonstrate some of the concepts in this section, we're going to create a new schema to store points of interest in New York City. Unlike the last schema where we used the Shapefile Import/Export Manager to both create and populate a table at the same time, here we'll carry out those steps separately.

A. Create a new empty spatial table

  1. In the pgAdmin Object Explorer, right-click on the Schemas node beneath the Lesson3db database and select Create > Schema.
  2. Set the Name of the schema to nyc_poi and its Owner to postgres, then click Save to create the schema.
  3. Expand the object listing associated with the nyc_poi schema.
  4. Right-click on the Tables node, and select Create > Table.
  5. Set the table's Name to pts and its Owner to postgres.
  6. Under the Columns tab, click the + button.
  7. For the new column, set the Name to gid (geometry ID) and the Data type to serial. This data type is roughly equivalent to the AutoNumber type we saw in Access. Define this column as the table's Primary key.
  8. Repeat the previous step to create a column called name. Set its Data type to character varying and its Length to 50. This column should not be the Primary key.

    The last column we want to add to the table is one that will hold the geometries. While it's possible to add a column of type 'point' through the GUI, there are a number of other important settings that should be made when adding a geometry column (such as its spatial reference ID, or SRID). These settings are all handled by a PostGIS maintenance function called AddGeometryColumn(), so that is the route we will take.
  9. Click Save to dismiss the dialog and create the table. Before adding the geometry column to the table, let's recall the pgAdmin search path. It's not set to include the nyc_poi schema, so let's do that first.
  10. Reset the search path by executing the following statement in a Query window:

    SET search_path TO nyc_poi, public;
  11. Now add a geometry column called geom to the table by executing this statement. (You can re-use the Query window you already have open for this and subsequent queries.)

    SELECT AddGeometryColumn('nyc_poi','pts','geom',4269,'POINT',2);

    First, let's address the unusual syntax of this statement. You've no doubt grown accustomed to listing column names (or *) in the SELECT clause, but here we're plugging in a function without any columns. We're forced to use this awkward syntax because SQL rules don't allow for invoking functions directly. Function calls must be made in one of the statement types we've encountered so far (SELECT, INSERT, UPDATE, or DELETE). In this situation, a SELECT statement is the most appropriate.

    The arguments to the function in this statement are, in order: the schema name, the table name, the name to be given to the geometry column, its spatial reference ID, the type of geometry it will store, and the dimension of the coordinates it will hold.

    Before moving on, an explanation of this dimension parameter is in order. In most cases, you're likely to be storing just X and Y coordinates. When that's the case, you should assign a dimension value of 2, as we just did. However, it is also possible that you want to store the elevation (Z value) of the points. In that case, you assign a dimension value of 3. As mentioned above, it is also possible to store some type of measure (M value) with each point (e.g., the time the point was recorded with a GPS unit). In that scenario, you would also assign a dimension value of 3. You would differentiate that XYM type of point from an XYZ point by setting the geometry type to POINTM instead of POINT. Finally, it is possible to store all four values (X, Y, Z and M). In that situation, you would assign a dimension value of 4.

B. Add rows to the spatial table

We're about to add rows to our pts table through a series of INSERT statements. You'll find it much easier to copy and paste these statements rather than typing them manually, if not now, then certainly when we insert polygons later using long strings of coordinates.

  1. Execute the following statement to insert a row into the pts table.

    INSERT INTO pts (name, geom)
    VALUES ('Empire State Building', ST_GeomFromText('POINT(-73.985744 40.748549)',4269));

    The key point to take away from this statement (no pun intended) is the call to the ST_GeomFromText() function. This function converts a geometry supplied in text format to the hexadecimal form that PostGIS geometries are stored in. The other argument is the spatial reference of the geometry. This argument is required in this case because when we created the geom column using AddGeometryColumn(), it added a constraint that values in that column must be in a particular spatial reference (which we specified as 4269).

  2. Execute the statements below to add a couple more rows to the table. Note that while we've executed single statements thus far in the lesson, you are also allowed to execute multiple statements in succession.

    INSERT INTO pts (name, geom)
    VALUES ('Statue of Liberty', ST_GeomFromText('POINT(-74.044508 40.689229)',4269));
    
    INSERT INTO pts (name, geom)
    VALUES ('World Trade Center', ST_GeomFromText('POINT(-74.013371 40.711549)',4269));
  3. Next, execute the statement below, noting the use of the ST_MakePoint() function nested within the ST_SetSRID() function. In many situations, PostGIS offers multiple ways to accomplish the same task. In this case, the inner function (ST_MakePoint) is executed first, creating a POINT geometry, which then serves as an input to the outer function (ST_SetSRID), which sets the SRID of the POINT.

    INSERT INTO pts (name, geom)
    VALUES ('Grand Central Station', ST_SetSRID(ST_MakePoint(-73.976522, 40.7528),4269));
  4. Finally, add two more rows using the statement below. Note that in this step you're adding multiple rows using a single statement.

    INSERT INTO pts (name, geom)
    VALUES ('Radio City Music Hall', ST_GeomFromText('POINT(-73.97988 40.760171)',4269)),
    ('Madison Square Garden', ST_GeomFromText('POINT(-73.993544 40.750541)',4269));
  5. In the pgAdmin window, right-click on the pts table and select View/Edit Data > All Rows to confirm that the INSERT statements executed properly.

C. Create and populate a table of linestrings

  1. Repeat the steps (in Part A above) to create a new table within the nyc_poi schema, that will hold NYC line features. Pay particular attention to these differences:
    • Give the table a name of lines.
    • The table should have the same column definitions, with the exception that the geometry type should be set to LINESTRING rather than POINT.
    • No need to set the search path again, as it will already include the nyc_poi schema.
  2. Execute the following statement to insert 3 new rows into the lines table:

    INSERT INTO lines (name, geom)
    VALUES ('Holland Tunnel',ST_GeomFromText('LINESTRING(
    -74.036486 40.730121,
    -74.03125 40.72882,
    -74.011123 40.725958)',4269)),
    ('Lincoln Tunnel',ST_GeomFromText('LINESTRING(
    -74.019921 40.767119,
    -74.002841 40.759773)',4269)),
    ('Brooklyn Bridge',ST_GeomFromText('LINESTRING(
    -73.99945 40.708231,
    -73.9937 40.703676)',4269));

    Note that I've split this statement across several lines to improve its readability, not for any syntax reasons. You should feel welcome to format your statements however you see fit.

    As you can see, the syntax for constructing linestrings is similar to that of points. The difference is that instead of supplying just one X/Y (lon/lat in this case) pair, you supply however many pairs are needed to delineate the feature, with the pairs being connected sequentially by straight line segments. The longitude (X) value comes first and is separated from the latitude (Y) by a space. The lon/lat pairs are in turn separated by commas. For simplicity's sake, I gave you the coordinates of some very simple straight lines. In a real-world situation, you would likely need many lon/lat pairs for each line feature, the number depending on the curviness of the feature.

D. Create and populate a table of polygons

  1. Repeat the steps (in Part A) to create a new table within the nyc_poi schema, with the following exceptions:
    • Give the table a name of polys.
    • Set the geometry type of the geom column to POLYGON rather than LINESTRING.
  2. Execute the following statement to add a row to your polys table:

    INSERT INTO polys (name, geom)
    VALUES ('Central Park',ST_GeomFromText('POLYGON((
    -73.973057 40.764356,
    -73.981898 40.768094,
    -73.958209 40.800621,
    -73.949282 40.796853,
    -73.973057 40.764356))',4269));

    While the syntax for constructing a polygon looks very similar to that of a linestring, there are two important differences:

    • The first X/Y (lon/lat) pair should be the same as the last (to close the polygon).
    • Note that the coordinate list is enclosed in an additional set of parentheses. This set of parentheses is required because polygons are actually composed of potentially multiple rings. Every polygon has a ring that defines its exterior. Some polygons also have additional rings that define holes in the interior. When constructing a polygon with holes, the exterior ring is supplied first, followed by the interior rings. Each ring is enclosed in a set of parentheses, and the rings are separated by commas.
      To see an example, let's add Central Park again, this time cutting out the large reservoir near its center.
  3. First, let's remove the original Central Park row. In pgAdmin, right-click on the polys table and select Truncate > Truncate. Note that this deletes all rows from the table.
  4. Add Central Park (minus the reservoir) back into the table using this statement:

    INSERT INTO polys (name, geom)
    VALUES ('Central Park',ST_GeomFromText('POLYGON((
    -73.973057 40.764356,
    -73.981898 40.768094,
    -73.958209 40.800621,
    -73.949282 40.796853,
    -73.973057 40.764356),
    (-73.966681 40.785221,
    -73.966058 40.787674,
    -73.965586 40.788064,
    -73.9649 40.788291,
    -73.963913 40.788194,
    -73.963333 40.788291,
    -73.962539 40.788259,
    -73.962153 40.788389,
    -73.96181 40.788714,
    -73.961359 40.788909,
    -73.960887 40.788925,
    -73.959986 40.788649,
    -73.959492 40.788649,
    -73.958913 40.78873,
    -73.958269 40.788974,
    -73.957797 40.788844,
    -73.957497 40.788568,
    -73.957497 40.788259,
    -73.957776 40.787739,
    -73.95784 40.787057,
    -73.957819 40.786569,
    -73.960801 40.782394,
    -73.961145 40.78215,
    -73.961638 40.782036,
    -73.962518 40.782199,
    -73.963076 40.78267,
    -73.963677 40.783661,
    -73.965694 40.784457,
    -73.966681 40.785221)
    )',4269));

E. 3- and 4-dimensional geometries

Earlier in this section, we discussed 3-dimensional (XYZ and XYM) and 4-dimensional (XYZM) geometries in the context of properly specifying the dimension argument to the AddGeometryColumn() function. We won't be doing so in this course, but let's look for a moment at the syntax used for creating these geometries.

To define a column that can store M values as part of the geometry, use the POINTM, LINESTRINGM, and POLYGONM data types. When specifying objects of these types, the M value should appear last. For example, an M value of 9999 is attached to each coordinate in these features from our nyc_poi schema:

POINTM(-73.985744 40.748549 9999)

LINESTRINGM(-74.019921 40.767119 9999, -74.002841 40.759773 9999)

POLYGONM((-73.973057 40.764356 9999,
-73.981898 40.768094 9999,
-73.958209 40.800621 9999,
-73.949282 40.796853 9999,
-73.973057 40.764356 9999)

Perhaps the most common usage of M coordinates is in linear referencing (e.g., to store the distance from the start of a road, power line, pipeline, etc.). This Wikipedia article on Linear Referencing provides a good starting point if you're interested in learning more.

To define a column capable of storing Z values along with X and Y, use the "plain" POINT, LINESTRING and POLYGON data types rather than their "M" counterparts. The syntax for specifying an XYZ coordinate is the same as that for an XYM coordinate. The "plain" data type name tells PostGIS that the third coordinate is a Z value rather than an M value. For example, we could include sea level elevation in the coordinates for the Empire State Building (in feet):

POINT(-73.985744 40.748549 190).

Finally, in the event you want to store both Z and M values, again use the "plain" POINT, LINESTRING and POLYGON data types. The Z value should be listed third and the M value last. For example:

POINT(-73.985744 40.748549 190 9999)

F. Multipart geometries

PostGIS provides support for features with multiple parts through the MULTIPOINT, MULTILINESTRING, and MULTIPOLYGON data types. A classic example of multipart geometry is the state of Hawaii, which is composed of multiple disconnected islands. The syntax for specifying a MULTIPOLYGON builds upon the rules for a regular POLYGON; the parts are separated by commas and an additional set of parentheses is used to enclose the full coordinate list. The footprints of the World Trade Center Towers 1 and 2 (now fountains in the 9/11 Memorial) can be represented as a single multipart polygon as follows:

MULTIPOLYGON(((-74.013751 40.711976, -74.01344 40.712439,
-74.012834 40.712191,
-74.013145 40.711732,
-74.013751 40.711976)),
((-74.013622 40.710772,
-74.013311 40.711236,
-74.012699 40.710992,
-74.013021 40.710532,
-74.013622 40.710772)))

This basic example shows the syntax for storing just X and Y coordinates. Keep in mind that Z values and M values are also supported for multipart geometries. As you might guess, the "MULTI" data types have "M" counterparts too: MULTIPOINTM, MULTILINESTRINGM and MULTIPOLYGONM.

G. Mixing geometries

The tables we've created so far reflect a bias toward Esri-centric design with each table storing a single column of homogeneous geometries (i.e. all points, or all lines, or all polygons, but not a mix). However, PostGIS supports two design approaches that are good to keep in mind when putting together a database:

  • It is possible to store multiple geometry columns in a table. This capability could be used to store data in two or more different spatial reference systems. Though one should do so under limited circumstances, given the existence of the ST_Transform() function and the additional maintenance such a design would require.
  • It is possible to store multiple geometry types in a single column. This capability can simplify schema design and certain types of queries, though there are drawbacks to this approach, such as the fact that some third-party tools can't deal with mixed-geometry tables.

Let's see how this heterogeneous column approach can be used to store all of our nyc_poi data in the same table.

  1. Repeat the steps (in Part A) to create a new table within the nyc_poi schema. Pay particular attention to these differences:
    • Give the table a name of mixed.
    • The table should have the same column definitions, with the exception that the geometry type should be set to GEOMETRY.
  2. Add the same features to this new table by executing the following statement:

    INSERT INTO mixed (name, geom)
    VALUES ('Empire State Building', ST_GeomFromText('POINT(-73.985744 40.748549)',4269)),
    ('Statue of Liberty', ST_GeomFromText('POINT(-74.044508 40.689229)',4269)),
    ('World Trade Center', ST_GeomFromText('POINT(-74.013371 40.711549)',4269)),
    ('Radio City Music Hall', ST_GeomFromText('POINT(-73.97988 40.760171)',4269)),
    ('Madison Square Garden', ST_GeomFromText('POINT(-73.993544 40.750541)',4269)),
    ('Holland Tunnel',ST_GeomFromText('LINESTRING(
    -74.036486 40.730121,
    -74.03125 40.72882,
    -74.011123 40.725958)',4269)),
    ('Lincoln Tunnel',ST_GeomFromText('LINESTRING(
    -74.019921 40.767119,
    -74.002841 40.759773)',4269)),
    ('Brooklyn Bridge',ST_GeomFromText('LINESTRING(
    -73.99945 40.708231,
    -73.9937 40.703676)',4269)),
    ('Central Park',ST_GeomFromText('POLYGON((
    -73.973057 40.764356,
    -73.981898 40.768094,
    -73.958209 40.800621,
    -73.949282 40.796853,
    -73.973057 40.764356))',4269));
  3. In the pgAdmin window right-click on the mixed table and select View/Edit Data > All Rows to confirm that the INSERT statement executed properly.

At some point in this lesson, you probably thought to yourself, "This is fine, but what if I want to see the geometries?" Well, you can get a quick look at the geometries returned by a query in pgAdmin by clicking on the "map" icon that appears on the right side of the geometry column header. But you'll likely want to go beyond this, for example, to utilize your geometries in the context of other data layers. That is the focus of the next part of the lesson, where we will use the third-party application Quantum GIS (QGIS) to view our PostGIS data.

Viewing Data in QGIS

Viewing Data in QGIS jls27

Quantum GIS (QGIS, pronounced kyü'-jis) is a free and open-source desktop GIS package analogous to Esri's ArcMap/ArcGIS Pro. Because of its support for viewing PostGIS data and strong cartographic capabilities, QGIS and PostGIS are often found paired together. (OpenJUMP is another desktop application often used in combination with PostGIS, though its strengths are in spatial querying and geoprocessing.)

A. Add PostGIS data to QGIS

Let's see how QGIS can be used to view the tables we created and populated in the previous section.

  1. Open QGIS. It'll be the QGIS Desktop 3.x choice from the Start menu.

    Double-click the New Empty Project option under Project Templates.

    The basic elements of the application GUI are similar to ArcMap/ArcGIS Pro's. The Layers panel in the lower left of the window lists the project layers and their symbology, while the much wider pane to the right displays the layer features themselves. Across the top of the window is a set of toolbars that can be moved to custom positions by the user.

    Above the Layers panel is the Browser panel, which provides an interface for browsing data sources. Moving from top to bottom:
    • Favorites - for enabling easy access to frequently used folders on your file system
    • Spatial Bookmarks - for saving map extents that you'd like to return to later
    • Home - for accessing data located within your folder in C:\Users
    • C:\ - for accessing data anywhere on your hard drive

      The folder navigation items above can be used to add file-based datasets to your project, such as Esri shapefiles or CSV files.
    • GeoPackage - for data in an interchange format from the Open Geospatial Consortium (OGC)
    • SpatiaLite - for data stored in a SpatiaLite database (another free and open-source spatial database extension similar to PostGIS, built to add spatial functionality to an RDBMS called SQLite)
    • PostgreSQL - for PostGIS data
    • SAP HANA - for data stored in SAP HANA spatial databases
    • MS SQL Server - for data stored in Microsoft SQL Server spatial databases
    • Oracle - for data stored in Oracle Spatial databases
    • WMS/WMTS - for data streamed via a Web Mapping Service or Web Map Tile Service
    • Vector Tiles - for adding vector-tiled basemaps
    • XYZ Tiles - for adding raster-tiled basemaps
    • WCS - for data streamed via a Web Coverage Service
    • WFS - for data streamed via a Web Feature Service
    • ArcGIS REST Servers - for map service or feature service data published via an ArcGIS Server instance
  2. Right-click on the PostgreSQL entry and select New Connection.
  3. In the Create a New PostGIS connection dialog, supply the following information. (See Figure 3.1, below.)
    • Name: Lesson3
    • Service - (leave blank)
    • Host: localhost
    • Port: 5432
    • Database: Lesson3db
    • SSL mode: disable
    • Session role - (leave blank)

      Supplying your authentication information can be done in a simple, but insecure way or a more secure way that requires a bit more effort. The simple way is to enter your authentication parameters under the Basic tab. They will be stored as part of the QGIS project file (.qgs), assuming you save your work, where they could be read fairly easily. The more secure way is to create a configuration under the Configurations tab. To do this, you would click the + icon, and, in the resulting dialog, give your configuration a name (e.g., Local Server), specify the URL of the PostGIS instance (or leave blank if using the one on your own machine), then supply the authentication parameters. If going this route, your credentials are stored in encrypted form in a QGIS authentication database on your machine.

      You're welcome to choose either method, though for the purposes of this class, it should be fine to select the Basic method.

      see caption
      Figure 3.1: The Create a New PostGIS connection dialog, showing the correct information entered.
  4. You should click the Test Connection button to make sure you have typed things correctly.
  5. Click OK to create the connection and dismiss the dialog.
    You should now see an arrow next to the PostgreSQL heading indicating that the project now has a Postgres data source connection available. Click the arrow to expand and reveal the Lesson3 connection.
  6. Now, click the arrow next to the Lesson3 entry to expand the list of schemas made available through that connection.
    If an Enter Credentials dialog pops up, just supply the postgres user name and the password you established for it.
  7. Expand the list of layers available in the nyc_poi schema. You should see the pts, lines, and polys tables created early in the last section and also three layers that reference the mixed table created later (one layer for each geometry type). One of the convenient features of QGIS is that it automatically creates separate layers for each geometry type in a multi-geometry table, like our mixed table. The mixed layers can be differentiated by the icons that indicate the geometry type.
  8. Highlight all 6 nyc_poi datasets and drag them onto the map canvas (or right-click and choose Add Selected Layers to Project). You should see a Select Transformation dialog since the SRID assigned to the geometries in the selected tables (4269, NAD83) is of a different datum than the project's datum (WGS84, from SRID 4326). Click OK to accept the pre-selected NAD83 to WGS84 transformation. You should see your data displayed in the map display area and the layers listed in the Layers Panel on the bottom left of the window.

    Now let's take a quick tour of how some common GIS operations are performed in QGIS.

B. Explore basic functions of QGIS

  1. Zoom to the full extent of all the layers by selecting View > Zoom Full.
  2. Click and drag to rearrange the layers. Put them in the following order, from top to bottom: pts, lines, polys, and then the mixed pts, lines, and polys layers.
    Note: If you'd like a basemap, OpenStreetMap is available as a pre-loaded XYZ Tiles option.
  3. The pts, lines, and polys layers are redundant with the three layers based on the mixed table, so turn off pts, lines, and polys by clicking the next to each layer.

    While playing with layer visibility, you may note that the polys layer contains a hole in the Central Park polygon, whereas the "mixed" version of that polygon does not. This is to be expected, since we didn't bother to create that inner ring in the mixed version.
  4. Each layer based on the mixed table, has the same name. Go ahead and re-name these layers: mixed - lines, mixed - pts, and mixed - polys by right-clicking on the layers one at a time and selecting the Rename command. (Be sure you're working in the Layers panel, not the Browser panel.)

    Now, let's see how we can restrict the features that are displayed in a layer by setting a property analogous to the "definition query" in ArcGIS.
  5. Double-click on the mixed - lines layer to open its Layer Properties dialog. You should see tabs along the left side.

    The Symbology tab is where you'd go to change the way a layer is symbolized. Note the pick list at the very top of the dialog, which provides Single Symbol, Categorized, and Graduated options, etc.

    The Actions tab provides functionality similar to ArcGIS's Hyperlink settings for launching external applications to view data found in the attribute table, such as images and URLs.

    The Joins tab is where you'd go if you need to join data from another table to the layer's attribute table.

    The Diagrams tab provides settings for creating pie chart and histogram (bar) chart overlays from numeric data in the attribute table.

    You can investigate the other elements of the Layer Properties at your leisure.
  6. Now, still in the dialog for the mixed - lines Layer Properties, note the locations of the three line features, then go to the Source tab. Recall that the three line features represent two tunnels (the longer line features) and a bridge. We are going to restrict the layer to showing just tunnels by doing the following:
    • Click on the Query Builder button at the lower right of the Layer Properties dialog.
    • In the Provider Feature Filter box, compose the expression shown below, or just copy-paste it.

       
      "name" LIKE '%Tunnel'

      The % character is the wildcard character in Postgres; we saw it was the * character in MS-Access in Lesson 1.

    • Click the Test button to verify the veracity of your expression, then OK after confirming that it returns 2 rows.
    • Click OK to dismiss the Query Builder dialog.
      You'll see the expression mirrored in the Provider feature filter box of the Layer Properties dialog.
    • Click OK to dismiss the Layer Properties dialog.
      You should see that the Brooklyn Bridge is no longer displayed as part of the layer.
  7. An intuitive set of zoom/pan tools can be found on the Map Navigation Toolbar (the bar containing the "hand" icon). Clicking on the Pan tool activates it; you can then click-and-drag in the map display area to alter the visible extent.
  8. To the right of the Pan tool, you'll see a number of useful tools for zooming: in/out, to the full extent, to the extent of a layer, to the extent of the selected features, and backward and forward in the extent history, along with tools associated with map tips and bookmarks.
  9. To the right of the Map Navigation toolbar are the Selection Toolbar and the Attributes Toolbar. The controls on these toolbars include tools for selecting (the layer needs to be highlighted first)/unselecting features, the Identify Features tool, a tool for opening the attribute table of the layer selected in the Layers Panel, a Field Calculator, etc.
  10. Right-click on the mixed - pts layer, and select Open Attribute Table. Again, you should find the table dialog has functionality that is very similar to what is found in ArcGIS.
  11. Find and click the Select features using an expression button (it has a script capital E on it), and enter the following example expression:

    gid < 4

    Note that you can build the expression graphically by expanding the Fields and Values and Operators lists, then double-clicking on items in those lists.

  12. Click Select features to evaluate the selection expression, then click Close to dismiss the Select by expression dialog, and note the selected features in the attribute table and on the map.
  13. Close the Attribute table.
  14. Finally, select Project > Properties.

    Under the General tab, you can: set the selection and background colors, specify whether references to data sources should be stored with relative or absolute paths, and set the display units of the project.

    Under the CRS (Coordinate Reference System) tab, you can specify on-the-fly coordinate system transformation settings. Let's re-project our data into the New York East State Plane system.
  15. Click on the CRS tab.
  16. The bottom of the dialog should show that NAD83 is the current CRS. This setting was inherited from the first data layer added to the project. Scroll to the top of the Predefined Coordinate Reference Systems list and click the minus sign [-] sign box to collapse the list of Geographic (2D) reference systems.This should enable you to see the other major categories, including Projected reference systems, which is where we'd expect to find the New York East system. QGIS doesn't include a State Plane sub-category in its Projected reference system list; all of the state plane projections can be found in either the Lambert Conformal Conic or Transverse Mercator sub-categories.
  17. Scrolling through those long lists can be tedious, so go to the Filter box near the top of the dialog and enter new york. The coordinate system listing should now only include those with 'new york' in the name.
  18. Find NAD83 / New York Long Island EPSG: 32118 under the Lambert Conformal Conic category and select it. Note that the area the coordinate system is designed for is helpfully highlighted in red on the mini map at the bottom right of the dialog.
  19. Click OK to accept the selected CRS.
  20. If your data are no longer visible, select View > Zoom Full. You should note that the Coordinate readout at the bottom of the application window now reports values in the ballpark of 300000,65000.
  21. Finally, save your project by clicking the Save button. Give the project a name of Lesson3 and save it to an appropriate folder. Note that the file extension for QGIS project files is .qgs.

That completes our quick tour of QGIS. In the next section, we'll return to pgAdmin to see how queries can be saved for later re-use.

Working with Views

Working with Views jls27

In Lesson 1, we saved a number of our MS-Access queries so that we could easily re-run them later and, in a couple of cases, to build a query upon another query rather than a table. In Postgres and other sophisticated RDBMSs, stored SQL statements like these are called views. In this section, we'll see how views can be created in Postgres.

A. Create a view based on attribute criteria

  1. In the pgAdmin Query dialog, execute the following query (which identifies the state capitals):
     
    SELECT * FROM usa.cities WHERE capital = 1 ORDER BY stateabb;
    (Sorry Montpelier, I guess you were too small.)
  2. After confirming that the query returns the correct rows, copy the SQL to your computer's clipboard (Ctrl-C).
  3. Back in the main pgAdmin window, navigate to the usa schema.
  4. Right-click on the Views node and select Create > View.
  5. Set the view's Name to vw_capitals and its Owner to postgres.
  6. Click on the Code tab, and paste the SQL statement held on the clipboard (Ctrl-V) into the text box.
  7. Click Save to complete creation of the view.
  8. Select View > Refresh (or hit F5) if you don't see the new view listed.
  9. Assuming the original query and its output is still being displayed, click the X in the upper right of the page to close the query, then click Don't Save.
  10. Right-click on vw_capitals in the Browser pane, and select View/Edit Data > All Rows. The Data Output pane will re-fill with the results of having ‘run’ the view.

    You could now use this view any time you want to work with state capitals. It's important to note that the output from the view is not a snapshot from the moment you created the view; if the underlying source table were updated, perhaps to add Montpelier, those updates would automatically be reflected in the view.

B. Build a query based on a view

Just as we saw in MS-Access, the records returned by views can be used as the source for a query.

  1. Open a new Query Tool tab and execute the following query, which identifies the 18 relatively small capitals. Note the use of the view we just created.
     
    SELECT * FROM usa.vw_capitals WHERE popclass = 2;

C. Create a view based on a spatial function

Views can also include spatial functions, or a combination of spatial and non-spatial criteria, in their definition. To demonstrate this, let's create views that re-project our states and cities data on the fly.

  1. Study then execute the following query:
     
    SELECT gid, name, pop2018, sub_region, ST_Transform(geom,2163) AS geom FROM usa.states;
  2. Follow the procedure outlined above to create a new view based on this query. Assign a name of vw_states_2163 to this view.
  3. Again, repeat this process to create an on-the-fly re-projection of the cities data called vw_cities_2163. Define the view using the following query:
     
    SELECT *, ST_Transform(geom,2163) AS geom_2163 FROM usa.cities;

D. Display views in QGIS

QGIS makes it possible to add both tables and views as layers. We'll take advantage of this feature now by creating layers from the views we just created.

  1. Open a new project in QGIS.
  2. Go to Project > Properties, check the No CRS check box, and click OK.
  3. Now, look under the PostGIS heading in the Browser Panel.
  4. Your Lesson3 connection should be remembered in the Browser Panel. If your Lesson3 connection is not available, you will need to recreate it.

    Click the arrow next to your Lesson3 connection selected to view the available schemas.

    If an Enter Credentials dialog pops up, just supply the postgres user name and the password you established for it.
  5. Expand the object list associated with the usa schema. You should see the original cities and states tables. You should also see vw_capitals, vw_states_2163 and two versions of vw_cities_2163.

    You see two versions of vw_cities_2163 because that view outputs all of the columns from the cities table, including geom, plus a column of geometries re-projected into SRID 2163 (geom_2163).
  6. Add the six usa schema layers to the QGIS project.
  7. Go back to Project > Properties. You should see that the checkbox for No CRS is now unchecked. Apparently, QGIS detected the fact that the spatial reference property settings for the layers we added are not all the same, so it engaged the on-the-fly projection capability.
  8. Check the No CRS box again, and hit the Apply button.
    Close the Project Properties dialog.
  9. Spend a few minutes playing with the view (zoom to the extent of the different layers and turn the views on and off). You will note that, indeed, without the on-the-fly projection engaged, the fact that some of the layers are in lon/lat coordinates and some are in meters based on an equal area map projection becomes apparent; cities, states, vw_capitals and vw_cities_2163.geom align correctly with one another, but they do not align with vw_cities_2163.geom_2163 and vw_states_2163.
    To get all of the layers to realign with one another, we would need to re-enable on-the-fly re-projection.

This section showed how to save queries as views, which can then be utilized in the same way as tables. In the next section, we'll go into a bit more detail on the topic of spatial references.

Spatial Reference Considerations

Spatial Reference Considerations jls27

A. Spatial Reference ID lookup

As we’ve seen, populating a geometry column with usable data requires specifying the spatial reference of the data. We also saw that geometries can be re-projected from one spatial reference to another using the ST_Transform() function. In both cases, it is necessary to refer to spatial reference systems by an SRID (Spatial Reference ID). So, where do these IDs come from, and where can a list of them be found?

The answer to the question of where the IDs come from is that PostGIS uses the spatial reference IDs defined by the European Petroleum Survey Group (EPSG). As for finding the ID for a spatial reference you want to use, there are a few different options.

Using pgAdmin

All of the spatial reference IDs are stored in a Postgres table in the public schema called spatial_ref_sys.

  1. In pgAdmin, navigate to the spatial_ref_sys table and view its first 100 rows. Make particular note of the srid and srtext columns.

    One way to find an SRID is to query the spatial_ref_sys table.
  2. Open the Query Tool and execute the following query:
     
    SELECT srid, srtext FROM spatial_ref_sys
    WHERE srtext LIKE '%Pennsylvania%';
    This query shows the SRIDs of each Pennsylvania-specific spatial reference supported in PostGIS.

Using QGIS

Another way to find SRIDs is to look them up in QGIS.

  1. In QGIS, go to Project > Project Properties.

    Under the CRS tab, recall that the various coordinate systems are categorized as Geographic or Projected. If you’re an Esri ArcMap user, this sort of interface should feel familiar. Let’s say you wanted to find the ID for UTM zone 18N, NAD83.
  2. Expand the Projected Coordinate Systems category, then expand the Universal Transverse Mercator (UTM) category.
  3. To easily get to this sublist, you might want to use the Filter capability.
  4. Scroll down through the list and find NAD83 / UTM zone 18N.
  5. On the right side of the dialog, you should see a column called Authority ID. Note that most of the Authority ID values are prefixed with EPSG, which means those are the values you should use in PostGIS. In this case, you would find that the desired SRID for UTM NAD83 Zone 18N is 26918.

Using the Prj2EPSG service

The Prj2EPSG website provides an easy-to-use interface for finding EPSG IDs. As its name implies, it allows the user to upload a .prj file (used by Esri to store projection metadata) and get back the matching EPSG ID. The site also makes it possible to enter search terms. My test search for ‘pennsylvania state plane’ yielded some garbage matches, but also the ones that I would expect.

This service appears to be down as of 5/29/2020. I leave this section here in case anyone is aware of a web-based alternative. If so, please share with the class in the discussion forum!

B. Geometry metadata in PostGIS

We’ve seen that the public schema contains a table called spatial_ref_sys that stores all of the spatial references supported by PostGIS. Another important item in that schema is the geometry_columns view. Have a look at the data returned by that view and note that it includes a row for each geometry column in the database. Among the metadata stored here are the parent schema, the parent table, the geometry column’s name, the coordinate dimension, the SRID and the geometry type (e.g., POINT, LINESTRING, etc.). Being able to conduct spatial analysis with PostGIS requires accurate geometry column information, so the PostGIS developers have made these data accessible through a read-only view rather than a table.

Earlier in the lesson, we used the AddGeometryColumn() function instead of adding the geometry column through the table definition GUI. An important reason for adding the geometry column in that manner is that it updates the geometry metadata that you can see through the geometry_columns view, something that would not happen if we had used the GUI.

C. Spherical measurements and the geography data type

We’ll talk more about measuring lengths, distances, and areas in the next lesson, but while we’re on the topic of spatial references, it makes sense to consider 2D Cartesian measurement in the context of planimetric map data versus measurement in the context of the spherical surface of the Earth. For example, the PostGIS function ST_Distance() can be used to calculate the distance between two geometries. When applied to geometries of the type we’ve dealt with so far, ST_Distance() will calculate distances in 2D Cartesian space. This is fine at a local or regional scale, since the impact of the curvature of the earth at those scales is negligible, but, over a continental or global scale, a significant error would result.

PostGIS offers a couple of alternative approaches to taking the earth’s curvature into account. Let’s assume that we wanted to measure the distance between points in the (U.S.) cities table that we created earlier in the lesson. We could use a version of the ST_Distance() function called ST_Distance_Spheroid(). As its name implies, this function is designed to calculate the minimum great-circle distance between two geometries.

The other approach is to store the features using a data type introduced in PostGIS 1.5 called geography. Unlike the geometry data type, the geography data type is meant for storing only latitude/longitude coordinates. The advantage of the geography data type is that measurement functions like ST_Distance(), ST_Length() and ST_Area() will return measures calculated in 3D space rather than 2D space. The disadvantage is that the geography data type is compatible with a significantly smaller subset of functions as compared to the geometry type. Calculating spherical measures can also take longer than Cartesian measures, since the mathematics involved is more complex.

The take-away message is that the geography data type can simplify data handling for projects that cover a continental-to-global scale. For projects covering a smaller portion of the earth’s surface, you are probably better off sticking with the geometry data type.

With that, we've covered all of the content for Lesson 3. In the next section, you'll find a project that will allow you to put what you've learned to use.

Project 3: Mapping the Class Roster

Project 3: Mapping the Class Roster jls27

For Project 3, I would like you to map the hometowns of everyone on the class roster using Postgres/PostGIS and QGIS. Included in the data you downloaded at the beginning of the lesson were U.S. state and counties shapefiles, along with a comma-separated values file called postal_codes.txt that stores U.S. and Canadian postal codes and the coordinates of their centroids.

Right-click here to download the class roster with postal codes. (If there are any students from outside the U.S. and Canada, they will appear at the bottom of the file with a special code. There will be a matching record at the bottom of the postal_codes.txt file.)

Here are the broad steps you should follow:

  1. Import the counties shapefile into Postgres.
  2. Create a table to store the postal code centroids.
  3. Load the centroid data from the text file.
  4. Add a geometry column and use an UPDATE query to populate it with POINT geometries.
  5. Create a table to store the class roster.
  6. Load the class roster from the text file.
  7. Determine a way to associate the geometries in the postal codes table with the records in the roster table.
  8. Use QGIS to create a map of the hometowns. Include state and county boundaries for some context.

Tips:

  • The copy command can be used to load data from a text file into a table. Use the Postgres documentation to determine the correct usage for this command. Navigate to the command's page in the documentation either through the Table of Contents or by using the Search function. Pay particular attention to the following options: FORMAT, HEADER, DELIMITER, and QUOTES. The copy command expects the columns of the "to" table to match and be in the same order as the columns of the "from" file.
  • You can assume that the centroid coordinates are in the NAD83/Geographic coordinate system.
  • Assuming your postal codes table contains columns named lat and lon you could use the following expression in an UPDATE statement to create a textual representation of the POINT geometry:
     

    'POINT(' || lon || ' ' || lat || ')'


    Note that the double-pipe character string (||) is the concatenation operator in Postgres.

  • A table/view must include a column that contains unique values in order for it to be added as a layer in QGIS. This column could be one that you've explicitly defined as a PK, but not necessarily. Also, this column need not be numeric. (Earlier versions of QGIS required tables/views to include an integer primary key column.)
  • There are multiple ways to address step 7, some more well-designed than others. The ideal design would allow you to make changes to the class roster table (e.g., add new students or change erroneous postal codes) and those changes will automatically be reflected in QGIS (i.e., it should not be necessary to update a geometry column when a new student is added). Coming up with this design will be worth 10% of the total project points.
  • Draw the state and county boundaries in light hues, so they don't distract from the main thing the map is meant to convey.
  • If there are students from outside North America, create one map focused on North America and another that shows the whole class. You'll probably want to exclude the county boundaries for this second map.

Deliverables

This project is one week in length. Please refer to Canvas Calendar for the due date.

  1. In the Project 3 Dropbox, upload a write-up that includes the map and describes the process you used to produce it. Your write-up will be evaluated according to the following criteria:
    • Workflow (60 of 100 points)
    • Write-up quality (20 of 100 points)
    • Aesthetic quality of the map (10 of 100 points)
    • Ease of making updates/insertions to the class roster (10 of 100 points)
  2. Complete the Lesson 3 quiz.

Lesson 3 Practice Exercise Solutions

Lesson 3 Practice Exercise Solutions jls27
  1. SELECT * FROM states WHERE pop2018 > 10000000;
  2. SELECT * FROM cities WHERE capital = 1;
  3. SELECT * FROM states WHERE name LIKE 'New%';
  4. SELECT * FROM cities WHERE name LIKE '%z%';
  5. SELECT * FROM states ORDER BY pop2018 DESC;
  6. SELECT * FROM states ORDER BY sub_region, name;
  7. SELECT * FROM states WHERE pop2018 > 10000000 AND sub_region = 'Pop';
  8. SELECT * FROM cities WHERE stateabb IN ('US-NY','US-NJ','US-PA');
  9. SELECT state, (white::double precision/total) * 100 AS pctwhite FROM census2020;
  10. SELECT sub_region, Sum(pop2018) FROM states GROUP BY sub_region;
  11. SELECT states.name, census2020.total, states.geom
    
        FROM states INNER JOIN census2020
    
        ON states.name = census2020.state;
  12. SELECT states.sub_region, Avg(census2020.male)
    
        FROM states INNER JOIN census2020
    
        ON states.name = census2020.state
    
        GROUP BY states.sub_region;