Shapely Polygons: Coloring Shapefile Polygons

In my previous two posts, I showed how to (1) read and plot shapefile geometries using the pyshp library and (2) plot polygons using shapely and descartes. So the obvious next step is to combine the two! And that’s what I’ll cover today, again using my learning_shapefiles github repo along with the shapefile of state boundaries from census.gov.

The Final Map

In case you don’t care about the Python and are just curious about the end product, here’s the final map where the color of each state reflects its total land area:

shapefile_us_colored_by_area_sat

It’s kind of neat to see the gradient of state size from east to west, reflecting the historical expansion of the U.S. westward, but other than that, there’s not much to the map. But it does serve as a simple case for learning to manipulate shapefiles.

The Code

There are two scripts in learning_shapefiles/src of relevance for today’s post: basic_readshp_plotpoly.py and read_shp_and_rcrd.py. The first script is a simple combination of basic_read_plot.py and simple_polygons.py (from my previous two posts), plotting the shapefile geometries using polygons instead of lines, so let’s start there.

basic_readshp_plotpoly.py

The code starts out the same as basic_read_plot.py, but now also imports Polygon and PolygonPatch from shapely and descartes, before reading in the shapefile:

import shapefile
import numpy as np
import matplotlib.pyplot as plt
from shapely.geometry import Polygon
from descartes.patch import PolygonPatch

"""
 IMPORT THE SHAPEFILE 
"""
shp_file_base='cb_2015_us_state_20m'
dat_dir='../shapefiles/'+shp_file_base +'/'
sf = shapefile.Reader(dat_dir+shp_file_base)

The next part of the code plots a single geometry from the shapefile. This is super easy because shapefile.Reader reads a shapefile geometry as a list of points, which is exactly what the Polygon function needs. So we can just give that list of points directly to the Polygon function:

plt.figure()
ax = plt.axes()
ax.set_aspect('equal')

shape_ex = sf.shape(5) # could break if selected shape has multiple polygons. 

# build the polygon from exterior points
polygon = Polygon(shape_ex.points)
patch = PolygonPatch(polygon, facecolor=[0,0,0.5], edgecolor=[0,0,0], alpha=0.7, zorder=2)
ax.add_patch(patch)

# use bbox (bounding box) to set plot limits
plt.xlim(shape_ex.bbox[0],shape_ex.bbox[2])
plt.ylim(shape_ex.bbox[1],shape_ex.bbox[3])

And we get Washington, now as a colored polygon rather than an outline:

shapefile_single

Woo!

And as before, we can now loop over each shape (and each part of each shape), construct a polygon and plot it:

""" PLOTS ALL SHAPES AND PARTS """
plt.figure()
ax = plt.axes() # add the axes
ax.set_aspect('equal')

icolor = 1
for shape in list(sf.iterShapes()):

    # define polygon fill color (facecolor) RGB values:
    R = (float(icolor)-1.0)/52.0
    G = 0
    B = 0

    # check number of parts (could use MultiPolygon class of shapely?)
    nparts = len(shape.parts) # total parts
    if nparts == 1:
       polygon = Polygon(shape.points)
       patch = PolygonPatch(polygon, facecolor=[R,G,B], alpha=1.0, zorder=2)
       ax.add_patch(patch)

    else: # loop over parts of each shape, plot separately
      for ip in range(nparts): # loop over parts, plot separately
          i0=shape.parts[ip]
          if ip < nparts-1:
             i1 = shape.parts[ip+1]-1
          else:
             i1 = len(shape.points)

          polygon = Polygon(shape.points[i0:i1+1])
          patch = PolygonPatch(polygon, facecolor=[R,G,B], alpha=1.0, zorder=2)
          ax.add_patch(patch)

    icolor = icolor + 1

plt.xlim(-130,-60)
plt.ylim(23,50)
plt.show()

In order to distinguish each polygon, I set each shape’s color based on how many shapes have already been plotted:

R = (float(icolor)-1.0)/52.0

This grades the red scale in an RGB tuple between 0 and 1 (since there are 52 shapes), and it is then used in the facecolor argument of PolygonPatch. The coloring is simply a function of the order in which the shapes are accessed:

shapefile_us

The goal, however, is to color each polygon by some sort of data so that we can actually learn something interesting, and that is exactly what read_shp_and_rcrd.py does.

read_shp_and_rcrd.py

Up to now, we’ve only considered the shape geometry, but that is only one part of a shapefile. Also included in most shapefiles are the records, or the data, associated with each shape. When a shapefile is imported,

shp_file_base='cb_2015_us_state_20m'
dat_dir='../shapefiles/'+shp_file_base +'/'
sf = shapefile.Reader(dat_dir+shp_file_base)

The resulting shapefile object (sf in this case) contains records associated with each shape. I wasn’t sure what fields were included for the State Boundary shapefile from census.gov, so I opened up a Python shell in terminal, read in the shapefile then typed

>>> sf.fields

to get a list of available fields:

[('DeletionFlag', 'C', 1, 0), ['STATEFP', 'C', 2, 0], ['STATENS', 'C', 8, 0], ['AFFGEOID', 'C', 11, 0], ['GEOID', 'C', 2, 0], ['STUSPS', 'C', 2, 0], ['NAME', 'C', 100, 0], ['LSAD', 'C', 2, 0], ['ALAND', 'N', 14, 0], ['AWATER', 'N', 14, 0]]

Down towards the end, there’s an interesting entry

['ALAND', 'N', 14, 0]

Though I couldn’t find any documentation on the included fields, I suspected ALAND stood for land area (especially since it was followed by AWATER). So in read_shp_and_rcrd.py, the first thing I do is extract the field names and find the index corresponding the the land area:

""" Find max/min of record of interest (for scaling the facecolor)"""

# get list of field names, pull out appropriate index
# fieldnames of interest: ALAND, AWATER are land and water area, respectively
fld = sf.fields[1:]
field_names = [field[0] for field in fld]
fld_name='ALAND'
fld_ndx=field_names.index(fld_name)

I found this post helpful for extracting the fieldnames of each record.

Next, I loop over the records using the interRecords() object to find the minimum and maximum land area in order to scale the polygon colors:

# loop over records, track global min/max
maxrec=-9999
minrec=1e21
for rec in sf.iterRecords():
    if rec[4] != 'AK': # exclude alaska so the scale isn't skewed
       maxrec=np.max((maxrec,rec[fld_ndx]))
       minrec=np.min((minrec,rec[fld_ndx]))

maxrec=maxrec/1.0 # upper saturation limit

print fld_name,'min:',minrec,'max:',maxrec

I excluded Alaska (if rec[4] != ‘AK’:) so that the color scale wouldn’t be thrown off, and then I also scale the maximum (maxrec=maxrec/1.0) to adjust the color scale manually (more on this later).

Now that I know the max/min, I loop over each shape and (1) calculate the RGB value for each polygon using a linear scale between the max and min and then (2) plot a polygon for each shape (and all the parts of a shape) using that RGB value:

for shapeRec in sf.iterShapeRecords():
    # pull out shape geometry and records 
    shape=shapeRec.shape
    rec = shapeRec.record

    # select polygon facecolor RGB vals based on record value
    if rec[4] != 'AK':
         R = 1
         G = (rec[fld_ndx]-minrec)/(maxrec-minrec)
         G = G * (G<=1) + 1.0 * (G>1.0)
         B = 0
    else:
         R = 0
         B = 0
         G = 0

    # check number of parts (could use MultiPolygon class of shapely?)
    nparts = len(shape.parts) # total parts
    if nparts == 1:
       polygon = Polygon(shape.points)
       patch = PolygonPatch(polygon, facecolor=[R,G,B], edgecolor=[0,0,0], alpha=1.0, zorder=2)
       ax.add_patch(patch)
    else: # loop over parts of each shape, plot separately
       for ip in range(nparts): # loop over parts, plot separately
           i0=shape.parts[ip]
           if ip < nparts-1:
              i1 = shape.parts[ip+1]-1
           else:
              i1 = len(shape.points)

          # build the polygon and add it to plot 
          polygon = Polygon(shape.points[i0:i1+1])
          patch = PolygonPatch(polygon, facecolor=[R,G,B], alpha=1.0, zorder=2)
          ax.add_patch(patch)

plt.xlim(-130,-60)
plt.ylim(23,50)
plt.show()

One import thing not to miss is that on the first line, I loop over the iterShapeRecords iterable rather than using iterShapes. This is neccesary so that I have access to both shape geometry and the associated records, rather than just the shapes (iterShapes) or just the records (iterRecords).

Running the above code will produce the following map:

shapefile_us_colored_by_area

Because Texas is so much larger than the rest of the states, we don’t see much of a difference between the states. But we can adjust this by decreasing the max value using in the scaling. So after finding the max/min value, I set

maxrec=maxrec/2.0 # upper saturation limit

and end up with the following map that brings out more of the variation in the states’ land area (same map as in the very beginning of this post):

shapefile_us_colored_by_area_sat

Note that because I’m decreased the maxvalue for scaling, I had to ensure that the RGB value did not exceed 1, which is why I had the following lines limiting the green value (G):

    if rec[4] != 'AK':
         R = 1
         G = (rec[fld_ndx]-minrec)/(maxrec-minrec)
         G = G * (G<=1) + 1.0 * (G>1.0)

So that’s about it! That’s how you can read in a shapefile and plot polygons of each shape colored by some data (record) associated with each shape. There are plenty of more sophisticated ways to do this exercise, and I’ll be looking into some other shapefile Python libraries for upcoming posts.

2 thoughts on “Shapely Polygons: Coloring Shapefile Polygons

  1. Pingback: Mapping some things! | Chris Havlin

  2. Pingback: Shapefiles in Python: converting contours to shapes | Chris Havlin

Comments are closed.