Shapefiles in Python: converting contours to shapes

I’m in the process of finishing up a web-app that I’ve written using Dash/Plotly and as someone who is not an expert in front end web development, I’ve really enjoyed how easy it is to build an interactive app without getting bogged down by all the javascript/html/css (though knowing some definitely helps!). But recently I’ve run into an issue while figuring how to plot various datasets as a background contour on a number of interactive maps: while Dash/Plotly is great at plotting interactive points on a map, there’s not a simple way to contour a dataset and use it as the base layer (there is the new scattermapbox class, but there is not yet a simple contouring method for maps like, for example, contourf in the basemap library).

But Dash/Plotly is very good at chloropleth (colored shapes) plots, so I figured that if I could convert a matplotlib.pyplot.contourf plot into a shapefile of polygons, I could then load that shapefile in Dash/Plotly as needed and plot it as a chloropleth plot, which seems easier to me than rendering a bitmap image of the contour plot in the background (the only other option I can think of)… and so I spent the morning experimenting with how to do this. There’s a few scattered stackoverflow questions that related to this, but I had to pull together a bunch of pieces to get this working,  so I figured a short writeup here might be useful to others out there.

So with that said, here are they python libraries that you’ll need to run my test script:

  • shapely (for polygons, used previously here, here and here)
  • fiona (for writing a shapefile)
  • descartes (for making some plots)
  • matplotlib (for the initial contouring of the data)
  • numpy (for making up some initial data and some array manipulation)

The full code is available in my learning_shapefiles repo and the new script is contourf_to_shp.py. And some useful stackoverflow answers that helped significantly: Converting Matplotlib contour objects to Shapely objects, matplotlib – extracting values from contour lines.

Creating data to contour 

To start off, I wanted to work with a small test data set that would include multiple domains at the same contour level, so I created a lat/lon grid and then superimposed two 2d gaussian curves with different amplitudes and decay rates and plotted those up:

from shapely import geometry
import matplotlib.pyplot as plt
from matplotlib import cm
import numpy as np
import fiona
import os,json
from descartes.patch import PolygonPatch

# create some test data with multiple peaks
lon = np.linspace(0,45,100)
lat = np.linspace(-20,32,90)
long,latg=np.meshgrid(lon,lat)
C1=np.sqrt((long-5.)**2+(latg-25)**2)/30.
C2=np.sqrt((long-30.)**2+(latg-1)**2)/10.
m = 30*np.exp(-C1**2)+20.*np.exp(-C2**2)

# make the contourf plot, storing the resulting ContourSet in cs
plt.figure(figsize=[10,5])
plt.subplot(1,2,1)
Nlevels=10
cs = plt.contourf(lon,lat,m,Nlevels,cmap='gist_heat')
plt.title('contourf figure with Nlevels='+str(Nlevels))

In my actual use-case I’ll be loading a 2D dataset, so most of this will be replaced. But the important bit is that I store the object that results from the contourf call, that cs object contains all the data on the contours.

Storing the contour results

The first step in saving the contoured data is to create a lookup table for the contour levels. The cs object stores each contour in cs.collections and the levels in cs.levels:

# create lookup table for levels
lvl_lookup = dict(zip(cs.collections, cs.levels))

Next, I loop over each contour of cs.collections, convert the contour’s shape into a shapely polygon and store it in a list of dictionaries that also saves that contour’s level value:

# loop over collections (and polygons in each collection), store in list for fiona
PolyList=[]
for col in cs.collections:
    z=lvl_lookup[col] # the value of this level
    for contour_path in col.get_paths():
        # create the polygon for this level
        for ncp,cp in enumerate(contour_path.to_polygons()):
            lons = cp[:,0]
            lats = cp[:,1]
            new_shape = geometry.Polygon([(i[0], i[1]) for i in zip(lons,lats)])
            if ncp == 0:                
                poly = new_shape # first shape
            else:
                poly = poly.difference(new_shape) # Remove the holes
            
            PolyList.append({'poly':poly,'props':{'z': z}})

The list PolyList now holds a shapely polygon as well as the z-value (or level) of that polygon.

Writing the data

In order to use these polygons elsewhere, the easiest thing to do is simply save them within a shapefile. I used the Fiona library for writing, which entails writing the geometry (the polygons) and the properties for each polygon that I stored earlier:

# define ESRI schema, write each polygon to the file
outfi=os.path.join(outname,'shaped_contour.shp')
schema = {'geometry': 'Polygon','properties': {'z': 'float'}}
with fiona.collection(outfi, "w", "ESRI Shapefile", schema) as output:
    for p in PolyList:
        output.write({'properties': p['props'],
            'geometry': geometry.mapping(p['poly'])})

In order to reconstruct the contour plot, I need to save the information about the different levels (to get the colormap correct). And while I could load the shapefile and loop over the shapes to calculate min/max values, it saves time to simply write that data to a different file. And rather than write it as metadata of some kind for the shapefile, I just drop it into a json file:

# save the levels and global min/max as a separate json for convenience
Lvls={'levels':cs.levels.tolist(),'min':m.min(),'max':m.max()}
with open(os.path.join(outname,'levels.json'), 'w') as fp:
    json.dump(Lvls, fp)

Plotting the polygons

From there, the shapefile and levels data can be re-loaded and plotted to get the following:

polygon_conts_N10On the left is the original contour plot, on the write is the plot of Polygon patches colored using the same colormap. And it’s an almost perfect replica… I’m not sure if it’s my eyes or screen, but the lightest colors seem just a hair different between the two. But that difference is even less noticeable as I use more contours, e.g. 1000: 

polygon_conts_N1000

The final bit of code is mostly self explanatory: looping over the shapes in the shapefile, then plotting colored polygon patches like in previous posts. The only tricky business is making sure the polygon facecolor matches the filled contour color. To do this, I load in the contour level json file that I saved off, and load the matplotlib colormap that I used in the initial call to contourf:

# read in levels, define colormap
with open(os.path.join(outname,'levels.json')) as jfile:
    Lvls=json.load(jfile)
levels=np.array(Lvls['levels'])
cmap=plt.cm.gist_heat
lv_range=[Lvls['min'],Lvls['max']]

And then for each polygon, I pull out the z-value that I saved and find the appropriate RGB value for that level with:

        lv=shp['properties']['z'] # this shape's level
        clr=cmap((lv - lv_range[0])/(lv_range[1]-lv_range[0]))

The bit of math with lv_range is ensure that the value is between 0 and 1. The clr variable is a tuple that can be given to the descartes PolygonPatch:

patch = PolygonPatch(poly, facecolor=clr, edgecolor=clr)

And that’s pretty much it. I still have some experimenting to do with how many contour levels I need for the datasets I’m working with, and I expect to run into some scaling issues (the full dataset fields I’m working with are 100s of MB), but at least I have a feasible approach!

Full code here.

Shapely Polygons: Expansion!

Today I found myself needing to scale a large number of geometric shapes corresponding to volcanic fields in the Western United States. Shapely (used in previous posts) makes this easy with the shapely.affinity.scale() method (an affine transformation preserves lines and points, and the scale() method is a simplified transformation without adding rotation or skew).

So I put together a test script using the scale() method (on my github here) that takes the scaling factor as input. Since I’ll be using this with geometric coordinates and will be testing sensitivity of measurements to different sized shapes, I added the option to scale the shape by a a sequences of distances (angular or linear). The code is fairly straight forward and I don’t have time for a detailed post, so for now, here’s a trippy plot showing a squiggly-circle shape scaled by different amounts:

affine_expand

To generate the above plot (assuming you’ve cloned my learning_shapefiles repo and are in the directory containing affine_expansion.py).:

python affine_expansion.py -f ../data/expansion_test_data.csv -exp 0.25,.5,.75,1
.25,1.5,1.75,2

 

merging shapes and plotting the physiographic boundary of the Colorado Plateau

Today I found myself needing to plot the physiographic boundary of the Colorado Plateau in Python. It’s been a while since I’ve touched on shapefiles (or anything on the blog) so I figured I’d write a quick blurb on reading and plotting this particular shapefile.

Data: shapefile of data from  Fenneman and Johnson 1946 [1] available at https://water.usgs.gov/GIS/dsdl/physio_shp.zip

Code to load & plot & write processed data: colorado_plateau.py

Python requirements: pyshp, shapely, matplotlib

What you’ll learn: reading shapefiles, merging polygon shapes in Python with shapely

The Data

The first challenge was finding the actual lat/lon coordinates defining the edge of the Colorado Plateau… it’s amazing how many papers in geology/geophysics plot the boundary but don’t actually reference where the heck they got their coordinates from. After much digging I FINALLY found a paper that actually cited their source: Hopper and Fischer 2018 [2] reference a 1946 publication by Fenneman and Johnson [1] titled “Physiographic divisions of the conterminous U. S.” and after a quick search I found the digitized data from that publication online at water.usgs.gov.

Here’s the summary page containing metadata: https://water.usgs.gov/GIS/metadata/usgswrd/XML/physio.xml

and a direct link to the zipped shapefile:  https://water.usgs.gov/GIS/dsdl/physio_shp.zip.

The dataset contains a large number of physiographic regions and the Colorado Plateau is subdivided into multiple regions, so the code below pulls out the regions within the Colorado Plateau and joins them into a single shape defining the full boundary. To run the code below, unpack physio_shp.zip wherever you downloaded it to and rename the folder to physio (to match expectations for the pyshp shapefile reader).

The Code

The full code is here.

The XML data for the shapefile defines a province code for different provinces, for which the Colorado Plateau sub-regions have a value of 21. So the code (1) reads the shapefiles, (2) finds the shapes with a province code of 21 and (3) combines them.

Step 1:  imports, reading arguments, reading the shapefile.

shapefile is the library for pyshp, otherwise pretty self explanatory:

import shapefile, os,sys
import matplotlib.pyplot as plt
from shapely.geometry import Polygon
from shapely.ops import cascaded_union

# read the arguments
fname=sys.argv[1] # path to physio.shp
outfolder=None
if len(sys.argv)>2:
    outfolder=sys.argv[2] # folder to store output

# read the shapefile
sf = shapefile.Reader(fname)

Step 2: Find the Colorado Plateau shapes.

The shapes are described in the records list of the shapefile object:

sf.records()

records() is a list of attributes for each shape and a single record looks like

[3.886, 9.904, 220, 15, 212, '21b', 'INTERMONTANE PLATEAUS', 'COLORADO PLATEAUS', 'UINTA BASIN', 21]

The final value is the province code — so we just need to save off the indeces for which that value is 21. It turns out the 3rd value in the record list is actually a cross-reference to a shape ID, but for some reason the indexing is offset by 2 when reading this shapefile with python. So the shape data for this shape would be accessed with:

sf.shapes()[218]

rather than 220. Not sure why it’s off by 2 (would expect it to be off by 1 due to python indexing), but in any case, my code simply records the list index as python sees it:

# find the record indeces for colorado plateau (province ID = 21)
i_rec=0
recs_to_plot=[]
for rec in sf.records():
    if rec[-1]==21:
        print(rec)
        print(i_rec)
        recs_to_plot.append(i_rec)
    i_rec=i_rec+1

# plot the individual records
plt.subplot(1,2,1)
for rec in recs_to_plot:
    pts=sf.shapes()[rec].points
    lons=[pt[0] for pt in pts]
    lats=[pt[1] for pt in pts]
    plt.plot(lons,lats,'.k')

As seen above — the coordinates for the shape boundaries for a record are in

sf.shapes()[rec].points

which is a list of longitude and latitude points (which the code unpacks for plotting). This section of code will generate the following outline of the Colorado Plateau regions:

Step 3: merging shapes

This is the fun bit! What we want is just the outer boundary of the union of all the shapes. The python library shapely lets us do this very easily by creating a list of shapely Polygon objects then combining them with the cascaded_union method:

# create a single shape for Colorado Plateau from union of sub-shapes
polies=[]
for rec in recs_to_plot:
    polies.append(Polygon(sf.shapes()[rec].points))
CP_bound=cascaded_union(polies)

# plot the exterior shape
lon,lat = CP_bound.exterior.xy
plt.subplot(1,2,2)
plt.plot(lon,lat,'.k')

and the resulting plot of just the exterior boundary:

Step 4: output the processed data 

The code also exports the lat/lon points defining that exterior boundary with:

# export the final shape as a CSV of boundary points
if outfolder is not None:
    f=open(os.path.join(outfolder,'ColoradoPlateauBoundary.csv'),'w')
    f.write("lon,lat\n")
    for i,j in zip(lon,lat):
        f.write(str(i)+","+str(j)+"\n")
    f.close()

I could have written some code to save the data in a shapefile format, but for such a small amount of data I find it easier to save a CSV and just create a Polygon from the list of points as I need it. I’m actually planning to create a Polygon that will be combined with GeoPandas to find sets of points falling within the plateau (GeoPandas lets you do database joins on geospatial data, it’s awesome!).

Running the Code

To run the code:

python colorado_plateau.py /path/to/physio/physio.shp /folder/for/output/

where the first argument is the path to the downloaded and unpacked shapefile and the second argument is the location to save the CSV file (this argument is optional — no data will be saved if not included).

References

[1] Fenneman, N. M., & Johnson, D. W. (1946). Physiographic
divisions of the conterminous U.S. Reston, VA: US Geological Survey,
Physiographic Committee Special Map. https://water.usgs.gov/GIS/metadata/usgswrd/XML/physio.xml

[2] Hopper, E., & Fischer, K. M. (2018), The changing face of the lithosphere-asthenosphere boundary: Imaging continental scale patterns in upper mantle structure across the contiguous U.S. with Sp converted waves. Geochemistry, Geophysics, Geosystems, 19 , 2 593 – 2 614 . https://doi.org/10. 1029/2018GC007476

Mapping some things!

Since publishing my series of posts on manipulating shapefiles in Python (1, 2 and 3), I’ve been exploring different open data catalogs so I thought I’d share some of the maps I’ve mapped! All these were produced using scripts in my learning_shapefiles repository on GitHub (see denver_stack.py and denver_tree_canopy.py).

Downtown Denver

denver_downtown

MapZen is a pretty sweet service! You can draw a regional box then the folks at MapZen will take that box and extract all the OpenStreetMap data within the box then give you the shapefiles with all that info. First thing I (stupidly) did after downloading my data extraction from MapZen was to just plot all of the shapes… and I then proceeded to sit around for quite some time while my laptop chugged away and produced a highly detailed map of all the things in Denver County. I zoomed in to downtown Denver for the above image to show off the detail.

Metro-Denver

denver_road_rivers

A more abstract representation of the primary roadways in metro Denver. I figured out how the MapZen/OpenStreetMap shapefile was organized and only plotted motorways, primary and secondary roads (bright white in the map). I also created a grid containing the shortest distance to a roadway (using  the distance method available for shapely.geometry.LineString) and contoured the inverse distance (1/x) to evoke the topographic contours along rivers.

Tree Cover in Denver County

denver_1

Denver’s Open Data Catalog has a bunch of databases with shapefiles galore. I downloaded one for the Tree Canopy and then plotted up the percent tree cover in each geometry. This is what actually lead me to learn how to plot roadways… and here I overlaid the tree cover on the same MapZen extraction of OpenStreeMap data. Along with the roadways underneath, it forms a sort of abstract tree.

So that’s it for today. Three maps from a couple different open data sources using some inefficient Python. Not going to go into detail on the code this time, because, well, it’s slooow. I’m using modifications of the simple scripts that I used in my shapefile tutorials because I was excited to just plot things! But there are much better ways to handle shapefiles with hundreds of thousands of geometries. So until next time (after I figure out how to use fiona or geopandas), just enjoy the visuals!

Shapely Polygons: Coloring Shapefile Polygons

In my previous two posts, I showed how to (1) read and plot shapefile geometries using the pyshp library and (2) plot polygons using shapely and descartes. So the obvious next step is to combine the two! And that’s what I’ll cover today, again using my learning_shapefiles github repo along with the shapefile of state boundaries from census.gov.

The Final Map

In case you don’t care about the Python and are just curious about the end product, here’s the final map where the color of each state reflects its total land area:

shapefile_us_colored_by_area_sat

It’s kind of neat to see the gradient of state size from east to west, reflecting the historical expansion of the U.S. westward, but other than that, there’s not much to the map. But it does serve as a simple case for learning to manipulate shapefiles.

The Code

There are two scripts in learning_shapefiles/src of relevance for today’s post: basic_readshp_plotpoly.py and read_shp_and_rcrd.py. The first script is a simple combination of basic_read_plot.py and simple_polygons.py (from my previous two posts), plotting the shapefile geometries using polygons instead of lines, so let’s start there.

basic_readshp_plotpoly.py

The code starts out the same as basic_read_plot.py, but now also imports Polygon and PolygonPatch from shapely and descartes, before reading in the shapefile:

import shapefile
import numpy as np
import matplotlib.pyplot as plt
from shapely.geometry import Polygon
from descartes.patch import PolygonPatch

"""
 IMPORT THE SHAPEFILE 
"""
shp_file_base='cb_2015_us_state_20m'
dat_dir='../shapefiles/'+shp_file_base +'/'
sf = shapefile.Reader(dat_dir+shp_file_base)

The next part of the code plots a single geometry from the shapefile. This is super easy because shapefile.Reader reads a shapefile geometry as a list of points, which is exactly what the Polygon function needs. So we can just give that list of points directly to the Polygon function:

plt.figure()
ax = plt.axes()
ax.set_aspect('equal')

shape_ex = sf.shape(5) # could break if selected shape has multiple polygons. 

# build the polygon from exterior points
polygon = Polygon(shape_ex.points)
patch = PolygonPatch(polygon, facecolor=[0,0,0.5], edgecolor=[0,0,0], alpha=0.7, zorder=2)
ax.add_patch(patch)

# use bbox (bounding box) to set plot limits
plt.xlim(shape_ex.bbox[0],shape_ex.bbox[2])
plt.ylim(shape_ex.bbox[1],shape_ex.bbox[3])

And we get Washington, now as a colored polygon rather than an outline:

shapefile_single

Woo!

And as before, we can now loop over each shape (and each part of each shape), construct a polygon and plot it:

""" PLOTS ALL SHAPES AND PARTS """
plt.figure()
ax = plt.axes() # add the axes
ax.set_aspect('equal')

icolor = 1
for shape in list(sf.iterShapes()):

    # define polygon fill color (facecolor) RGB values:
    R = (float(icolor)-1.0)/52.0
    G = 0
    B = 0

    # check number of parts (could use MultiPolygon class of shapely?)
    nparts = len(shape.parts) # total parts
    if nparts == 1:
       polygon = Polygon(shape.points)
       patch = PolygonPatch(polygon, facecolor=[R,G,B], alpha=1.0, zorder=2)
       ax.add_patch(patch)

    else: # loop over parts of each shape, plot separately
      for ip in range(nparts): # loop over parts, plot separately
          i0=shape.parts[ip]
          if ip < nparts-1:
             i1 = shape.parts[ip+1]-1
          else:
             i1 = len(shape.points)

          polygon = Polygon(shape.points[i0:i1+1])
          patch = PolygonPatch(polygon, facecolor=[R,G,B], alpha=1.0, zorder=2)
          ax.add_patch(patch)

    icolor = icolor + 1

plt.xlim(-130,-60)
plt.ylim(23,50)
plt.show()

In order to distinguish each polygon, I set each shape’s color based on how many shapes have already been plotted:

R = (float(icolor)-1.0)/52.0

This grades the red scale in an RGB tuple between 0 and 1 (since there are 52 shapes), and it is then used in the facecolor argument of PolygonPatch. The coloring is simply a function of the order in which the shapes are accessed:

shapefile_us

The goal, however, is to color each polygon by some sort of data so that we can actually learn something interesting, and that is exactly what read_shp_and_rcrd.py does.

read_shp_and_rcrd.py

Up to now, we’ve only considered the shape geometry, but that is only one part of a shapefile. Also included in most shapefiles are the records, or the data, associated with each shape. When a shapefile is imported,

shp_file_base='cb_2015_us_state_20m'
dat_dir='../shapefiles/'+shp_file_base +'/'
sf = shapefile.Reader(dat_dir+shp_file_base)

The resulting shapefile object (sf in this case) contains records associated with each shape. I wasn’t sure what fields were included for the State Boundary shapefile from census.gov, so I opened up a Python shell in terminal, read in the shapefile then typed

>>> sf.fields

to get a list of available fields:

[('DeletionFlag', 'C', 1, 0), ['STATEFP', 'C', 2, 0], ['STATENS', 'C', 8, 0], ['AFFGEOID', 'C', 11, 0], ['GEOID', 'C', 2, 0], ['STUSPS', 'C', 2, 0], ['NAME', 'C', 100, 0], ['LSAD', 'C', 2, 0], ['ALAND', 'N', 14, 0], ['AWATER', 'N', 14, 0]]

Down towards the end, there’s an interesting entry

['ALAND', 'N', 14, 0]

Though I couldn’t find any documentation on the included fields, I suspected ALAND stood for land area (especially since it was followed by AWATER). So in read_shp_and_rcrd.py, the first thing I do is extract the field names and find the index corresponding the the land area:

""" Find max/min of record of interest (for scaling the facecolor)"""

# get list of field names, pull out appropriate index
# fieldnames of interest: ALAND, AWATER are land and water area, respectively
fld = sf.fields[1:]
field_names = [field[0] for field in fld]
fld_name='ALAND'
fld_ndx=field_names.index(fld_name)

I found this post helpful for extracting the fieldnames of each record.

Next, I loop over the records using the interRecords() object to find the minimum and maximum land area in order to scale the polygon colors:

# loop over records, track global min/max
maxrec=-9999
minrec=1e21
for rec in sf.iterRecords():
    if rec[4] != 'AK': # exclude alaska so the scale isn't skewed
       maxrec=np.max((maxrec,rec[fld_ndx]))
       minrec=np.min((minrec,rec[fld_ndx]))

maxrec=maxrec/1.0 # upper saturation limit

print fld_name,'min:',minrec,'max:',maxrec

I excluded Alaska (if rec[4] != ‘AK’:) so that the color scale wouldn’t be thrown off, and then I also scale the maximum (maxrec=maxrec/1.0) to adjust the color scale manually (more on this later).

Now that I know the max/min, I loop over each shape and (1) calculate the RGB value for each polygon using a linear scale between the max and min and then (2) plot a polygon for each shape (and all the parts of a shape) using that RGB value:

for shapeRec in sf.iterShapeRecords():
    # pull out shape geometry and records 
    shape=shapeRec.shape
    rec = shapeRec.record

    # select polygon facecolor RGB vals based on record value
    if rec[4] != 'AK':
         R = 1
         G = (rec[fld_ndx]-minrec)/(maxrec-minrec)
         G = G * (G<=1) + 1.0 * (G>1.0)
         B = 0
    else:
         R = 0
         B = 0
         G = 0

    # check number of parts (could use MultiPolygon class of shapely?)
    nparts = len(shape.parts) # total parts
    if nparts == 1:
       polygon = Polygon(shape.points)
       patch = PolygonPatch(polygon, facecolor=[R,G,B], edgecolor=[0,0,0], alpha=1.0, zorder=2)
       ax.add_patch(patch)
    else: # loop over parts of each shape, plot separately
       for ip in range(nparts): # loop over parts, plot separately
           i0=shape.parts[ip]
           if ip < nparts-1:
              i1 = shape.parts[ip+1]-1
           else:
              i1 = len(shape.points)

          # build the polygon and add it to plot 
          polygon = Polygon(shape.points[i0:i1+1])
          patch = PolygonPatch(polygon, facecolor=[R,G,B], alpha=1.0, zorder=2)
          ax.add_patch(patch)

plt.xlim(-130,-60)
plt.ylim(23,50)
plt.show()

One import thing not to miss is that on the first line, I loop over the iterShapeRecords iterable rather than using iterShapes. This is neccesary so that I have access to both shape geometry and the associated records, rather than just the shapes (iterShapes) or just the records (iterRecords).

Running the above code will produce the following map:

shapefile_us_colored_by_area

Because Texas is so much larger than the rest of the states, we don’t see much of a difference between the states. But we can adjust this by decreasing the max value using in the scaling. So after finding the max/min value, I set

maxrec=maxrec/2.0 # upper saturation limit

and end up with the following map that brings out more of the variation in the states’ land area (same map as in the very beginning of this post):

shapefile_us_colored_by_area_sat

Note that because I’m decreased the maxvalue for scaling, I had to ensure that the RGB value did not exceed 1, which is why I had the following lines limiting the green value (G):

    if rec[4] != 'AK':
         R = 1
         G = (rec[fld_ndx]-minrec)/(maxrec-minrec)
         G = G * (G<=1) + 1.0 * (G>1.0)

So that’s about it! That’s how you can read in a shapefile and plot polygons of each shape colored by some data (record) associated with each shape. There are plenty of more sophisticated ways to do this exercise, and I’ll be looking into some other shapefile Python libraries for upcoming posts.

Shapefiles in Python: shapely polygons

In my last post, I described how to take a shapefile and plot the outlines of the geometries in the shapefile. But the power of shapefiles is in the records (the data) associated with each shape. One common way of presenting shapefile data is to plot the shapefile geometry as polygons that are colored by some value of data. So as a prelude to doing just that, this post will cover how to plot polygons using the shapely and descartes libraries. As always, my code is up on my github page.

The two python libraries that I’ll be using are shapely (for constructing a polygon) and descartes (for adding a polygon to a plot). So step 0 is to go install those! I’ll also be using the numpy and matplotlib libraries, but you probably already have those.

Though the documentation for shapely has some nice sample source code, I wrote my own script, simple_polygons.py, to get to know the libraries better. In this approach, there are two steps to building a polygon from scratch: constructing the points that define the polygon’s shape and then mapping those points into a polygon structure. The first step doesn’t require any special functions, just standard numpy. The second step uses the  shapely.geometry.Polygon class to build a polygon from a list of coordinates.

There are limitations for valid polygons, but virtually any shape can be constructed, like the following pacman:

pacman

The first step is to build the list of coordinates defining the exterior points (the outer circle) and a list of interior points to exclude from the polygon (the eyeball). Starting with the exterior points, I calculate the x and y coordinates of unit circle from 0.25pi to 7/4pi (0 to 2pi would map a whole circle rather than a pacman):

theta = np.linspace(0.25*3.14,1.75*3.14,80) 

# add random perturbation 
max_rough=0.05 
pert=max_rough * np.random.rand(len(theta)) 

x = np.cos(theta)+pert 
y = np.sin(theta)+pert

I also add a random, small perturbation to each x-y position to add a bit of roughness to the outer pacman edge, because I wanted some small scale roughness more similar to the shapefiles I’d be plotting later. Next, I build a python list of all those x-y points. This list, ext, is the list of exterior points that I’ll give to shapely:

# build the list of points 
ext = list() 

# loop over x,y, add each point to list 
for itheta in range(len(theta)): 
    ext.append((x[itheta],y[itheta])) 

ext.append((0,0)) # add 0 point

At the end, I add the 0,0 point, otherwise the start and end points on the circle would connect to each other and I’d get a pacman that was punched in the face:

pacman_punch

That takes care of the exterior points, and making the list of interior points is similar. This list, inter, will be a list of points that define interior geometries to exclude from the polygon:

# build eyeball interior points 
theta=np.linspace(0,2*3.14,30) 
x = 0.1*np.cos(theta)+0.2 
y = 0.1*np.sin(theta)+0.7 

inter = list() 
for itheta in range(len(theta)): 
    inter.append((x[itheta],y[itheta])) 
inter.append((x[0],y[0]))

Now that we have the list of exterior and interior points, you just give that to shapely’s polygon function (shapely.geometry.Polygon):

polygon = Polygon(ext,[inter[::-1]])

Two things about passing Polygon the interior list: (1) you can actually pass Polygon a list of lists to define multiple areas to exclude from the polygon, so you have to add the brackets around inter and (2) I haven’t quite figured out the [::-1] that the shapely documentation includes. I know that generally, [::-1] will take all the elements of a list and reverse them, but why does Polygon need the points in reverse? No idea. Without it, I only get an outer edge defining the eyeball:

pacman_badeye

I would love to get some information on why Polygon needs the reversed list, so leave me a note in the comments if you know why.

Regardless, the next step is to add that polygon structure to a plot, with a straightforward use of matplotlib.pyplot (imported as plt) and descartes.patch.PolygonPatch:

 

# initialize figure and axes 
fig = plt.figure() 
ax = fig.add_axes((0.1,0.1,0.8,0.8)) 

# put the patch on the plot 
patch = PolygonPatch(polygon, facecolor=[0,0,0.5], edgecolor=[1,1,1], alpha=1.0) 
ax.add_patch(patch) 

# new axes 
plt.xlim([-1.5, 1.5]) 
plt.ylim([-1.5,1.5]) 
ax.set_aspect(1) 

plt.show()

PolygonPatch’s arguments are pretty self explanatory: facecolor and edgecolor set the colors for the fill and edge of the polygon. Conveniently, facecolor and edgecolor can be specified as RGB values, which I’ll take advantage of for plotting shapefile records in my next post. It can also accept any of the kwargs available to matplotlib.patches.Polygon class (like the transparency,alpha, between 0 and 1).

So that’s it! Pretty easy! And in some ways it is even easier to plot polygons from a shapefile, since pyshp imports shapefile coordinates as a list and you can just give that list directly to Polygon… more on that in the next post.