Bikes! Part 0

Some months ago I discovered that there are a number of bike shares out there that make their data publicly available, and I’ve been meaning to download some of it and poke around. Well today I finally had some time. And though I’m not sure what I’ll be doing with this data set yet, I wanted to share a figure I made in my initial exploration.

The following figure shows the number of rides per day and the median ride distance for the Portland OR bike share (data from BikeTown: https://www.biketownpdx.com/system-data). I threw the code in a new github repo here so you can take a look if inerested, but I’m not going to go into detail yet (the code just downloads their system data files, concantenates the monthly files and does some minimal processing and plotting). In any case, the figure:

The neat (and maybe unsurprising) feature is the strong seasonality to bike share usage, both in terms of just the number of rides per day (high in summer, low in winter) and the median distance of each ride (longer rides in summer). There is an interesting spike in total rides around May 2018 — maybe excitement for springtime? or additional bikes added to the program? A plot of bike usage percent (total rides / available bikes) might be more illustrative.

So that’s that for now. Hopefully won’t be too long before I have time for some more in depth analysis.

Bikes!

 

Great Circle Paths

Been quite a while since any updates, but here’s a short one!

As a part of a contract I’m working on, I found myself having to plot the major arc of great circle paths on a map. But if you google “how to plot great circle path in [insert python library here]” all the solutions are for plotting minor arcs. Turns out in the end, there’s a really simple trick to plotting the major arc (and I felt pretty dumb when I realized it after wasting a ton of time), but I figured I’d write it up here in case it saves anyone else a bit of time. The short answer: given two points that you want the major arc for, just add the antipodes for each point to the list of points in your path.

First off, in case you need a review of great circles, here’s a globe:

Given two points on the surface of a sphere, there is a single circle that that contains both points (unless you’re at a pole, in which case there are infinite great circles). The short way round is the minor arc (red curve), the long way round is the major arc (green curve). And I needed to plot both of them.

The reason I got into plotting this in the first place is that in seismology, surface waves are described by major and minor arcs. When an earthquake generates seismic waves and is measured at a seismometer somewhere else, the raypath between the epicenter and seismic station falls on a great circle path. And surface waves are referred to in terms of the minor and major arcs: the R1 wave travels the minor arc and the R2 travels the major arc. These waves will actually keep going around the earth’s surface before dissipating: R3 is the R1 after it goes around again, R4 is the R2 after it goes around again, and on and on. So I needed to be able to plot all these.

Ok, so back to plotting…

I was using the plotly library in Python for this plot, so I’ll stick with that for examples here, but there should be similar functions in whatever mapping library you’re using. The full script is on my github page here.

So the important bit is just defining the list of latitude and longitude points. Here, the minor arc points are put into a dictionary:

paths={}
paths['minor_arc']={'lon':[ start_lon, end_lon ],
                    'lat':[start_lat,end_lat], 'clr':'red','dash':None}

When we give this to plotly, we’ll tell it to connect the two points, which will give us the shortest path between the two, the minor arc.

To plot the major arc, we just need to add some points between the start and end so that it takes the long way around. But how to choose the points? Well, turns out that there are tons of confusing pages out there on the trig used for calculating great circle paths, and I almost started to code up some of it… until I realized only needed a couple points. And the antipodes (the point that is exactly opposite a given point on the surface) are both real easy to calculate and  guaranteed to lie on the great circle path. Just add 180 to the longitude and flip the sign of the latitude:

ant1lon=start_lon+180
if ant1lon>360:
    ant1lon= ant1lon - 360
ant1lat=-start_lat

Same for the antipode of the second, end point. The >360 bit is just to make sure longitude remains between 0 and 360 degrees.

And now, we can put the antipodes in a list for the major arc:

paths['major_arc']={'lon':[ start_lon,ant2lon,ant1lon, end_lon ],
                    'lat':[start_lat,ant2lat,ant1lat,end_lat], 
                    'clr':'green','dash':None}

In the script, these paths are then added as a data dictionary used in creating the plotly figure:

DataDict=list()
for path in ['minor_arc','major_arc']:
    DataDict.append(
        dict(
            type = 'scattergeo',
            lon = paths[path]['lon'],
            lat = paths[path]['lat'],
            name= path,
            mode = 'lines',
            line = dict(
                width = 2.5,
                color = paths[path]['clr'],
                dash=paths[path]['dash'],
            ),
            opacity = 1.0,
        )
    )

figdata={}
figdata['data']=DataDict

The full script has a bit more where it actually plots the data (to give the image above), but plotly has some really nice tutorials for that already so I won’t bother explaining all that.

So that’s that! Hope it saves someone else some time.

 

Quick tutorial on Geocoding with Python

I recently found myself needing to get the latitude/longitude of a list of cities (for this map here) and it turns out, it’s pretty easy now that I know how to do it. Here’s a quick tutorial!

Ok, so the process of taking a city and assigning a latitude/longitude point is called geocoding. There are many services that offer this (e.g., Google or Bing Maps APIs) but most I looked at seemed overkill for a one-time task of assigning lat/lon to about 500 cities. But then I discovered OpenStreetMap’s Nominatim!  You can modify the http address to return results in an xml file. For example,  the following searches for Providence, RI:

https://nominatim.openstreetmap.org/search.php?q=Providence+RI+USA&format=xml

 

And returns:

<searchresults timestamp="Thu, 02 Feb 17 16:17:00 +0000" attribution="Data © OpenStreetMap contributors, ODbL 1.0. http://www.openstreetmap.org/copyright" querystring="Providence RI USA" polygon="false" exclude_place_ids="158799064,159481664" more_url="https://nominatim.openstreetmap.org/search.php?format=xml&exclude_place_ids=158799064,159481664&accept-language=en-US,en;q=0.8&q=Providence+RI+USA">
<place place_id="158799064" osm_type="relation" osm_id="191210" place_rank="16" boundingbox="41.772414,41.861571,-71.4726669,-71.3736134" lat="41.8239891" lon="-71.4128342" display_name="Providence, Providence County, Rhode Island, United States of America" class="place" type="city" importance="0.80724054252736" icon="https://nominatim.openstreetmap.org/images/mapicons/poi_place_city.p.20.png"/>
<place place_id="159481664" osm_type="relation" osm_id="1840541" place_rank="12" boundingbox="41.7232498,42.0188529,-71.7992521,-71.3177699" lat="41.8677428" lon="-71.5814833" display_name="Providence County, Rhode Island, United States of America" class="boundary" type="administrative" importance="0.58173948152676" icon="https://nominatim.openstreetmap.org/images/mapicons/poi_boundary_administrative.p.20.png"/>
</searchresults>

If you scroll to the right you’ll see:

lat="41.8239891" lon="-71.4128342"

It’s pretty easy to write a python script to request then parse the xml result for lat and lon.  Here’s what that might look like (BUT DON’T DO THIS):

import urllib2

city='Providence, RI'
city_search=city.replace(' ','').split(',') # removes whitespace, splits city/state

# build the http address:
# (results in a string: 'https://nominatim.openstreetmap.org/search.php?q=Providence+RI+USA&format=xml')
osm='https://nominatim.openstreetmap.org/search.php?q=' 
fmt='+USA&polygon=1&format=xml'
srch = osm + city_search[0] + '+' + city_search[1] + fmt

# now use urllib2 to open the url and store the result:
response = urllib2.urlopen(srch)
the_page = response.read().split()

# and now we can parse the resulting string array where the xml info is stored.
# the it only stores the first Lon/Lat that it encounters

Lon = 0.0
Lat = 0.0
for iel in range(0,len(the_page)): # loop over the strings in the_page, look for Lat/Lon
    if 'lon=' in the_page[iel] and Lon == 0.0:
        Lon=float(the_page[iel].split("'")[1])
    if 'lat=' in the_page[iel] and Lat == 0.0:
        Lat=float(the_page[iel].split("'")[1])

So. Why not just loop over your list of cities and repeat this exercise? Well if you check out Nomanatim’s documentation page, and take a look at the usage policy, it requires: “(1) No heavy uses (an absolute maximum of 1 request per second). (2) Provide a valid HTTP Referer or User-Agent identifying the application (stock User-Agents as set by http libraries will not do). (3) Clearly display attribution as suitable for your medium. (4) Data is provided under the ODbL license which requires to share alike (although small extractions are likely to be covered by fair usage / fair dealing).” While I don’t think that my case of simply geocoding 500 or so cities falls under heavy usage and I could just delay my successive calls, I decided to look into their suggestions for other options.

In the end I settled on MapQuest’s implementation of Nominatim. It provides access to all the OpenStreetMaps data (still open source and subject to the OSM license agreements) and a MapQuest free developer account gets you 15,000 request/month for free. Waaay more than I’d need for this project.

So to geocode a list of cities, first sign up for a MapQuest Developer Account. You’ll get an API key assigned to you. Unlike some other API’s, MapQuest doesn’t use any fancy authentication. You basically just make a request for the URL with the API in the http address. Reaaaaally easy (but not exactly secure).

Then you can run a code very similar to that above. My implementation is here: look_up_latlons.py. But it’s kind of tied to the data that I was mapping.

Some notes on the code.

(1)  the API key is passed in through a command line argument, so when you run this code you have to type

$ python look_up_latlons.py AL1243KSFD242332552134KLJ

where that long string of letters/numbers is whatever your API key is.

(2) And then the formatting of the http address is slightly different from the standard Nominatim api. The same search for Providence RI  looks like:

 http://open.mapquestapi.com/nominatim/v1/search.php?key=API_KEY&format=xml&q=Providence+RI

where API_KEY is, again, your API key.

(3) In my implementation, I have imported a CSV file as a pandas dataframe (called Counts). Each row contains a city name along with the number of people who marched in the Women’s Marches on Jan. 21. The meat of the code is copied below, in which I iterate over the rows in the dataframe (named Counts here), find the lat/lon for each row (i.e., each city) and then store that lat/lon in a new dataframe (NewCounts) because it’s bad to modify an existing dataframe while iterating over it. Here’s what that looks like:

 osm='http://open.mapquestapi.com/nominatim/v1/search.php?key='+API_KEY+'&format=xml&q='

 # loop over cities in crowd counts, find Lat/Lon
 NewCounts=Counts.copy()
 NewCounts['lon']=np.zeros(len(Counts)) # add new column for lon
 NewCounts['lat']=np.zeros(len(Counts)) # add new column for lat
 for index, row in Counts.iterrows():

     srch=osm+str(row['City']).replace(' ','+')

     print '\n\nLooking up lat/lon for',row['City'],index
     time.sleep(dt) 
     response = urllib2.urlopen(srch)
     the_page = response.read().split()
 
     for iel in range(0,len(the_page)):
         if 'lon=' in the_page[iel] and NewCounts['lon'][index]==0.0:
            NewCounts['lon'][index]=float(the_page[iel].split("'")[1])
         if 'lat=' in the_page[iel] and NewCounts['lat'][index]==0.0:
            NewCounts['lat'][index]=float(the_page[iel].split("'")[1])

      print row['City'],NewCounts['lon'][index],NewCounts['lat'][index]

The MapQuest API didn’t have any specific usage constraints for how frequently you make a request, just overall number in a month, but I added a small delay between calls using the time.sleep() function anyway.

That’s all for now, hopefully some more posts with colorful plots coming soon!

 

A Python tool for inspecting shapefiles

In my recent coding exploits, I’ve downloaded lots of different shapefiles. Most shapefiles were accompanied by nice .xml documenation with information about the data and how its stored or labeled, but a few had hardly any information at all. I knew the general content based on the description from the website were I downloaded the shapefile, but I didn’t know what they had used for the record labels and I didn’t know what the record values were exactly. So the past couple days I sat down and wrote a bit of code to help in unraveling a myserious shapefile…

Check out (and/or download) the full Python source here: shapefile inspection!

The program is fairly straightforward. It traverses the records of a shapefile, recording the record label (or “field names” as I refer to them in the source) and information about each record. One of the program’s methods uses the Python XML API called ElementTree to  produce an xml file that you can load in a browser. Here’s a screen shot from using Firefox to view the xml file produced when running the program on the Open Street Map shapefile that I extracted via MapZen for my previous post.

xml_sample_1

In a browser, you can shrink or expand the xml attributes to get some basic information about each record: the name or label of the records, the data type and some sort of sample of the data. If the record data is an integer or float, then the sample will be the min/max values of the record while if it’s a string, it will either be a list of the unique strings in the records or just a sample of some of the strings. The OpenStreetMap shapefile contained some record values that were keywords, like the “highway” attribute in the screen shot above. While other records were strings with unique values for each shape, like the “name” attribute below:

xml_sample_2

In addition to generating an xml file, the program allows you to interactively explore a field.

When you run the program from command line (type in python inspect_shapefile.py in the src directory), it’ll ask for your input. It first asks if you want to give it a shapefile, here I said no and used the shapefile hardwired into __main__ of inspect_shapefile.py:

Do you want to enter the path to a shapefile? (Y/N) N
 
Using shapefile specified in __main__ :
directory: ../../learning_shapefiles/shapefiles/denver_maps/grouped_by_geometry_type/
filename: ex_QMDJXT8DzmqNh6eFiNkAuESyDNCX_osm_line

 Loading shapefile ...
... shapefile loaded! 

It then pulls out all the fields in the shapefile records, displays them and asks what you want to do. This is what it looks like using the OpenStreetMaps shapefile:

Shapefile has the following field names
['osm_id', 'access', 'aerialway', 'aeroway', 'amenity', 'area', 'barrier', 'bicycle', 
'brand', 'bridge', 'boundary', 'building', 'covered', 'culvert', 'cutting', 'disused', 
'embankment', 'foot', 'harbour', 'highway', 'historic', 'horse', 'junction', 'landuse', 
'layer', 'leisure', 'lock', 'man_made', 'military', 'motorcar', 'name', 'natural', 
'oneway', 'operator', 'population', 'power', 'place', 'railway', 'ref', 'religion', 
'route', 'service', 'shop', 'sport', 'surface', 'toll', 'tourism', 'tower:type', 
'tracktype', 'tunnel', 'water', 'waterway', 'wetland', 'width', 'wood', 'z_order', 
'way_area', 'tags'] 

Do you want to investigate single field (single)? Generate xml 
file (xml)? Or both (both)? single

Enter field name to investigate: landuse

So you can see all these different fields. I chose to look at a single field (“landuse”) and the program will then look at the “landuse” record value for each shape, record its data type and save new record values:

searching for non-empty entry for landuse ...
data type found: str
Finding unique record values for landuse
1 of 212550 shapes ( 0.0 % )
 new record value: 
93 of 212550 shapes ( 0.04 % )
 new record value: reservoir
6782 of 212550 shapes ( 3.19 % )
 new record value: residential
110432 of 212550 shapes ( 51.95 % )
 new record value: grass
111094 of 212550 shapes ( 52.26 % )
 new record value: construction
Completed field name inspection 

---------------------------------------
Shapefile has the following field names
['osm_id', 'access', 'aerialway', 'aeroway', 'amenity', 'area', 
'barrier', 'bicycle', 'brand', 'bridge', 'boundary', 'building', 
'covered', 'culvert', 'cutting', 'disused', 'embankment', 'foot', 
'harbour', 'highway', 'historic', 'horse', 'junction', 'landuse', 
'layer', 'leisure', 'lock', 'man_made', 'military', 'motorcar', 
'name', 'natural', 'oneway', 'operator', 'population', 'power', 
'place', 'railway', 'ref', 'religion', 'route', 'service', 'shop', 
'sport', 'surface', 'toll', 'tourism', 'tower:type', 'tracktype', 
'tunnel', 'water', 'waterway', 'wetland', 'width', 'wood', 'z_order', 
'way_area', 'tags']

The field name landuse is str
and has 5 unique values
Display Values? (Y/N) Y
 possible values:
['', 'reservoir', 'residential', 'grass', 'construction']

As you can see from the output, there were 4 keywords (reservoir, residential, grass and construction) used to describe the ‘landuse’ field. So I could now write some code to go into a shapefile and extract only the shapes that have a ‘residential’ value for ‘landuse.’ But I couldn’t do that until I (1) knew that the landuse field existed and (2) knew the different definitions for landuse type.

So there it is! That’s the program. Hopefully all the shapefiles you ever download will be well-documented. But if you find one that’s not and you really need to figure it out, this little tool might help!

Some code notes and tips

The xml file that I create didn’t follow any particular standard or convention, just what I thought might be useful. Perhaps that could be improved?

REMEMBER THAT IN PYTHON, YOU NEED TO EXPLICITLY COPY LISTS! I stupidly forgot that when you make a list

list_a = list()
list_a.append('blah')
list_a.append('d')

And then want to make a copy of the list, if you do this:

list_b = list_a

Then any changes to list_b will change list_a. But if you do

list_b = list_a[:]

You’ll get a new copy that won’t reference back to list_a. This is probably one of the things that I forget most frequently with Python lists. Palm-smack-to-forehead. 

The XML API ElementTree was pretty great to work with. You can very easily define a hierarchy that will produce a nice xml tree (see this example). I did, however, have some trouble parsing the direct output from the type() function. When you calculate a type,

type(0.01)

you get this:

<type 'float'>

When I gave it directly to ElementTree (imported as ET here), like this:

ET.SubElement(attr, "attrtype",name="data type").text = type(0.01)

I would get some errors because of the quotation marks enclosed. To get around this, I converted the type output to a string, split it up by the quotes and took the index that would just be the type (int, str, or float):

ET.SubElement(attr, "attrtype",name="data type").text = str(type(0.01)).split("'")[1]

Mapping some things!

Since publishing my series of posts on manipulating shapefiles in Python (1, 2 and 3), I’ve been exploring different open data catalogs so I thought I’d share some of the maps I’ve mapped! All these were produced using scripts in my learning_shapefiles repository on GitHub (see denver_stack.py and denver_tree_canopy.py).

Downtown Denver

denver_downtown

MapZen is a pretty sweet service! You can draw a regional box then the folks at MapZen will take that box and extract all the OpenStreetMap data within the box then give you the shapefiles with all that info. First thing I (stupidly) did after downloading my data extraction from MapZen was to just plot all of the shapes… and I then proceeded to sit around for quite some time while my laptop chugged away and produced a highly detailed map of all the things in Denver County. I zoomed in to downtown Denver for the above image to show off the detail.

Metro-Denver

denver_road_rivers

A more abstract representation of the primary roadways in metro Denver. I figured out how the MapZen/OpenStreetMap shapefile was organized and only plotted motorways, primary and secondary roads (bright white in the map). I also created a grid containing the shortest distance to a roadway (using  the distance method available for shapely.geometry.LineString) and contoured the inverse distance (1/x) to evoke the topographic contours along rivers.

Tree Cover in Denver County

denver_1

Denver’s Open Data Catalog has a bunch of databases with shapefiles galore. I downloaded one for the Tree Canopy and then plotted up the percent tree cover in each geometry. This is what actually lead me to learn how to plot roadways… and here I overlaid the tree cover on the same MapZen extraction of OpenStreeMap data. Along with the roadways underneath, it forms a sort of abstract tree.

So that’s it for today. Three maps from a couple different open data sources using some inefficient Python. Not going to go into detail on the code this time, because, well, it’s slooow. I’m using modifications of the simple scripts that I used in my shapefile tutorials because I was excited to just plot things! But there are much better ways to handle shapefiles with hundreds of thousands of geometries. So until next time (after I figure out how to use fiona or geopandas), just enjoy the visuals!

Shapely Polygons: Coloring Shapefile Polygons

In my previous two posts, I showed how to (1) read and plot shapefile geometries using the pyshp library and (2) plot polygons using shapely and descartes. So the obvious next step is to combine the two! And that’s what I’ll cover today, again using my learning_shapefiles github repo along with the shapefile of state boundaries from census.gov.

The Final Map

In case you don’t care about the Python and are just curious about the end product, here’s the final map where the color of each state reflects its total land area:

shapefile_us_colored_by_area_sat

It’s kind of neat to see the gradient of state size from east to west, reflecting the historical expansion of the U.S. westward, but other than that, there’s not much to the map. But it does serve as a simple case for learning to manipulate shapefiles.

The Code

There are two scripts in learning_shapefiles/src of relevance for today’s post: basic_readshp_plotpoly.py and read_shp_and_rcrd.py. The first script is a simple combination of basic_read_plot.py and simple_polygons.py (from my previous two posts), plotting the shapefile geometries using polygons instead of lines, so let’s start there.

basic_readshp_plotpoly.py

The code starts out the same as basic_read_plot.py, but now also imports Polygon and PolygonPatch from shapely and descartes, before reading in the shapefile:

import shapefile
import numpy as np
import matplotlib.pyplot as plt
from shapely.geometry import Polygon
from descartes.patch import PolygonPatch

"""
 IMPORT THE SHAPEFILE 
"""
shp_file_base='cb_2015_us_state_20m'
dat_dir='../shapefiles/'+shp_file_base +'/'
sf = shapefile.Reader(dat_dir+shp_file_base)

The next part of the code plots a single geometry from the shapefile. This is super easy because shapefile.Reader reads a shapefile geometry as a list of points, which is exactly what the Polygon function needs. So we can just give that list of points directly to the Polygon function:

plt.figure()
ax = plt.axes()
ax.set_aspect('equal')

shape_ex = sf.shape(5) # could break if selected shape has multiple polygons. 

# build the polygon from exterior points
polygon = Polygon(shape_ex.points)
patch = PolygonPatch(polygon, facecolor=[0,0,0.5], edgecolor=[0,0,0], alpha=0.7, zorder=2)
ax.add_patch(patch)

# use bbox (bounding box) to set plot limits
plt.xlim(shape_ex.bbox[0],shape_ex.bbox[2])
plt.ylim(shape_ex.bbox[1],shape_ex.bbox[3])

And we get Washington, now as a colored polygon rather than an outline:

shapefile_single

Woo!

And as before, we can now loop over each shape (and each part of each shape), construct a polygon and plot it:

""" PLOTS ALL SHAPES AND PARTS """
plt.figure()
ax = plt.axes() # add the axes
ax.set_aspect('equal')

icolor = 1
for shape in list(sf.iterShapes()):

    # define polygon fill color (facecolor) RGB values:
    R = (float(icolor)-1.0)/52.0
    G = 0
    B = 0

    # check number of parts (could use MultiPolygon class of shapely?)
    nparts = len(shape.parts) # total parts
    if nparts == 1:
       polygon = Polygon(shape.points)
       patch = PolygonPatch(polygon, facecolor=[R,G,B], alpha=1.0, zorder=2)
       ax.add_patch(patch)

    else: # loop over parts of each shape, plot separately
      for ip in range(nparts): # loop over parts, plot separately
          i0=shape.parts[ip]
          if ip < nparts-1:
             i1 = shape.parts[ip+1]-1
          else:
             i1 = len(shape.points)

          polygon = Polygon(shape.points[i0:i1+1])
          patch = PolygonPatch(polygon, facecolor=[R,G,B], alpha=1.0, zorder=2)
          ax.add_patch(patch)

    icolor = icolor + 1

plt.xlim(-130,-60)
plt.ylim(23,50)
plt.show()

In order to distinguish each polygon, I set each shape’s color based on how many shapes have already been plotted:

R = (float(icolor)-1.0)/52.0

This grades the red scale in an RGB tuple between 0 and 1 (since there are 52 shapes), and it is then used in the facecolor argument of PolygonPatch. The coloring is simply a function of the order in which the shapes are accessed:

shapefile_us

The goal, however, is to color each polygon by some sort of data so that we can actually learn something interesting, and that is exactly what read_shp_and_rcrd.py does.

read_shp_and_rcrd.py

Up to now, we’ve only considered the shape geometry, but that is only one part of a shapefile. Also included in most shapefiles are the records, or the data, associated with each shape. When a shapefile is imported,

shp_file_base='cb_2015_us_state_20m'
dat_dir='../shapefiles/'+shp_file_base +'/'
sf = shapefile.Reader(dat_dir+shp_file_base)

The resulting shapefile object (sf in this case) contains records associated with each shape. I wasn’t sure what fields were included for the State Boundary shapefile from census.gov, so I opened up a Python shell in terminal, read in the shapefile then typed

>>> sf.fields

to get a list of available fields:

[('DeletionFlag', 'C', 1, 0), ['STATEFP', 'C', 2, 0], ['STATENS', 'C', 8, 0], ['AFFGEOID', 'C', 11, 0], ['GEOID', 'C', 2, 0], ['STUSPS', 'C', 2, 0], ['NAME', 'C', 100, 0], ['LSAD', 'C', 2, 0], ['ALAND', 'N', 14, 0], ['AWATER', 'N', 14, 0]]

Down towards the end, there’s an interesting entry

['ALAND', 'N', 14, 0]

Though I couldn’t find any documentation on the included fields, I suspected ALAND stood for land area (especially since it was followed by AWATER). So in read_shp_and_rcrd.py, the first thing I do is extract the field names and find the index corresponding the the land area:

""" Find max/min of record of interest (for scaling the facecolor)"""

# get list of field names, pull out appropriate index
# fieldnames of interest: ALAND, AWATER are land and water area, respectively
fld = sf.fields[1:]
field_names = [field[0] for field in fld]
fld_name='ALAND'
fld_ndx=field_names.index(fld_name)

I found this post helpful for extracting the fieldnames of each record.

Next, I loop over the records using the interRecords() object to find the minimum and maximum land area in order to scale the polygon colors:

# loop over records, track global min/max
maxrec=-9999
minrec=1e21
for rec in sf.iterRecords():
    if rec[4] != 'AK': # exclude alaska so the scale isn't skewed
       maxrec=np.max((maxrec,rec[fld_ndx]))
       minrec=np.min((minrec,rec[fld_ndx]))

maxrec=maxrec/1.0 # upper saturation limit

print fld_name,'min:',minrec,'max:',maxrec

I excluded Alaska (if rec[4] != ‘AK’:) so that the color scale wouldn’t be thrown off, and then I also scale the maximum (maxrec=maxrec/1.0) to adjust the color scale manually (more on this later).

Now that I know the max/min, I loop over each shape and (1) calculate the RGB value for each polygon using a linear scale between the max and min and then (2) plot a polygon for each shape (and all the parts of a shape) using that RGB value:

for shapeRec in sf.iterShapeRecords():
    # pull out shape geometry and records 
    shape=shapeRec.shape
    rec = shapeRec.record

    # select polygon facecolor RGB vals based on record value
    if rec[4] != 'AK':
         R = 1
         G = (rec[fld_ndx]-minrec)/(maxrec-minrec)
         G = G * (G<=1) + 1.0 * (G>1.0)
         B = 0
    else:
         R = 0
         B = 0
         G = 0

    # check number of parts (could use MultiPolygon class of shapely?)
    nparts = len(shape.parts) # total parts
    if nparts == 1:
       polygon = Polygon(shape.points)
       patch = PolygonPatch(polygon, facecolor=[R,G,B], edgecolor=[0,0,0], alpha=1.0, zorder=2)
       ax.add_patch(patch)
    else: # loop over parts of each shape, plot separately
       for ip in range(nparts): # loop over parts, plot separately
           i0=shape.parts[ip]
           if ip < nparts-1:
              i1 = shape.parts[ip+1]-1
           else:
              i1 = len(shape.points)

          # build the polygon and add it to plot 
          polygon = Polygon(shape.points[i0:i1+1])
          patch = PolygonPatch(polygon, facecolor=[R,G,B], alpha=1.0, zorder=2)
          ax.add_patch(patch)

plt.xlim(-130,-60)
plt.ylim(23,50)
plt.show()

One import thing not to miss is that on the first line, I loop over the iterShapeRecords iterable rather than using iterShapes. This is neccesary so that I have access to both shape geometry and the associated records, rather than just the shapes (iterShapes) or just the records (iterRecords).

Running the above code will produce the following map:

shapefile_us_colored_by_area

Because Texas is so much larger than the rest of the states, we don’t see much of a difference between the states. But we can adjust this by decreasing the max value using in the scaling. So after finding the max/min value, I set

maxrec=maxrec/2.0 # upper saturation limit

and end up with the following map that brings out more of the variation in the states’ land area (same map as in the very beginning of this post):

shapefile_us_colored_by_area_sat

Note that because I’m decreased the maxvalue for scaling, I had to ensure that the RGB value did not exceed 1, which is why I had the following lines limiting the green value (G):

    if rec[4] != 'AK':
         R = 1
         G = (rec[fld_ndx]-minrec)/(maxrec-minrec)
         G = G * (G<=1) + 1.0 * (G>1.0)

So that’s about it! That’s how you can read in a shapefile and plot polygons of each shape colored by some data (record) associated with each shape. There are plenty of more sophisticated ways to do this exercise, and I’ll be looking into some other shapefile Python libraries for upcoming posts.

Shapefiles in Python: shapely polygons

In my last post, I described how to take a shapefile and plot the outlines of the geometries in the shapefile. But the power of shapefiles is in the records (the data) associated with each shape. One common way of presenting shapefile data is to plot the shapefile geometry as polygons that are colored by some value of data. So as a prelude to doing just that, this post will cover how to plot polygons using the shapely and descartes libraries. As always, my code is up on my github page.

The two python libraries that I’ll be using are shapely (for constructing a polygon) and descartes (for adding a polygon to a plot). So step 0 is to go install those! I’ll also be using the numpy and matplotlib libraries, but you probably already have those.

Though the documentation for shapely has some nice sample source code, I wrote my own script, simple_polygons.py, to get to know the libraries better. In this approach, there are two steps to building a polygon from scratch: constructing the points that define the polygon’s shape and then mapping those points into a polygon structure. The first step doesn’t require any special functions, just standard numpy. The second step uses the  shapely.geometry.Polygon class to build a polygon from a list of coordinates.

There are limitations for valid polygons, but virtually any shape can be constructed, like the following pacman:

pacman

The first step is to build the list of coordinates defining the exterior points (the outer circle) and a list of interior points to exclude from the polygon (the eyeball). Starting with the exterior points, I calculate the x and y coordinates of unit circle from 0.25pi to 7/4pi (0 to 2pi would map a whole circle rather than a pacman):

theta = np.linspace(0.25*3.14,1.75*3.14,80) 

# add random perturbation 
max_rough=0.05 
pert=max_rough * np.random.rand(len(theta)) 

x = np.cos(theta)+pert 
y = np.sin(theta)+pert

I also add a random, small perturbation to each x-y position to add a bit of roughness to the outer pacman edge, because I wanted some small scale roughness more similar to the shapefiles I’d be plotting later. Next, I build a python list of all those x-y points. This list, ext, is the list of exterior points that I’ll give to shapely:

# build the list of points 
ext = list() 

# loop over x,y, add each point to list 
for itheta in range(len(theta)): 
    ext.append((x[itheta],y[itheta])) 

ext.append((0,0)) # add 0 point

At the end, I add the 0,0 point, otherwise the start and end points on the circle would connect to each other and I’d get a pacman that was punched in the face:

pacman_punch

That takes care of the exterior points, and making the list of interior points is similar. This list, inter, will be a list of points that define interior geometries to exclude from the polygon:

# build eyeball interior points 
theta=np.linspace(0,2*3.14,30) 
x = 0.1*np.cos(theta)+0.2 
y = 0.1*np.sin(theta)+0.7 

inter = list() 
for itheta in range(len(theta)): 
    inter.append((x[itheta],y[itheta])) 
inter.append((x[0],y[0]))

Now that we have the list of exterior and interior points, you just give that to shapely’s polygon function (shapely.geometry.Polygon):

polygon = Polygon(ext,[inter[::-1]])

Two things about passing Polygon the interior list: (1) you can actually pass Polygon a list of lists to define multiple areas to exclude from the polygon, so you have to add the brackets around inter and (2) I haven’t quite figured out the [::-1] that the shapely documentation includes. I know that generally, [::-1] will take all the elements of a list and reverse them, but why does Polygon need the points in reverse? No idea. Without it, I only get an outer edge defining the eyeball:

pacman_badeye

I would love to get some information on why Polygon needs the reversed list, so leave me a note in the comments if you know why.

Regardless, the next step is to add that polygon structure to a plot, with a straightforward use of matplotlib.pyplot (imported as plt) and descartes.patch.PolygonPatch:

 

# initialize figure and axes 
fig = plt.figure() 
ax = fig.add_axes((0.1,0.1,0.8,0.8)) 

# put the patch on the plot 
patch = PolygonPatch(polygon, facecolor=[0,0,0.5], edgecolor=[1,1,1], alpha=1.0) 
ax.add_patch(patch) 

# new axes 
plt.xlim([-1.5, 1.5]) 
plt.ylim([-1.5,1.5]) 
ax.set_aspect(1) 

plt.show()

PolygonPatch’s arguments are pretty self explanatory: facecolor and edgecolor set the colors for the fill and edge of the polygon. Conveniently, facecolor and edgecolor can be specified as RGB values, which I’ll take advantage of for plotting shapefile records in my next post. It can also accept any of the kwargs available to matplotlib.patches.Polygon class (like the transparency,alpha, between 0 and 1).

So that’s it! Pretty easy! And in some ways it is even easier to plot polygons from a shapefile, since pyshp imports shapefile coordinates as a list and you can just give that list directly to Polygon… more on that in the next post.