Mapping in R

Elizabeth Byerly
Jacob Patterson-Stein

2015-07-31

Press "s" for presenter's notes
Code: mapping_in_r

Purpose

Map graphics communicate spatially distributed data

Quick map plots help identify spatial dependencies (exploratory data analysis)
Presenting results by administrative unit helps inform policy-making
Tying data to relatable landmarks builds compelling narratives

You will leave with:

High-level best practices for map graphics
Tools to begin making map graphics using the R language
Troubleshooting steps when you encounter problems

Agenda

What makes a good map graphic?
Mapping in R
Troubleshooting mapping in R

What makes a good map graphic?

Intuitive
Appropriate
Visually appealing

Intuitive

http://edm.com/articles/2014-10-29/pandora-data-us

These two maps fall into traps. The left is a punchline, but the right is a typical mistake: using an ordered visual, one color hue along scaled saturation and values, to represent categorical values.
Intuitive maps use visual pre-attentive variables: location, size, transparency, texture, color value/hue/saturation, orientation, and shape. Which to use depends on whether your data is quantitative, ordered, categorical, associated, or disassociated. These are general graphic rules; they do not change because you're on a map.
The best test for a map, as with any graphic, is to show it to someone outside your current project and confirm they can parse out the meaning.

Red = Bad
Take advantage of R's color packages
Know your palettes
More can be more, but sometimes more can be less

Appropriate

http://my.ilstu.edu/~jrcarter/Geo204/Choro/Tom/

Visually Appealing

Mapping in R

Why R?
Your first map graphic
Raster and vector maps
Graphing over maps

Why R?

Free
Community supported
Attractive, informative, and fully reproducible graphics

https://procomun.wordpress.com/2012/02/20/maps_with_r_2/

Why R.

http://spatial.ly/2012/02/great-maps-ggplot2/

Your first map


qmap("601 New Jersey Ave NW, Washington, DC")

This is a map you can make by the end of today. We all start somewhere.
`qmap()` means "quick map" from the `ggmap` package. It's an easy-to-use abstraction of the `get_map()` and `ggmap()` functions (about which more later). We feed it an address, a lat-lon set, or a bounding box, and it returns a raster map. In our case, it automagically geocodes a provided address.
A map on its own is a reference map (like we have here) or a topographic map (if we had pulled up a map showing hills, valleys, and bodies of water). What we care about as analysts are thematic maps which overlay information above the spatial and topographic features. You cannot call a map of London an "infographic," not even if it has streets, but you CAN if you visually encode those streets with traffic densities during rush hour.

Your first map plot


eg <- data.frame(geocode(c("601 New Jersey Ave NW, Washington, DC",
                           "Union Station Metro, Washington, DC",
                           "Judiciary Square Metro, Washington, DC")))
qmplot(data = eg, x = lon, y = lat, zoom = 18, f = 1.1, size = I(3))

Rasters and Vectors

Raster

A raster map is an image of a map
Appropriate for spatial point data
More attractive, more default options, less customizable

Vector

A vector map is composed of polygons
Appropriate for area-coded data
Customizable, flexible applications, less attractive for equivalent work

The best source of raster maps is the "ggmaps" package: it is easy to use, well documented, flexible, and created by two respected R developers (Wickham and Kahle).
There is no best source for vector maps, but here are two starting recommendations:

EDA: "maps" and "choroplethr" packages. Easy-to-use datasets for common use cases like the globe, US states/counties, many more. Their weakness is how frequently their maps are updated (there is an apocryphal tale of the analyst mapping Europe and finding the Socialist Federal Republic of Yugoslavia proudly represented).
Production: it is straightforward to load shapefiles from trusted sources (e.g., US Census) into R. We will show an example of how this can be done.

Raster maps from `ggmap()`


ggmap(get_map("601 New Jersey Ave NW, Washington, DC",
              zoom = 12, source = "stamen", maptype = ...)

Vector maps from `maps`


map_data("world")
ggplot(world, aes(x = long, y = lat, group = group)) +
  geom_polygon(aes(fill = region), color = "black")

How did we make this map?
First, we pull in a dataset from the `maps` package.
We establish a ggplot object using data from the world object (a dataframe from the "maps" package, which tracks ordered latitude and longitude points tracing the outlines of nations grouped by a categorical variable), defining the aesthetic mappings of x to longitude, y to latitude, and group to group (the value used to track which points belong to which nations). The next line adds a geom layer using the polygon statistic: this uses the group aesthetic to trace contained spaces. The fill color is set to region, a categorical variable in the world dataframe, so ggplot will automatically try to apply appropriate colors to make the regions distinct (it does not work perfectly for data like this, you can notice a few instances where similar colors are adjacent, but the rules assigning color values can be adjusted as part of generating a finished product).
There is a problem with this quick-and-dirty map: can you find the Caspian Sea or the Great Lakes?
An important note for now and moving forward: the R code displayed is a minimal representation of that required to produce the graphic shown. The focus is on the functions and options needed to produce the map. We want to focus on the mapping code, not the housekeeping code that suppresses legends or axis ticks. The full code used to produce these graphics is available: ElizabethAB/mapping_in_r.

Vector maps from `maps`


map_data("world")
ggplot(world, aes(x = long, y = lat, group = group)) +
  geom_polygon(aes(fill = region), color = "black") +
  coord_map("ortho", orientation=c(41, -74, 0))

Vector maps from shapefiles


usa <- readOGR(dsn = "Inputs", "cb_2014_us_county_500k")
usa@data$id = rownames(usa@data)
usa.points = fortify(usa, region = "id")
county = join(usa.points, usa@data, by = "id")

Data source
Shapefiles are an industry-standard format for storing spatial relationships data. They rely on multiple individual files on disk in a shared folder and connected by a common base name (in this example, cb_2014_us_county_500k).
Reading a shapefile into R is not difficult, but it is more involved than the methods we have so far reviewed. We use the `readOGR()` function from the "rgdal" package (an R interface to the geospatial data abstraction library GDAL) to identify the folder containing our shapefile set and the common name.
We can use the same code from our previous slide, but using US census shapefiles gives us a better resolution and more confidence that the graphic represents current accepted boundaries.

Mixing rasters and vectors


usa_raster <- get_map(bbox(county[,c("long", "lat")]),
                      maptype = "watercolor", zoom = 6)
ggmap(usa_raster, extent = "device",
      base_layer = ggplot(aes(x = long, y = lat, group = group),
                          data = county)) +
  geom_polygon(aes(fill = STATEFP, color = STATEFP), alpha = .3) +
  coord_map(projection = "mercator")

Graphing Data On Maps

The following slides are examples of four basic graphic map types:

Dot density
Graduated symbol
Choropleth
Isopleth

The data used is the public HUD insured multifamily properties dataset and the US Census Quickfacts dataset.

In the following slides, we will review example cases of four classic map graphic types:

Dot density and graduated symbol maps are glorified scatter plots, but the superimposition of the information over a map allows us to draw inferences apparent only with the added spatial dimension.
Choropleth maps show concentrations or locations of statistical qualities broken down by administrative unit (you've probably seen colored voting pattern maps).
Isopleth maps represent abstract three dimensional surfaces to depict continuous phenomena (that is to say, an isopleth map can show you where in a city are concentrated the most positive Yelp reviews, or the most hipster Yelp reviews)

Dot density


ggplot(aes(x = long, y = lat), data = states) +
  geom_polygon(aes(group = group), color = "grey95") +
  geom_point(aes(x = LON, y = LAT), color = "#2db6e8", alpha = .6,
             data = insured) +
  coord_map()

Graduated symbol


ggplot(aes(x = long, y = lat), data = states) +
  geom_polygon(aes(group = group), color = "grey95") +
  geom_point(aes(x = LON, y = LAT, size = Unit_Total), color = "grey85",
             data = cnty_count, shape = 21, fill = "#2db6e8") +
  coord_map()

Choropleth


ggplot(aes(x = long, y = lat), data = county) +
  geom_polygon(aes(group = group, fill = Trouble)) +
  coord_map()

Isopleth


ggmap(dmv_map, base_layer =
        ggplot(aes(x = LON, y = LAT, fill = CLIENT_GROUP_TYPE),
               data = dmv_insured)) +
  stat_density2d(aes(alpha = ..level..), bins = 3, geom = "polygon")

Troubleshooting

An example: JPS's problem
Local resources
Troubleshooting steps

JPS's Problem

Client presentation using geographic data
Want to highlight relative performance across states
Other mapping methods required proprietary software (SAS, ArcGIS), had limited customization (Excel), or required learning an entirely new open-source software on a short timeline (QGIS)

JPS's Problem

What the data look like:

How we want the data to look:

How the data looked after plotting

Troubleshooting

Typical sources of map graphic errors:

Graphic generator methods (e.g., `ggmap()` not recognizing variables provided by the base layer)
Data organization errors (e.g., ordering of vector points)
Projection mismatching (e.g., vector in WGS84 and data points in NAT83)

Resources

ggmap: Spatial Visualization with ggplot2 - Kahle and Wickham
R packages: `ggplot2` and `ggmap` for graphing, `rgdal` for handling coded geospatial data, and `maps` for EDA
R code to generate the maps in this presentation, including loading and cleaning the data: ElizabethAB/mapping_in_r

Mapping in R

Purpose

Map graphics communicate spatially distributed data

Agenda

What makes a good map graphic?

Intuitive

Appropriate

Visually Appealing

Mapping in R

Why R?

Why R.

Your first map

Your first map plot

Rasters and Vectors

Raster

Vector

Raster maps from `ggmap()`

Vector maps from `maps`

Vector maps from `maps`

Vector maps from shapefiles

Mixing rasters and vectors

Graphing Data On Maps

Dot density

Graduated symbol

Choropleth

Isopleth

Troubleshooting

JPS's Problem

JPS's Problem

How the data looked after plotting

Troubleshooting

Resources

Questions?