R is able to handle basic mapping and GIS applications.

1. GBIF API

rgbif, a GBIF API

A good source of species occurrence data is The Global Biodiversity Information Facility (GBIF) – gbif.org

There is a related R-package called rgbif, which acts as an API for accessing data from GBIF using R (API = Application Programming Interface). Here are links to the instructions and tutorial

Take a look at the website to understand the data before continuing, and refer back to the website to understand the type and structure of data.

Install and load the rgbif package

install.packages("rgbif")
library(rgbif)
## Warning: package 'rgbif' was built under R version 3.3.3

Look for occurrence data on garlic mustard (Alliaria petiolata). Note for that many species there could be thousands to hundreds of thousands of records. This can take a while to download, so we’ll put an upper limit of 200 records to keep it from running too long. We also include only records that contain lat/long data

GMgbif<-occ_search(scientificName="Alliaria petiolata", hasCoordinate=T, limit=200)

The function occ_search() returns a list. It’s worth exploring the different items.

names(GMgbif)
## [1] "meta"      "hierarchy" "data"      "media"     "facets"

We’re most interested in the GMgbif$data layer, which includes lats/longs. You can see the names by using the names() function on the $data slice:

names(GMgbif$data)
##  [1] "name"                                
##  [2] "key"                                 
##  [3] "decimalLatitude"                     
##  [4] "decimalLongitude"                    
##  [5] "issues"                              
##  [6] "datasetKey"                          
##  [7] "publishingOrgKey"                    
##  [8] "publishingCountry"                   
##  [9] "protocol"                            
## [10] "lastCrawled"                         
## [11] "lastParsed"                          
## [12] "crawlId"                             
## [13] "extensions"                          
## [14] "basisOfRecord"                       
## [15] "taxonKey"                            
## [16] "kingdomKey"                          
## [17] "phylumKey"                           
## [18] "classKey"                            
## [19] "orderKey"                            
## [20] "familyKey"                           
## [21] "genusKey"                            
## [22] "speciesKey"                          
## [23] "scientificName"                      
## [24] "kingdom"                             
## [25] "phylum"                              
## [26] "order"                               
## [27] "family"                              
## [28] "genus"                               
## [29] "species"                             
## [30] "genericName"                         
## [31] "specificEpithet"                     
## [32] "taxonRank"                           
## [33] "dateIdentified"                      
## [34] "coordinateUncertaintyInMeters"       
## [35] "year"                                
## [36] "month"                               
## [37] "day"                                 
## [38] "eventDate"                           
## [39] "modified"                            
## [40] "lastInterpreted"                     
## [41] "references"                          
## [42] "license"                             
## [43] "identifiers"                         
## [44] "facts"                               
## [45] "relations"                           
## [46] "geodeticDatum"                       
## [47] "class"                               
## [48] "countryCode"                         
## [49] "country"                             
## [50] "rightsHolder"                        
## [51] "identifier"                          
## [52] "verbatimEventDate"                   
## [53] "datasetName"                         
## [54] "collectionCode"                      
## [55] "gbifID"                              
## [56] "verbatimLocality"                    
## [57] "occurrenceID"                        
## [58] "taxonID"                             
## [59] "catalogNumber"                       
## [60] "recordedBy"                          
## [61] "http...unknown.org.occurrenceDetails"
## [62] "institutionCode"                     
## [63] "rights"                              
## [64] "eventTime"                           
## [65] "identificationID"                    
## [66] "locality"                            
## [67] "individualCount"                     
## [68] "continent"                           
## [69] "municipality"                        
## [70] "county"                              
## [71] "identificationVerificationStatus"    
## [72] "language"                            
## [73] "type"                                
## [74] "occurrenceStatus"                    
## [75] "vernacularName"                      
## [76] "taxonConceptID"                      
## [77] "informationWithheld"                 
## [78] "endDayOfYear"                        
## [79] "startDayOfYear"                      
## [80] "datasetID"                           
## [81] "accessRights"                        
## [82] "higherClassification"                
## [83] "habitat"                             
## [84] "occurrenceRemarks"                   
## [85] "identifiedBy"                        
## [86] "stateProvince"                       
## [87] "associatedReferences"                
## [88] "locationID"                          
## [89] "nomenclaturalCode"                   
## [90] "dataGeneralizations"                 
## [91] "verbatimCoordinateSystem"            
## [92] "ownerInstitutionCode"                
## [93] "bibliographicCitation"

Finalize data

Extract Lats/longs into a new data.frame object, and rename columns

gbifLOC<-as.data.frame(GMgbif$data[,grep("decimal",names(GMgbif$data))])
names(gbifLOC)<-c("Lat","Long")
head(gbifLOC)
##        Lat       Long
## 1 40.60905 -74.573720
## 2 49.50332   8.556223
## 3 40.60794 -74.445505
## 4 41.32912 -88.143929
## 5 40.66453 -74.646184
## 6 40.72501 -74.418554

2. ggplot2 maps

You can create quick, rudimentary maps in ggplot2.

library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.3.3
MyMap <- ggplot() + borders("world", colour="grey70", fill="black") + theme_bw()
MyMap

Add points

Use geom_point() to add locations from gbif.

MyMap <- MyMap + geom_point(aes(x=Long,y=Lat),data=gbifLOC,colour="green",size=2,alpha=0.1)
MyMap

Zoom

Let’s try to zoom in to the area covered by the points.

MyMap + xlim(-100,50) + ylim(25,75)

What’s wrong with this map? The borders are vectors connecting points that are outside of the limits we set. When we set xlim() + ylim() ggplot cuts all the points outside of the plotting area, forcing redrawn polygons that no longer correspond with the borders. Look at ?borders to find a solution. Note that there is xlim() and ylim() within the borders() function.

MyMap <- ggplot() + borders("world", colour=NA, fill="black",
                            xlim=c(-100,50), ylim=c(30,75)) +
  geom_point(aes(x=Long,y=Lat), data=gbifLOC, colour="green",
             size=2, alpha=0.1) + theme_bw() 
MyMap 

Now the map borders are complete, but to draw them borders() expands the x- and y-limits beyond what we defined so that all the relevant border vector points are not excluded. For example, our right xlim is 50 degrees east, but the plot area extends > 200 to plot the Russian border.

Another solution is coord_fixed().

MyMap + coord_fixed(xlim=c(-100,50), ylim=c(30,75))

Another problem is that we have ‘hard coded’ the x- and y-limits. What if we have a big data project and we want to have the borders adjust based on the data? Try to edit the above code to do that. If you get stuck, click on the box below for one possible solution.

# Add/subtract 1 degree to add a small buffer around the data points
PopRange<-c(min(gbifLOC$Long,na.rm=T)-1,
         min(gbifLOC$Lat,na.rm=T)-1,
         max(gbifLOC$Long,na.rm=T)+1,
         max(gbifLOC$Lat,na.rm=T)+1)
MyMap + coord_fixed(xlim=PopRange[c(1,3)], ylim=PopRange[c(2,4)])

Datum & Projection

Do any of the points in the above graphs look odd (e.g. in the Ocean)? When working with geographic data you have to pay close attention to the projection and datum used. The problem is that lats/longs are based on a spheroid that have to be ‘projected’ to a flat surface for plotting. The datum describes the spheroid surface, since the earth is not a perfect sphere. The projection transforms the locations from a curved surface to a flat map. The datum you have used probably used is probably WGS84, which is the one used by Google Maps.

The datum information for gbif is available in the same data.frame() that we got lat/long info from:

unique(GMgbif$data[46])
## # A tibble: 1 x 1
##   geodeticDatum
##   <chr>        
## 1 WGS84

The misplacement of points is probably due to different projections for the mapping layer and the gbif point layer. Take a look at ?coord_map for info on using projections.

MyMap <- ggplot() + borders("world", colour=NA, fill="black") +
  coord_map(projection="cylindrical") + coord_fixed(xlim=PopRange[c(1,3)], ylim=PopRange[c(2,4)])
MyMap + geom_point(aes(x=gbifLOC$Long,y=gbifLOC$Lat), colour="green",
             size=2, alpha=0.1) + theme_bw()

If this didn’t fix the problem, you might need a more complicated adjustment using projections in the rgdal package. We’ll come back to this in the next section on ggmap.

3. ggmap maps

The ggmap package in R has nice functions for creating maps and plotting spatial data using online map APIs (e.g. Google Maps).

install.packages("ggmap")
library(ggmap)
## Warning: package 'ggmap' was built under R version 3.3.3

The functions of ggmap are similar to ggplot2, so you might want to brush up on the tutorials for:

1. [qplot visualizations](../RCrashCourse/2_qplot.html)
2. [ggplot visualizations](../RCrashCourse/2_qplot.html)

Then check out this wonderful ggmap cheat sheet by Melanie Frazier

Base map

The ggmap function is the plotting equivalent of ggplot. However, a lot of the action happens with ggmap. There are a lot of nice options, which you can view in the help file ?ggmap

To map the base layer, we have to define a range of lats/longs. We’ll define these values directly from the data (i.e. min/max of each lat/long with 1-degree buffer)

We define a mapping area as an address

ggmap(get_map(location="QUBS",source="google",maptype="satellite",zoom=14))
## Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=QUBS&zoom=14&size=640x640&scale=2&maptype=satellite&language=en-EN&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=QUBS&sensor=false

or lat/long midpoint. A tribute to Gord Downie: Where the Great Plains begin!

ggmap(get_map(location=c(-100,49),source="google",maptype="satellite",zoom=8))
## Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=49,-100&zoom=8&size=640x640&scale=2&maptype=satellite&language=en-EN&sensor=false

Zoom

If you look at ?ggmap you can see there is also an option for a bounding box: c(left-Long, bottom-Lat, right-Long, top-Lat). Rather than hard-code, let’s use the data to define it:

PopRange<-c(min(gbifLOC$Long,na.rm=T)-1,
         min(gbifLOC$Lat,na.rm=T)-1,
         max(gbifLOC$Long,na.rm=T)+1,
         max(gbifLOC$Lat,na.rm=T)+1)
ggmap(get_map(location=PopRange,source="google",maptype="road"))
## Warning: bounding box given to google - spatial extent only approximate.
## converting bounding box to center/zoom specification. (experimental)
## Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=48.894804,-35.765375&zoom=4&size=640x640&scale=2&maptype=roadmap&language=en-EN&sensor=false

However, this option currently is not working for Google Maps. Try another map source:

ggmap(get_map(location=PopRange,source="osm",maptype="satellite"))
## Warning in download.file(url, destfile = destfile, quiet = !messaging, mode
## = "wb"): cannot open URL 'http://tile.openstreetmap.org/cgi-bin/export?
## bbox=-90.740625,36.172261,19.209875,61.617348&scale=32500000&format=png':
## HTTP status was '400 Bad Request'
## Error: map grabbing failed - see details in ?get_openstreetmap.
ggmap(get_map(location=PopRange,source="stamen",maptype="satellite"))
## Error: invalid stamen maptype, see ?get_stamenmap
ggmap(get_map(location=PopRange,source="cloudmade",maptype="satellite"))
## Error: an api key must be provided for cloudmade maps, see ?get_cloudmademap.

Nope. Let’s use our intuition from ggplot() to try to jury-rig Google Maps

### First create a map object as above, zooming out far enough to cover the area
MyMap <- get_map(location=PopRange,source="google",maptype="satellite",zoom=3) 
## Warning: bounding box given to google - spatial extent only approximate.
## converting bounding box to center/zoom specification. (experimental)
## Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=48.894804,-35.765375&zoom=3&size=640x640&scale=2&maptype=satellite&language=en-EN&sensor=false
### Then set Long & Lat limits
ggmap(MyMap) + coord_fixed(xlim=PopRange[c(1,3)], ylim=PopRange[c(2,4)]) 

Add points

Looks much better. Now let’s add our point data:

### First create a map object as above, zooming out far enough to cover the area
MyMap <- get_map(location=PopRange,source="google",maptype="satellite",zoom=3) 
## Warning: bounding box given to google - spatial extent only approximate.
## converting bounding box to center/zoom specification. (experimental)
## Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=48.894804,-35.765375&zoom=3&size=640x640&scale=2&maptype=satellite&language=en-EN&sensor=false
### Then set Long & Lat limits
ggmap(MyMap) + coord_fixed(xlim=PopRange[c(1,3)], ylim=PopRange[c(2,4)]) +
  geom_point(aes(x=Long,y=Lat), data=gbifLOC, colour="yellow",
             size=2, alpha=0.1) 

Now another problem. Data points are showing up where they shouldn’t (e.g. ocean). One common reason for this is that the map and the data points have different projections. We can use the rgdal package to fix this.

Projections

We have to project the plotting data using the same datum and projection as the background map. For Google Earth Maps the datum is WGS84 and the projection is Cylindrical. This requires additional information attached to our simple lat/long dataset. The package rgdal has tools for this.

install.packages("rgdal")
library(rgdal)
## Loading required package: sp
## rgdal: version: 1.2-4, (SVN revision 643)
##  Geospatial Data Abstraction Library extensions to R successfully loaded
##  Loaded GDAL runtime: GDAL 2.0.1, released 2015/09/15
##  Path to GDAL shared files: C:/Users/rob_c/Documents/R/win-library/3.3/rgdal/gdal
##  Loaded PROJ.4 runtime: Rel. 4.9.2, 08 September 2015, [PJ_VERSION: 492]
##  Path to PROJ.4 shared files: C:/Users/rob_c/Documents/R/win-library/3.3/rgdal/proj
##  Linking to sp version: 1.2-3

First take a look at the type of data we are using:

class(gbifLOC)
## [1] "data.frame"

Spatial data used by rgdal has a special format. To convert to spatial data, define the names of the spatial columns in the data.frame() object using the format ~X+Y.

gbifLOCsp<-gbifLOC
coordinates(gbifLOCsp)<-~Long+Lat
class(gbifLOCsp)
## [1] "SpatialPoints"
## attr(,"package")
## [1] "sp"

Define the projection of the data using proj4string() and CRS(). For more information on the CRS function, see this helpful overview, also by Melanie Frazier.

proj4string(gbifLOCsp)<-CRS("+init=epsg:4326")
proj4string(gbifLOCsp)
## [1] "+init=epsg:4326 +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0"

Now we can project the lats/longs of the locations back to the map. But what projection to use?

MyMap <- get_map(location=PopRange,source="google",maptype="satellite",zoom=3)
## Warning: bounding box given to google - spatial extent only approximate.
## converting bounding box to center/zoom specification. (experimental)
## Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=48.894804,-35.765375&zoom=3&size=640x640&scale=2&maptype=satellite&language=en-EN&sensor=false
names(ggmap(MyMap))
## [1] "data"        "layers"      "scales"      "mapping"     "theme"      
## [6] "coordinates" "facet"       "plot_env"    "labels"

Let’s take a look at the coordinates layer.

ggmap(MyMap)$coordinates
## <ggproto object: Class CoordMap, Coord>
##     aspect: function
##     distance: function
##     is_linear: function
##     labels: function
##     limits: list
##     orientation: NULL
##     params: list
##     projection: mercator
##     range: function
##     render_axis_h: function
##     render_axis_v: function
##     render_bg: function
##     render_fg: function
##     train: function
##     transform: function
##     super:  <ggproto object: Class CoordMap, Coord>

Under ‘projection’ it shows mercator. Possibly the Web Mercator projection? You can read about on Wikipedia.

For a more comprehensive database, see spatialreference.org and Proj.4

Here is the Web Mercator projection and a few others.

WebMerc<-CRS("+proj=merc +a=6378137 +b=6378137 +lat_ts=0.0 +lon_0=0.0 +x_0=0.0 +y_0=0 +k=1.0 +units=m +nadgrids=@null +wktext  +no_defs")
Google<-CRS("+init=epsg:4326 +proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs +towgs84=0,0,0") # From Frazier's CRS document, linked above
Earth<-CRS("+proj=longlat +init=epsg:4326") # Alternate Google Earth Projection
Geo <- CRS("+proj=longlat +ellps=WGS84 +datum=WGS84")# Projection from: http://pakillo.github.io/R-GIS-tutorial/
GMaps<-CRS("+proj=longlat +init=epsg:3857 +datum=WGS84") # Google Maps Projection
GBIF<-CRS("+init=epsg:4326")

Projections are very confusing!

Let’s try one that seems to be used by Google Maps, defined in the object GMaps, above.

gbifLOCtform<-spTransform(gbifLOCsp,GMaps)

Now that we have projected the datapoints, we can try plotting.

But FIRST, we have to transform back into a data.frame() object for the ggmap() function.

gbifLOCtform_data<-data.frame(gbifLOCtform)
MyMap <- get_map(location=PopRange,source="google",maptype="satellite",zoom=3) 
## Warning: bounding box given to google - spatial extent only approximate.
## converting bounding box to center/zoom specification. (experimental)
## Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=48.894804,-35.765375&zoom=3&size=640x640&scale=2&maptype=satellite&language=en-EN&sensor=false
ggmap(MyMap) + geom_point(aes(x=Long,y=Lat), data=gbifLOCtform_data, colour="yellow", size=2, alpha=0.1) 

Everything looks good, now let’s rescale the map by recalculating the limits from the projected data.

PopRange_tform<-c(min(gbifLOCtform_data$Long,na.rm=T)-1,
         min(gbifLOCtform_data$Lat,na.rm=T)-1,
         max(gbifLOCtform_data$Long,na.rm=T)+1,
         max(gbifLOCtform_data$Lat,na.rm=T)+1)
MyMap <- get_map(location=PopRange,source="google",maptype="satellite",zoom=2) 
## Warning: bounding box given to google - spatial extent only approximate.
## converting bounding box to center/zoom specification. (experimental)
## Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=48.894804,-35.765375&zoom=2&size=640x640&scale=2&maptype=satellite&language=en-EN&sensor=false
ggmap(MyMap)  +
  geom_point(aes(x=Long,y=Lat), data=gbifLOCtform_data, colour="yellow",
             size=2, alpha=0.1) + coord_fixed(xlim=PopRange_tform[c(1,3)], ylim=PopRange_tform[c(2,4)])

There seems to be a problem when we rescale the map with coord_fixed. Compare the two graphs above. What is going wrong?

The points seem to be in the right spot relative to lat/long but the background map is not scaling properly.

We can try the ratio= parameter in coord_fixed:

Rat<-abs((PopRange_tform[2]-PopRange_tform[4])/(PopRange_tform[1]-PopRange_tform[3]))
ggmap(MyMap)  +  geom_point(aes(x=Long,y=Lat), data=gbifLOCtform_data, colour="yellow",size=2, alpha=0.1) + coord_fixed(ratio=Rat)

Still not right. There is no elegant solution here, as far as I know.

get_googlemap()

The best we can do is hack the get_googlemap() function and play around with the size= and scale= parameters.

MidPoint<-round(c((PopRange_tform[1]+PopRange_tform[3])/2,(PopRange_tform[2]+PopRange_tform[4])/2))
MidPoint
## [1] -36  49
MyMap<-get_googlemap(center=c(MidPoint),size=c(640,ceiling(640*Rat)),scale=2,source="google",maptype="satellite",zoom=2) 
## Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=49,-36&zoom=2&size=640x149&scale=2&maptype=satellite&sensor=false
ggmap(MyMap) +  geom_point(aes(x=Long,y=Lat), data=gbifLOCtform_data, colour="yellow",size=2, alpha=0.1) 

To take this to the next level, you could make the size=c(width,height) paramater into a function of the lats/longs of the datapoints.

4. Other Maps

You can leverage the Google Visualizations API to create a number of visuals, including interactive maps! There are a lot of other great visualizations available too. There is a R package called googleVis that uses the Google Visualizations API: link

install.packages("rgbif")
library(googleVis)
## Warning: package 'googleVis' was built under R version 3.3.3
## Creating a generic function for 'toJSON' from package 'jsonlite' in package 'googleVis'
## 
## Welcome to googleVis version 0.6.2
## 
## Please read Google's Terms of Use
## before you start using the package:
## https://developers.google.com/terms/
## 
## Note, the plot method of googleVis will by default use
## the standard browser to display its output.
## 
## See the googleVis package vignettes for more details,
## or visit http://github.com/mages/googleVis.
## 
## To suppress this message use:
## suppressPackageStartupMessages(library(googleVis))
# Create columns for lat/long info and point labels
gbifLOC$LatLong<-paste(gbifLOC$Lat,gbifLOC$Long,sep=":")
gbifLOC$PopNum<-seq_along(gbifLOC$LatLong)

InterMap<-gvisMap(gbifLOC,location="LatLong",tipvar="PopNum",option=list(
  showTip=T, mapType="hybrid", useMapTypeControl=T,zoomLevel=4))
plot(InterMap)
## starting httpd help server ... done