GPS coordinates can easily be located on a map. However, sometimes our spatial reference is a street address or only the name of a place. It turns out that we can often use these data to find a location on a map. The process of converting an address or place name into map coordinates is called geocoding. This is what Google Maps does when you type in an address and it shows a location on a map. Within R, you can use Google’s mapping database to geocode your data.
You can find lots of useful spatial data online, but rarely will you be lucky enough to find it in shapefile format. More often, it will be embedded in a PDF. You will need to do some work to first get it into a table or spreadsheet that can be read by R. From there, you can geocode.
A PDF report of the National Survey of HIV and Syphilis Prevalence among Women attending Antenatal Clinics in Zimbabwe 2012 is available online. The report includes geographic data on each site, including the name of the site and the district it is located within. However, there are no GPS coordinates or maps included in the report. We’ll see how R can be used to geocode the locations of these facilities and place them on a map.
Of course, first you need to convert the PDF into a spreadsheet. If you have an Adobe license this can be done fairly easily. Otherwise it requires some work: highlight and copy the table, and paste it into an Excel spreadsheet. You’ll see that all the data ends up into a single column. To separate the data, click the Data tab, then click Text to Columns. Choose Delimited. Check Space. Click Finish.
The data still will not be properly organized, so you’ll have to manually move things around until you get the data sorted in columns. Luckily, I already did this for you! I extracted all of the 2012 data from the report and organized it in ANC2012.xlsx.
Now we are ready to geocode in R. We’ll need the ggmap
package to do this. We also need a package for reading Excel files into R. If you’ve closed R since starting this tutorial, you’ll also need to load the rgdal
package again.
# Set working directory
# Load packages
library(rgdal) # 'rgdal' automatically loads 'sp' package along with it
library(ggmap) # has a geocoding function
library(readxl) # for reading excel files
# Make sure prettymapr is NOT loaded; it also has a geocode() function
# Our code is written for ggmap's geocode() function
detach(package:prettymapr)
# Options
options(stringsAsFactors = F)
Import the ANC data using the read_excel()
function from the readxl
package and assign the data to an object called ANC2012
.
# Import ANC data
ANC2012 <- read_excel("ANC2012.xlsx")
# Examine the data
class(ANC2012)
## [1] "tbl_df" "tbl" "data.frame"
head(ANC2012)
## # A tibble: 6 × 23
## province district site name tested.total tested.positive prevalence prev.ci_lower prev.ci_upper res.total_n res.farm_n res.farm_pct res.growthpt_n res.growthpt_pct res.mine_n res.mine_pct res.rural_n res.rural_pct res.urban_n res.urban_pct res.missing_n res.missing_pct classification
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
## 1 Bulawayo Bulawayo Nkulumane Nkulumane Clinic 596 102 17.1 14.1 20.1 600 1 0.2 2 0.3 0 0.0 7 1.2 581 96.8 9 1.5 Urban
## 2 Chitungwiza Chitungwiza St Marys St Marys Clinic 600 67 11.2 8.6 13.7 600 5 0.8 3 0.5 0 0.0 9 1.5 582 97.0 1 0.1 Urban
## 3 Harare Harare Glenview Glenview 333 34 10.2 6.9 13.5 334 0 0.0 1 0.3 0 0.0 0 0.0 328 98.2 5 1.5 Urban
## 4 Harare Harare Hatcliffe Hatcliffe Clinic 339 33 9.7 6.5 12.9 340 5 1.5 6 1.8 1 0.3 9 2.7 317 93.2 1 0.3 Urban
## 5 Harare Harare Kuwadzana Kuwadzana Clinic 599 70 11.7 9.1 14.3 600 4 0.7 0 0.0 0 0.0 0 0.0 593 98.8 3 0.5 Urban
## 6 Manicaland Mutasa Hauna Hauna District Hospital 224 23 10.3 6.2 14.3 224 23 10.3 17 7.6 8 3.6 165 73.7 10 4.5 1 0.5 Rural
Check out the geocode()
functionality.
# Read the manual for the geocode function
?geocode
From the Help documentation, we learn that the Google Maps limits to 2500 queries per day. We see that we can use geocodeQueryCheck()
to determine how many queries remain today.
#Check how many free geocodes we have
geocodeQueryCheck() #2500 per day limit
## 2500 geocoding queries remaining.
The Help documentation also informed us that we need to give the geocode()
function a single “character vector” for the location
input. It will be helpful to include not just the site, but also the province and country. Let’s string those together using the paste0()
function.
# Make a new string that has all the place details together
place<- paste0(ANC2012$site, ", ", ANC2012$province, ", ", "Zimbabwe")
head(place)
## [1] "Nkulumane, Bulawayo, Zimbabwe" "St Marys, Chitungwiza, Zimbabwe" "Glenview, Harare, Zimbabwe" "Hatcliffe, Harare, Zimbabwe" "Kuwadzana, Harare, Zimbabwe" "Hauna, Manicaland, Zimbabwe"
Note that we are using the site
variable to locate the place. Alternatively, we could include the name
variable to get the more precise location of the health facility. Google knows the locations of many health facilities in Zimbabwe by name. However, Google might not find the location if it is too specific, so we would have to do some cleaning and run a few rounds of geocoding until all locations could be found. For now, let’s just use site
to get the approximate location. This will be adequate for a national-level static map.
# Now try geocoding
geocode_result <- geocode(place, output = "more")
You’ll see some red text start to appear in the Console. This is nothing to worry about; it just shows that queries are running. When the red text stops appearing, you’ll know the geocoding is finished. Check the red text results to make sure no errors occurred. View the results.
head(geocode_result)
## X lon lat type loctype address north south east west neighborhood political locality administrative_area_level_2 administrative_area_level_1 country route bus_station establishment
## 1 1 28.50391 -20.18118 neighborhood approximate nkulumane, bulawayo, zimbabwe -20.16063 -20.20023 28.53330 28.48167 Nkulumane Nkulumane Bulawayo Bulawayo Matabeleland South Province Zimbabwe <NA> <NA> <NA>
## 2 2 31.06667 -17.98333 neighborhood approximate saint marys, zimbabwe -17.98198 -17.98468 31.06802 31.06532 Saint Marys <NA> <NA> <NA> Mashonaland East Province Zimbabwe <NA> <NA> <NA>
## 3 3 30.94747 -17.90892 political approximate glen view, harare, zimbabwe -17.88817 -17.93338 30.96954 30.92469 <NA> Glen View Harare Harare Harare Province Zimbabwe <NA> <NA> <NA>
## 4 4 31.10473 -17.69284 establishment approximate 485 hatcliffe, 20th st, harare, zimbabwe -17.69149 -17.69419 31.10608 31.10339 <NA> <NA> Harare <NA> <NA> Zimbabwe 20th Street <NA> <NA>
## 5 5 30.91304 -17.83039 political approximate kuwadzana, harare, zimbabwe -17.81736 -17.84433 30.94282 30.89076 <NA> Kuwadzana Harare Harare Harare Province Zimbabwe <NA> <NA> <NA>
## 6 6 32.84917 -18.49885 bus_station approximate hauna growth point, honde valley, zimbabwe -18.49750 -18.50020 32.85052 32.84782 <NA> <NA> Honde Valley Mutasa Manicaland Province Zimbabwe <NA> Hauna Growth Point <NA>
We see that Google reported back numerous details. If we only wanted the longitude and latitude, we could have left out the “more” option in our geocode query. However, details are useful for checking the accuracy of the geocoding results. Let’s check whether Google located these points within the same districts that were reported in the ANC survey report.
# Quality check
cbind(ANC2012$district, geocode_result$administrative_area_level_2)
## [,1] [,2]
## [1,] "Bulawayo" "Bulawayo"
## [2,] "Chitungwiza" NA
## [3,] "Harare" "Harare"
## [4,] "Harare" NA
## [5,] "Harare" "Harare"
## [6,] "Mutasa" "Mutasa"
## [7,] "Buhera" "Buhera"
## [8,] "Chimanimani" "Chimanimani"
## [9,] "Nyanga" "Nyanga"
## [10,] "Makoni" "Makoni"
## [11,] "Mutare" "Mutare"
## [12,] "Bindura" "Bindura"
## [13,] "Rushinga" "Rushinga"
## [14,] "Mazowe" "Mazowe"
## [15,] "Guruve" "Guruve"
## [16,] "Mt Darwin" "Mount Darwin"
## [17,] "Shamva" "Shamva"
## [18,] "Centenary" "Centenary"
## [19,] "Seke" NA
## [20,] "Hwedza" "Wedza"
## [21,] "Goromonzi" "Goromonzi"
## [22,] "Marondera" "Marondera"
## [23,] "Murehwa" "Murehwa"
## [24,] "Mutoko" "Mutoko"
## [25,] "Chikomba" "Chikomba"
## [26,] "Zvimba" "Zvimba"
## [27,] "Makonde" "Makonde"
## [28,] "Kadoma" "Kadoma"
## [29,] "Kariba" "Kariba"
## [30,] "Hurungwe" "Hurungwe"
## [31,] "Chiredzi" "Chiredzi"
## [32,] "Chivi" "Chivi"
## [33,] "Gutu" "Gutu"
## [34,] "Masvingo" "Masvingo"
## [35,] "Zaka" "Zaka"
## [36,] "Mwenezi" "Chivi"
## [37,] "Bikita" "Bikita"
## [38,] "Binga" "Binga"
## [39,] "Hwange" "Hwange"
## [40,] "Bubi" "Bubi"
## [41,] "Nkayi" "Nkayi"
## [42,] "Umguza" "Umguza"
## [43,] "Beitbridge" "Beitbridge"
## [44,] "Insiza" "Insiza"
## [45,] "Gwanda" "Gwanda"
## [46,] "Bulilima/Mangwe" "Bulilimamangwe"
## [47,] "Gokwe" "Gokwe South"
## [48,] "Gweru" "Gweru"
## [49,] "Kwekwe" "Kwekwe"
## [50,] "Mberengwa" "Mberengwa"
## [51,] "Chirumanzu" "Chirumhanzu"
## [52,] "Shurugwi" "Shurugwi"
## [53,] "Zvishavane" "Zvishavane"
Looks pretty good! There are a few NA
results where Google didn’t report the district, but if you further inspect the other fields you’ll see that they seem to be in the right area.
Now that we have latitude and longitude, we can convert to a shapefile and map! First, we need to merge the geocoding results with the ANC data. Let’s also export this data frame to a CSV (comma separated values) spreadsheet file so we can access the geocoded data outside of R.
# Merge ANC data with geocoding results
ANC2012_geocoded <- cbind(ANC2012, geocode_result)
head(ANC2012_geocoded)
## province district site name tested.total tested.positive prevalence prev.ci_lower prev.ci_upper res.total_n res.farm_n res.farm_pct res.growthpt_n res.growthpt_pct res.mine_n res.mine_pct res.rural_n res.rural_pct res.urban_n res.urban_pct res.missing_n res.missing_pct classification X lon lat type loctype address north south east west neighborhood political locality
## 1 Bulawayo Bulawayo Nkulumane Nkulumane Clinic 596 102 17.1 14.1 20.1 600 1 0.2 2 0.3 0 0.0 7 1.2 581 96.8 9 1.5 Urban 1 28.50391 -20.18118 neighborhood approximate nkulumane, bulawayo, zimbabwe -20.16063 -20.20023 28.53330 28.48167 Nkulumane Nkulumane Bulawayo
## 2 Chitungwiza Chitungwiza St Marys St Marys Clinic 600 67 11.2 8.6 13.7 600 5 0.8 3 0.5 0 0.0 9 1.5 582 97.0 1 0.1 Urban 2 31.06667 -17.98333 neighborhood approximate saint marys, zimbabwe -17.98198 -17.98468 31.06802 31.06532 Saint Marys <NA> <NA>
## 3 Harare Harare Glenview Glenview 333 34 10.2 6.9 13.5 334 0 0.0 1 0.3 0 0.0 0 0.0 328 98.2 5 1.5 Urban 3 30.94747 -17.90892 political approximate glen view, harare, zimbabwe -17.88817 -17.93338 30.96954 30.92469 <NA> Glen View Harare
## 4 Harare Harare Hatcliffe Hatcliffe Clinic 339 33 9.7 6.5 12.9 340 5 1.5 6 1.8 1 0.3 9 2.7 317 93.2 1 0.3 Urban 4 31.10473 -17.69284 establishment approximate 485 hatcliffe, 20th st, harare, zimbabwe -17.69149 -17.69419 31.10608 31.10339 <NA> <NA> Harare
## 5 Harare Harare Kuwadzana Kuwadzana Clinic 599 70 11.7 9.1 14.3 600 4 0.7 0 0.0 0 0.0 0 0.0 593 98.8 3 0.5 Urban 5 30.91304 -17.83039 political approximate kuwadzana, harare, zimbabwe -17.81736 -17.84433 30.94282 30.89076 <NA> Kuwadzana Harare
## 6 Manicaland Mutasa Hauna Hauna District Hospital 224 23 10.3 6.2 14.3 224 23 10.3 17 7.6 8 3.6 165 73.7 10 4.5 1 0.5 Rural 6 32.84917 -18.49885 bus_station approximate hauna growth point, honde valley, zimbabwe -18.49750 -18.50020 32.85052 32.84782 <NA> <NA> Honde Valley
## administrative_area_level_2 administrative_area_level_1 country route bus_station establishment
## 1 Bulawayo Matabeleland South Province Zimbabwe <NA> <NA> <NA>
## 2 <NA> Mashonaland East Province Zimbabwe <NA> <NA> <NA>
## 3 Harare Harare Province Zimbabwe <NA> <NA> <NA>
## 4 <NA> <NA> Zimbabwe 20th Street <NA> <NA>
## 5 Harare Harare Province Zimbabwe <NA> <NA> <NA>
## 6 Mutasa Manicaland Province Zimbabwe <NA> Hauna Growth Point <NA>
class(ANC2012_geocoded)
## [1] "data.frame"
# Export as a CSV to access outside of R
write.csv(ANC2012_geocoded, file="ANC2012_geocoded.csv")
NOTE: If you were NOT able to successfully geocode, you should download this file to use for the remainder of the workshop. Save it in your Spatial Workshop folder on your Desktop and import it with the following code: ANC2012_geocoded <- read.csv("ANC2012_geocoded.csv")
# Convert to shapefile
ANC_shp <- ANC2012_geocoded
coordinates(ANC_shp) <- ~lon + lat
class(ANC_shp)
## [1] "SpatialPointsDataFrame"
## attr(,"package")
## [1] "sp"
# Import provinces shapefile
provinces_shp <- readOGR("zwe_polbnda_adm1_250k_cso.shp")
## OGR data source with driver: ESRI Shapefile
## Source: "zwe_polbnda_adm1_250k_cso.shp", layer: "zwe_polbnda_adm1_250k_cso"
## with 10 features
## It has 5 fields
# Plot ANC sites with provinces
plot(provinces_shp, col = "azure", main = "Zimbabwe ANC Sentinel Sites 2012")
plot(ANC_shp, add = T, pch = 15)
text(ANC_shp$lon, ANC_shp$lat - 0.1, labels = ANC_shp$site, cex = 0.8)
It’s difficult to make out all of the points clustered in Harare, so let’s create a map of just the Harare province. We can use square brackets to indicate the subset of provinces_shp we want to select. For the labels, subtract a small value (0.01) from the latitude so that the label will appear just below the point.
# Plot only Harare
plot(provinces_shp[provinces_shp$PROVINCE=="Harare", ],
col = "azure", main = "Harare ANC Sentinel Sites 2012")
plot(ANC_shp, add = T, pch = 15)
text(ANC_shp$lon, ANC_shp$lat - 0.01, labels = ANC_shp$site, cex = 0.8)