<< Workshop Home

Mapping Shapefiles

# Set Working Directory
setwd("/Users/Carrie/Desktop/Spatial Workshop") #replace with your directory

Loading Packages

If you want to use an installed package, you will need to load the package using the library() function. Each time you close R and then re-open it, you will need to re-load the packages you want to use.

# Load packages for this working session
library(sp)
library(rgdal)

The ? function is useful for looking up documentation on packages or code. The results will display in the Help tab within R Studio. If you do not find what you are looking for, remember that you can find lots of help by searching online.

# What do these packages do?
?sp #provides classes & methods for spatial data
?rgdal #helps to read shapefiles

Importing Shapefiles

There are many different commands to import data in R, depending on the type of data. We downloaded health facilities and provinces as shapfiles, which means they are spatial vector data. Shapefiles are pretty easy to work with. Use the readOGR() function from the rgdal package to import shapefiles (files ending in .shp). R can easily find the file by name because you set your working directory to the location of the file. Assign your shapefile to an object called facilities_shp, using the <- notation.

# Options
options(stringsAsFactors = FALSE) #import text variables as characters

# Import the health facilities shapefile
facilities_shp <- readOGR("zwe_health_facility_mohcw.shp")
## OGR data source with driver: ESRI Shapefile 
## Source: "zwe_health_facility_mohcw.shp", layer: "zwe_health_facility_mohcw"
## with 1480 features
## It has 41 fields

Examining Data

It’s always a good idea to examine your data after you import it. In particular, we need to know what class of data we are working with. Different classes of data require different commands.

# Class of data
class(facilities_shp)
## [1] "SpatialPointsDataFrame"
## attr(,"package")
## [1] "sp"

We see that facilities_shp is a SpatialPointsDataFrame, which means it contains point vector data which can be readily mapped. Try it out with R’s built-in plot() function!

# Plot as a map
plot(facilities_shp)

# View the dataset
View(facilities_shp) #View opens a new tab showing the data as a spreadsheet
# Look at the first few rows of data in the Console
head(facilities_shp)
##             coordinates ID  DISTRICT LONGITUDE  LATITUDE ELEVATION UPDATED    NAMEOFFACI OWNERSHIP YEARBUILT          TYPEOFFACI NUMOFDOCTO NUMOFNURSE NUMOFNUR_1 NUMOFPCN NUMOFEHTS NUMOFPHARM NUMOFLABTE NUMOFBEDS NUMOFMATER NUMOFGENER CATHMENTPO DISTNEARES HASCOMMUNI HASCOMMU_1 HASWATERPI HASWATERUN HASELECTRI HASELECT_1 DISTNEAR_1 HASSANITAT HASSANIT_1 HASSANIT_2 HASSECURIT HASSECUR_1 HASROADTAR HASROADGRA HASINCINER HASAUTOWAY HASDENTAL COMMENTS TYPE_EDITE
## 1 (31.03853, -18.35294)  1      Seke  31.03853 -18.35294         0    1998 Acton Ronolds         6         0 Rural Health Centre          0          0          0        0         0          0          0         0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0         0     <NA>       <NA>
## 2   (31.31681, -17.788)  2 Goromonzi  31.31681 -17.78800         0    1998  Acturus Mine         5         0              Clinic          0          0          0        0         0          0          0         0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0         0     <NA>       <NA>
## 3 (30.42028, -16.89056)  3    Zvimba  30.42028 -16.89056         0    1998     ARDA Sisi         5         0      Council Clinic          0          0          0        0         0          0          0         0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0         0     <NA>       <NA>
## 4  (30.0675, -17.37389)  4   Makonde  30.06750 -17.37389         0    1998        Alaska         5         0              Clinic          0          0          0        0         0          0          0         0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0         0     <NA>       <NA>
## 5 (30.77089, -19.83389)  5  Masvingo  30.77089 -19.83389         0    1998         Alvod         5         0              Clinic          0          0          0        0         0          0          0         0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0         0     <NA>       <NA>
## 6 (30.30806, -16.10195)  6    Guruve  30.30806 -16.10195         0    1998         Angwa         5         0              Clinic          0          0          0        0         0          0          0         0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0         0     <NA>       <NA>

It appears that our health facilities shapefile came with lots of data about each facility! There are variables including its name, year built, type of facility, number of doctors, number of nurses, etc! Unfortunately it appears that many of these values are missing, judging by all of those zeros. It looks like the data have not been updated since 1998 for these facilities.

# Look at the last few rows of data
tail(facilities_shp)
##                coordinates   ID   DISTRICT LONGITUDE  LATITUDE ELEVATION UPDATED NAMEOFFACI OWNERSHIP YEARBUILT                   TYPEOFFACI NUMOFDOCTO NUMOFNURSE NUMOFNUR_1 NUMOFPCN NUMOFEHTS NUMOFPHARM NUMOFLABTE NUMOFBEDS NUMOFMATER NUMOFGENER CATHMENTPO DISTNEARES HASCOMMUNI HASCOMMU_1 HASWATERPI HASWATERUN HASELECTRI HASELECT_1 DISTNEAR_1 HASSANITAT HASSANIT_1 HASSANIT_2 HASSECURIT HASSECUR_1 HASROADTAR HASROADGRA HASINCINER HASAUTOWAY HASDENTAL
## 1465 (27.71811, -18.17046) 1288      Binga  27.71811 -18.17046      1022    2007    Chibila         8      2002       Council (Construction)          0          0          0        0         0          0          0         0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0         0
## 1466 (28.63129, -19.47221) 1289       Bubi  28.63129 -19.47221      1208    2007  Mdutshane         5      2004                       Clinic          0          1          1        1         0          0          0         1          1          0       5744          0          0          0          1          0          1          0          0          1          0          0          0          1          1          0          0          1         0
## 1467 (28.41426, -19.73884) 1290     Umguza  28.41426 -19.73884      1153    2007    Redwood         5      2006                       Clinic          0          0          0        0         0          0          0         0          0          0       7600         52          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0         0
## 1468    (28.006, -19.9121) 1291     Umguza  28.00600 -19.91210      1132    2007      Muntu         5      2004                       Clinic          0          1          1        1         0          0          0         5          4          1        428         35          0          1          1          0          1          0          0          1          1          0          0          1          1          0          0          1         0
## 1469  (27.80185, -20.0903) 1292 Tsholotsho  27.80185 -20.09030      1246    2007   Chefunye         8      2006 Council Clinic(Construction)          0          0          0        0         0          0          0         0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0         0
## 1470 (27.18993, -19.48617) 1293 Tsholotsho  27.18993 -19.48617         0    2007   Samahuru         7      2006                       Clinic          0          0          0        0         0          0          0         0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0         0
##                                                                                           COMMENTS TYPE_EDITE
## 1465                                                           Under construction below roof level       <NA>
## 1466 1.Need telephone or radio 2. General beds for patient observation 3. Poor water supply system     Clinic
## 1467                                                                Complete -Opening long overdue     Clinic
## 1468                                              1. Accomodation problems 2.Water engine problems     Clinic
## 1469                                                                            Under construction       <NA>
## 1470                                                                     Complete awaiting opening     Clinic

At the end of the dataset, the updated year is more recent (2007), and there are some values filled in for doctors, nurses, beds, etc. Exciting! Now let’s take a look at the data for some of these variables.

TIP! If you have ever used Stata, you should note a key difference in R is that you can easily work with numerous datasets at a time. So, when you want to access a variable, you need to tell R which dataset the variable is in. This is done using the $ symbol.

# Year updated
table(facilities_shp$UPDATED)
## 
## 1998 2007 
## 1212  258

We see that there are two years for the UPDATED variable: 1998 and 2007. Most of the facilities (1,212 of them) were updated in 1998, while 258 facilities were updated in 2007.

# Number of doctors
table(facilities_shp$NUMOFDOCTO)
## 
##    0    1    2    3 
## 1448   15    5    2

We see that most facilities have 0 recorded for the number of doctors. 15 facilities have 1 doctor, 5 facilities have 2 doctors, and 2 facilities have 3 doctors recorded. It’s likely that many of the 0 values should actually be recorded as missing (NA).

# Number of nurses
table(facilities_shp$NUMOFNURSE)
## 
##    0    1    2    3    4    5    6    7    8    9   12   14   15   17   20   30   41 
## 1299   71   47   15    5    5   10    2    3    1    2    3    2    1    2    1    1

Again, most of the facilities have 0 recorded for the number of nurses. However, there are more facilities with at least 1 nurse, and 1 facility even has 41 nurses recorded.

# Number of beds
summary(facilities_shp$NUMOFBEDS)
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
##   0.0000   0.0000   0.0000   0.8673   0.0000 150.0000

Using the summary command, we can see that at least three-quarters of facilities have 0 beds recorded. The maximum number of beds recorded is 150.

# Types of facilities
table(facilities_shp$TYPEOFFACI)
## 
##                   (New Site)     Acting District Hospital              Airforce-clinic                  Army Clinic                Army Hospital                       clinic                       Clinic        Clinic (construction)        Clinic Private Clinic             Clinic(Not Open)                  Cliniclinic             Conucil Not Open                      Council       Council (Construction)               Council clinic               Council Clinic Council Clinic(Construction) 
##                            3                            1                            1                            2                            2                           16                          772                            2                            5                            8                            1                            3                            1                            2                            5                           76                            4 
##                District Hosp            District Hospital             District Hosptal              Family Planning                  Farm Clinic           Farm Health Clinic             Forestry-Private             General Hospital             Gold Mine Clinic                   Gvt Clinic                  Gvt Council               Gvt Infections                     Hospital                 Hotel Clinic            Industrial Clinic           Matebeleland North                  Mine Clinic 
##                            1                           39                            1                            1                            1                            2                            2                            3                            2                           17                            1                            1                           19                            1                            1                            1                           23 
##               Mission clinic               Mission Clinic             Mission Hospital            Mission Hospitals                 Municipality          Municipality-clinic         National Park-clinic                 New Not Open                     Not Open                  PMD Offices                  Poly Clinic                Prison Clinic       Private-Border Timbers             Private-Forestry                 Private-Mine             Private-Tanganda       Private-Waltle Company 
##                            1                           16                           56                            3                            4                            2                            2                            1                            1                            1                            1                            1                            2                            2                            1                            4                            1 
##               Private Clinic             Private Hospital          Private Mine Clinic          Provincial Hospital              Railways-clinic          Rural Health centre          Rural Health Centre        Rural Health Hospital                   Rural Hosp                  Rural Hosp.               Rural Hospital              Satelite Clinic                School Clinic                 T.B Hospital           Under Construction            University clinic                   ZRP Clinic 
##                           12                            6                            2                            7                            4                            9                          242                            1                            3                            1                           48                            1                            3                            2                            5                            1                            1

There are so many facility types that it is difficult to tell which are most common. Why are they not sorted by order of frequency like they were for doctors, nurses, and beds?

# Check class of data
class(facilities_shp$TYPEOFFACI)
## [1] "character"

After checking the class, we see that facility types are “character” data, meaning they are made up of letter or symbols rather than numbers (called “string” in Stata). So when we use the table() function, R sorts by alphabetical letter. This is helpful for spotting some of the data cleaning needs, but we need to give R special instructions if we want to order by frequency.

#Sort in descending order
sort(table(facilities_shp$TYPEOFFACI),decreasing=TRUE)
## 
##                       Clinic          Rural Health Centre               Council Clinic             Mission Hospital               Rural Hospital            District Hospital                  Mine Clinic                     Hospital                   Gvt Clinic                       clinic               Mission Clinic               Private Clinic          Rural Health centre             Clinic(Not Open)          Provincial Hospital             Private Hospital        Clinic Private Clinic 
##                          772                          242                           76                           56                           48                           39                           23                           19                           17                           16                           16                           12                            9                            8                            7                            6                            5 
##               Council clinic           Under Construction Council Clinic(Construction)                 Municipality             Private-Tanganda              Railways-clinic                   (New Site)             Conucil Not Open             General Hospital            Mission Hospitals                   Rural Hosp                School Clinic                  Army Clinic                Army Hospital        Clinic (construction)       Council (Construction)           Farm Health Clinic 
##                            5                            5                            4                            4                            4                            4                            3                            3                            3                            3                            3                            3                            2                            2                            2                            2                            2 
##             Forestry-Private             Gold Mine Clinic          Municipality-clinic         National Park-clinic       Private-Border Timbers             Private-Forestry          Private Mine Clinic                 T.B Hospital     Acting District Hospital              Airforce-clinic                  Cliniclinic                      Council                District Hosp             District Hosptal              Family Planning                  Farm Clinic                  Gvt Council 
##                            2                            2                            2                            2                            2                            2                            2                            2                            1                            1                            1                            1                            1                            1                            1                            1                            1 
##               Gvt Infections                 Hotel Clinic            Industrial Clinic           Matebeleland North               Mission clinic                 New Not Open                     Not Open                  PMD Offices                  Poly Clinic                Prison Clinic                 Private-Mine       Private-Waltle Company        Rural Health Hospital                  Rural Hosp.              Satelite Clinic            University clinic                   ZRP Clinic 
##                            1                            1                            1                            1                            1                            1                            1                            1                            1                            1                            1                            1                            1                            1                            1                            1                            1

What is the most common type of facility?

Great! Now we have a feel for our facilities data. But all of those points mapped in a blank space are not very helpful for getting a sense of where the facilities are located.

# Import the provinces shapefile
provinces_shp <- readOGR("zwe_polbnda_adm1_250k_cso.shp")
## OGR data source with driver: ESRI Shapefile 
## Source: "zwe_polbnda_adm1_250k_cso.shp", layer: "zwe_polbnda_adm1_250k_cso"
## with 10 features
## It has 5 fields
# Class of data
class(provinces_shp)
## [1] "SpatialPolygonsDataFrame"
## attr(,"package")
## [1] "sp"

Again, we have spatial vector data, but this time the data are polygons. Let’s see what they look like.

# Plot as a map
plot(provinces_shp)

Neat! But where did our facilities go? We need to add them on top of the provinces.

# Plot facilities on top of provinces
plot(provinces_shp)
plot(facilities_shp, add=TRUE) #add=TRUE adds to previous plot

Selecting a Subset of a Shapefile

Now that we have examined our data and determined that much of the data are out of date, moving forward we just want to use the facilities that were updated more recently. Let’s look at what proportion of the total facilities we will be keeping if we do that.

# Examine the data again, this time with a proportion table
prop.table(table(facilities_shp$UPDATED))
## 
##      1998      2007 
## 0.8244898 0.1755102

With the prop.table() (proportion table) function, we see that 17.55% of facilities that were updated in 2007. Let’s select those facilities using the subset() function:

  • We specify that within the dataset facilities_shp, we want to keep all records where the variable UPDATED is equal to 2007.
  • We assign this subset to a new object called facilities2007_shp.
# Select facilities updated in 2007
facilities2007_shp <- subset(facilities_shp, UPDATED==2007)

Now let’s plot our new shapefile to look at the spatial distribution of facilities with data updated in 2007.

# Plot
plot(provinces_shp)
plot(facilities2007_shp, add=TRUE)

What do you notice? Which provinces did not have any facilities included in the 2007 data update?

Alright, let’s examine the data in our 2007 facilities shapefile. This time we’ll construct a more nicely formatted output with columns for the frequency and proportion of facilities by type. We do this by creating two new objects we’ll call Freq and Pct, and binding them together with the cbind() function. We’ll also use the head() function again, this time specifying that we want to see the first 15 rows.

# Within the recently updated facilities, select the clinics, RHCs, and hospitals.
# Explore the types of facilities
Freq <- sort(table(facilities_shp$TYPEOFFACI), decreasing=TRUE)
Pct <- prop.table(Freq)*100 #multiply by 100 for percentage format
Pct <- round(Pct, 2) # round the Pct values to two decimal points
head(cbind(Freq, Pct),15) #use head( ,15) to just look at first 15 rows of the table
##                     Freq   Pct
## Clinic               772 52.62
## Rural Health Centre  242 16.50
## Council Clinic        76  5.18
## Mission Hospital      56  3.82
## Rural Hospital        48  3.27
## District Hospital     39  2.66
## Mine Clinic           23  1.57
## Hospital              19  1.30
## Gvt Clinic            17  1.16
## clinic                16  1.09
## Mission Clinic        16  1.09
## Private Clinic        12  0.82
## Rural Health centre    9  0.61
## Clinic(Not Open)       8  0.55
## Provincial Hospital    7  0.48

We can see the most of the facilities are some type of clinic, rural health centre, or hospital. Let’s create three new shapefiles, one for each of these facility types. We’ll use the subset() function again. To indicate multiple values to include in a single subset, use the | symbol (meaning “or”). R is case sensitive, so we need to specify that we want both Clinic and clinic values in our new clinics2007_shp object.

# Create a new shapefile with just selected clinics
clinics2007_shp <- subset(facilities2007_shp,
                          TYPEOFFACI=="Clinic" | #this bar signifies "OR"
                          TYPEOFFACI=="clinic")

# Create a new shapefile for RHCs
RHCs2007_shp <- subset(facilities2007_shp,
                       TYPEOFFACI=="Rural Health Centre" |
                       TYPEOFFACI=="Rural Health centre")

# Create a new shapefile for Hospitals
hospitals2007_shp <- subset(facilities2007_shp,
                            TYPEOFFACI=="Hospital" |
                            TYPEOFFACI=="Mission Hospital" |
                            TYPEOFFACI=="Rural Hospital" |
                            TYPEOFFACI=="District Hospital" |
                            TYPEOFFACI=="Provincial Hospital")

Visualizing Qualitative Data

Now let’s map these three new shapefiles using different shapes and colors. Within the plot command, you can specify the following inputs:

  • col: a color
  • pch: a symbol code (1 to 24)
# Plot facilities by type using different shapes and colors
plot(provinces_shp)
plot(clinics2007_shp, add = T, col = "purple", pch = 19) #19=circle symbol
plot(RHCs2007_shp, add = T, col = "blue", pch = 15) #15=square symbol
plot(hospitals2007_shp, add = T, col = "red", pch = 17) #17=triangle symbol

# Tip: view more symbols and colors
?plot #see more plot options, including pch symbol options
colors() #see list of color options

Now add some more plot specifications.

  • Provinces polygons:
    • col: fill color
    • border: border color
    • main: title for the plot
  • Facilities points:
    • pch: this time, specify filled symbols
    • col: outline color
    • bg: fill color
    • alpha: transparency for the fill color
    • cex: font size (1 is default)
# Make it prettier
plot(provinces_shp, col = "grey93", border = "grey",
      main = "Zimbabwe Health Facilities Updated in 2007") #title of plot
plot(clinics2007_shp, add = T,
      pch = 21, #filled circle symbol
      col = "blue4", #outline color
      bg = adjustcolor("blue1", alpha = 0.6), #fill color, alpha=transparency
      cex = 1.5) # increase font size
plot(RHCs2007_shp, add = T,
      pch = 22, #filled square symbol
      col = "purple4",
      bg = adjustcolor("purple1", alpha = 0.7),
      cex = 1.5)
plot(hospitals2007_shp, add = T,
      pch = 24, #filled triangle symbol
      col = "red4",
      bg = adjustcolor("red1", alpha = 0.8),
      cex = 1.5)

# Assign plot to an object for easy modification!
plot_2007 <- recordPlot()

Adding a Legend

Now let’s create a legend. We’ll need a list of the labels we want to give to each symbol.

# Create a vector of labels for the legend
legendLab <- c("Clinic", "Rural Health Centre", "Hospital")
legendLab
## [1] "Clinic"              "Rural Health Centre" "Hospital"
# Plot with legend
plot_2007 # Render the saved plot
legend("bottomleft", legend=legendLab) # Add legend

We can see that just providing the labels was not sufficient. We also need to tell the legend() function the symbols and colors to go with each label.

# Create a vector of symbols for the legend
legendSym <- c(19, 15, 17)
# Create a vector of colors for the legend
legendCol <- c("blue", "purple", "red")

# Re-draw the plot and legend
plot_2007
legend("topleft", legend = legendLab,
       pch = legendSym, 
       col = legendCol,
       cex = 1, # font size
       pt.cex = 1.5, # increase the symbol size
       bty = "n") # turn off the legend border

# Save changes to plot
plot_2007 <- recordPlot()

Adding a Scale Bar and North Arrow

Now let’s add some finishing touches. Every finished map should include a scale bar and north arrow as references.

# Load the prettymapr package
library(prettymapr)

# Try adding these
plot_2007
addscalebar()
addnortharrow()

The default settings are not ideal for our map. The scale bar and legend are important, but should not distract from the map. Let’s investigate how we can customize these so they attract less attention.

# Check how to adjust the scale bar and north arrow
?addscalebar
?addnortharrow
# Try again
plot_2007
addscalebar(style = "ticks")
addnortharrow(cols = c("white", "grey50"), scale = 0.4)

# Save changes to plot
plot_2007 <- recordPlot()

<< Workshop Home