The crime
dataset contains crime reports for Houston from January-August 2010, geocoded with Google Maps.
tibble(crime)
## # A tibble: 86,314 × 17
## time date hour premise offense beat block street type
## <dttm> <chr> <int> <chr> <fct> <chr> <chr> <chr> <chr>
## 1 2010-01-01 01:00:00 1/1/2010 0 18A murder 15E30 9600-… marli… "ln"
## 2 2010-01-01 01:00:00 1/1/2010 0 13R robbery 13D10 4700-… telep… "rd"
## 3 2010-01-01 01:00:00 1/1/2010 0 20R aggrava… 16E20 5000-… wickv… "ln"
## 4 2010-01-01 01:00:00 1/1/2010 0 20R aggrava… 2A30 1000-… ashla… "st"
## 5 2010-01-01 01:00:00 1/1/2010 0 20A aggrava… 14D20 8300-… canyon ""
## 6 2010-01-01 01:00:00 1/1/2010 0 20R burglary 18F60 9300-… rowan "ln"
## 7 2010-01-01 01:00:00 1/1/2010 0 20A burglary 10H60 2500-… south… "blv…
## 8 2010-01-01 01:00:00 1/1/2010 0 20R burglary 13D10 6300-… rupley "cir"
## 9 2010-01-01 01:00:00 1/1/2010 0 20A burglary 3B10 5000-… georgi "ln"
## 10 2010-01-01 01:00:00 1/1/2010 0 20P burglary 20G20 10700… briar… "dr"
## # … with 86,304 more rows, and 8 more variables: suffix <chr>, number <int>,
## # month <ord>, day <ord>, location <chr>, address <chr>, lon <dbl>, lat <dbl>
Lots of useful info: dates, types of crimes, locations by type of place, locations by street, locations by longitude/latitude …
First, let’s get an overview of the crimes on the map. The qmplot
is often recommended as a quick way to do mapping (but we will switch to another approach shortly). We put in longitude (lon
) and latitude (lat
) for the x
and y
parameters and specify crime
as the data set. This plots all of the crimes in the database. (Note that to specify colour, size, etc. as set values, we need to use I()
rather than including them outside of the mapping (aes()
) as in normal ggplot
usage.)
(Example adapted from here.)
## remotes::install_github("bbolker/ggmap")
q1 <- qmplot(lon, lat, data = crime,
maptype = "toner-lite",
## for q* plots need to use I() for direct specifications
colour = I("red"),
size = I(0.9),
alpha=I(.3))
A slightly slower but safer method is to get the map first, then combine it with the data. This way we can retrieve the map and store it; this is both more efficient if we’re going to make a bunch of plots with the same map (almost inevitable if we’re polishing a data visualization), and safer (in case the server goes down/network connection is lost/etc.).
## utility function: extract appropriate components for retrieving a Stamen/OSM map
get_mapzone <- function(data, latcol="lat", loncol="lon") {
lon <- na.omit(data[[loncol]])
lat <- na.omit(data[[latcol]])
return(c(left=min(lon),right=max(lon),bottom=min(lat),top=max(lat)))
}
houston_mz <- get_mapzone(crime) ## find boundaries
## retrieve map
houston_map1 <- get_stamenmap(houston_mz,zoom=7, maptype="toner-lite",
messaging=FALSE)
Now we can combine the map with the data set:
(ggmap(houston_map1)
+ geom_point(data=crime,colour="red",size=0.9,alpha=0.3)
)
This graph is not very good, in particular because of the few crimes that are reported far away from the city but still end up in the database.
Reduce crime to violent crimes in downtown Houston:
violent_crime <-
(crime
%>% filter(
!offense %in% c("auto theft", "theft", "burglary"),
-95.39681 <= lon & lon <= -95.34188,
29.73631 <= lat & lat <= 29.78400
)
%>% mutate(
## drops unused levels, mitigates downstream errors
offense = fct_drop(offense),
offense = fct_relevel(offense,
c("robbery", "aggravated assault", "rape", "murder")
)
)
)
houston_map2 <- get_stamenmap(get_mapzone(violent_crime), maptype="toner-lite",
zoom=14,messaging=FALSE)
(ggmap(houston_map2)
+ geom_point(data = violent_crime, colour = "red",size = 0.9, alpha=.3)
)
Re-use the map, plotting as density contours instead (i.e., transform the points to a density field, then summarize the density field as a set of contours)
(ggmap(houston_map2)
+ geom_density2d(data = violent_crime, aes(x=lon,y=lat), col="red")
)
Or do 2-D (square) binning, with a custom gradient (and log-scaled breaks):
(ggmap(houston_map2)
+ geom_bin2d(data = violent_crime, alpha=0.5)
+ scale_fill_gradient(low="#F0F0FF", high="#131393",
trans=scales::log10_trans())
)
## Warning: Removed 7 rows containing missing values (geom_tile).
The high-density square on Smith Street is messing up our ability to see much else.