Time series data mining in R. Bratislava, Slovakia.
Dangerous streets of Bratislava! Animated maps using open data in R
Written on 2019-11-10
At the work recently, I wanted to make some interesting start-up pitch (presentation) ready animated visualization and got some first experience with spatial data (e.g. polygons). I enjoyed working with such a type of data and I wanted to improve on working with them, so I decided to try to visualize something interesting with Bratislava (Slovakia) open-data and OpenStreetMaps. I ended with animated maps of violations on Bratislava streets through the time of 2 and a half years.
Since spatial time series are analyzed in this post, it still sticks with the blog domain and it is time series data mining :)
You can read more about time series forecasting, representations and clustering in my previous blog posts here.
Aaand teaser, what I will create in this blog post:
In this blog post you will learn how to:
get free city district polygons data from OpenStreetMaps API,
get free street lines (polygons) coordinates from OpenStreetMaps API,
visualize polygons and street lines with ggplot2 and ggmap,
merge spatial data with violation data represented as time series,
animate spatial data combined with violation time series with gganimate.
Bratislava Open-Data
The ultimate goal is to show where and when are the most dangerous places in the capital of Slovakia - Bratislava.
For this task, I will use open-data that cover violations gathered from the city police with locations (city district and street name) and time-stamp.
Firstly, load all the needed packages.
Then, let’s download the violations data from opendata.bratislava.sk webpage and translate Slovak column names to English.
Let’s bind all the data together.
Next, I will transform Date_Time to POCIXct format and generate time aggregation features - Year and Month - Year_M.
Let’s see how many violations have each Place (city district) for whole available period (2017-2019.09):
We can see that Old-town (Stare Mesto) rocks in this statistic..obviously - e.g. lot of tourists. There are also some misunderstand Slavic letters. We should get rid of them - in the blog post there we be lot of handling of these special Slavic (Slovak) symbols.
I will extract only types of violations that relate to the bad behavior of people, like harassment, using alcoholic beverages on public space and related things. So, I will not extract traffic violations like bad car parking, etc. For this task, I need to extract codes of violations:
Polygons of city districts
I want to show a number of crimes per city district and street (so both place information) on a map, so I need to get coordinates of city districts and streets.
Let’s get polygons of city districts first using OpenStreetMap API.
Let’s bind all the districts data of Bratislava.
Let’s visualize simply the districts.
For animation and visualization purposes, I need to aggregate violations data by districts (Place) and Year+Month columns (Year_M).
The next step is to merge polygon data with aggregated violation data:
Let’s also compute mean coordinates for every district for showing theirs names on a graph.
Let’s test visualization of violations on June 2019:
Street lines coordinates
The next step is to download street coordinates from OpenStreetMaps.
Firstly, we have to extract unique street names from violation data, and handle Slovak letters and other punctuation for easier matching by street name.
Let’s get all streets coordinates of Bratislava just by one (powerful) command, again using OSM API:
Let’s plot it:
Pretty nice web.
Now, we need to handle again street names downloaded from OSM. I will extract only available streets from violation data:
We lost some street data from the dataset by not exact matched street names.
Let’s subset the street data.
Now, I will transform data to standard lon/lat matrix (data.table class) format instead of sf object (I highly recommend this for next ggplot usage).
Next, I will bound streets by existing polygons of Bratislava and add merging column - Street_edit:
Let’s also aggregate violation data by street names and Year + Month (Year_M):
Let’s add look-up column Street_edit to aggregated data and merge spatial street data with violation data.
Now, I will transform integers of the number of violations to reasonable factors segments:
Let’s see what we extracted so far for streets in one example year-month…
ggmap
Since, we will use ggmap for visualizations, we need to extract map image of Bratislava:
Now, let’s make it altogether to one visualization - so violations by city districts and also streets for one time-stamp (Year_M).
I will also extract the most violated streets by Year+Month for labeling purposes.
As we noticed so far, the most violated city district is Old-town, so I will create also zoomed visualization for this city district like this (using coord_map function):
Using ggnanimate
We have all prepared to create some interesting animated map.
For this purpose, I will use the gganimate package that simply uses the transition_states function that combines our previously prepared ggplot2 and ggmap plot and time feature Year_M.
Let’s firstly animate the whole city picture.
Then, let’s animate zoomed map of Old-town district.
GIFS binding with magick package
Now, we have to bind these two animations into one GIF. I will do it by functions from the magick package.
Voilaaa:
For full resolution - right click on the gif and click on the view image option.
We can see that at the end of the observed period, so the summer of 2019, the Old-town district has even-more violations than usual. That can be caused by multiple factors…
Also notice, that peripheral areas of districts as Ruzinov and Vrakuna have streets with repeating multiple violations.
Streets with the most violations
Let’s see some of the most violated streets as time series as we seen in the animation.
The most of the time, Michalska street in the historical center of the Old-town was the most violated street, but the last three months is Postova the “winner of the most violated street”…this is maybe because of the new city-police station nearby.
Summary
In this blog post, I showed you how to:
work with different spatial data as polygons, street lines or map images,
combine these spatial objects with external data as city violation time series,
create animated maps using packages as ggplot2, ggmap, gganimate, and magick.
I hope, you will use these information with your spatial-time series data combo for creating some interesting visualization :)