Fancinated by the weather forcasting type of graphics in a map setting, I got an idea during the 2019 New Year vacation to D.C for doing a mini-project. I googled around the internet to find a place to start with, there are plenty of packages and functions for plotting something on a map but the majority of the focuses have been on the data preprocessing while the actual plotting was simplified to a 1 line of function. It’s still mysterious about what was happening behind the scene even after knowing the usages of such higher level functions designed for plotting. Moreover, these tutorials came with a set of pre-loaded data, which is nice for testing but may not be applicable for data we are interested in. To get an understanding of how to plot data from scratch, I’m using data from CDC of Opioid Prescribing rate, to appreicate how this country has changed in prescribing Opioid overtime.
The CDC website of Opioid prescription rate consisted of data from 2006-2017 both state-wise and county-wise. The most straightforward way for getting these data would be to manually save them locally, instead of doing that I used the web scraping
rvest so that I can let my code downloading data rather than saving datasets one by one. There are some simple examples of using
rvestpackage on DataCamp, R-bloggers and one would also benefit by learning a bit of CSS selector.
After getting the Opioid prescribing data, we also need longtitude and latitude information to map the value of Opioid prescribtion rate to the right region. There are several ways to do it, one is to download US map with latitude and longitude information in raw format, such as in geojson; the other is to find a package that comes with pre-loaded information of latitude and longitude in the region of interest.
Since the Opioid dataset has both county and state prescribing values, I used both methods mentioned above. Raw format data was downloaded from here becuase I want to take advantage of the FIPS codes in Opioid county-level dataset to get better precision. In addition,
mapdata package was used for its simplicity, raw data with “State” information in geojson format from the above website has some missing regions, which left missing pieces in the final plot.
After getting both geographical data and Opioid prescribing data and matching them together, the dataset is ready for plotting. I limited the regions of the U.S map to 48 States to get a good size, using geographic limitation guided by latlong. To make it easier for interpretation, the intensity of prescribing rates were color-coded according to quantiles. CDC website came with such plot that mapped to counties/states, but one needs to flip images to see the dynamics of the rate changes. To visualize it better with the impact of time, I used FFmpeg to create movies based on images.
From both videos we can roughly tell the higest Opioid prescribing rate is around 2010. Since the year 2015, there is a noticiable drop of Opioid prescription, but still, for regions in the AR, MS, TN and AL, there are a considerable amount of Opioid prescription rates.
The code of this blog can be found on my Github, the project was inspired by a Japanese cherry blossom visualization post.