SafeGraphR is an R package designed to make it easy to read in and process data from SafeGraph, including data that comes through the SafeGraph COVID-19 consortium or the catalog. You may want to consult the Quick Start Guide, the Awesome SafeGraph Data Science List, the Normalization Best Practices, and especially the SafeGraph Docs.
You can install SafeGraphR directly from GitHub.
# if necessary # install.packages('remotes') remotes::install_github('SafeGraphInc/SafeGraphR')
The other pages on this site will walk you through how you can use SafeGraphR to work with the data.
SafeGraphR is currently in beta. All of its functions work, but of course there may be bugs remaining. The code has also not been checked with every possible combination of options that you could pick. Lastly, the SafeGraph data itself changes format on occasion, which may break some SafeGraphR functionality.
If you run into an issue or bug in the code, please raise an Issue on the SafeGraphR Github Issues page.
If you’re just having trouble getting things to work, you can find help at the SafeGraph COVID Consortium Slack Channel in the r-troubleshooting room.
Below is a list of what’s in the package with a brief description.
link_poi_naics(): Read in a Core Places file and use it to create a crosswalk between SafeGraph POI codes and NAICS codes.
read_distancing(): Given a list of dates, reads in and aggregates SafeGraph social-distancing v2 files.
read_many_csvs(): Reads a bunch of CSVs in the same folder and row-binds them all together. Useful for stuff like normalization data.
read_patterns(): Reads a bunch of (or one, respectively) monthly or weekly patterns
.csv.gz files all in the same folder, does appropriate processing, and row-binds the results together.
expand_integer_json(): Take SafeGraph data with a column of categorical (named) or numeric (unnamed) JSON data and expand that column, pivot the data to long format, and then aggregate to the desired level.
fips_from_cbg(): Take a census block group identifier and extract the state and/or county FIPS codes.
rbind_by_list_pos(): Take a list of lists of
data.tables and row-binds them by their position in the sub-list. For example,
rbind_by_list_pos(list(A,B),list(C,D)) would return
list(rbind(A,C),rbind(B,D)). Can be used after
read_ functions, which in some cases return a list of
data.tables for each file they read.
hb_shrink(): Perform hierarchical Bayesian shrinkage on the CBG-to-county or county-to-state level.
ma(): Calculates a (by default) seven day moving average on pre-sorted data with no gaps.
sample_size_adjust(): Adjusts data for differences in sampling rates across geographic locations.
scale_to_date(): Adjusts data to be relative to a specific date.
scale_yoy(): Adjusts data to be relative to the same date the previous year.
cbg_pop: Population data from the easy census file.
county_pop: Population aggregated to the county level.
fips_to_names: Data set linking state and county FIPS codes to state and county names, for merging in and labeling.
naics_codes: Data set linking NAICS codes to NAICS code titles, for merging in and labeling (or just knowing what you’re looking at).