SafeGraphR is an R package designed to make it easy to read in and process data from SafeGraph, including data that comes through the SafeGraph COVID-19 consortium or the catalog. You may want to consult the Quick Start Guide, the Awesome SafeGraph Data Science List, the Normalization Best Practices, and especially the SafeGraph Docs.

You can install SafeGraphR directly from GitHub.

# if necessary
# install.packages('remotes')

The other pages on this site will walk you through how you can use SafeGraphR to work with the data.

Bugs and Help!

SafeGraphR is currently in beta. All of its functions work, but of course there may be bugs remaining. The code has also not been checked with every possible combination of options that you could pick. Lastly, the SafeGraph data itself changes format on occasion, which may break some SafeGraphR functionality.

If you run into an issue or bug in the code, please raise an Issue on the SafeGraphR Github Issues page.

If you’re just having trouble getting things to work, you can find help at the SafeGraph COVID Consortium Slack Channel in the r-troubleshooting room.

Below is a list of what’s in the package with a brief description.

Data Reading Functions

link_poi_naics(): Read in a Core Places file and use it to create a crosswalk between SafeGraph POI codes and NAICS codes.

read_distancing(): Given a list of dates, reads in and aggregates SafeGraph social-distancing v2 files.

read_many_csvs(): Reads a bunch of CSVs in the same folder and row-binds them all together. Useful for stuff like normalization data.

read_many_patterns() and read_patterns(): Reads a bunch of (or one, respectively) monthly or weekly patterns .csv.gz files all in the same folder, does appropriate processing, and row-binds the results together.

read_many_shop() and read_shop(): Reads a bunch of (or one, respectively) .zip files in the format they come in from the shop and combines the data sets inside the zip across zip-files appropriately.

safegraph_aws(): A thin wrapper for aws.s3::s3sync() that downloads data from the SafeGraph AWS COVID response buckets.

Data Processing Functions

expand_cat_json() and expand_integer_json(): Take SafeGraph data with a column of categorical (named) or numeric (unnamed) JSON data and expand that column, pivot the data to long format, and then aggregate to the desired level.

fips_from_cbg(): Take a census block group identifier and extract the state and/or county FIPS codes.

rbind_by_list_pos(): Take a list of lists of data.tables and row-binds them by their position in the sub-list. For example, rbind_by_list_pos(list(A,B),list(C,D)) would return list(rbind(A,C),rbind(B,D)). Can be used after read_ functions, which in some cases return a list of data.tables for each file they read.

Final-Stages Processing Functions

hb_shrink(): Perform hierarchical Bayesian shrinkage on the CBG-to-county or county-to-state level.

ma(): Calculates a (by default) seven day moving average on pre-sorted data with no gaps.

sample_size_adjust(): Adjusts data for differences in sampling rates across geographic locations.

scale_to_date(): Adjusts data to be relative to a specific date.

scale_yoy(): Adjusts data to be relative to the same date the previous year.

Data Sets

cbg_pop: Population data from the easy census file.

county_pop: Population aggregated to the county level.

fips_to_names: Data set linking state and county FIPS codes to state and county names, for merging in and labeling.

naics_codes: Data set linking NAICS codes to NAICS code titles, for merging in and labeling (or just knowing what you’re looking at).