Read and row-bind many files from the SafeGraph Shop

This accepts a directory. It will use read_shop to load every zip in that folder, assuming they are all files downloaded from the SafeGraph shop. It will then row-bind together each of the subfiles, so you'll get a list where one entry all the normalization data row-bound together, another is all the patterns files, and so on. . Note that after reading in data, if gen_fips = TRUE, state and county names can be merged in using data(fips_to_names).

read_many_shop(
  dir = ".",
  recursive = FALSE,
  filelist = NULL,
  start_date = NULL,
  keeplist = c("patterns", "normalization_stats.csv", "home_panel_summary.csv",
    "visit_panel_summary.csv", "brand_info.csv"),
  exdir = dir,
  cleanup = TRUE,
  by = NULL,
  fun = sum,
  na.rm = TRUE,
  filter = NULL,
  expand_int = NULL,
  expand_cat = NULL,
  expand_name = NULL,
  multi = NULL,
  naics_link = NULL,
  select = NULL,
  gen_fips = FALSE,
  silent = FALSE,
  ...
)

Arguments

dir: Name of the directory the files are in.
recursive: Look for files in all subdirectories as well.
filelist: Optionally specify only a subset of the filename to read in.
start_date: A vector of dates giving the first date present in each zip file, to be passed to read_patterns giving the first date present in the file, as a date object. When using read_many_shop this **really** should be included, since the patterns file names in the shop files are not in a format read_patterns can pick up on automatically. If left unspecified, will produce an error. To truly go ahead unspecified, set this to FALSE.
keeplist, exdir, cleanup: Arguments to be passed to read_shop, specified as in help(read_shop).
by, fun, na.rm, filter, expand_int, expand_cat, expand_name, multi, naics_link, select, gen_fips, silent, ...: Other arguments to be passed to read_patterns, specified as in help(read_patterns).

Examples


if (FALSE) {
# In the working directory we have two shop ZIP files, one for March and one for April.
mydata <- read_shop(# I only want some of the sub-files
                    keeplist = c('patterns','home_panel_summary.csv'),
                    # For patterns, only keep these variables
                    select = c('raw_visit_counts', 'region', 'bucketed_dwell_times', 'location_name'),
                    # I want two aggregations of patterns - one of total visits by state ('region')
                    # and another by location_name that has the dwell times for each brand
                    multi = list(
                      list(name = 'all',
                           by = 'region'),
                      list(name = 'location_dwells',
                           by = 'location_name',
                           expand_cat = 'bucketed_dwell_times',
                           expand_name = 'bucketed_times')
                      ),
                    # Be sure to specify start_date for read_shop
                    start_date = c(lubridate::ymd('2020-03-01'),lubridate::ymd('2020-04-01')))

# The result is a list with two items- patterns and home_panel_summary.csv
# patterns itself is a list with two data.tables inside - 'all' and 'location_name',
# aggregated as given.

}