Read a ZIP file with patterns and other data as it comes from the SafeGraph Shop

This will open up a ZIP file from the SafeGraph shop and will read all of the data in, performing processing of the patterns files using read_patterns.

read_shop(
  filename,
  dir = ".",
  keeplist = c("patterns", "normalization_stats.csv", "home_panel_summary.csv",
    "visit_panel_summary.csv", "brand_info.csv"),
  exdir = dir,
  cleanup = TRUE,
  by = NULL,
  fun = sum,
  na.rm = TRUE,
  filter = NULL,
  expand_int = NULL,
  expand_cat = NULL,
  expand_name = NULL,
  multi = NULL,
  naics_link = NULL,
  select = NULL,
  gen_fips = FALSE,
  silent = FALSE,
  start_date = NULL,
  ...
)

Arguments

filename: The filename of the .zip file from the shop.
dir: The directory the file is in.
keeplist: Character vector of the files in the ZIP to read in. Use 'patterns' to refer to the patterns files.
exdir: Name of the directory to unzip to.
cleanup: Set to TRUE to delete all the unzipped files after being read in.
by, fun, na.rm, filter, expand_int, expand_cat, expand_name, multi, naics_link, select, gen_fips, silent, ...: Other arguments to be passed to read_patterns, specified as in help(read_patterns). NOte that gen_fips is FALSE here by default, rather than TRUE as elsewhere, as files from the shop often do not contain the poi_cbg variable necessary to use it. Check which state indicator variables you have access to, perhaps region.
start_date: An argument to be passed to read_patterns giving the first date present in the file, as a date object. When using read_shop this should usually be included, since the patterns file names in the shop files are not in a format read_patterns can pick up on automatically.

Details

The result will be a named list with each of the components of the data.

Examples


if (FALSE) {
# In the working directory I have the file 'shop_file.zip' to read in

mydata <- read_shop('shop_file.zip',
                    # I only want some of the files
                    keeplist = c('patterns','home_panel_summary.csv'),
                    # For patterns, only keep these variables
                    select = c('raw_visit_counts', 'region', 'bucketed_dwell_times', 'location_name'),
                    # I want two aggregations of patterns - one of total visits by state ('region')
                    # and another by location_name that has the dwell times for each brand
                    multi = list(
                      list(name = 'all',
                           by = 'region'),
                      list(name = 'location_dwells',
                           by = 'location_name',
                           expand_cat = 'bucketed_dwell_times',
                           expand_name = 'bucketed_times')
                      ),
                    # Be sure to specify start_date for read_shop
                    start_date = lubridate::ymd('2020-03-01'))

# The result is a list with two items- patterns and home_panel_summary.csv
# patterns itself is a list with two data.tables inside - 'all' and 'location_name',
# aggregated as given.
}