read_many_shop.Rd
This accepts a directory. It will use read_shop
to load every zip
in that folder, assuming they are all files downloaded from the SafeGraph shop. It will then row-bind together each of the subfiles, so you'll get a list where one entry all the normalization data row-bound together, another is all the patterns files, and so on.
.
Note that after reading in data, if gen_fips = TRUE
, state and county names can be merged in using data(fips_to_names)
.
read_many_shop(
dir = ".",
recursive = FALSE,
filelist = NULL,
start_date = NULL,
keeplist = c("patterns", "normalization_stats.csv", "home_panel_summary.csv",
"visit_panel_summary.csv", "brand_info.csv"),
exdir = dir,
cleanup = TRUE,
by = NULL,
fun = sum,
na.rm = TRUE,
filter = NULL,
expand_int = NULL,
expand_cat = NULL,
expand_name = NULL,
multi = NULL,
naics_link = NULL,
select = NULL,
gen_fips = FALSE,
silent = FALSE,
...
)
Name of the directory the files are in.
Look for files in all subdirectories as well.
Optionally specify only a subset of the filename to read in.
A vector of dates giving the first date present in each zip file, to be passed to read_patterns
giving the first date present in the file, as a date object. When using read_many_shop
this **really** should be included, since the patterns file names in the shop files are not in a format read_patterns
can pick up on automatically. If left unspecified, will produce an error. To truly go ahead unspecified, set this to FALSE
.
Arguments to be passed to read_shop
, specified as in help(read_shop)
.
Other arguments to be passed to read_patterns
, specified as in help(read_patterns)
.
if (FALSE) {
# In the working directory we have two shop ZIP files, one for March and one for April.
mydata <- read_shop(# I only want some of the sub-files
keeplist = c('patterns','home_panel_summary.csv'),
# For patterns, only keep these variables
select = c('raw_visit_counts', 'region', 'bucketed_dwell_times', 'location_name'),
# I want two aggregations of patterns - one of total visits by state ('region')
# and another by location_name that has the dwell times for each brand
multi = list(
list(name = 'all',
by = 'region'),
list(name = 'location_dwells',
by = 'location_name',
expand_cat = 'bucketed_dwell_times',
expand_name = 'bucketed_times')
),
# Be sure to specify start_date for read_shop
start_date = c(lubridate::ymd('2020-03-01'),lubridate::ymd('2020-04-01')))
# The result is a list with two items- patterns and home_panel_summary.csv
# patterns itself is a list with two data.tables inside - 'all' and 'location_name',
# aggregated as given.
}