patterns_lookup.Rd
This function, given a date or range of dates, will return a character vector of folder paths in the weekly (new or backfill) data you will need to run through list.files(pattern = '.csv.gz', full.names = TRUE)
after downloading files (or just set list_files = TRUE
. This is done because the subfolder after this is based on the hour the data is released, which can't be predicted ahead of time for future weeks.
patterns_lookup(
dates,
dir = NULL,
old_dir = NULL,
new_dir = NULL,
subfolder = "patterns",
silent = FALSE,
add_ma = 0,
patterns_backfill_date = "2021/08/02/22/",
old_date_split = lubridate::ymd("2021-07-11"),
old_bucket = "weekly-backfill",
new_bucket = "weekly",
key = NULL,
secret = NULL,
list_files = FALSE,
...
)
A vector of Date
objects (perhaps taking a single Date
object and adding +lubridate::days(0:finish)
) to find the associated files for.
If specified, will append dir
to the start of the filepaths, to get full filepaths. If using both "old" (pre-June 15, 2020) and "new" (post) dates, this will only work if both the "patterns_backfill" (old) and "patterns" (new) folders are in the same folder. Superseded by old_dir
and new_dir
for old and new files, respectively.
If specified, will append old_dir
to the start of the filepaths for all "old" (pre-Dec 7 2020) files. This should be the folder that contains the patterns_backfill
folder.
If specified, will append new_dir
to the start of the filepaths for all "new" (post-Dec 7, 2020) files. This should be the folder that contains the patterns
folder.
Which folder in the AWS bucket to look at. Will append "_backfill" for backfill data. Usually this is "patterns", "normalization_data", or "home_panel_summary".
If specified, will omit the warning for using any dates after the package author last checked the consistency of the SafeGraph file structure.
Also looks at the add_ma
days before the dates listed in dates
, so you can calculate an add_ma
-day moving average. Or you could just change the dates
argument yourself to allow this.
Character variable with the folder structure for the most recent patterns_backfill
pull. i.e., the 2018, 2019, and 2020 folders containing backfill data in their subfolders should set in the paste0(old_dir,'/patterns_backfill/',patterns_backfill_date)
folder.
Date indicating the last day on which "old" data is present, before switching to the "new" data structure.
The safegraph_aws()
dataset
argument for the buckets containing the old and new data, respectively.
A character string containing an AWS Access Key ID. If key
and secret
are both specified, patterns_lookup
will download all the files it finds.
A character string containing an AWS Secret Access Key.
After creating folderpaths (and, possibly, downloading files), run each of them through list.files(pattern = '.csv', recursive = TRUE, full.names = TRUE)
to get a usable list of files. This only works if all the files have already been downloaded.
Arguments to be passed to safegraph_aws()
.
# We have already downloaded all of AWS data into the working directory and just need to locate and load it
# (if we also wanted to download, we could leave off list_files and pass this to safegraph_aws,
# or add our key and secret here and it would download)
filelist <- patterns_lookup(lubridate::ymd('2020-9-01') + lubridate::days(0:100),
list_files = TRUE)
#> Warning: The safegraph C19 AWS server will be shut down as of January 31, 2022.
#> Warning: This function has defaults set to still use that server,
#> Warning: but generally will be intentioned for use with enterprise customers.
#> Warning: Found no files in patterns_backfill/2021/08/02/22/2020/08/31/. list_files requires files be downloaded first.
#> Warning: Found no files in patterns_backfill/2021/08/02/22/2020/09/07/. list_files requires files be downloaded first.
#> Warning: Found no files in patterns_backfill/2021/08/02/22/2020/09/14/. list_files requires files be downloaded first.
#> Warning: Found no files in patterns_backfill/2021/08/02/22/2020/09/21/. list_files requires files be downloaded first.
#> Warning: Found no files in patterns_backfill/2021/08/02/22/2020/09/28/. list_files requires files be downloaded first.
#> Warning: Found no files in patterns_backfill/2021/08/02/22/2020/10/05/. list_files requires files be downloaded first.
#> Warning: Found no files in patterns_backfill/2021/08/02/22/2020/10/12/. list_files requires files be downloaded first.
#> Warning: Found no files in patterns_backfill/2021/08/02/22/2020/10/19/. list_files requires files be downloaded first.
#> Warning: Found no files in patterns_backfill/2021/08/02/22/2020/10/26/. list_files requires files be downloaded first.
#> Warning: Found no files in patterns_backfill/2021/08/02/22/2020/11/02/. list_files requires files be downloaded first.
#> Warning: Found no files in patterns_backfill/2021/08/02/22/2020/11/09/. list_files requires files be downloaded first.
#> Warning: Found no files in patterns_backfill/2021/08/02/22/2020/11/16/. list_files requires files be downloaded first.
#> Warning: Found no files in patterns_backfill/2021/08/02/22/2020/11/23/. list_files requires files be downloaded first.
#> Warning: Found no files in patterns_backfill/2021/08/02/22/2020/11/30/. list_files requires files be downloaded first.
#> Warning: Found no files in patterns_backfill/2021/08/02/22/2020/12/07/. list_files requires files be downloaded first.
dt <- read_many_patterns(filelist = filelist, by = 'brands', expand_int = 'visits_by_day')
#> Starting to read NA at 2022-02-04 23:07:42
#> Error in data.table::fread(file = f, ...): File 'NA' does not exist or is non-readable. getwd()=='C:/Users/nhuntington-klein/OneDrive - Seattle University/Documents/GitHub/SafeGraphR/docs/reference'
# Now let's get the normalization files
normlist <- patterns_lookup(lubridate::ymd('2020-9-01') + lubridate::days(0:100),
subfolder = 'normalization_stats',
list_files = TRUE)
#> Warning: The safegraph C19 AWS server will be shut down as of January 31, 2022.
#> Warning: This function has defaults set to still use that server,
#> Warning: but generally will be intentioned for use with enterprise customers.
#> Warning: Found no files in normalization_stats_backfill/2021/08/02/22/2020/08/31/. list_files requires files be downloaded first.
#> Warning: Found no files in normalization_stats_backfill/2021/08/02/22/2020/09/07/. list_files requires files be downloaded first.
#> Warning: Found no files in normalization_stats_backfill/2021/08/02/22/2020/09/14/. list_files requires files be downloaded first.
#> Warning: Found no files in normalization_stats_backfill/2021/08/02/22/2020/09/21/. list_files requires files be downloaded first.
#> Warning: Found no files in normalization_stats_backfill/2021/08/02/22/2020/09/28/. list_files requires files be downloaded first.
#> Warning: Found no files in normalization_stats_backfill/2021/08/02/22/2020/10/05/. list_files requires files be downloaded first.
#> Warning: Found no files in normalization_stats_backfill/2021/08/02/22/2020/10/12/. list_files requires files be downloaded first.
#> Warning: Found no files in normalization_stats_backfill/2021/08/02/22/2020/10/19/. list_files requires files be downloaded first.
#> Warning: Found no files in normalization_stats_backfill/2021/08/02/22/2020/10/26/. list_files requires files be downloaded first.
#> Warning: Found no files in normalization_stats_backfill/2021/08/02/22/2020/11/02/. list_files requires files be downloaded first.
#> Warning: Found no files in normalization_stats_backfill/2021/08/02/22/2020/11/09/. list_files requires files be downloaded first.
#> Warning: Found no files in normalization_stats_backfill/2021/08/02/22/2020/11/16/. list_files requires files be downloaded first.
#> Warning: Found no files in normalization_stats_backfill/2021/08/02/22/2020/11/23/. list_files requires files be downloaded first.
#> Warning: Found no files in normalization_stats_backfill/2021/08/02/22/2020/11/30/. list_files requires files be downloaded first.
#> Warning: Found no files in normalization_stats_backfill/2021/08/02/22/2020/12/07/. list_files requires files be downloaded first.
norm <- read_many_csvs(filelist = normlist, makedate = TRUE)