This function, given a date or range of dates, will return a character vector of folder paths in the weekly (new or backfill) data you will need to run through list.files(pattern = '.csv.gz', full.names = TRUE) after downloading files (or just set list_files = TRUE. This is done because the subfolder after this is based on the hour the data is released, which can't be predicted ahead of time for future weeks.

patterns_lookup(
  dates,
  dir = NULL,
  old_dir = NULL,
  new_dir = NULL,
  subfolder = "patterns",
  silent = FALSE,
  add_ma = 0,
  patterns_backfill_date = "2021/08/02/22/",
  old_date_split = lubridate::ymd("2021-07-11"),
  old_bucket = "weekly-backfill",
  new_bucket = "weekly",
  key = NULL,
  secret = NULL,
  list_files = FALSE,
  ...
)

Arguments

dates

A vector of Date objects (perhaps taking a single Date object and adding +lubridate::days(0:finish)) to find the associated files for.

dir

If specified, will append dir to the start of the filepaths, to get full filepaths. If using both "old" (pre-June 15, 2020) and "new" (post) dates, this will only work if both the "patterns_backfill" (old) and "patterns" (new) folders are in the same folder. Superseded by old_dir and new_dir for old and new files, respectively.

old_dir

If specified, will append old_dir to the start of the filepaths for all "old" (pre-Dec 7 2020) files. This should be the folder that contains the patterns_backfill folder.

new_dir

If specified, will append new_dir to the start of the filepaths for all "new" (post-Dec 7, 2020) files. This should be the folder that contains the patterns folder.

subfolder

Which folder in the AWS bucket to look at. Will append "_backfill" for backfill data. Usually this is "patterns", "normalization_data", or "home_panel_summary".

silent

If specified, will omit the warning for using any dates after the package author last checked the consistency of the SafeGraph file structure.

add_ma

Also looks at the add_ma days before the dates listed in dates, so you can calculate an add_ma-day moving average. Or you could just change the dates argument yourself to allow this.

patterns_backfill_date

Character variable with the folder structure for the most recent patterns_backfill pull. i.e., the 2018, 2019, and 2020 folders containing backfill data in their subfolders should set in the paste0(old_dir,'/patterns_backfill/',patterns_backfill_date) folder.

old_date_split

Date indicating the last day on which "old" data is present, before switching to the "new" data structure.

old_bucket, new_bucket

The safegraph_aws() dataset argument for the buckets containing the old and new data, respectively.

key

A character string containing an AWS Access Key ID. If key and secret are both specified, patterns_lookup will download all the files it finds.

secret

A character string containing an AWS Secret Access Key.

list_files

After creating folderpaths (and, possibly, downloading files), run each of them through list.files(pattern = '.csv', recursive = TRUE, full.names = TRUE) to get a usable list of files. This only works if all the files have already been downloaded.

...

Arguments to be passed to safegraph_aws().

Examples


# We have already downloaded all of AWS data into the working directory and just need to locate and load it
# (if we also wanted to download, we could leave off list_files and pass this to safegraph_aws,
# or add our key and secret here and it would download)
filelist <- patterns_lookup(lubridate::ymd('2020-9-01') + lubridate::days(0:100),
                             list_files = TRUE)
#> Warning: The safegraph C19 AWS server will be shut down as of January 31, 2022.
#> Warning: This function has defaults set to still use that server,
#> Warning: but generally will be intentioned for use with enterprise customers.
#> Warning: Found no files in patterns_backfill/2021/08/02/22/2020/08/31/. list_files requires files be downloaded first.
#> Warning: Found no files in patterns_backfill/2021/08/02/22/2020/09/07/. list_files requires files be downloaded first.
#> Warning: Found no files in patterns_backfill/2021/08/02/22/2020/09/14/. list_files requires files be downloaded first.
#> Warning: Found no files in patterns_backfill/2021/08/02/22/2020/09/21/. list_files requires files be downloaded first.
#> Warning: Found no files in patterns_backfill/2021/08/02/22/2020/09/28/. list_files requires files be downloaded first.
#> Warning: Found no files in patterns_backfill/2021/08/02/22/2020/10/05/. list_files requires files be downloaded first.
#> Warning: Found no files in patterns_backfill/2021/08/02/22/2020/10/12/. list_files requires files be downloaded first.
#> Warning: Found no files in patterns_backfill/2021/08/02/22/2020/10/19/. list_files requires files be downloaded first.
#> Warning: Found no files in patterns_backfill/2021/08/02/22/2020/10/26/. list_files requires files be downloaded first.
#> Warning: Found no files in patterns_backfill/2021/08/02/22/2020/11/02/. list_files requires files be downloaded first.
#> Warning: Found no files in patterns_backfill/2021/08/02/22/2020/11/09/. list_files requires files be downloaded first.
#> Warning: Found no files in patterns_backfill/2021/08/02/22/2020/11/16/. list_files requires files be downloaded first.
#> Warning: Found no files in patterns_backfill/2021/08/02/22/2020/11/23/. list_files requires files be downloaded first.
#> Warning: Found no files in patterns_backfill/2021/08/02/22/2020/11/30/. list_files requires files be downloaded first.
#> Warning: Found no files in patterns_backfill/2021/08/02/22/2020/12/07/. list_files requires files be downloaded first.

dt <- read_many_patterns(filelist = filelist, by = 'brands', expand_int = 'visits_by_day')
#> Starting to read NA at 2022-02-04 23:07:42
#> Error in data.table::fread(file = f, ...): File 'NA' does not exist or is non-readable. getwd()=='C:/Users/nhuntington-klein/OneDrive - Seattle University/Documents/GitHub/SafeGraphR/docs/reference'

# Now let's get the normalization files

normlist <- patterns_lookup(lubridate::ymd('2020-9-01') + lubridate::days(0:100),
                            subfolder = 'normalization_stats',
                            list_files = TRUE)
#> Warning: The safegraph C19 AWS server will be shut down as of January 31, 2022.
#> Warning: This function has defaults set to still use that server,
#> Warning: but generally will be intentioned for use with enterprise customers.
#> Warning: Found no files in normalization_stats_backfill/2021/08/02/22/2020/08/31/. list_files requires files be downloaded first.
#> Warning: Found no files in normalization_stats_backfill/2021/08/02/22/2020/09/07/. list_files requires files be downloaded first.
#> Warning: Found no files in normalization_stats_backfill/2021/08/02/22/2020/09/14/. list_files requires files be downloaded first.
#> Warning: Found no files in normalization_stats_backfill/2021/08/02/22/2020/09/21/. list_files requires files be downloaded first.
#> Warning: Found no files in normalization_stats_backfill/2021/08/02/22/2020/09/28/. list_files requires files be downloaded first.
#> Warning: Found no files in normalization_stats_backfill/2021/08/02/22/2020/10/05/. list_files requires files be downloaded first.
#> Warning: Found no files in normalization_stats_backfill/2021/08/02/22/2020/10/12/. list_files requires files be downloaded first.
#> Warning: Found no files in normalization_stats_backfill/2021/08/02/22/2020/10/19/. list_files requires files be downloaded first.
#> Warning: Found no files in normalization_stats_backfill/2021/08/02/22/2020/10/26/. list_files requires files be downloaded first.
#> Warning: Found no files in normalization_stats_backfill/2021/08/02/22/2020/11/02/. list_files requires files be downloaded first.
#> Warning: Found no files in normalization_stats_backfill/2021/08/02/22/2020/11/09/. list_files requires files be downloaded first.
#> Warning: Found no files in normalization_stats_backfill/2021/08/02/22/2020/11/16/. list_files requires files be downloaded first.
#> Warning: Found no files in normalization_stats_backfill/2021/08/02/22/2020/11/23/. list_files requires files be downloaded first.
#> Warning: Found no files in normalization_stats_backfill/2021/08/02/22/2020/11/30/. list_files requires files be downloaded first.
#> Warning: Found no files in normalization_stats_backfill/2021/08/02/22/2020/12/07/. list_files requires files be downloaded first.
norm <- read_many_csvs(filelist = normlist, makedate = TRUE)