Download SafeGraph data from AWS COVID Response — safegraph

This is a thin wrapper for aws.s3::s3sync that will aim you at the right directory to synchronize.

safegraph_aws(
  path = ".",
  dataset,
  bucket_only = FALSE,
  base_url = "s3.wasabisys.com",
  key,
  secret,
  region = "",
  prefix = "",
  prefix_is_dir = FALSE,
  s3 = "s3://sg-c19-response/",
  max_print = 1,
  ...
)

Arguments

path: The local directory to synchronize.
dataset: The SafeGraph bucket to get from. Can be "weekly" (new method since July 2021), "weekly-backfill" (the new method for times before July 2021; note AS OF AUGUST 2021 this gives the same result as "weekly" but I've kept "weekly-backfill" here in case it switches back to being different later), "monthly" (method since July 2021; also contains backfill folders as *_backfill/), "neighborhood" (June 2021 and forward), "neighborhood-backfill" (May 2021 and previous), "distancing", "core", "core-canada", "geo-supplement", or, to get the baseline bucket, "none".
bucket_only: Instead of doing an aws.s3::s3sync call, just return the correct bucket as a string. Then you can use that to do your own aws.s3::s3sync call, or work with the AWS CLI.
base_url: The base URL to pull the data from.
key: A character string containing an AWS Access Key ID.
secret: A character string containing an AWS Secret Access Key.
region: A character string containing the AWS region.
prefix: Leading part of the objects in the bucket must have this prefix. For example, to download social distancing data only from 2020, set this to "2020/". Some of the backfill buckets can be tricky because folder structure also includes the release date. For example, for "weekly-backfill" if you want patterns data, you want "patterns_backfill/2021/07/15/15/" and THEN followed by the time period you want like "2021/". If you want backfill data from "monthly", for example patterns, it's "patterns_backfill/2021/07/15/16/", then followed by the year/month. The "neighborhood" buckets use "y=2021/m=06/" etc instead of "2021/06".
prefix_is_dir: If FALSE, the files matching prefix will be downloaded directly to path, which may not be desired behavior if prefix contains a directory (you probably want the directory structure to match!). Set to TRUE to, in effect, replace path with paste0(path, prefix) and so download files to the appropriate folder. Don't use if prefix also contains file characteristics like extension. This is prefix_IS_dir, not prefix_CONTAINS_dir.
s3: The S3 server that stores the data.
max_print: Temporarily set options(max.print) to this value. This will massively speed up the function, as aws.s3::s3sync likes to print the full list of files on the server before moving on. The option will be returned to its original value afterwards. Set to NULL to not alter any options.
...: Additional parameters to be sent to aws.s3::s3sync and from there on to aws.s3:s3HTTP. "direction" will be ignored.

Details

NOTE THE BREAKING CHANGE WITH SafeGraphR 0.4.2: BUCKET NAMES ARE CHANGED AND ACCESS TO OUTDATED VERSIONS OF DATA IS REMOVED.

This function doesn't add too much, but it does make the default behavior you probably want a bit easier. If you plan to specify the aws.s3::s3sync "bucket" option yourself, this function is largely useless.

See catalog.safegraph.io for more description of the various buckets.

Examples


if (FALSE) {

# Download all the recent weekly-patterns files to the working directory
safegraph_aws(dataset = 'weekly', key = 'MYINFO', secret = 'MYOTHERINFO')

}