Be aware that the files this is designed to work with are large and this function may take a while to execute. This function takes folder of Core files and reads it them in. The output is a data.table.

read_core(
  dir = "core_poi/",
  filter = NULL,
  select = NULL,
  key = NULL,
  secret = NULL,
  silent = FALSE,
  ...
)

Arguments

dir

The directory that the CORE files are in. If this folder contains multiple months of Core files, it will use the most recent (this only works if you are using the standard AWS file structure).

filter

A character string describing a logical statement for filtering the data, for example filter = 'naics_code == 512131' would give you only movie theater POIs. Will be used as an i argument in a data.table, see help(data.table). Filtering here instead of afterwards can cut down on time and memory demands.

select

Character vector of variables to get from the file. Set to NULL to get all variables. If you plan to link the results to a patterns file, you will probably want to include 'placekey' or 'placekey' in this vector. Note that any variables mentioned in filter MUST be in select unless select = NULL.

key

A character string containing an AWS Access Key ID. If key and secret are both specified, read_core will download the most recent Core files and process them. This process assumes your system date is set correctly, and will only check this month's Core and last month's Core, since one of those shold exist.

secret

A character string containing an AWS Secret Access Key.

silent

Suppress timing messages.

...

Other arguments to be passed to data.table::fread when reading in the CSV files inside of the ZIP. For example, nrows to only read in a certain number of rows.

Details

AS OF SafeGraphR VERSION 0.3.0 THIS FUNCTION ONLY WORKS WITH NEW CORE FILE FORMATS. For old-format Core files, you can still use the less-flexible and otherwise deprecated link_poi_naics() function.

Examples


if (FALSE) {
# Location of our CORE file
# Note we probably don't have to specify 2020/10 if that's the most recent one
dir <- '../SafeGraph/core_poi/2020/10/'

# Let's only get retail POIs in California
# And
locations <- read_core(dir = dir,
                       filter = 'region == "CA" & floor(naics_code/10000) %in% 44:45')
}