read_core.Rd
Be aware that the files this is designed to work with are large and this function may take a while to execute. This function takes folder of Core files and reads it them in. The output is a data.table
.
read_core(
dir = "core_poi/",
filter = NULL,
select = NULL,
key = NULL,
secret = NULL,
silent = FALSE,
...
)
The directory that the CORE files are in. If this folder contains multiple months of Core files, it will use the most recent (this only works if you are using the standard AWS file structure).
A character string describing a logical statement for filtering the data, for example filter = 'naics_code == 512131'
would give you only movie theater POIs. Will be used as an i
argument in a data.table
, see help(data.table)
. Filtering here instead of afterwards can cut down on time and memory demands.
Character vector of variables to get from the file. Set to NULL
to get all variables. If you plan to link the results to a patterns file, you will probably want to include 'placekey'
or 'placekey'
in this vector. Note that any variables mentioned in filter
MUST be in select
unless select = NULL
.
A character string containing an AWS Access Key ID. If key
and secret
are both specified, read_core
will download the most recent Core files and process them. This process assumes your system date is set correctly, and will only check this month's Core and last month's Core, since one of those shold exist.
A character string containing an AWS Secret Access Key.
Suppress timing messages.
Other arguments to be passed to data.table::fread
when reading in the CSV
files inside of the ZIP
. For example, nrows
to only read in a certain number of rows.
AS OF SafeGraphR VERSION 0.3.0 THIS FUNCTION ONLY WORKS WITH NEW CORE FILE FORMATS. For old-format Core files, you can still use the less-flexible and otherwise deprecated link_poi_naics()
function.
if (FALSE) {
# Location of our CORE file
# Note we probably don't have to specify 2020/10 if that's the most recent one
dir <- '../SafeGraph/core_poi/2020/10/'
# Let's only get retail POIs in California
# And
locations <- read_core(dir = dir,
filter = 'region == "CA" & floor(naics_code/10000) %in% 44:45')
}