processing_template.Rd
This function takes data read in from SafeGraph patterns files that has had expand_integer_json()
already applied to its visits_by_day
variable (or used the expand_int = 'visits_by_day'
option in read_patterns()
or read_many_patterns()
). It aggregates the data to the date-by
level, normalizes according to the size of the sample, calculates a moving average, and also calculates growth since the start_date
for each by
category. The resulting data.table
, with one row per date
per combination of by
, can be used for results and insight, or passed to graph_template()
for a quick graph.
processing_template(
dt,
norm = NULL,
by = NULL,
date = "date",
visits_by_day = "visits_by_day",
origin = 0,
filter = NULL,
single_by = NULL,
ma = 7,
drop_ma = TRUE,
first_date = NULL,
silent = FALSE
)
A data.table
(or something that can be coerced to data.table
).
A data.table
containing columns for date
, any number of the elements of by
, and a final column containing a normalization factor. The visits_by_day
values will be divided by that normalization factor after merging. growth_over_time
will generate this internally for you, but you can make (a standard version of it) easily by just using read_many_csvs(makedate = TRUE)
to load in all of the files in the normalization_stats
or normalization_stats_backfill
folders from AWS, limiting it to just the all-state rows, and then passing in just the date
and total_devices_seen
columns. If null, applies no normalization (if your analysis covers a reasonably long time span, you want normalization).
A character vector of the variable names that indicate groups to calculate growth separately by.
Character variable indicating the date variable.
Character variable indicating the variable containing the visits_by_day
numbers.
The value indicating no growth/initial value. The first date for each group will have this value. Usually 0 (for "0 percent growth") or 1 ("100 percent of initial value").
A character variable describing a subset of the data to include, for example filter = 'state_fips == 6'
to only include California.
A character variable for the name of a new variable that combines all the different variables in by
into one variable, handy for passing to graph_template()
.
Number of days over which to take the moving average.
Drop observations for which adj_visits
is missing because of the moving-average adjustment.
After implementing the moving-average, drop all values before this date and calculate growth starting from this date. If NULL
, uses the first date that's not missing after the moving average.
Omit the warning and detailed report that occurs for values of dt
that find no match in norm
, as well as the one if you try not to normalize at all.
The result is the same data.table
that was passed in, with some modifications: the data will be aggregated (using sum
) to the date-by
level, with visits_by_day
as the only other surviving column. Three new columns are added: The normalization variable (from norm
, or just a variable norm
equal to 1 if norm = NULL
), adj_visits
, which is visits_by_day
adjusted for sample size and with a moving average applied, and growth
which tracks the percentage change relative to the earliest value of adj_visits
that is not missing.
# Generally you'd be doing this with data that comes from read_many_patterns()
# But here's an example using randomly generated data
dt <- data.table::data.table(date = rep(lubridate::ymd('2020-01-01') + lubridate::days(0:300),2),
state_fips = c(rep(6, 301), rep(7,301)),
visits_by_day = rpois(602, lambda = 10))
norm <- data.table::data.table(date = rep(lubridate::ymd('2020-01-01') + lubridate::days(0:300),2),
state_fips = c(rep(6, 301), rep(7,301)),
total_devices_seen = rpois(602, lambda = 10000))
processed_data <- processing_template(dt, norm = norm, by = 'state_fips')