This function takes data read in from SafeGraph patterns files that has had
expand_integer_json() already applied to its
visits_by_day variable (or used the
expand_int = 'visits_by_day' option in
read_many_patterns()). It aggregates the data to the
date-by level, normalizes according to the size of the sample, calculates a moving average, and also calculates growth since the
start_date for each
by category. The resulting
data.table, with one row per
date per combination of
by, can be used for results and insight, or passed to
graph_template() for a quick graph.
processing_template( dt, norm = NULL, by = NULL, date = "date", visits_by_day = "visits_by_day", origin = 0, filter = NULL, single_by = NULL, ma = 7, drop_ma = TRUE, first_date = NULL, silent = FALSE )
data.table (or something that can be coerced to
data.table containing columns for
date, any number of the elements of
by, and a final column containing a normalization factor. The
visits_by_day values will be divided by that normalization factor after merging.
growth_over_time will generate this internally for you, but you can make (a standard version of it) easily by just using
read_many_csvs(makedate = TRUE) to load in all of the files in the
normalization_stats_backfill folders from AWS, limiting it to just the all-state rows, and then passing in just the
total_devices_seen columns. If null, applies no normalization (if your analysis covers a reasonably long time span, you want normalization).
A character vector of the variable names that indicate groups to calculate growth separately by.
Character variable indicating the date variable.
Character variable indicating the variable containing the
The value indicating no growth/initial value. The first date for each group will have this value. Usually 0 (for "0 percent growth") or 1 ("100 percent of initial value").
A character variable describing a subset of the data to include, for example
filter = 'state_fips == 6' to only include California.
A character variable for the name of a new variable that combines all the different variables in
by into one variable, handy for passing to
Number of days over which to take the moving average.
Drop observations for which
adj_visits is missing because of the moving-average adjustment.
After implementing the moving-average, drop all values before this date and calculate growth starting from this date. If
NULL, uses the first date that's not missing after the moving average.
Omit the warning and detailed report that occurs for values of
dt that find no match in
norm, as well as the one if you try not to normalize at all.
The result is the same
data.table that was passed in, with some modifications: the data will be aggregated (using
sum) to the
date-by level, with
visits_by_day as the only other surviving column. Three new columns are added: The normalization variable (from
norm, or just a variable
norm equal to 1 if
norm = NULL),
adj_visits, which is
visits_by_day adjusted for sample size and with a moving average applied, and
growth which tracks the percentage change relative to the earliest value of
adj_visits that is not missing.
# Generally you'd be doing this with data that comes from read_many_patterns() # But here's an example using randomly generated data dt <- data.table::data.table(date = rep(lubridate::ymd('2020-01-01') + lubridate::days(0:300),2), state_fips = c(rep(6, 301), rep(7,301)), visits_by_day = rpois(602, lambda = 10)) norm <- data.table::data.table(date = rep(lubridate::ymd('2020-01-01') + lubridate::days(0:300),2), state_fips = c(rep(6, 301), rep(7,301)), total_devices_seen = rpois(602, lambda = 10000)) processed_data <- processing_template(dt, norm = norm, by = 'state_fips')