This function takes data read in from SafeGraph patterns files that has had expand_integer_json() already applied to its visits_by_day variable (or used the expand_int = 'visits_by_day' option in read_patterns() or read_many_patterns()). It aggregates the data to the date-by level, normalizes according to the size of the sample, calculates a moving average, and also calculates growth since the start_date for each by category. The resulting data.table, with one row per date per combination of by, can be used for results and insight, or passed to graph_template() for a quick graph.

processing_template(
  dt,
  norm = NULL,
  by = NULL,
  date = "date",
  visits_by_day = "visits_by_day",
  origin = 0,
  filter = NULL,
  single_by = NULL,
  ma = 7,
  drop_ma = TRUE,
  first_date = NULL,
  silent = FALSE
)

Arguments

dt

A data.table (or something that can be coerced to data.table).

norm

A data.table containing columns for date, any number of the elements of by, and a final column containing a normalization factor. The visits_by_day values will be divided by that normalization factor after merging. growth_over_time will generate this internally for you, but you can make (a standard version of it) easily by just using read_many_csvs(makedate = TRUE) to load in all of the files in the normalization_stats or normalization_stats_backfill folders from AWS, limiting it to just the all-state rows, and then passing in just the date and total_devices_seen columns. If null, applies no normalization (if your analysis covers a reasonably long time span, you want normalization).

by

A character vector of the variable names that indicate groups to calculate growth separately by.

date

Character variable indicating the date variable.

visits_by_day

Character variable indicating the variable containing the visits_by_day numbers.

origin

The value indicating no growth/initial value. The first date for each group will have this value. Usually 0 (for "0 percent growth") or 1 ("100 percent of initial value").

filter

A character variable describing a subset of the data to include, for example filter = 'state_fips == 6' to only include California.

single_by

A character variable for the name of a new variable that combines all the different variables in by into one variable, handy for passing to graph_template().

ma

Number of days over which to take the moving average.

drop_ma

Drop observations for which adj_visits is missing because of the moving-average adjustment.

first_date

After implementing the moving-average, drop all values before this date and calculate growth starting from this date. If NULL, uses the first date that's not missing after the moving average.

silent

Omit the warning and detailed report that occurs for values of dt that find no match in norm, as well as the one if you try not to normalize at all.

Details

The result is the same data.table that was passed in, with some modifications: the data will be aggregated (using sum) to the date-by level, with visits_by_day as the only other surviving column. Three new columns are added: The normalization variable (from norm, or just a variable norm equal to 1 if norm = NULL), adj_visits, which is visits_by_day adjusted for sample size and with a moving average applied, and growth which tracks the percentage change relative to the earliest value of adj_visits that is not missing.

Examples


# Generally you'd be doing this with data that comes from read_many_patterns()
# But here's an example using randomly generated data

dt <- data.table::data.table(date = rep(lubridate::ymd('2020-01-01') + lubridate::days(0:300),2),
state_fips = c(rep(6, 301), rep(7,301)),
visits_by_day = rpois(602, lambda = 10))

norm <- data.table::data.table(date = rep(lubridate::ymd('2020-01-01') + lubridate::days(0:300),2),
                               state_fips = c(rep(6, 301), rep(7,301)),
                               total_devices_seen = rpois(602, lambda = 10000))

processed_data <- processing_template(dt, norm = norm, by = 'state_fips')