Expand and collapse a categorical JSON column — expand_cat

This function accepts a data.table along with a set of grouping variables and a character-format category-style JSON column (i.e. starts with curly brackets, not square).

expand_cat_json(
  dt,
  expand,
  index = "index",
  by = NULL,
  fast = FALSE,
  fun = sum,
  na.rm = TRUE,
  set_key = TRUE
)

Arguments

dt: data.table object (or something that can be coerced to data.table)
expand: String indicating the JSON column to be expanded.
index: String indicating the name of the new index column
by: Character vector indicating the variables to group by after expanding. Set to NULL to aggregate across all initial rows, or set to FALSE to not aggregate at all (this will also add an initial_rowno column showing the original row number).
fast: Assumes that all the JSON vectors are of the exact same categories, and adds the values together rather than using whatever is in fun.
fun: Function that takes a vector and returns a single value to use when collapsing to the by level. Requires fast = FALSE.
na.rm: Ignore missing values of expand
set_key: Set the key of dt to by. Set to FALSE if you have already set the key or want it returned without key.

Details

It expands that JSON column into long format, with one row per observation per value of the JSON column, and then collapses everything according to the set of grouping variables.

Examples


# Raw example data for expanding/collapsing
patterns <- data.table::data.table(state_fips = c(1,1,2,2),
                                   cat_origin = c('{"a": "2", "b": "3"}',
                                                '{"a": "3", "b": "4"}',
                                                '{"a": "4", "b": "5"}',
                                                '{"a": "5", "b": "6"}'))

expand_cat_json(patterns, 'cat_origin', by = 'state_fips')[]
#>    index state_fips cat_origin
#> 1:     a          1          5
#> 2:     b          1          7
#> 3:     a          2          9
#> 4:     b          2         11