This function accepts a data.table along with a set of grouping variables and a character-format category-style JSON column (i.e. starts with curly brackets, not square).

expand_cat_json(
  dt,
  expand,
  index = "index",
  by = NULL,
  fast = FALSE,
  fun = sum,
  na.rm = TRUE,
  set_key = TRUE
)

Arguments

dt

data.table object (or something that can be coerced to data.table)

expand

String indicating the JSON column to be expanded.

index

String indicating the name of the new index column

by

Character vector indicating the variables to group by after expanding. Set to NULL to aggregate across all initial rows, or set to FALSE to not aggregate at all (this will also add an initial_rowno column showing the original row number).

fast

Assumes that all the JSON vectors are of the exact same categories, and adds the values together rather than using whatever is in fun.

fun

Function that takes a vector and returns a single value to use when collapsing to the by level. Requires fast = FALSE.

na.rm

Ignore missing values of expand

set_key

Set the key of dt to by. Set to FALSE if you have already set the key or want it returned without key.

Details

It expands that JSON column into long format, with one row per observation per value of the JSON column, and then collapses everything according to the set of grouping variables.

Examples


# Raw example data for expanding/collapsing
patterns <- data.table::data.table(state_fips = c(1,1,2,2),
                                   cat_origin = c('{"a": "2", "b": "3"}',
                                                '{"a": "3", "b": "4"}',
                                                '{"a": "4", "b": "5"}',
                                                '{"a": "5", "b": "6"}'))

expand_cat_json(patterns, 'cat_origin', by = 'state_fips')[]
#>    index state_fips cat_origin
#> 1:     a          1          5
#> 2:     b          1          7
#> 3:     a          2          9
#> 4:     b          2         11