Transform Fold

A fold transformation reshapes the data by unpivoting it.

Suppose a table with variables, column1, column2, category1, and category2. Folding column1 and column2 by category1 results in key, value, category1, and category2 like below.

Before
column1 column2 category1 category2
1 2 ‘A’ ‘a’
3 5 ‘B’ ‘b’
4 6 ‘C’ ‘c’
After
key value category1 category2
column1 1 ‘A’ ‘a’
column2 2 ‘A’ ‘a’
column1 3 ‘B’ ‘b’
column2 5 ‘B’ ‘b’
column1 4 ‘C’ ‘c’
column2 6 ‘C’ ‘c’

Fold properties

Property Type Description
fold Array[String] (Required) An array of field names to fold.
by String (Required) A nominal field to group by.
exclude Boolean (Optional, default: false) Whether to drop other fields (not specified by fold and by).
as Array[String, length=2] (Optional, default: ['key', 'value']) New field names for folded variables.

Usage pattern

JSON

{
  ...
  "transform" : [
    {
      "fold": [
        "column1", "column2"
      ],
      "by": "category1",
      "exclude": true,
      "as": [
        "measure", "value"
      ]
    },
  ]
  ...
}

JavaScript

let stream = new Erie.Stream();
...
let fold = new Erie.Fold(["column1", "column2"], "category1"); // filter expression
// alt) let fold = new Erie.Fold();
//      fold.Fold("datum.cost > 30");
//      fold.by("category1");
fold.exclude(true);
fold.as(["measure", "value"]);
stream.transform.add(fold);
...

Extended pattern with a repeat channel

Using fold with a repeat channel enables expressing intervals repeated over a field.

JSON

{
  ...
  "transform" : [
    {
      "aggregate": [
        {
          "op": "mean",
          "field": "Miles_per_Gallon",
          "as": "Miles_per_Gallon_mean"
        },
        {
          "op": "stdevp",
          "field": "Miles_per_Gallon",
          "as": "Miles_per_Gallon_stdevp"
        }
      ],
      "groupby": [
        "Origin"
      ]
    },
    {
      "calculate": "datum.Miles_per_Gallon_mean - datum.Miles_per_Gallon_stdevp",
      "as": "Miles_per_Gallon_lower"
    },
    {
      "calculate": "datum.Miles_per_Gallon_mean + datum.Miles_per_Gallon_stdevp",
      "as": "Miles_per_Gallon_upper"
    },
    {
      "fold": [
        "Miles_per_Gallon_lower",
        "Miles_per_Gallon_mean",
        "Miles_per_Gallon_upper"
      ],
      "by": "Origin",
      "exclude": true,
      "as": [
        "measure",
        "statistics"
      ]
    }
  ],
  ...
  "encoding": {
    "time": {
      "field": "measure",
      "type": "nominal",
      "scale": {
        "band": 0.5,
        "order": [
          "Miles_per_Gallon_lower",
          "Miles_per_Gallon_mean",
          "Miles_per_Gallon_upper"
        ]
      }
    },
    ...
    "repeat": {
      "field": "Origin",
      ...
    }
    ...
  },
  ...
}

JavaScript

let stream = new Erie.Stream();
...
let aggregate = new Erie.Aggregate();
aggregate.add("mean", "Miles_per_Gallon", "Miles_per_Gallon_mean");
aggregate.add("stdevp", "Miles_per_Gallon", "Miles_per_Gallon_stdevp");
aggregate.groupby(["Origin"]);
stream.transform.add(aggregate);

let calc1 = new Erie.Calculate("datum.Miles_per_Gallon_mean - datum.Miles_per_Gallon_stdevp")
calc1.as("Miles_per_Gallon_lower");
stream.transform.add(calc1);

let calc2 = new Erie.Calculate("datum.Miles_per_Gallon_mean + datum.Miles_per_Gallon_stdevp")
calc2.as("Miles_per_Gallon_upper");
stream.transform.add(calc2);

let fold = new Erie.Fold([
  "Miles_per_Gallon_lower",
  "Miles_per_Gallon_mean",
  "Miles_per_Gallon_upper"
], "Origin");
fold.exclude(true);
fold.as(["measure", "statistics"]);
stream.transform.add(fold);
...

stream.encoding.time.field("measure", "nominal");
stream.encoding.time.scale("timing", "relative");
stream.encoding.time.scale("band", 0.5);
stream.encoding.time.scale("order", [
  "Miles_per_Gallon_lower",
  "Miles_per_Gallon_mean",
  "Miles_per_Gallon_upper"
]);
...
stream.encoding.repeat.field("Origin");
...
© Hyeok Kim