An aggregate transformation offers aggregating one or more fields.
Because aggregate drops columns that are not specified by aggregate or groupby,
make sure that include all fields needed to be aggregated.
Aggregate properties
| Property | Type | Description |
|---|---|---|
aggregate |
Array[AggregateOp] |
One or more data aggregation operation. |
groupby |
Array[String] |
Nominal fields to group the aggregates by. |
aggregate operation definition
| Property | Type | Description |
|---|---|---|
op |
String |
An aggregate operation. |
field |
String|Array[String] |
A field(s) to aggregate. Some operations require two fields. |
p |
Number[0-1] |
A quantile threshould. |
as |
String |
(Optional) A new field name. |
Supported aggregate operations
For detailed documentation, refer to this page (Arquero).
Zero-field operation
No field property is required. If provided, it will be ignored.
count: the count of elements
Single-field operation
The field should be a string for a single field.
valid: the number of valid items in a variable-
distinct: the number of distinct items in a variable mean/average: the mean of a variablemode: the mode of a variablemedian: the median of a variablequantile: thepquantile value of a variablestdev: the standard deviation of a variablestdevp: the population standard deviation of a variablevariance: the variance of a variable-
variancep: the population variance of a variable sum: the sum of a variableproduct: the product (multiplication) of a variablemax: the maximum value of a variablemin: the min value of a variable
Double-field operation
The field should be a string for a single field.
corr: the correlation of two variablescovariance: the covariance of two variablescovariancep: the population covariance of two variables
Usage pattern
JSON
{
...
"transform" : [
{
"aggregate": [
{ "op": "count", "as": "count" },
{ "op": "mean", "field": "cost", "as": "mean_cost" },
{ "op": "quantile", "field": "cost", "p": 0.7 "as": "top_30_cost" },
{ "op": "covariance", "field": ["cost", "expense"], "as": "cost_expense_covariance" }
],
"groupby": [
"category", "country"
]
},
]
...
}JavaScript
let stream = new Erie.Stream();
...
let aggregate = new Erie.Aggregate();
aggregate.add("count", "count"); // op, as
aggregate.add("mean", "cost", "count");
aggregate.add("quantile", "cost", "top_30_cost", 0.7); // op, field, as, p
aggregate.add("covariance", ["cost", "expense"], "cost_expense_covariance");
aggregate.groupby(["category", "country"]);
stream.transform.add(aggregate);
...
Erie Documentation (Future)