Aggregation functions

The AI & Analytics Engine provides a comprehensive list of aggregation functions in data wrangling (see below).

Title Description
Count Count of values. It can be specified as non-null values, null values or simply the row count
Approximate Count Distinct Number of distinct values approximated using HyperLogLog++. Null values are ignored
Minimum (Numeric) Minimum value
Maximum (Numeric) Maximum value
Sum (Numeric) Sum of values
Mean/Average (Numeric) Average of values
Standard Deviation (Numeric) Unbiased sample standard deviation of values
Variance (Numeric) Unbiased sample variance of values
Skewness (Numeric) Skewness of values
Kurtosis (Numeric) Kurtosis of values
Approximate Median (Numeric) Approximate median of values
Approximate Quantile (Numeric) Approximate k-th quantile of values for a given k
Mode (Numeric) Mode of the estimated probability distribution
First value Value in the first row, ignoring leading null values. Returns null if all values are null
Last value Value in the last row, ignoring trailing null values. Returns null if all values are null
Most frequent value Most frequent value approximated using an algorithm that is both fast and efficient for large data
Top K frequent values Top k values sorted in descending order based on frequencies for a given k, approximated using an algorithm that is fast and efficient for large data
Earliest (DateTime) Earliest datetime (timestamp) value
Latest (DateTime) Latest datetime (timestamp) value
All (Boolean) Whether all values are true
Any (Boolean) Whether any value is true
Not all (Boolean) Whether any value is false
Not any (Boolean) Whether all values are false
Top K elements (JSONArray) Top k array elements sorted in descending order based on frequencies for a given k, approximated using an algorithm that is fast and efficient for large data

These functions can be applied to the following actions over groups/partitions (for details, see Action Catalogue).

  • Aggregate Columns within groups
  • Look up aggregated columns from another dataset
  • Compute window functions
  • Reshape dataset into a pivot table
  • Resample data into a regular time series