Aggregation Functions

The AI & Analytics Engine provides a comprehensive list of aggregation functions in data wrangling (see below).

Title Description
1 Count Count of values. It can be specified as non-null values, null values or simply the row count
2 Approximate Count Distinct Number of distinct values approximated using HyperLogLog++. Null values are ignored
3 Minimum (Numeric) Minimum value
4 Maximum (Numeric) Maximum value
5 Sum (Numeric) Sum of values
6 Mean/Average (Numeric) Average of values
7 Standard Deviation (Numeric) Unbiased sample standard deviation of values
8 Variance (Numeric) Unbiased sample variance of values
9 Skewness (Numeric) Skewness of values
10 Kurtosis (Numeric) Kurtosis of values
11 Approximate Median (Numeric) Approximate median of values
12 Approximate Quantile (Numeric) Approximate k-th quantile of values for a given k
13 Mode (Numeric) Mode of the estimated probability distribution
14 First value Value in the first row, ignoring leading null values. Returns null if all values are null
15 Last value Value in the last row, ignoring trailing null values. Returns null if all values are null
16 Most frequent value Most frequent value approximated using an algorithm that is both fast and efficient for large data
17 Top K frequent values Top k values sorted in descending order based on frequencies for a given k, approximated using an algorithm that is fast and efficient for large data
18 Earliest (DateTime) Earliest datetime (timestamp) value
19 Latest (DateTime) Latest datetime (timestamp) value
20 All (Boolean) Whether all values are true
21 Any (Boolean) Whether any value is true
22 Not all (Boolean) Whether any value is false
23 Not any (Boolean) Whether all values are false
24 Top K elements (JSONArray) Top k array elements sorted in descending order based on frequencies for a given k, approximated using an algorithm that is fast and efficient for large data

These functions can be applied to the following actions over groups/partitions (for details, see Action Catalogue).

  • Aggregate Columns within groups
  • Look up aggregated columns from another dataset
  • Compute window functions
  • Reshape dataset into a pivot table
  • Resample data into a regular time series