Aggregation functions
The AI & Analytics Engine provides a comprehensive list of aggregation functions in data wrangling (see below).
Title | Description |
---|---|
Count | Count of values. It can be specified as non-null values, null values or simply the row count |
Approximate Count Distinct | Number of distinct values approximated using HyperLogLog++. Null values are ignored |
Minimum (Numeric) | Minimum value |
Maximum (Numeric) | Maximum value |
Sum (Numeric) | Sum of values |
Mean/Average (Numeric) | Average of values |
Standard Deviation (Numeric) | Unbiased sample standard deviation of values |
Variance (Numeric) | Unbiased sample variance of values |
Skewness (Numeric) | Skewness of values |
Kurtosis (Numeric) | Kurtosis of values |
Approximate Median (Numeric) | Approximate median of values |
Approximate Quantile (Numeric) | Approximate k-th quantile of values for a given k |
Mode (Numeric) | Mode of the estimated probability distribution |
First value | Value in the first row, ignoring leading null values. Returns null if all values are null |
Last value | Value in the last row, ignoring trailing null values. Returns null if all values are null |
Most frequent value | Most frequent value approximated using an algorithm that is both fast and efficient for large data |
Top K frequent values | Top k values sorted in descending order based on frequencies for a given k, approximated using an algorithm that is fast and efficient for large data |
Earliest (DateTime) | Earliest datetime (timestamp) value |
Latest (DateTime) | Latest datetime (timestamp) value |
All (Boolean) | Whether all values are true |
Any (Boolean) | Whether any value is true |
Not all (Boolean) | Whether any value is false |
Not any (Boolean) | Whether all values are false |
Top K elements (JSONArray) | Top k array elements sorted in descending order based on frequencies for a given k, approximated using an algorithm that is fast and efficient for large data |
These functions can be applied to the following actions over groups/partitions (for details, see Action Catalogue).
- Aggregate Columns within groups
- Look up aggregated columns from another dataset
- Compute window functions
- Reshape dataset into a pivot table
- Resample data into a regular time series