Skip to content

Actions and Insights Catalogue

Actions Catalogue

Casting

Cast columns to categorical type - cast_categorical

Cast the selected columns to categorical type.

Parameters Parameter Description Default Parameter Type
categories_specified If set to 'true', categories must be specified explicitly. If set to 'false', every unique value in column will become a category False boolean
categories Categories in the column string-list

Cast columns to datetime type - cast_datetime

Cast the selected columns to datetime type with specified format string

Parameters Parameter Description Default Parameter Type
datetime_format Datetime format string in the strftime format. %Y-%m-%dT%H:%M:%S.%fZ string

Cast columns to numeric type - cast_numeric

Cast the selected columns to numeric type.

Column Group

Take - take

Extracts specified components from a column group. If the component is not found, returns a null column.

Parameters Parameter Description Default Parameter Type
suffix The component suffix to take from the group. E.g.If the group columns are ["__group_grp_a", "__group_grp_b"] and suffix=="a", the taken column will be "__group_grp_a". string
index The component index (zero based) to take from the group. E.g.If the group columns are ["__group_grp_a", "__group_grp_b"] and index==1, the taken column will be "__group_grp_b". 0 integer
address_type Method of selection of group column component to take single-select

Conversion

Convert numeric column to datetime - numeric_to_datetime

Converts the given numeric columns into datetime, interpreting them as offset values in the given unit since the specified UTC time.

Parameters Parameter Description Default Parameter Type
rel_to Specifies the "origin" UTC datetime. The conversion from numeric to datetime shall be relative to this "origin". 1970-01-01 string
units Specifies the unit of time that the values in the numeric column represent. single-select

Data Cleaning/Processing

Impute columns - impute_column

Impute the missing values in your data

Parameters Parameter Description Default Parameter Type
numeric_fill_value Filler value for Numeric columns when strategy is constant float
numeric_strategy Strategy for imputing Numeric columns mean single-select
categorical_fill_value Filler value for Categoric columns when strategy is constant string
categorical_strategy Strategy for imputing Categorical columns most_frequent single-select

Copy - copy

Creates a copy of the specified columns OR column groups

Concatenate with delimiter - concat_cols_delim

Concatenates one or more text columns / column groups with a specified delimiter.

Parameters Parameter Description Default Parameter Type
delimiter Delimiter to be appended between concatenated values string

Drop - drop

Drop columns and/or column groups

Strip HTML tags - strip_html

Strip HTML tags from text to convert webpages to text documents, sans HTML formatting

Examples:

Strip html tags from text e.g. ""<emp> Text </emp>"" becomes ""Text""

Date/DateTime

Extract day of week name - extract_day_of_week_name

Extract the name of the day in week ('Monday' to 'Sunday')

Examples:

Example: 2020-04-15T15:53:44 -> will return: Wednesday

Extract day of month - extract_day_of_month_number

Extract the day of month as a number from 1 to 31

Extract AM/PM - extract_am_pm

Extracts the AM/PM value for rows of a datetime column

Examples:

Example: 2020-04-15T15:53:44 -> will return: PM

Extract month number - extract_month_number

Extract the month as a number from 1 to 12

Extract minutes - extract_minutes

Extract minutes after the hour as a number from 0 to 59

Examples:

Example: 2020-04-15T15:53:44 -> will return: 53

Extract month name - extract_month_name

Extracts the months names of a datetime column

Examples:

Example: 2020-04-15T15:53:44 -> will return: April

Extract quarter - extract_quarter

Extract the quarter (3-month period) of the year in which the dates fall as an integer from 1 to 4.

Examples:

Example: 2020-04-15T15:53:44 -> will return: 02

Extract seconds - extract_seconds

Extract seconds after the minute as a number from 0 to 59

Examples:

Example: 2020-04-15T15:53:44 -> will return: 44

Extract timestamp - extract_timestamp

Extracts the POSIX timestamp. (Number of seconds since January 1st, 1970

Examples:

Returns a POSIX timestamp as float. (Number of seconds since January 1st, 1970).

Example: 2020-04-15T15:53:44 -> will return: 1586930024.0.

Extract week number in year - extract_week_number_in_year

Extracts the week number (between 1-52)

Extract year - extract_year

Extract the year as a number

Examples:

Example: 2020-04-15T15:53:44 -> will return: 2020

Round datetime - round_datetime

Rounds a DateTime column (in place) to 15, 30 or 60 minutes

Parameters Parameter Description Default Parameter Type
round_to Time rounding resolution. E.g. for "half_hour", 14:39:11 --> 14:30:00 single-select
Examples:

Rounds the time component of a datetime column to 15, 30 or 60 minutes.

Example: ""2020-04-15T15:53:44"" with rounding to 30 minutes. Output ""2020-04-15T16:00:00""

Extract hours - extract_hour

Extract hour of the day as a number from 0 to 23

Examples:

Example: 2020-04-15T15:53:44 -> will return: 15

Compute time elapsed - diff_datetime

Calculates the time difference in seconds between two datetime columns

Parameters Parameter Description Default Parameter Type
units Specifies the units of the time difference result Seconds single-select

Add time delta - add_timedelta

Adds a time delta (in seconds) to one or more datetime columns. Changes the columns in place.

Parameters Parameter Description Default Parameter Type
time_delta_sec This is the time-duration in seconds to be added to one or more datetime columns. 0 integer

Extract day of week number - extract_day_of_week_number

Extract the day of week as a number ('Sunday' = 1, 'Saturday' = 7)

Examples:

Example: 2020-04-15T15:53:44, Days coding is: Monday=0, Tuesday=1, ..., Sunday=6 -> will return: 02.

Create cyclic encoding features - cyclic_encode

A directive that receives one or more columns that are cyclical in nature. denote the column as 'x' and the period of the column as 'p'. For each such columns, creates 2 new columns which are: x_sin = sin(frac{2\pix}p), x_cos = cos(frac{2\pix}p)

Parameters Parameter Description Default Parameter Type
period The period of the cyclic feature 1 float

Feature Engineering

One hot encode categorical columns - one_hot_encode

This action converts categorical columns by one-hot encoding. Missing values remain as null. Unseen values are encoded as all 0s, when a finalized recipe with this action is applied on new data

Create features using autoencoder neural network - autoencoder

Create new numerical features by combining existing numerical features in novel and non-linear ways using an autoencoder

Parameters Parameter Description Default Parameter Type
num_epochs The number of iterations or epochs for optimizing the loss function over the training set. 100 range
batch_size The size of a batch of examples used to compute the Stochastic Gradient updates. 128 range
impute Strategy for imputing missing values mean single-select
create_missing_flag Create a new feature based on missing values True boolean
standardize Standardize, if true, will transform each feature value by subtracting the mean-value and dividing the result by the standard deviation of the feature True boolean
learning_rate Learning rates control the size of the update to parameters during each step. 0.001 range
Examples:

Using autoencoder to generate novel and non-linear features based on existing numerical features

Label encode categorical columns - label_encode

Encode each input categorical column into a new column of non-negative integers, following either alphabetical (by default) or customised order. Null values and unseen labels (when a finalized recipe fitted by a dataset is applied on new data) remain as null.

Parameters Parameter Description Default Parameter Type
order_of_categories Category list that determines the order of labels [] string-list
Examples:

E.g. convert the Categories ""A"", ""B"", ""C"", NA to their numerical encoding 0, 1, 2, -1

Create historical summary aggregation features - historical_summ_agg

Create historical summary aggregation features, e.g. the average purchase amount over the last 6 months

Parameters Parameter Description Default Parameter Type
timeframe Time frame for windowing 3 integer
aggregation_functions Aggregation methods such as "min" or "max" are functions that reduce a column of values to a single value [sum] multi-select
Examples:

For example, creating columns like average spending last 6 months for each customer, by using the date column, monthly spending and customer ID

Bin columns - bin_columns

This action bins one or more numeric columns by specifying the number of equally-spaced bins or a list of bin edges

Parameters Parameter Description Default Parameter Type
method Whether to use equally-spaced bins or custom cut positions single-select
n_bins Number of equally-spaced bins. This creates n equally-spaced bins from the minimum to maximum in the column. integer
output_format Format for output labels [lower, upper) single-select
cut_positions Custom cut positions in ascending order float-list
Examples:

Discretise a numeric column into categorical ranges e..g [100, 200, 150] can be binned as [""1-100"", ""101-200"", ""101-200""]

Create missing flag - create_missing_flag

For every input column, create a boolean column flagging the locations of missing values.

Calculate math function (2) - calculate_math_function2

Calculate a math function that takes two inputs. If invalid inputs are encountered (such as division by zero), the outputs will be null.

Parameters Parameter Description Default Parameter Type
math_function Math function applied to input columns single-select
Examples:

For example compute the difference (c = a -b) or computing the ratio (c = a/b)

Calculate math function - calculate_math_function

Calculate a math function on selected columns. Invalid values such as negative input for a logarithmic function result in null outputs.

Parameters Parameter Description Default Parameter Type
math_function Math function applied to columns single-select

Bin datetime - bin_datetime

Bins a DateTime column (creates 2 new columns) to 15, 30 or 60 minutes

Parameters Parameter Description Default Parameter Type
bin_res Upper/lower bounds to round towards in order to bin the time. E.g.: 14:53:15 --> [14:45:00, 15:00:00] when using "quarter_hour" resolution single-select
Examples:

Creates 2 datetime output columns per each input column which ""bin"" the input column between them. The binning resolution is either: hourly, half-hourly, or 15 minutes. Example: ""2020-04-15T15:53:44"" with half-hourly resolution. Output columns: [""2020-04-15T15:30:00"", ""2020-04-15T16:00:00""]

Bin columns by quantiles - bin_columns_by_quantiles

This action bins one or more numeric columns by specifying the number of uniformly-split quantiles or a list of custom quantiles.

Parameters Parameter Description Default Parameter Type
split_quantiles Custom split quantiles in ascending order float-list
n_bins Number of uniformly-split quantiles. This creates n bins based on uniformly-split quantiles from 0 to 1 integer
method Whether to use uniformly-split quantiles or custom split quantiles single-select
output_format Format for output labels [lower, upper) single-select

Text

Substitute using regex - regex_sub

Finds and replaces string patterns in a column via Regex. Can optionally specify additional pattern to filter by, in which case the scope of the find/replace will be restricted to records that match the specified pattern in the column.

Parameters Parameter Description Default Parameter Type
sub_string String used to replace values in places where sub_pattern matched the values. string
sub_pattern The pattern that needs to be substituted string
match_pattern Specify this to restrict the scope of replacement .* string

Strip whitespaces - strip_whitespaces

Strips whitespaces from rows of chosen Text columns.

Parameters Parameter Description Default Parameter Type
mode The method for stripping the whitespaces. Stripping can be done at the beginning, end, both of the strings, and can even strip ALL of the whitespaces. single-select

Split text column on delimiter - split_on_delimiter

For a given column, extracts text components between the delimiters into a column group. The size of the column group corresponds to the largest number of components found in the column or by the limit on the number of splits. Trailing null values are created for records containing fewer components.

Parameters Parameter Description Default Parameter Type
num_of_splits An integer which controls the number of times the delimiter regex is applied. 2 integer
limit_num_splits Boolean flag indicating wether you wish to limit the number of splits or not. False boolean
delimiter_type Type of delimiter Verbatim single-select
delimiter Delimiter to be used for splitting the string. string

Remove prefix - remove_prefix

Removes a string prefix from the rows of a chosen string column, if the string is found.

Parameters Parameter Description Default Parameter Type
prefix_string The prefix string to be stripped string

Remove punctuation - remove_punctuation

Removes the punctuation from the rows of a chosen string column. The punctuation symbols to remove are given by the user as a string

Parameters Parameter Description Default Parameter Type
punc_symbols The punctuation symbols to strip !"#$%&'()*+,-./:;<=>?@[\]^_`{|}~ string

Find all matches of a regular expression - regex_findall

Creates an output column group containing all the strings that match the regex pattern.

Parameters Parameter Description Default Parameter Type
pattern A regular expression pattern for the matching string

Extract capture groups - extract_capture_groups

Extracts one or more regex capture groups into a column group

Parameters Parameter Description Default Parameter Type
pattern A regular expression pattern containing one or more capture groups string

Change text case - change_text_case

Changes text columns to selected text case (uppercase, lowercase, etc.)

Parameters Parameter Description Default Parameter Type
case The required change in case of values where "Upper" converts all to uppercase, "Lower" converts all to lowercase and "Capitalize" converts first character to uppercase and remaining to lowercase. single-select

Create flag for regex match - regex_match

Creates an output boolean column which is the result of matching the input column against a regex pattern

Parameters Parameter Description Default Parameter Type
pattern A regular expression pattern for the matching string

Remove suffix - remove_suffix

Removes a string suffix from the rows of a chosen string column, if the string is found.

Parameters Parameter Description Default Parameter Type
suffix_string The suffix string to strip string

Windowing

Count the number of values over partition - window_aggregate_count

This action counts the number of values in one or more input column(s) partitioned by other selected column(s).

Parameters Parameter Description Default Parameter Type
value_type What to count. The default is to count only non-null values. non_null single-select

Aggregate numeric column(s) over partition - window_aggregate_numeric

This action applies a window aggregation function on one or more numeric columns partitioned by other selected column(s).

Parameters Parameter Description Default Parameter Type
relative_error Relative error for approximating median or a specific quantile. Must be no less than 0.001 for computational efficiency. float
agg_function Aggregation function applied to input column(s) single-select
quantile_to_compute Specify which quantile to compute for the quantile aggregation function. Must be a number between 0 and 1. float

Make a boolean indicator over partition - window_aggregate_indicate_boolean

This action applies a window aggregation function as an overall indicator on one or more boolean columns partitioned by other selected column(s).

Parameters Parameter Description Default Parameter Type
indicator_function Indicator function applied to input column(s) single-select

Get datetime value by partition - window_aggregate_datetime

This action applies a window aggregation function on one or more datetime column(s) partitioned by other selected column(s).

Parameters Parameter Description Default Parameter Type
agg_function Aggregation function applied to input column(s) single-select

Miscellaneous

Extract UTC offset - extract_utc_offset

Extracts the UTC offset (in hours) from timezone aware string timestamps

Parameters Parameter Description Default Parameter Type
date_format The date_format of the column %Y-%m-%dT%H:%M:%S%z string

Insights Catalogue

There are three types of insights the AI & Analytics Engine (the Engine) can provide

Insight Type Description List of Insights
Insights regarding data types and schema If the user uploads a dataset, the AI & Analytics Engine will analyze the column types and recommend casting actions to cast the features into the right type.
  • Identify string columns that should be stored as date and infer its ISO format
  • Identify string columns that should be stored categorical columns
  • Identify string columns that should be stored numerical columns
  • Target-less insights If no target column is selected, then the Engine can recommend actions that are independent of target column
  • Date time extraction (e.g. year, month, day) from datetime columns
  • Impute missing values
  • Identify columns with a constant value
  • Identify pair of columns with a one-to-one mapping
  • One-hot encoding of categorical features
  • Identify html content and recommend stripping of html tags
  • Target-based insights If a target column is selected then the Engine can make AI-assisted recommendations that take into account the target column.
  • Identify columns with low correlation and/or low canonical correlation with target
  • Identify columns with suspiciously high correlation and/or high canonical correlation with target; indicating potential target leakage issues
  • Identify columns whose missingness is predictive of target
  • Identify datetime components that are predictive of target and recommend their creation.
  • Recommend creating new features with autoencoder neural network
  • Identify that summarization of numeric columns over historical periods is possible (e.g. average spending over last 6 months)