Skip to content

Actions and Insights Catalogue

Actions Catalogue

Action Type Input Type Output Type Comments
Cast a column as numeric Column replacement String Numerical
Cast a column as categorical Column replacement String Categorical
Cast a column as datetime Column replacement String DateTime
Drop columns Deletion Any NA
Extract from datetime column: year Add Columns DateTime Numerical Example: "2020-04-15T15:53:44" -> will return: 2020
Extract from datetime column: month Add Columns DateTime Numerical Example: "2020-04-15T15:53:44" -> will return: 04
Extract from datetime column: day Add Columns DateTime Numerical Example: "2020-04-15T15:53:44" -> will return: 15
Extract from datetime column: hour Add Columns DateTime Numerical Example: "2020-04-15T15:53:44" -> will return: 15
Extract from datetime column: minutes Add Columns DateTime Numerical Example: "2020-04-15T15:53:44" -> will return: 53
Extract from datetime column: seconds Add Columns DateTime Numerical Example: "2020-04-15T15:53:44" -> will return: 44
Extract from datetime column: name of day Add Columns DateTime Text Example: "2020-04-15T15:53:44" -> will return: "Wednesday"
Extract from datetime column: day of week Add Columns DateTime Numerical Example: "2020-04-15T15:53:44", Days coding is: Monday=0, Tuesday=1, ..., Sunday=6 -> will return: 02.
Extract from datetime column: month name Add Columns DateTime Text Example: "2020-04-15T15:53:44" -> will return: "April"
Extract from datetime column: quarter Add Columns DateTime Numerical Example: "2020-04-15T15:53:44" -> will return: 02
Extract from datetime column: AM/PM Add Columns DateTime Text Example: "2020-04-15T15:53:44" -> will return: "PM"
Extract from datetime column: timestamp Add Columns DateTime Numerical Returns a POSIX timestamp as float. (Number of seconds since January 1st, 1970). Example: "2020-04-15T15:53:44" -> will return: 1586930024.0.
Add time delta (in seconds) to datetime Column replacement DateTime DateTime
Compute time difference between datetimes (in seconds) Add Columns DateTime Numerical
Impute missing values Column replacement Numerical Numerical
Create missing value indicator/flag column Add Columns Numerical Boolean
Create historical summarizations by ID Add Columns
  • ID Col: Any type
  • Date col: DateTime
  • Numerical
  • Numerical
  • For example, creating columns like average spending last 6 months for each customer, by using the date column, monthly spending and customer ID
    Perform label encoding for a categorical column Add Columns Categorical Categorical E.g. convert the Categories "A", "B", "C", NA to their numerical encoding 0, 1, 2, -1
    Perform one-hot (or dummy) encoding for a categorical column Adding Columns Categorical Multiple Numerical
    Binning numerical columns by value Adding Columns Numerical Categorical Discretise a numeric column into categorical ranges e..g [100, 200, 150] can be binned as ["1-100", "101-200", "101-200"]
    Binning numerical columns by quantiles Add Columns Numerical Categorical
    Binning datetime columns Add Columns Datetime Datetime Creates 2 datetime output columns per each input column which "bin" the input column between them. The binning resolution is either: hourly, half-hourly, or 15 minutes. Example: "2020-04-15T15:53:44" with half-hourly resolution. Output columns: ["2020-04-15T15:30:00", "2020-04-15T16:00:00"]
    Round datetime column Column replacement Datetime Datetime Rounds the time component of a datetime column to 15, 30 or 60 minutes.

    Example: "2020-04-15T15:53:44" with rounding to 30 minutes. Output "2020-04-15T16:00:00"
    Create numeric column from operations between numeric columns Add Columns Numerical Numerical For example compute the difference (c = a -b) or computing the ratio (c = a/b)
    Use autoencoder to generate features Add Columns Many Numerical 2 to 8 Numerical columns Using autoencoder to generate novel and non-linear features based on existing numerical features
    Strip HTML Column replacement Text Text Strip html tags from text e.g. "<emp> Text </emp>" becomes "Text"

    Insights Catalogue

    There are three types of insights the AI & Analytics Engine (the Engine) can provide

    Insight Type Description List of Insights
    Insights regarding data types and schema If the user uploads a dataset, the AI & Analytics Engine will analyze the column types and recommend casting actions to cast the features into the right type.
  • Identify string columns that should be stored as date and infer its ISO format
  • Identify string columns that should be stored categorical columns
  • Identify string columns that should be stored numerical columns
  • Target-less insights If no target column is selected, then the Engine can recommend actions that are independent of target column
  • Date time extraction (e.g. year, month, day) from datetime columns
  • Impute missing values
  • Identify columns with a constant value
  • Identify pair of columns with a one-to-one mapping
  • One-hot encoding of categorical features
  • Identify html content and recommend stripping of html tags
  • Target-based insights If a target column is selected then the Engine can make AI-assisted recommendations that take into account the target column.
  • Identify columns with low correlation and/or low canonical correlation with target
  • Identify columns with suspiciously high correlation and/or high canonical correlation with target; indicating potential target leakage issues
  • Identify columns whose missingness is predictive of target
  • Identify datetime components that are predictive of target and recommend their creation.
  • Recommend creating new features with autoencoder neural network
  • Identify that summarization of numeric columns over historical periods is possible (e.g. average spending over last 6 months)