Skip to content

Data Types and Schema

The AI & Analytics Engine (the Engine) follows a simplified data type system for handling data types. You can see the AI & Analytics Engine's columns types are decidedly simpler than the types that exists in other languages.

DataFrame and Columns and Types

Each dataset on the Engine is represented by a DataFrame. A DataFrame consists of a collection of Columns where each Column is a vector of the same length. Each column also belongs to one of these types

Column Type Description Corresponding Types
Numeric Columns Integers and continuous real numbers
  • Julia: Float32, Float64, Int, Int32, Int64, BigInt
  • Python: double, int
  • R: numeric, double, int, bigint
  • Boolean Columns True or False
  • Julia: Bool
  • Python: bool, numpy.bool_
  • R: logical
  • Categorical Columns Discrete categories
  • Julia: CategoricalVector{Any, Int}
  • Python: NA
  • R: factor
  • Text Columns Free form text
  • Julia: String
  • Python: str, object (pandas)
  • R: character
  • DateTime Columns Date time
  • Julia: Date, DateTime
  • Python: str, object (pandas)
  • R: Date, POSIXt
  • Schema

    A schema is attached to a dataset and it contains information about the column names, and the type of each column. A dataset must have have a schema before actions can be appplied on it. Therefore, every dataset on the AI & Analytics Engine platform has an associated schema, unless it has just been uploaded.

    If the user uploads a dataset without a schema (e.g. a CSV or parquet file), then the platform will infer a schema for the dataset, and recommend the appropriate casting actions to convert the columns into the right types.