Data Types and Schema
The AI & Analytics Engine (the Engine) follows a simplified data type system for handling data types. You can see the AI & Analytics Engine's columns types are decidedly simpler than the types that exists in other languages.
DataFrame and Columns and Types
Each dataset on the Engine is represented by a DataFrame
. A DataFrame
consists of a collection of Columns
where each Column is a vector of the same length. Each column also belongs to one of these types
Column Type | Description | Corresponding Types |
---|---|---|
Numeric Columns | Integers and continuous real numbers | |
Boolean Columns | True or False | |
Categorical Columns | Discrete categories | |
Text Columns | Free form text | |
DateTime Columns | Date time |
Schema
A schema is attached to a dataset and it contains information about the column names, and the type of each column. A dataset must have have a schema before actions can be appplied on it. Therefore, every dataset on the AI & Analytics Engine platform has an associated schema, unless it has just been uploaded.
If the user uploads a dataset without a schema (e.g. a CSV or parquet file), then the platform will infer a schema for the dataset, and recommend the appropriate casting actions to convert the columns into the right types.