Using the Python SDK

This page gives a walk-through of performing the task introduced in the Quick Start Introduction page using the aiaengine SDK that aids API access to the AI & Analytics Engine. Re-usable snippets of code are provided.

SDK and System Requirements

AI & Analytics Engine's API access is made available in the form of a SDK. We strongly recommend that you use this SDK rather than manually compose payloads and post them using curl or some other command-line tool.

System Requirements: You will need a Linux operating system (such as the latest Ubuntu distribution) along with a python3 (recommended version >= 3.8). Additional python packages such as numpy, pandas, scipy, seaborn, and plotly may be used in a few examples in the documentation pages, but are not required for running the SDK and API access.

Installing the Python SDK

Download Python SDK. Then install the python package by issuing the command:

$ pip install /path/to/downloaded_whl_file

Setting up your environment for API access

You will first need to setup an environment variable called AIA_ENGINE_CONFIG_FILE, which stores the path of a configuration file. A good practice is to make a .aiaengine folder in your home directory, and put a command to define the environment variable in your .bashrc file:

$ grep -n -H "AIA_ENGINE_CONFIG_FILE" ~/.bashrc
/home/new-user/.bashrc:122:export AIA_ENGINE_CONFIG_FILE="/home/new-user/.aiaengine/config.json"

Here is the template you need to follow for the config.json file. Simply fill in your email address you used for your registration, and your AI & Analytics Engine password:

{
  "target": "grpc.aiaengine.com:443",
  "secure": true,
  "auth": {
    "provider": "email_password",
    "data": {
        "email": "abcd.tuvw@example.com",
        "password": "qwerty123"
    }
  }
}

Using the SDK

The following example code shows you how to use the Python SDK to:

import a dataset from a local CSV file,
create a binary classification app,
train a model using the recommended feature set
run prediction using the trained model

import pandas as pd
from aiaengine import *

org = Org(id='b6240512-cd17-43a0-8297-84c51c1bc5a0') # replace with your org ID

## Creating a new project in the org
# Within an organisation are various projects. To continue, the Engine requires 
# a project with a given `name`, `description`. Once the project is created, a 
# unique id `project_id` is generated and needs to be used in associated tasks.
project = org.create_project(name="Demo project using Python SDK", description="Your demo project")

# now it is time to upload your data. We use the German Credit dataset for a simple illustration.
data_file = 'examples/datasets/german-credit.csv'
# You can use the `print_schema` utility function to print the auto-inferred schema
# print_schema(pd.read_csv(data_file, header=0))
dataset = project.create_dataset(
    name=f"German Credit Data",
    data_source=FileSource(
        file_urls=[data_file],
        schema=[
            Column('checking_status', DataType.Text),
            Column('duration', DataType.Numeric),
            Column('credit_history', DataType.Text),
            Column('purpose', DataType.Text),
            Column('credit_amount', DataType.Numeric),
            Column('savings_status', DataType.Text),
            Column('employment', DataType.Text),
            Column('installment_commitment', DataType.Numeric),
            Column('personal_status', DataType.Text),
            Column('other_parties', DataType.Text),
            Column('residence_since', DataType.Numeric),
            Column('property_magnitude', DataType.Text),
            Column('age', DataType.Numeric),
            Column('other_payment_plans', DataType.Text),
            Column('housing', DataType.Text),
            Column('existing_credits', DataType.Numeric),
            Column('job', DataType.Text),
            Column('num_dependents', DataType.Numeric),
            Column('own_telephone', DataType.Text),
            Column('foreign_worker', DataType.Text),
            Column('class', DataType.Text)
        ]
    )
)

## Creating a new app
# To move on, you will next need to create an "App" from this dataset.
# An app is a special container on the Engine that holds multiple models
# trained and evaluated on the same train set and test set. An app enables you
# to train, evaluate, and compare models using different machine learning
# algorithms (called "model templates" on the Engine). To create an app you need to
# specify the dataset id, problem type ("classification" for the German Credit Data),
# target columns and the proportion of data you assign for training.
app = project.create_app(
    name=f"German Credit Risk Prediction Task",
    dataset_id=dataset.id,
    config=ClassificationConfig(
        sub_type=ClassificationSubType.BINARY,
        target_column="class",
        positive_class_label="good",
        negative_class_label="bad"
    )
)

# The processing that occurs during app creation involves:
# 1. Splitting your data into train and test,
# 2. Computing additional stats on your data,
# 3. Using the model recommender to predict beforehand how fast models can be 
# trained from available templates as well as their estimated quality (in terms
# of how accurate its predictions are).

## Feature sets
# The Engine will create the `Recommended features` set by default and contains 
# recommended features in the data.
feature_set = app.get_recommended_feature_set()
print(feature_set.feature_names)

# Or you can create a new feature set
# feature_set = app.create_feature_set(
#     name="Selected features",
#     feature_names=(
#         "credit_amount",
#         "installment_commitment",
#         "residence_since",
#         "age",
#         "existing_credits",
#         "num_dependents"
#     )
# )

## Selecting recommended models
# Once the app is processed successfully, model recommendations are provided 
# with predicted performance over a range of metrics such as accuracy and
# F1-macro score (for classification), as well as estimated time cost in
# training and prediction. In this example, we select the top 5 models based on
# F1-macro score.
print(feature_set.select_recommended_models(n=5, by_metric='f1_macro'))

## Training models
# Once the decision on which models to train is made, you can start to train
# the selected models.
model = app.create_model(
    name="XGBoost Classifier",
    template_id=Classifiers.XGBoost,
    feature_set_id=feature_set.id
)

## Evaluating your model performance
# You can get the evaluation of the trained model to see how it performs on the
# test portion of the input dataset.
evaluation = model.evaluate()
print("Evaluation summary")
print(evaluation.result.summary)
print("Evaluation metrics")
print(evaluation.result.details['threshold_independent']['metrics'])

## Using your model to predict on new data
# run a batch prediction
prediction_data_file = "examples/datasets/german-credit-predict.csv"
prediction = model.run_batch_prediction(
    data_source=FileSource(file_urls=[prediction_data_file])
)

# download the prediction result dataset into current folder
prediction.result.download('./')

# get predicted data as a Pandas DataFrame
df = prediction.result.to_pandas()

print(df.head(100))