Skip to content

Using the API access with SDK

This page gives a walk-through of performing the task introduced in the Quick Start Introduction page using the aiaengine SDK that aids API access to the AI & Analytics Engine. Re-usable snippets of code are provided.

SDK and System Requirements

AI & Analytics Engine's API access is made available in the form of a SDK. We strongly recommend that you use this SDK rather than manually compose payloads and post them using curl or some other command-line tool.

System Requirements: You will need a Linux operating system (such as the latest Ubuntu distribution) along with a python3 (recommended version >= 3.6). Additional python packages such as numpy, pandas, scipy, seaborn, and plotly may be used in a few examples in the documentation pages, but are not required for running the SDK and API access.

Installing the SDK

Download the python .whl file here. Then install the python package by issuing the command:

$ pip install /path/to/downloaded_whl_file

Setting up your local environment for API access

You will first need to setup an environment variable called AIA_ENGINE_CONFIG_FILE, which stores the path of a configuration file. A good practice is to make a .aiaengine folder in your home directory, and put a command to define the environment variable in your .bashrc file:

$ grep -n -H "AIA_ENGINE_CONFIG_FILE" ~/.bashrc
/home/new-user/.bashrc:122:export AIA_ENGINE_CONFIG_FILE="/home/new-user/.aiaengine/config.json"

Here is the template you need to follow for the config.json file. Simply fill in your email address you used for your registration, and your AI & Analytics Engine password:

{
  "target": "grpc.aiaengine.com:443",
  "secure": true,
  "auth": {
    "provider": "email_password",
    "data": {
        "email": "abcd.tuvw@example.com",
        "password": "qwerty123"
    }
  }
}

Authenticating into the Engine using the SDK

You will first need to authenticate into the Engine, using your credentials you saved in the configuration file. To do so, use the following two-line snippet:

from aiaengine import api

client = api.Client()

The client is authenticated using the credentials provided in the config.json file whose path is accessed by the SDK using the environment variable AIA_ENGINE_CONFIG_FILE.

To confirm if you have the right credentials, try

client.config

Using the SDK

Before you begin

Most requests to the Engine are made according to the following syntax:

do_thing_response = client.____.DoThing(
    ____.DoThingRequest(
        params='values'
    )
)

The engine takes time to process each instruction. Immediately sending a new request that requires the previous request to be complete will throw an error. Either wait a bit, or wrap the request in the following code if running requests consecutively:

import time

TIMEOUT = 60

repeat = True
t0 = time.time()
while repeat:
    try:
        '''response = client.____.DoThing(
            ____.DoThingRequest(
                params='values')
            )'''
        repeat = False
    except:
        time.sleep(0.5)
        t1 = time.time()
        if t1-t0 > TIMEOUT:
            repeat = False
            raise Exception('Timeout')

Creating a new organization

To start with the platform, an organization needs to be created by providing some basic details including name and description. In this process, a unique id org_id is generated for the created organization.

from aiaengine.api import org

create_org_response = client.orgs.CreateOrg(
    org.CreateOrgRequest(
        name='AIA Engine Demos',
        description='A separate org for demo projects'
    )
)

org_id = create_org_response.id

Creating a new project

Within an organisation are various projects. To continue, the Engine requries a project with a given name, description and org_id. Once the project is created, a unique id project_id is generated and needs to be used in associated tasks.

from aiaengine.api import project

create_project_response = client.projects.CreateProject(
    project.CreateProjectRequest(
        name='Demo Project 1',
        description='First set of demos',
        org_id=org_id
    )
)

project_id = create_project_response.id

Creating a new dataset

Now it is time to upload your data. As mentioned in in the Quick Start Introduction page, we use the German Credit dataset (download) here for a simple illustration. To create a dataset, you need to specify project_id, name, description, data_files and content_type if the dataset is uploaded from your local machine.

from aiaengine import util

new_dataset = util.create_dataset(
    client,
    project_id=project_id,
    name='German Credit Data',
    description='',
    data_files=['./path/to/german_credit_dataset.csv'],
    content_type='text/csv'
)

dataset_id = new_dataset.id

Preparing your dataset

Creating a new recipe

Once the dataset is created, it must be preprocessed prior to modeling. In the Engine, the steps used to preprocess a dataset are saved in the form of a recipe, which can be reused on new data. This eliminates the hassle of preparing new data for prediction when using our model. When creating a new recipe, the target columns need to be specified, which is 'class' for our example.

from aiaengine.api import recipe

create_recipe_response = client.recipes.CreateRecipe(
    recipe.CreateRecipeRequest(
        name='Process German Credit Risk Dataset',
        description='',
        datasets=[
            recipe.InputDataset(
                id=dataset_id,
                target_columns=['class']
            )
        ]
    )
)

recipe_id = create_recipe_response.id

Once you issue the command to create a recipe, you need to wait for the recipe creation to complete

# wait for creation of recipe to complete
def get_recipe_and_wait(client, recipe_id, iteration, step, expected_status, timeout=120, verbose=True):
    """Wait until the iteration is in expected status or timeout

    Return the recipe object
    """
    last_step = ""
    last_status = ""
    while True:
        get_recipe_response = client.recipes.GetRecipe(recipe.GetRecipeRequest(id=recipe_id))
        current_step = get_recipe_response.iterations[iteration - 1].step
        status = get_recipe_response.iterations[iteration - 1].status
        if current_step == step and status == expected_status:
            return get_recipe_response
        if timeout == 0:
            raise Exception('Timeout when waiting for interation {} in status {}'.format(iteration, status))
        timeout -= 1
        if verbose and (current_step != last_step or last_status != status):
            print("")
            print(f'step={current_step}, status={status}', end='')

        if verbose:
            print(".", end="")
        last_step = current_step
        last_status = status

        sleep(1) # wait for 1 second

get_recipe_response = get_recipe_and_wait(client, recipe_id=recipe_id, iteration=1, step='recommendation', expected_status='success')
get_recipe_response

When a recipe is created, agents in the Engine automatically detect problems that may exist in the dataset as well as propose solutions to handle these problems in an iterative process. These solutions are presented as a list of recommended actions, which can be committed or not on your decision. To request either recommended or committed actions, the iteration number needs to be specified. For simplicity, only recommended actions are committed in this tutorial. Refer to the Using the Data Preparation Module page of the documentation for committing custom actions.

import json
import requests

def get_current_iteration(client, recipe_id):
    """
        Get the current iteration
    """
    tmp = get_recipe(client, recipe_id)
    iteration = len(tmp.iterations)
    return iteration

def get_recommendations(client, recipe_id, iteration=None):
    """
        Get recommended actions from agents after certain iteration
    """
    if iteration is None:
        iteration = get_current_iteration(client, recipe_id)

    get_recommended_actions_request = recipe.GetRecommendedActionsUrlRequest(id=recipe_id, iteration=iteration)
    get_recommended_actions_response = client.recipes.GetRecommendedActionsUrl(get_recommended_actions_request)

    print('Get recommendations from', get_recommended_actions_response.url)
    recommendedActions = json.loads(requests.get(get_recommended_actions_response.url).content)
    return recommendedActions


iteration = 1

recommended_actions = get_recommendations(
    recipe_id=recipe_id,
    iteration=iteration
)

Queueing Actions

You must queue an action before you can apply action to a dataset. Queueing an action validates your action can be applied using an efficient algorithm without having to apply the action on the whole dataset.

To queue an action:

def queue_actions(client, recipe_id, actions, iteration=None, verbose=True):
    if iteration is None:
        iteration = get_current_iteration(client, recipe_id)

    if verbose:
        print("queueing actions")
    add_actions_request = recipe.AddActionsRequest(
        id=recipe_id, # id of the recipe
        iteration=iteration, # remember that iteration number starts from '1', not '0'
        actions=actions)

    add_actions_response = client.recipes.AddActions(add_actions_request)

    if verbose:
        print("checking validity of queueing actions")
    if add_actions_response.invalid_index != -1:
        raise Exception(add_actions_response.error)

    print("checks successful")

    return add_actions_response

actions = []
for ra in recommended_actions:
    actions = actions + ra['solution']['actions']


queue_actions(client, recipe_id, actions, iteration)

Commit Actions

The queued actions can be committed, which means that the actions will be applied to the whole dataset. This will also start a new iteration.

def commit_actions(client, recipe_id, iteration: Optional[List] = None, target_columns=[], verbose=True):
    if iteration is None:
        iteration = get_current_iteration(client, recipe_id)

    if verbose:
        print("commit actions")
    commit_actions_request = recipe.CommitActionsRequest(
        id=recipe_id, # id of the recipe
        iteration=iteration, # remember that iteration number starts from '1', not '0'
        target_columns=target_columns)
    commit_actions_response = client.recipes.CommitActions(commit_actions_request)

    if verbose:
        print("you can wait for commit actions to complete with `wait_for_commit_actions(client, recipe_id)`")
    return commit_actions_response

# Commit recommended actions
commit_actions(client, recipe_id, iteration)

# wait for commited actions to complete
wait_for_commit_actions(client, recipe_id, iteration)

Iterating the data preparation process

Once the first iteration in the recipe is complete, in which the Engine casts each column in the dataset to a specific type, you may choose to continue processing the dataset. Once actions are committed, a new iteration is automatically created. You can assess whether the next iteration is ready by obtaining recommended actions from the iteration.

# get list of recommended actions of the second iteration
recommended_actions = get_recommendations(client, recipe_id=recipe_id, iteration=iteration)
print("Number of recommendations", len(recommended_actions))

actions = []
for ra in recommended_actions:
    actions = actions + ra['solution']['actions']

queue_actions(client, recipe_id, actions, iteration)
commit_actions(client, recipe_id, iteration)
wait_for_commit_actions(client, recipe_id, iteration)

When a new iteration is created, the Engine once again analyses the dataset and provides recommendations. Once the Engine has finished processing, simply repeat the code under Committing recommended actions to apply them.

Finalizing a recipe & creating a processed dataset

You can finalize the recipe once you deem the data preparation to be complete. You can create a new dataset for further analysis and modeling.

complete_recipe_response = client.recipes.CompleteRecipe(
    recipe.CompleteRecipeRequest(
        id=recipe_id,
        dataset_name='German Credit Risk - Processed'
    )
)

processed_dataset_id = complete_recipe_response.dataset_id

Creating a new app

To move on, you will next need to create an "App" from this dataset. An app is a special container on the Engine that holds multiple models trained and evaluated on the same train set and test set. An app enables you to train, evaluate, and compare models using different machine learning algorithms (called "templates" on the Engine). To create an app you need to specify the dataset id, problem type ("classification" for the German Credit Data), target columns and the proportion of data you assign for training.

from aiaengine.api import app

create_app_response = client.apps.CreateApp(
    app.CreateAppRequest(
        name='German Credit Risk Prediction Task',
        description='',
        dataset_id=processed_dataset_id,
        problem_type='classification',
        target_columns=['class'],
        training_data_proportion=0.8
    )
)

app_id = create_app_response.id

The processing that occurs during app creation involves:

  1. Splitting your data into train and test,
  2. Computing additional stats on your data,
  3. Using the model recommender to predict beforehand how fast models can be trained from available templates as well as their estimated quality (in terms of how accurate its predictions are).

Feature sets

The default feature set is created by default and contains all the features in the data. You can create a new feature set by

create_feature_set_request = featureset.CreateFeatureSetRequest(app_id=app_id, name = "new_feature_set", description="description", selected_features=["SeriousDlqin2yrs", "DebtRatio", "MonthlyIncome"])

# carry out the feature set creation
create_feature_set_response = client.featuresets.CreateFeatureSet(create_feature_set_request)

you can customize the list of features considered with selected_features.

The create_feature_set_response contains the model recommendations used in the next section.

Once the app is processed successfully, model recommendations are provided with predicted performance over a range of metrics such as accuracy and F1 macro score (for classification), as well as estimated time cost in training and prediction. In this example, we select the top 5 models based on F1 macro score.

import pandas as pd

def select_recommended_models(model_recommendation, n, by_metric):
    """
        Get recommeneded models based on a given metric
    """
    df_model_rank = pd.DataFrame(
        columns=['template', by_metric]
    )
    for model in model_recommendation:
        if model['metrics'][by_metric] is not None:
            df_model_rank = df_model_rank.append(
                {
                    'template': model['template_id'],
                     by_metric: model['metrics'][by_metric]
                },
                ignore_index=True
            )
    selected_models = (
        df_model_rank.sort_values(by=by_metric, ascending=False)['template']
        .head(n)
        .tolist()
    )
    return selected_models

# select top n models recommended by a given metric
model_recommendation = json.loads(create_feature_set_response.recommendations)['recommendation']

# use the helper function to select the features
selected_models = select_recommended_models(
    model_recommendation, n=5, by_metric='f1_macro'
)

selected_models

output:

>>> ['extra_trees_clf', 'lightgbm_clf', 'svm_clf', 'random_forest_clf', 'linear_svm_clf']

Training models

Once the decision on which models to train is made, you can start to train the selected models.

from aiaengine.api import model

for mlt in selected_models:
    client.models.CreateModel(
        model.CreateModelRequest(
            app_id=app_id,
            feature_set_id = feature_set_response.id,
            name=mlt,
            template_id=mlt,
            hyperparameters='{}',
            evaluation=model.ModelEvaluation(
                metric='f1_macro',
                threshold=0.9,
                min_feedback_count=0.0,
                auto_retrain=False
            )
        )
    )
)

Selecting the model with best performance

Once successfully trained, models are automatically evaluated based on out-of-sample tests over a range of metrics. The best model based on F1 macro score is selected for further deployment.

by_metric = 'f1_macro'

list_models_response = client.models.ListModels(
    model.ListModelsRequest(
        app_id=app_id
    )
)

df_trained_models = pd.DataFrame(
    columns=['model_id', by_metric]
)
for mlt in list_models_response.models:
    if mlt.status == 'success':
        get_model_response = client.models.GetModel(
            model.GetModelRequest(
                id=mlt.id
            )
        )
        df_trained_models = df_trained_models.append(
            {
                'model_id': get_model_response.id,
                 by_metric: (
                     json.loads(
                         get_model_response.last_success_training.result
                     )
                     ['evaluation_scores']
                     [by_metric]
                 )
            },
            ignore_index=True
        )

best_model_id = (
    df_trained_models.sort_values(
        by=by_metric, ascending=False
    )
    ['model_id']
    .iloc[0]
)

Deploy the model of choice

The model selected after comparison is deployed and an endpoint URL is available for data prediction.

training_id = client.models.GetModel(
    model.GetModelRequest(id=best_model_id)
).last_success_training.id

deploy_model_response = client.models.DeployModel(
    model.DeployModelRequest(
        id=best_model_id,
        training_id=training_id
    )
)

Using your model to predict on new data

To predict on new data, you will first need to:

  1. Fill up the target column with empty values,
  2. Process it using the data preparation recipe you built.
  3. Save it in csv, jsonlines, or parquet format.
  4. Drop the target column after obtaining the prepared data.

Before we check out how to do it in the guide page for the Data Preparation Module, let us proceed with our example by just mimicking it with simple offline processing made possible by the fact that our recipe is simple.

Let us assume that we have the file data/predict.csv containing the data for which we need predictions.

Let us first import the necessary packages:

import json
import requests
import numpy as np
import pandas as pd

Next, let us load the data, prepare, and preview it:

# Load "raw" version of dataset
data = pd.read_csv('data/predict.csv', keep_default_na=False, dtype='str')

# Name of the target column in the data
target_col = 'class'

# Make empty target column
data[target_col] = np.nan

# Show column names in the data
print('Columns in raw dataset:')
print('\n')
print(json.dumps(data.columns.to_list(), indent=4))

# Process your data to fit into the recipe's output format, using the recipe built previously
# In this example, we just mimic the recipe

# Iteration 1: Cast columns to numeric type and categorical type:
numeric_cols = ['age', 'credit_amount', 'duration']
cat_cols = [x for x in data.columns if x not in numeric_cols]
data[numeric_cols] = data[numeric_cols].apply(pd.to_numeric, errors='coerce')
data[cat_cols] = data[cat_cols].astype('category')

# Iteration 2: Drop feature with low correlation:
names_of_columns_to_drop = ['num_dependents']
data.drop(columns=names_of_columns_to_drop, inplace=True)

# Finally, drop the target column
data.drop(columns=[target_col], inplace=True)

print('\n')
print('Data types of columns after preparing it:')
print('\n')
print(data.dtypes)

This produces the output:

Columns in raw dataset:


[
    "checking_status",
    "duration",
    "credit_history",
    "purpose",
    "credit_amount",
    "savings_status",
    "employment",
    "installment_commitment",
    "personal_status",
    "other_parties",
    "residence_since",
    "property_magnitude",
    "age",
    "other_payment_plans",
    "housing",
    "existing_credits",
    "job",
    "num_dependents",
    "own_telephone",
    "foreign_worker",
    "class"
]


Data types of columns after preparing it:


checking_status           category
duration                     int64
credit_history            category
purpose                   category
credit_amount                int64
savings_status            category
employment                category
installment_commitment    category
personal_status           category
other_parties             category
residence_since           category
property_magnitude        category
age                          int64
other_payment_plans       category
housing                   category
existing_credits          category
job                       category
own_telephone             category
foreign_worker            category
dtype: object

With this data, we can now invoke the prediction endpoint. Note that /invocations must be appended to the end of the URL.

# Convert prepared data into csv character chunk
data_to_call = data.to_csv(index=False, header=False)

# Invoking deployed model

url = 'https://ep-45416f6a-5569-4930-b835-583163e53bd9.aia-engine.pi.exchange/invocations'

headers = {'Content-Type': 'text/csv'}

res = requests.post(url, data=data_to_call.encode(), headers=headers)

if res.status_code != 200:
    print(res.content.decode())
else:
    result = [json.loads(line) for line in res.content.decode().split('\n') if line]
    print(json.dumps(result[:4], indent=4))

Sample output:

[
    {
        "class": {
            "prediction": "good",
            "prediction_probs": {
                "good": 0.940889835357666,
                "bad": 0.05911014974117279
            }
        }
    },
    {
        "class": {
            "prediction": "bad",
            "prediction_probs": {
                "good": 0.14075148105621338,
                "bad": 0.8592485189437866
            }
        }
    },
    {
        "class": {
            "prediction": "good",
            "prediction_probs": {
                "good": 0.9796953201293945,
                "bad": 0.020304657518863678
            }
        }
    },
    {
        "class": {
            "prediction": "good",
            "prediction_probs": {
                "good": 0.7446432113647461,
                "bad": 0.2553568184375763
            }
        }
    }
]

Note that the prediction for the target variable class is shown, along with the probabilities of the classes.