Skip to content

Using the API access with SDK

This page gives a walk-through of performing the task introduced in the Quick Start Introduction page using the aiaengine SDK that aids API access to the AI & Analytics Engine. Re-usable snippets of code are provided.

SDK and System Requirements

AI & Analytics Engine's API access is made available in the form of a SDK. We strongly recommend that you use this SDK rather than manually compose payloads and post them using curl or some other command-line tool.

System Requirements: You will need a Linux operating system (such as the latest Ubuntu distribution) along with a python3 (recommended version >= 3.6). Additional python packages such as numpy, pandas, scipy, seaborn, and plotly may be used in a few examples in the documentation pages, but are not required for running the SDK and API access.

Setting up your local environment for API access

You will first need to setup an environment variable called AIA_ENGINE_CONFIG_FILE, which stores the path of a configuration file. A good practice is to make a .aiaengine folder in your home directory, and put a command to define the environment variable in your .bashrc file:

$ grep -n -H "AIA_ENGINE_CONFIG_FILE" ~/.bashrc
/home/new-user/.bashrc:122:export AIA_ENGINE_CONFIG_FILE="/home/new-user/.aiaengine/config.json"

Here is the template you need to follow for the config.json file. Simply fill in your email address you used for your registration, and your AI & Analytics Engine password:

{
  "target": "grpc.aiaengine.com:443",
  "secure": true,
  "auth": {
    "provider": "email_password",
    "data": {
        "email": "abcd.tuvw@example.com",
        "password": "qwerty123"
    }
  }
}

Installing the SDK

Download the aiaengine_sdk.whl file from the URL provided to you. Then install the python package by issuing the command:

$ pip install /path/to/aiaengine_sdk.whl

Authenticating into the Engine using the SDK

You will first need to authenticate into the Engine, using your credentials you saved in the configuration file. To do so, use the following two-line snippet:

from aiaengine import api

client = api.Client()

The client is authenticated using the credentials provided in the config.json file whose path is accessed by the SDK using the environment variable AIA_ENGINE_CONFIG_FILE.

Creating a new organization

To start with the platform, an organization needs to be created by providing some basic details including name and description. In this process, a unique id org_id is generated for the created organization.

from aiaengine.api import org

create_org_response = client.orgs.CreateOrg(
    org.CreateOrgRequest(
        name='AIA Engine Demos',
        description='A separate org for demo projects'
    )
)

org_id = create_org_response.id

Creating a new project in an organization

To continue, the Engine requries a project with a given name, description and org_id. Onced the project is created, a unique id project_id is generated and needs to be used in associated tasks.

from aiaengine.api import project

create_project_response = client.projects.CreateProject(
    project.CreateProjectRequest(
        name='Demo Project 1',
        description='First set of demos',
        org_id=org_id
    )
)

project_id = create_project_response.id

Creating a new dataset from a local csv file

Now it is time to check in your data. As mentioned in in the Quick Start Introduction page, we use the German Credit dataset (download) here for a simple illustration. To create a dataset, you need to specify project_id, name, description, data_files and content_type if the dataset is uploaded from your local machine.

from aiaengine import util

new_dataset = util.create_dataset(
    client,
    project_id=project_id,
    name='German Credit Data',
    description='',
    data_files=['./path/to/german_credit_dataset.csv'],
    content_type='text/csv'
)

dataset_id = new_dataset.id

Creating a new recipe to preprocess a dataset

Once the dataset is created, you can create a recipe to preprocess the dataset, to facilitate modelling in the later stage. The target column needs to be specified, which is 'class' for our example.

from aiaengine.api import recipe

create_recipe_response = client.recipes.CreateRecipe(
    recipe.CreateRecipeRequest(
        name='Process German Credit Risk Dataset',
        description='',
        datasets=[
            recipe.InputDataset(
                id=dataset_id,
                target_columns=['class']
            )
        ]
    )
)

recipe_id = create_recipe_response.id

When a recipe is created, agents in the Engine automatically detect problems that may exist in the dataset as well as propose solutions to handle these problems in an iterative process. These solutions are presented as a list of recommended actions, which can be committed or not on your decision. To request either recommended or committed actions, the number of iteration needs to be specified. Here for a simple illustration, we only look at the first iteration.

import json
import requests

def get_recommendations(recipe_id, iteration):
    """
        Get recommended actions from agents after certain iteration
    """
    get_recommended_actions_response = client.recipes.GetRecommendedActionsUrl(
        recipe.GetRecommendedActionsUrlRequest(
            id=recipe_id,
            iteration=iteration
        )
    )
    recommended_actions = json.loads(
        requests.get(get_recommended_actions_response.url).content
    )
    return recommended_actions

recommended_actions = get_recommendations(
    recipe_id=recipe_id,
    iteration=1
)

# Commit recommended actions
client.recipes.CommitActions(
    recipe.CommitActionsRequest(
        id=recipe_id,
        iteration=1,
        committed_actions=json.dumps(recommended_actions)
    )
)

Finalizing a recipe & creating a processed dataset

Once the above data preprocessing is decided to be enough, you can finalize the recipe and create a new dataset that is prepared for further analysis and modeling.

complete_recipe_response = client.recipes.CompleteRecipe(
    recipe.CompleteRecipeRequest(
        id=recipe_id,
        dataset_name='German Credit Risk - Processed'
    )
)

processed_dataset_id = complete_recipe_response.dataset_id

Creating an app for a dataset

Before building models for your dataset, you need to create an app, which provides useful advice on potential models. You need to specify the dataset id, problem type (classification for the German Credit Data), target column and the proportion of data you assign for training.

from aiaengine.api import app

create_app_response = client.apps.CreateApp(
    app.CreateAppRequest(
        name='German Credit Risk Prediction Task',
        description='',
        dataset_id=processed_dataset_id,
        problem_type='classification',
        target_columns=['class'],
        training_data_proportion=0.8
    )
)

app_id = create_app_response.id

Recommending models with top performance

Once the app is processed successfully, model recommendation is provided with predicted performance over a range of metrics such as accuracy and F1 macro score (for classification), as well as estimated time cost in training and prediction. The top 5 models are selected based on F1 macro score.

import pandas as pd

def select_recommended_models(model_recommendation, n, by_metric):
    """
        Get recommeneded models based on a given metric
    """
    df_model_rank = pd.DataFrame(
        columns=['template', by_metric]
    )
    for model in model_recommendation:
        if model['metrics'][by_metric] is not None:
            df_model_rank = df_model_rank.append(
                {
                    'template': model['template_id'],
                     by_metric: model['metrics'][by_metric]
                },
                ignore_index=True
            )
    selected_models = (
        df_model_rank.sort_values(by=by_metric, ascending=False)['template']
        .head(n)
        .tolist()
    )
    return selected_models

# select top n models recommended by a given metric
get_app_response = client.apps.GetApp(
    app.GetAppRequest(id=app_id)
)
if get_app_response.status == 'success':
    model_recommendation = json.loads(get_app_response.metadata)['recommendation']
    selected_models = select_recommended_models(
        model_recommendation, n=5, by_metric='f1_macro'
    )

selected_models
>>> ['extra_trees_clf', 'lightgbm_clf', 'svm_clf', 'random_forest_clf', 'linear_svm_clf']

Once the decision on which models to train is made, you can start to train the selected models.

from aiaengine.api import model

for mlt in selected_models:
    client.models.CreateModel(
        model.CreateModelRequest(
            app_id=app_id,
            name=mlt,
            template_id=mlt,
            hyperparameters='{}',
            evaluation=model.ModelEvaluation(
                metric='f1_macro',
                threshold=0.9,
                min_feedback_count=0.0,
                auto_retrain=False
            )
        )
    )
)

Selecting the model with best performance

Once successfully trained, models are automatically evaluated based on out-of-sample tests over a range of metrics. The best model based on F1 macro score is selected for further deployment.

by_metric = 'f1_macro'

list_models_response = client.models.ListModels(
    model.ListModelsRequest(
        app_id=app_id
    )
)

df_trained_models = pd.DataFrame(
    columns=['model_id', by_metric]
)
for mlt in list_models_response.models:
    if mlt.status == 'success':
        get_model_response = client.models.GetModel(
            model.GetModelRequest(
                id=mlt.id
            )
        )
        df_trained_models = df_trained_models.append(
            {
                'model_id': get_model_response.id,
                 by_metric: (
                     json.loads(
                         get_model_response.last_success_training.result
                     )
                     ['evaluation_scores']
                     [by_metric]
                 )
            },
            ignore_index=True
        )

best_model_id = (
    df_trained_models.sort_values(
        by=by_metric, ascending=False
    )
    ['model_id']
    .iloc[0]
)

Deploy the model of choice

The model selected after comparison is deployed and an endpoint URL is available for data prediction. The following snippet gives an example on how to use the endpoint URL of a deployed model to make predictions on a local dataset.

deploy_model_response = client.models.DeployModel(
    model.DeployModelRequest(
        id=best_model_id,
        training_id=(
            client.models.GetModel(
                model.GetModelRequest(id=best_model_id)
            )
            .last_success_training.id
        )
    )
)

# Get the dataset used for prediction
with open('/path/to/data_for_prediction') as file:
    data_for_prediction = file.readlines()

# Save predicted output into file after the deployment endpoint is generated
if deploy_model_response.status == 'active':
    get_endpoint_response = client.apps.GetEndpoint(
        app.GetEndpointRequest(
            id=deploy_model_response.app.id,
            endpoint_id=deploy_model_response.endpoint.id
        )
    )
    response = requests.post(
        get_endpoint_response.url + '/invocations',
        data=''.join(data_for_prediction).encode(),
        headers={'Content-Type': 'text/csv'}
    )
    if response.status_code == 200:
        with open('/path/to/model_prediction', 'w') as file:
            file.write(response.content.decode())