Using the API access with SDK
This page gives a walk-through of performing the task introduced in the Quick Start Introduction page using the aiaengine
SDK that aids API access to the AI & Analytics Engine. Re-usable snippets of code are provided.
SDK and System Requirements
AI & Analytics Engine's API access is made available in the form of a SDK. We strongly recommend that you use this SDK rather than manually compose payloads and post them using curl
or some other command-line tool.
System Requirements: You will need a Linux operating system (such as the latest Ubuntu distribution) along with a python3
(recommended version >= 3.6
). Additional python packages such as numpy
, pandas
, scipy
, seaborn
, and plotly
may be used in a few examples in the documentation pages, but are not required for running the SDK and API access.
Installing the SDK
Download the python .whl
file here. Then install the python package by issuing the command:
$ pip install /path/to/downloaded_whl_file
Setting up your local environment for API access
You will first need to setup an environment variable called AIA_ENGINE_CONFIG_FILE
, which stores the path of a configuration file. A good practice is to make a .aiaengine
folder in your home directory, and put a command to define the environment variable in your .bashrc
file:
$ grep -n -H "AIA_ENGINE_CONFIG_FILE" ~/.bashrc
/home/new-user/.bashrc:122:export AIA_ENGINE_CONFIG_FILE="/home/new-user/.aiaengine/config.json"
Here is the template you need to follow for the config.json
file. Simply fill in your email address you used for your registration, and your AI & Analytics Engine password:
{
"target": "grpc.aiaengine.com:443",
"secure": true,
"auth": {
"provider": "email_password",
"data": {
"email": "abcd.tuvw@example.com",
"password": "qwerty123"
}
}
}
Authenticating into the Engine using the SDK
You will first need to authenticate into the Engine, using your credentials you saved in the configuration file. To do so, use the following two-line snippet:
from aiaengine import api
client = api.Client()
The client is authenticated using the credentials provided in the config.json
file whose path is accessed by the SDK using the environment variable AIA_ENGINE_CONFIG_FILE
.
To confirm if you have the right credentials, try
client.config
Using the SDK
Before you begin
Most requests to the Engine are made according to the following syntax:
do_thing_response = client.____.DoThing(
____.DoThingRequest(
params='values'
)
)
The engine takes time to process each instruction. Immediately sending a new request that requires the previous request to be complete will throw an error. Either wait a bit, or wrap the request in the following code if running requests consecutively:
import time
TIMEOUT = 60
repeat = True
t0 = time.time()
while repeat:
try:
'''response = client.____.DoThing(
____.DoThingRequest(
params='values')
)'''
repeat = False
except:
time.sleep(0.5)
t1 = time.time()
if t1-t0 > TIMEOUT:
repeat = False
raise Exception('Timeout')
Creating a new organization
To start with the platform, an organization needs to be created by providing some basic details including name
and description
.
In this process, a unique id org_id
is generated for the created organization.
from aiaengine.api import org
create_org_response = client.orgs.CreateOrg(
org.CreateOrgRequest(
name='AIA Engine Demos',
description='A separate org for demo projects'
)
)
org_id = create_org_response.id
Creating a new project
Within an organisation are various projects. To continue, the Engine requries a project with a given name
, description
and org_id
. Once the project is created,
a unique id project_id
is generated and needs to be used in associated tasks.
from aiaengine.api import project
create_project_response = client.projects.CreateProject(
project.CreateProjectRequest(
name='Demo Project 1',
description='First set of demos',
org_id=org_id
)
)
project_id = create_project_response.id
Creating a new dataset
Now it is time to upload your data. As mentioned in in the Quick Start Introduction page,
we use the German Credit dataset (download) here
for a simple illustration. To create a dataset, you need to specify project_id
, name
, description
,
data_files
and content_type
if the dataset is uploaded from your local machine.
from aiaengine import util
new_dataset = util.create_dataset(
client,
project_id=project_id,
name='German Credit Data',
description='',
data_files=['./path/to/german_credit_dataset.csv'],
content_type='text/csv'
)
dataset_id = new_dataset.id
Preparing your dataset
For your convenience, extra code for working with data preparation is provided in the Code for data preparation page. You will need to include the code in a file to be imported as aia_util
.
Creating a new recipe
Once the dataset is created, it must be preprocessed prior to modeling.
In the Engine, the steps used to preprocess a dataset are saved in the form of a recipe, which can be reused on new data.
This eliminates the hassle of preparing new data for prediction when using our model.
When creating a new recipe, the target columns need to be specified, which is 'class'
for our example.
from aiaengine.api import recipe
create_recipe_response = client.recipes.CreateRecipe(
recipe.CreateRecipeRequest(
name='Process German Credit Risk Dataset',
description='',
datasets=[
recipe.InputDataset(
id=dataset_id,
target_columns=['class']
)
]
)
)
recipe_id = create_recipe_response.id
Once you issue the command to create a recipe, you need to wait for the recipe creation to complete
import json
import requests
from aia_util import (
get_recipe_and_wait,
get_recommendations,
queue_actions,
commit_actions,
wait_for_commit_actions,
)
# wait for creation of recipe to complete
get_recipe_response = get_recipe_and_wait(
client, recipe_id=recipe_id, iteration=1,
step='recommendation', expected_status='success'
)
print(get_recipe_response)
Obtaining recommended actions
When a recipe is created, agents in the Engine automatically detect problems that may exist in the dataset as well as propose solutions to handle these problems in an iterative process. These solutions are presented as a list of recommended actions, which can be committed or not on your decision. To request either recommended or committed actions, the iteration number needs to be specified. For simplicity, only recommended actions are committed in this tutorial. Refer to the Using the Data Preparation Module page of the documentation for committing custom actions.
iteration = 1
recommended_actions = get_recommendations(
client,
recipe_id=recipe_id,
iteration=iteration
)
Queueing Actions
You must queue an action before you can apply action to a dataset. Queueing an action validates your action can be applied using an efficient algorithm without having to apply the action on the whole dataset.
To queue an action:
actions = []
for ra in recommended_actions:
actions = actions + ra['solution']['actions']
queue_actions(client, recipe_id, json.dumps(actions), iteration)
Commit Actions
The queued actions can be committed, which means that the actions will be applied to the whole dataset. This will also start a new iteration.
# Commit recommended actions
commit_actions(client, recipe_id, iteration)
# wait for commited actions to complete
wait_for_commit_actions(client, recipe_id, iteration)
Iterating the data preparation process
Once the first iteration in the recipe is complete, in which the Engine casts each column in the dataset to a specific type, you may choose to continue processing the dataset. Once actions are committed, a new iteration is automatically created. You can assess whether the next iteration is ready by obtaining recommended actions from the iteration.
# get list of recommended actions of the second iteration
recommended_actions = get_recommendations(client, recipe_id=recipe_id, iteration=iteration)
print("Number of recommendations", len(recommended_actions))
actions = []
for ra in recommended_actions:
actions = actions + ra['solution']['actions']
queue_actions(client, recipe_id, json.dumps(actions), iteration)
commit_actions(client, recipe_id, iteration)
wait_for_commit_actions(client, recipe_id, iteration)
When a new iteration is created, the Engine once again analyses the dataset and provides recommendations. Once the Engine has finished processing, simply repeat the code under Committing recommended actions to apply them.
Finalizing a recipe & creating a processed dataset
You can finalize the recipe once you deem the data preparation to be complete. You can create a new dataset for further analysis and modeling.
complete_recipe_response = client.recipes.CompleteRecipe(
recipe.CompleteRecipeRequest(
id=recipe_id,
dataset_name='German Credit Risk - Processed'
)
)
processed_dataset_id = complete_recipe_response.dataset_id
Creating a new app
To move on, you will next need to create an "App" from this dataset. An app is a special container on the Engine that holds multiple models trained and evaluated on the same train set and test set. An app enables you to train, evaluate, and compare models using different machine learning algorithms (called "templates" on the Engine). To create an app you need to specify the dataset id, problem type ("classification" for the German Credit Data), target columns and the proportion of data you assign for training.
from aiaengine.api import app
create_app_response = client.apps.CreateApp(
app.CreateAppRequest(
name='German Credit Risk Prediction Task',
description='',
dataset_id=processed_dataset_id,
problem_type='classification',
target_columns=['class'],
training_data_proportion=0.8
)
)
app_id = create_app_response.id
The processing that occurs during app creation involves:
- Splitting your data into train and test,
- Computing additional stats on your data,
- Using the model recommender to predict beforehand how fast models can be trained from available templates as well as their estimated quality (in terms of how accurate its predictions are).
Feature sets
The default
feature set is created by default and contains all the features in the data. You can create a new feature set by
from aiaengine.api import featureset
create_feature_set_request = featureset.CreateFeatureSetRequest(
app_id=app_id,
name="new_feature_set",
description="description",
selected_features=["SeriousDlqin2yrs", "DebtRatio", "MonthlyIncome"]
)
# carry out the feature set creation
create_feature_set_response = client.featuresets.CreateFeatureSet(create_feature_set_request)
you can customize the list of features considered with selected_features
.
The create_feature_set_response
contains the model recommendations used in the next section.
Selecting recommended models
Once the app is processed successfully, model recommendations are provided with predicted performance over a range of metrics such as accuracy and F1 macro score (for classification), as well as estimated time cost in training and prediction. In this example, we select the top 5 models based on F1 macro score.
import pandas as pd
def select_recommended_models(model_recommendation, n, by_metric):
"""
Get recommeneded models based on a given metric
"""
df_model_rank = pd.DataFrame(
columns=['template', by_metric]
)
for model in model_recommendation:
if model['metrics'][by_metric] is not None:
df_model_rank = df_model_rank.append(
{
'template': model['template_id'],
by_metric: model['metrics'][by_metric]
},
ignore_index=True
)
selected_models = (
df_model_rank.sort_values(by=by_metric, ascending=False)['template']
.head(n)
.tolist()
)
return selected_models
# Retrieve information of model recommendations when feature set is ready
get_feature_set_response = client.featuresets.GetFeatureSet(
featureset.GetFeatureSetRequest(
id=create_feature_set_response.id
)
)
# select top n models recommended by a given metric
if get_feature_set_response.status == 'success':
model_recommendation = json.loads(get_feature_set_response.recommendations)['recommendation']
selected_models = select_recommended_models(
model_recommendation, n=5, by_metric='f1_macro'
)
selected_models
output:
>>> ['extra_trees_clf', 'lightgbm_clf', 'svm_clf', 'random_forest_clf', 'linear_svm_clf']
Training models
Once the decision on which models to train is made, you can start to train the selected models.
from aiaengine.api import model
for mlt in selected_models:
client.models.CreateModel(
model.CreateModelRequest(
app_id=app_id,
feature_set_id=create_feature_set_response.id,
name=mlt,
template_id=mlt,
hyperparameters='{}',
evaluation=model.ModelEvaluation(
metric='f1_macro',
threshold=0.9,
min_feedback_count=0.0
)
)
)
)
Selecting the model with best performance
Once successfully trained, models are automatically evaluated based on out-of-sample tests over a range of metrics. The best model based on F1 macro score is selected for further deployment.
by_metric = 'f1_macro'
list_models_response = client.models.ListModels(
model.ListModelsRequest(
app_id=app_id
)
)
df_trained_models = pd.DataFrame(
columns=['model_id', by_metric]
)
for mlt in list_models_response.models:
if mlt.status == 'success':
get_model_response = client.models.GetModel(
model.GetModelRequest(
id=mlt.id
)
)
df_trained_models = df_trained_models.append(
{
'model_id': get_model_response.id,
by_metric: (
json.loads(
get_model_response.last_success_training.result
)
['evaluation_scores']
[by_metric]
)
},
ignore_index=True
)
best_model_id = (
df_trained_models.sort_values(
by=by_metric, ascending=False
)
['model_id']
.iloc[0]
)
Deploy the model of choice
The model selected after comparison is deployed and an endpoint URL is available for data prediction.
training_id = client.models.GetModel(
model.GetModelRequest(id=best_model_id)
).last_success_training.id
deploy_model_response = client.models.DeployModel(
model.DeployModelRequest(
id=best_model_id,
training_id=training_id
)
)
Using your model to predict on new data
To predict on new data, you will first need to:
- Fill up the target column with empty values,
- Process it using the data preparation recipe you built.
- Save it in
csv
,jsonlines
, orparquet
format. - Drop the target column after obtaining the prepared data.
Before we check out how to do it in the guide page for the Data Preparation Module, let us proceed with our example by just mimicking it with simple offline processing made possible by the fact that our recipe is simple.
Let us assume that we have the file data/predict.csv
containing the data for which we need predictions.
Let us first import the necessary packages:
import json
import requests
import numpy as np
import pandas as pd
Next, let us load the data, prepare, and preview it:
# Load "raw" version of dataset
data = pd.read_csv('data/predict.csv', keep_default_na=False, dtype='str')
# Name of the target column in the data
target_col = 'class'
# Make empty target column
data[target_col] = np.nan
# Show column names in the data
print('Columns in raw dataset:')
print('\n')
print(json.dumps(data.columns.to_list(), indent=4))
# Process your data to fit into the recipe's output format, using the recipe built previously
# In this example, we just mimic the recipe
# Iteration 1: Cast columns to numeric type and categorical type:
numeric_cols = ['age', 'credit_amount', 'duration']
cat_cols = [x for x in data.columns if x not in numeric_cols]
data[numeric_cols] = data[numeric_cols].apply(pd.to_numeric, errors='coerce')
data[cat_cols] = data[cat_cols].astype('category')
# Iteration 2: Drop feature with low correlation:
names_of_columns_to_drop = ['num_dependents']
data.drop(columns=names_of_columns_to_drop, inplace=True)
# Finally, drop the target column
data.drop(columns=[target_col], inplace=True)
print('\n')
print('Data types of columns after preparing it:')
print('\n')
print(data.dtypes)
This produces the output:
Columns in raw dataset:
[
"checking_status",
"duration",
"credit_history",
"purpose",
"credit_amount",
"savings_status",
"employment",
"installment_commitment",
"personal_status",
"other_parties",
"residence_since",
"property_magnitude",
"age",
"other_payment_plans",
"housing",
"existing_credits",
"job",
"num_dependents",
"own_telephone",
"foreign_worker",
"class"
]
Data types of columns after preparing it:
checking_status category
duration int64
credit_history category
purpose category
credit_amount int64
savings_status category
employment category
installment_commitment category
personal_status category
other_parties category
residence_since category
property_magnitude category
age int64
other_payment_plans category
housing category
existing_credits category
job category
own_telephone category
foreign_worker category
dtype: object
With this data, we can now invoke the prediction endpoint. Note that /invocations
must be appended to the end of the URL.
# Convert prepared data into csv character chunk
data_to_call = data.to_csv(index=False, header=False)
# Invoking deployed model
url = 'https://ep-45416f6a-5569-4930-b835-583163e53bd9.aia-engine.pi.exchange/invocations'
headers = {'Content-Type': 'text/csv'}
res = requests.post(url, data=data_to_call.encode(), headers=headers)
if res.status_code != 200:
print(res.content.decode())
else:
result = [json.loads(line) for line in res.content.decode().split('\n') if line]
print(json.dumps(result[:4], indent=4))
Sample output:
[
{
"class": {
"prediction": "good",
"prediction_probs": {
"good": 0.940889835357666,
"bad": 0.05911014974117279
}
}
},
{
"class": {
"prediction": "bad",
"prediction_probs": {
"good": 0.14075148105621338,
"bad": 0.8592485189437866
}
}
},
{
"class": {
"prediction": "good",
"prediction_probs": {
"good": 0.9796953201293945,
"bad": 0.020304657518863678
}
}
},
{
"class": {
"prediction": "good",
"prediction_probs": {
"good": 0.7446432113647461,
"bad": 0.2553568184375763
}
}
}
]
Note that the prediction for the target variable class
is shown, along with the probabilities of the classes.