Skip to content

Using the GUI

This page gives a walk-through of performing the task introduced in the Quick Start Introduction page using the web GUI. Screenshots are provided where appropriate.

Dashboard Home Page

Upon logging in to the web GUI of the Engine, you will be shown the dashboard with a collapsed view of all organizations you have created. In this demo, we are logging in for the first time, and we see the default organization for this account as given at the time of registration.

To create a new organization, simply click on any of the existing organizations in the collapsed view. You will then find the "New Organization" button underneath.

Creating a new organization

Upon clicking "New Organization", a dialog-box wizard opens up to ask for details on the new organization you want to create. You would first need to enter basic details such as name and description:

In the next step, you can optionally add other users to the new organization, to collaborate with you. In this instance, we do not need to do this. Hence, we want to finalize the organization creation by clicking on the "Create Organization" button:

Creating a new project

Upon creation of a new organization, you are taken to its "details" page for exploration. We want to create a new project under it. To do so, hover over the "floating actions button" page marked by the "plus" sign at the bottom-right of the page. These floating-action buttons are available in all detail pages for you to guide through the next step. For now, click on the "New Project" floating button option:

This opens up another dialog wizard where you will:

  1. Provide a name for your project. We will call it "Demo Project 1",
  2. Choose the organization under which you want to create the project, and
  3. Optionally provide a detailed description for your project.

In the next step of the new project wizard, you have the option to add other users to your project and assign a role such as "Owner", "Editor", or "Viewer" to them. In this instance, simply finish by clicking on "Create Project":

You will then be taken to the details page of the project. As before, hover on the floating-action buttons at the bottom right to choose the next step. We now want to create a new dataset using the file we downloaded from OpenML.

Creating a new dataset

The dataset creation process involves two steps:

  1. Importing raw data by connecting to a data source or uploading a file, and
  2. Preparing your data source into a dataset using multiple options, one of which is to create a new data preparation recipe.

On the "new dataset" dialog, you will be required to enter the name of the dataset first.

Importing from local file

The dialog then takes you to the dataset importing options. For our purpose, let us choose the "File Upload" option.

We can drag and drop a file into the upload area, then click on "import" to start importing the file into the Engine.

Preparing your dataset

When the file has been uploaded, you will be required to choose whether you want to prepare the dataset using a new recipe, or to prepare it using an existing recipe. Choose the option to "Create a new data wrangling recipe".

Upon clicking "Done" on the above dialog, you will be taken into the recipe building session, where data can be prepared into the desired form using various types of "actions" such as fixing column types, importing missing values, and computing aggregates.

At the end of the session, the user will have:

  1. The prepared dataset that can next be used to build models, and
  2. A re-usable "recipe" that remembers the pipeline of "actions" that were applied in the building process, so that it can be used on batches of new data in the same format.

The session is split into "iterations". In every iteration:

  1. Data is analyzed and a list of recommended actions along with justifications are presented.
  2. You will be requested to choose which solutions you want to accept.
  3. These actions are then applied.

At the end of each iteration, the user can:

  1. Choose to continue with another iteration. At this point, they can choose a target column for their dataset. This will enable the Engine to provide recommendations tailored to the target.
  2. Finalize their dataset.

The first 100 rows of the tabular dataset are shown to the user, in order to make informed decisions about the "actions" they want to apply next. This view is refreshed at every iteration to take into account the actions that have been applied.

Continuing with our task, we first see that "Iteration 1" has got a spinner on the left panel. This indicates that the Engine is in the process of analyzing your dataset and generating recommendations for the actions to be applied in this iteration:

You will need to wait until the spinner completes. This typically should take less than a minute. The duration depends on the number of variables or columns your tabular dataset contains. Once finished, you can expand out the panel at the right-hand side to view what the Engine has suggested for your dataset:

Upon expansion, you can see the "Problems" panel. On this panel, click on any "problem" tab to see its details:

You will next see:

  1. A summary of the issue that your dataset has currently got.
  2. A list of alternative "solutions", each of which is:
  3. A sequence of actions that are recommended.

The "Problem" panel details what issue has been found using text, data, and visaulizations.

Now click on any action in the solution to customize it:

By applying this action, the set of column(s) suggested in the "problems" panel will be remembered as categorical columns when you use the dataset for building your model:

The expanded actions also allows you to modify the inputs to the action and specify them in various ways. It also allows you to edit the parameters in your action. In the above example, you can set the categories in two ways: Either ask the Engine to automatically infer it (default), or you can specify them explicitly.

Clicking on the next action, you can see that it parses a selection of columns into numeric data:

Once you have customized these actions to your satisfaction, you can scroll up and click on "select solution".

Next click on "review" to see the list of solutions that have been added to the queue:

You will next click the "Commit Actions" button in the review panel:

When you click thus, you are confirming to the Engine that you want to apply the selected actions on your data. The spinner on the iteration is seen again, this time saying that the actions you requested are being applied:

When these actions have been applied, the iteration is shown as complete. At this point, you can choose to:

  1. Continue, if you want to enter into another iteration, thereby requesting the Engine to look for more issues in the new data, or
  2. Finalize, if you want to complete the recipe building process.

For now, let us enter into another iteration by clicking on the "Continue" option. You will now see a warning that you should be informing the Engine at this time about any target column(s) in your data, so that it can generate recommendations tailored to the target.

We choose the class column as the target column and proceed further. This starts the next iteration. Upon its completion, you can review the problems found and the recommended actions as before:

We see that some columns have been detected as "not predictive".

Entering into the details, you can see what the reasons are, and that the recommended action is to drop such column(s) since the presence of such features is unlikely to improve the predictiveness of the model:

As before, we select this solution, review, and commit these actions. The actions are now applied on the data resulting from the previous iteration, thus renewing it. At the end of this iteration, let us now finalize the recipe and create the final dataset after preparing it in the desired way:

This opens up a dialog asking you to name the prepared dataset. Modify the name as desired and click on "Yes" to confirm completion:

You will then be taken into the dataset's details page. At this stage, your prepared dataset is analyzed one final time to generate important statistics and visualizations:

On this screen, you can see:

  1. The name of the dataset you have prepared,
  2. A spinner indicating that your dataset is being analyzed, and
  3. A progress bar showing the percentage of completion of the analysis task:

Once this is complete, results of analysis on your dataset are stored behind the scenes as metadata. The dataset details page then fills up as follows:

In the above view, you will notice the following information:

  1. Basic information such as size, number of rows, and number of columns,
  2. A tab to see summary stats and visualizations of all numeric columns in your data (if they exist),
  3. Another tab showing similar information as above for categorical and other types of columns in your data (if they exist)
  4. The stats of each column shown.
  5. Density estimates of numeric columns.
  6. Histograms of numeric columns.

Creating a new app

To move on, you will next need to create an "App" from this dataset. An app is a special container on the Engine that holds multiple models trained and evaluated on the same train set and test set. An app enables you to train, evaluate, and compare models using different machine learning algorithms (called "templates" on the Engine).

Choose the "New App" option in the floating buttons at the bottom of this screen:

This opens up a dialog as shown below:

You will need to enter/confirm the following details before creating the app:

  1. The name of the app,
  2. The dataset to build your app from (already populated with the dataset you are currently viewing, for your convenience), and
  3. The target column for your app. Here, we choose the column class.

Upon clicking "next" after entering these details, you will be asked to choose a configuration for your app. Let us continue for now with the default configuration:

When the app creation dialog is completed by clicking on "Create Application", you will be taken to the app's details page, where the progress on the processing task is shown. Note that the Engine automatically detects that your prediction task is a classification problem:

The processing required for your app involves:

  1. Splitting your data into train and test,
  2. Computing additional stats on your data,
  3. Using the model recommender to predict beforehand how fast models can be trained from available templates as well as their estimated quality (in terms of how accurate its predictions are).

Upon completion, you can navigate to several tabs such as the dataset tab, which shows the details page of the dataset associated with this app:

You can also see feature importances for categorical, numerical, and text columns computed using a classifier trained on a small sample:

At this juncture, we can explore the dashboard a little more to understand where we are. Click on the Apps' summary dashboard card to the left, as pointed in the above illustration. This will take you back to the dashboard view:

On this screen:

  1. The summary card of the current app is highlighted,
  2. The project that the app belongs to is shown, and
  3. The organization that the project belongs to is shown.

To see more of the dashboard, click on the vertical tab on the right-hand side. You will see the following screen:

On this screen, you can see:

  1. The summary card of the current app, and the details of the dataset associated with the app, and
  2. The option to create a new model.

Training Models

Click on the "New Model" option from the dashboard screen shown above. This will take you to the model creation dialog. Alternatively, you can go back to the app's details page and use the floating-action button to do the same:

In the "New Model" dialog, you will be shown a list of model templates (machine learning algorithm) to choose from. In this view, the output of the model recommender is shown to you, to aid you in choose the model template that suits your needs. The recommender has estimated performance of each of these templates on your data in terms of:

  1. The "predictive performance", a percentage value that measures how close your model's predictions will be to the true values, on new data that was not available when the model was trained. A 100% predictive performance indicates that the model is able to predict without any errors,
  2. The time taken to train the model, and
  3. The time taken by the model to predict on new data.

The "Choose a model" step of the dialog shows this information in a visual manner, as seen below:

On this screen you can:

  1. Sort the models with respect to each of the three metrics estimated by the recommender, from best to worst.
  2. See for every model the ratings out of five for each of the three estimated metrics. A higher rating implies:
    1. Better predictions for the "predictive performance" metric,
    2. Faster training for the "train time" metric, and
    3. Faster predictions, for the "prediction time" metric.
  3. A bubble plot showing estimated training time on the x-axis, estimated prediction time on the y-axis, and the estimated predictive performance as text inside the bubbles.
  4. The details of the model template.

Using the check-boxes, choose the model template(s) you want to train, then click on "Next":

Leave the option as "Default Configuration" and choose "Next" again. Then click on "Train Models" button to start training the models:

You will then be taken to the models listing page of your App:

On this screen, you can see the model templates' names and progress bars for the training jobs. Two of them have finished training, while two more training jobs are about to commence.

Model evaluation

Models are trained on the training portion and evaluated on the test portion. The evaluation results are available on the individual models' detail pages. One can also compare the evaluation results of all trained models in an app by clicking on the "Comparison" tab (1) in the above screen. This takes us to the model comparison page of the app as seen below:

On this page, you can see:

  1. The actual performance of the trained models on the three metrics of importance, and
  2. A comparison of ROC curves and precision/recall curves, for classification models

Below the plots on the same page, there is more information about the evaluation metrcs:

Here, you will see:

  1. The rating out of five for each of the metrics,
  2. A table of evaluation metrics, where rows are models and columns are metric names. You can sort models by these metrics.

Your next step is to go back to the models listing page:

Then click on one of the models to view its details page:

You can see some basic information about the model, evaluation metrics, and more visaulizations such as confusion matrix, in addition to the ROC and precision/recall curves.

Deploying a model

Once a model is trained and you are convinced that it does well on your data, you will next need to deploy a model to be able to use it for predictions.

To deploy a model, click on the floating-action button as seen in the previous figure. This will take you to the deployment setup dialog:

Here, you want to choose the option "Deploy to PI.EXCHANGE cloud" and proceed by clicking "Next":

On this screen, simply choose the "New endpoint" option (you have no endpoints created previously, as this is your first model), then click "Deploy".

This makes your model available for online predictions through a simple API call. After this step, you are taken to the details page of the endpoint:

From this page, you will need to copy the URI of the endpoint to make online predictions.

Using your model to predict on new data

To predict on new data, you will first need to:

  1. Fill up the target column with empty values,
  2. Process it using the data preparation recipe you built.
  3. Save it in csv, jsonlines, or parquet format.
  4. Drop the target column after obtaining the prepared data.

Before we check out how to do it in the guide page for the Data Preparation Module, let us proceed with our example by just mimicking it with simple offline processing made possible by the fact that our recipe is simple.

Let us assume that we have the file data/predict.csv containing the data for which we need predictions.

Let us first import the necessary packages:

import json
import requests
import numpy as np
import pandas as pd

Next, let us load the data, prepare, and preview it:

# Load "raw" version of dataset
data = pd.read_csv('data/predict.csv', keep_default_na=False, dtype='str')

# Name of the target column in the data
target_col = 'class'

# Make empty target column
data[target_col] = np.nan

# Show column names in the data
print('Columns in raw dataset:')
print('\n')
print(json.dumps(data.columns.to_list(), indent=4))

# Process your data to fit into the recipe's output format, using the recipe built previously
# In this example, we just mimic the recipe

# Iteration 1: Cast columns to numeric type and categorical type:
numeric_cols = ['age', 'credit_amount', 'duration']
cat_cols = [x for x in data.columns if x not in numeric_cols]
data[numeric_cols] = data[numeric_cols].apply(pd.to_numeric, errors='coerce')
data[cat_cols] = data[cat_cols].astype('category')

# Iteration 2: Drop feature with low correlation:
names_of_columns_to_drop = ['num_dependents']
data.drop(columns=names_of_columns_to_drop, inplace=True)

# Finally, drop the target column
data.drop(columns=[target_col], inplace=True)

print('\n')
print('Data types of columns after preparing it:')
print('\n')
print(data.dtypes)

This produces the output:

Columns in raw dataset:


[
    "checking_status",
    "duration",
    "credit_history",
    "purpose",
    "credit_amount",
    "savings_status",
    "employment",
    "installment_commitment",
    "personal_status",
    "other_parties",
    "residence_since",
    "property_magnitude",
    "age",
    "other_payment_plans",
    "housing",
    "existing_credits",
    "job",
    "num_dependents",
    "own_telephone",
    "foreign_worker",
    "class"
]


Data types of columns after preparing it:


checking_status           category
duration                     int64
credit_history            category
purpose                   category
credit_amount                int64
savings_status            category
employment                category
installment_commitment    category
personal_status           category
other_parties             category
residence_since           category
property_magnitude        category
age                          int64
other_payment_plans       category
housing                   category
existing_credits          category
job                       category
own_telephone             category
foreign_worker            category
dtype: object

With this data, we can now invoke the prediction endpoint:

# Convert prepared data into csv character chunk
data_to_call = data.to_csv(index=False, header=False)

# Invoking deployed model

url = 'https://ep-45416f6a-5569-4930-b835-583163e53bd9.aia-engine.pi.exchange/invocations'

headers = {'Content-Type': 'text/csv'}

res = requests.post(url, data=data_to_call.encode(), headers=headers)

if res.status_code != 200:
    print(res.content.decode())
else:
    result = [json.loads(line) for line in res.content.decode().split('\n') if line]
    print(json.dumps(result[:4], indent=4))

Sample output:

[
    {
        "class": {
            "prediction": "good",
            "prediction_probs": {
                "good": 0.940889835357666,
                "bad": 0.05911014974117279
            }
        }
    },
    {
        "class": {
            "prediction": "bad",
            "prediction_probs": {
                "good": 0.14075148105621338,
                "bad": 0.8592485189437866
            }
        }
    },
    {
        "class": {
            "prediction": "good",
            "prediction_probs": {
                "good": 0.9796953201293945,
                "bad": 0.020304657518863678
            }
        }
    },
    {
        "class": {
            "prediction": "good",
            "prediction_probs": {
                "good": 0.7446432113647461,
                "bad": 0.2553568184375763
            }
        }
    }
]

Note that the prediction for the target variable class is shown, along with the probabilities of the classes.