Skip to content

Prediction Tasks - Regression

Introduction

Each app is a specialized intelligence to perform a single prediction task, trained from one or more datasets. Each app provides a space for the user to define the task, and to train and compare several models to achieve the goal. The "regression" app, as its name suggests, is tasked with performing a regression task.

Data Preparation for Regression

Apart from the actions committed in previous stages of the "data preparation module", you must also have a target column in the dataset. The target column must be of numeric type.

Example:

Predicting the house price given demographic, land use, environmental, infrastructure, and other related information of a suburb. The target column is MEDV (scroll to the right to observe it) which indicates the median house price. A sample of this dataset is shown below:

CRIM ZN INDUS CHAS NOX RM AGE DIS RAD TAX PTRATIO B LSTAT MEDV
0 0.00632 18 2.31 0 0.538 6.575 65.2 4.09 1 296 15.3 396.9 4.98 24
1 0.02731 0 7.07 0 0.469 6.421 78.9 4.9671 2 242 17.8 396.9 9.14 21.6
2 0.02729 0 7.07 0 0.469 7.185 61.1 4.9671 2 242 17.8 392.83 4.03 34.7
3 0.03237 0 2.18 0 0.458 6.998 45.8 6.0622 3 222 18.7 394.63 2.94 33.4
4 0.06905 0 2.18 0 0.458 7.147 54.2 6.0622 3 222 18.7 396.9 5.33 36.2

The Engine's data preparation combined with automated feature engineering embedded within model templates help you with many tasks but there are some tasks that must be handled by you prior to uploading the data to the platform. See:

Task Handled by the Engine Handled by user
Categorical columns "most frequent" imputation Yes
Categorical columns one hot encoding Yes
Numerical columns scaling Yes
Text columns "constant" imputation Yes
Text columns TF-IDF vectorizer Yes
Outliers removal Yes
Cleanup of bad target values Yes
Cleanup of duplicated values Yes
Domain knowledge enabled feature transformations Yes
Cleanup of data leakages Yes

This section shows how to build an application of problem type regression in AI & Analytics Engine using

  • Using the GUI
  • Using API access through SDK

Using the GUI

Creating an app for a dataset can be accomplished in one of two methods. Either create it from the dashboard by clicking on the "+ New App" rectangle, or, from within the project or dataset page, hover the mouse above the '+' icon in the bottom right corner of the screen and then click on the "New App" button:

First method Second method

Once any of these options are selected, the "New App" menu will appear. It requires the user to first assign an app name, the related dataset to the app and the name of the target column for that app. The type of the app (classification/regression) is automatically determined (unlike the API where it's explicitly stated) by the type of the target column.

In the second step, the app can be created with the default configuration, which means an 80/20 train/test split or with an advanced configuration that allows the user to change the train/test split size. In the advanced configuration, a value of 90 represents a 90:10 train/test split.

Step 1 Step 2

Once finished, the app is created and the app ID can be found in the browser's address bar.

Using API access through SDK

To access the API functions, you must first authenticate into the platform by

from aiaengine import api

client = api.Client()

Importing app

Next you need to import app in order to call functions involved in this module.

from aiaengine.api import app

Creating an app for a dataset

Now you can add a new app by specifying the required parameter values as follows.

create_app_response = client.apps.CreateApp(
    app.CreateAppRequest(
        name='App Name',
        description='What is this app about',
        dataset_id='id_of_dataset_app_is_created_for',
        problem_type='regression',
        target_columns=['target_column'],
        extra_columns={},
        training_data_proportion=0.8
    )
)

Similar to the process of building a classification task, you need the name, and description and the dataset id to build an application for regression. Here you also need to specify the problem type -- 'regression' for this task, and a single target column ('target_column'in the above example). For a regression task, you can keep extra_columns (only used in a forecasting task) as an empty dictionary {}. At last, you can set up a ratio of train-test split using training_data_proportion which indicates the proportion of data used for training over the whole dataset.

app_id = create_app_response.id

Once created, an application is assigned with a unique id, which is frequently used in the related functions.