Skip to content

Prediction Tasks - Regression

Introduction

Each app is a specialized intelligence to perform a single prediction task, trained from one or more datasets. Each app provides a space for the user to define the task, and to train and compare several models to achieve the goal. The "regression" app, as its name suggests, is tasked with performing a regression task.

Data Preparation for Regression

Apart from the actions committed in previous stages of the "data preparation module", you must also have a target column in the dataset. The target column must be of numeric type.

Example:

Predicting the house price given demographic, land use, environmental, infrastructure, and other related information of a suburb. The target column is MEDV (scroll to the right to observe it) which indicates the median house price. A sample of this dataset is shown below:

CRIM ZN INDUS CHAS NOX RM AGE DIS RAD TAX PTRATIO B LSTAT MEDV
0 0.00632 18 2.31 0 0.538 6.575 65.2 4.09 1 296 15.3 396.9 4.98 24
1 0.02731 0 7.07 0 0.469 6.421 78.9 4.9671 2 242 17.8 396.9 9.14 21.6
2 0.02729 0 7.07 0 0.469 7.185 61.1 4.9671 2 242 17.8 392.83 4.03 34.7
3 0.03237 0 2.18 0 0.458 6.998 45.8 6.0622 3 222 18.7 394.63 2.94 33.4
4 0.06905 0 2.18 0 0.458 7.147 54.2 6.0622 3 222 18.7 396.9 5.33 36.2

The Engine's data preparation combined with automated feature engineering embedded within model templates help you with many tasks but there are some tasks that must be handled by you prior to uploading the data to the platform. See:

Task Handled by the Engine Handled by user
Categorical columns "most frequent" imputation Yes
Categorical columns one hot encoding Yes
Numerical columns scaling Yes
Text columns "constant" imputation Yes
Text columns TF-IDF vectorizer Yes
Outliers removal Yes
Cleanup of bad target values Yes
Cleanup of duplicated values Yes
Domain knowledge enabled feature transformations Yes
Cleanup of data leakages Yes

Importing necessary modules

Next you need to import app in order to call functions involved in this module.

from aiaengine.api import app

Creating a regression app

Now you can add a new app by specifying the required parameter values as follows.

create_app_response = client.apps.CreateApp(
    app.CreateAppRequest(
        name='App Name',
        description='What is this app about',
        dataset_id='id_of_dataset_app_is_created_for',
        problem_type='regression',
        target_columns=['target_column'],
        extra_columns={},
        training_data_proportion=0.8
    )
)

Similar to the process of building a classification task, you need the name, and description and the dataset id to build an application for regression. Here you also need to specify the problem type -- 'regression' for this task, and a single target column ('target_column'in the above example). For a regression task, you can keep extra_columns (only used in a forecasting task) as an empty dictionary {}. At last, you can set up a ratio of train-test split using training_data_proportion which indicates the proportion of data used for training over the whole dataset.

app_id = create_app_response.id