Skip to content

Continuous Learning

Introduction

Continuous learning refers to the capability to update the trained prediction model to take into account the additional available data. This is particularly useful for situations where additional data only becomes available with the passage of time, and updating already trained models with the additional data will help it make better predictions.

The concept of Continuous Learning touches the following sections within the platform:

  • Dataset versioning: Updating the existing dataset with additional available data
  • Continuous learning setting for Trained Models: When enabled, automatically update the trained model once a new dataset version is available.
  • Auto-deploy setting for Deployments: When enabled, automatically deploy newer instances of the trained model once available.

Dataset versioning

Users can update an existing RAW dataset via the:

  1. Quick access icon; or
  2. Contextual menu

Note: Users will not be able to update a PROCESSED dataset as by definition, a PROCESSED dataset is an output dataset obtained by applying a recipe onto a source dataset.

update-datset-quick-access update-dataset-fab

The update dataset process is similar to the create new dataset process, with the added requirement that the additional data added has the same schema (no. of columns & column headers) as the original RAW dataset.

update-dataset-choose-source update-dataset-ingest-data

Once the RAW dataset has been successfully updated, users will receive an in-platform notification. From the dataset table and detail page, users can see that the initial dataset V0 has been updated to V1.

Note: Dataset is updated via appending the newly added data to the existing dataset. (e.g. Dataset V1 = Dataset V0 + data added newly by user)

dataset-update-successful-notification dataset-summary-new-version

Users may observe that although the RAW dataset has been updated to a newer version, any PROCESSED dataset associated with the RAW dataset (by application of a recipe), does not automatically update to a newer version, UNLESS there is an existing model trained on the PROCESSED dataset with continuous learning enabled.

Continuous learning setting for Trained Models

To enable continuous learning, navigate to the Settings tab within the trained model’s detail page. Toggle “Enable Continuous Learning” on, and click SAVE.

continuous-learning-turn-on-model-settings model-listing-shows-continuous-learning

When continuous learning is enabled, the:

  • PROCESSED dataset will automatically be updated when the associated RAW dataset is updated by the user
  • Trained model will automatically update through partial fitting or refitting to the latest updated version of the trained dataset, once available.

raw-processed-data-new-versions processed-data-new-version model-update-triggered model-new-version

Note: Every time the model is updated on the updated version of the trained dataset, a separate instance of the trained model is created. Each instance of the trained model will then be evaluated against the various hold-out test sets (train/test split) from different dataset versions.

evaluation-version-choice evaluation-test-data-version-choice

Auto-deploy setting for Deployments

To enable auto-deploy, navigate to the Settings tab within the Deployment’s detail page. Toggle “Auto-deploy updated model” on, and click SAVE.

deployment-setting-auto-deploy-updated

When auto-deploy is enabled:

  • Newer instances of the same trained model will be automatically deployed to replace the existing deployment, and assigned to the same endpoint

Auto-deploy will ensure that users can easily update their deployments with the updated trained model while ensuring that their endpoint remains intact with minimal downtime.

deployment-updated