Skip to content

Dataset

This section shows how to create, modify and list datasets within a project in AI & Analytics Engine using

  • Using the GUI
  • Using API access through SDK

Using the GUI

Since there is a specialized section dealing with adding a dataset to the project, we shall focus on what we can do once we have a dataset.

Retrieving information of a dataset

The information we can retrieve regarding a dataset in the UI depends on the current stage of processing and the data source. For example, assume we have uploaded a small csv file to the platform. The initial information about it can be accessed via the sample project by either choosing the top "DATASETS" button or the lower one within the expanded datasets list.

If you the user did not complete the initial "recipe" generation stage, once selected, you will be enter the "Datasets" menu which will list the datasets available in this project.

When selecting a specific dataset, you can see its summary or the raw columns by choosing between tabs in the tabs ribbon.

SUMMARY RAW DATA

However, if the user has completed the initial "recipe", the processed version of the data will be generated. This can be seen in the "Datasets" menu, flagged by the green "READY" sign.

Now, the information we can visually retrieve is much more informative. It will also include the schema, column visualizations, and some extra info about size on disks, rows, columns, etc.

Updating a dataset

In an identical manner to the organization and project, updating a dataset is performed by entering the specific dataset menu through the relevant project, and selecting the "SETTINGS" tab in the tabs ribbon. The only thing that can be changed is the Dataset name.

Deleting a dataset

In the dataset menu, select the garbage bin icon on the right side of the tabs ribbon. Once selecting it, a deletion menu will open where you will have to confirm the deletion operation.

Listing datasets in a project

Listing datasets in the project has already been presented. Once inside a project, selecting the "Datasets", either from the top/bottom parts of the menu will bring you to the datasets list where they can be visually listed.

Using API access through SDK

To access the API functions, you must first authenticate into the platform by

from aiaengine import api

client = api.Client()

Importing modules

Then you need to import the following modules in order to use the related functions.

from aiaengine.api import dataset, util

Creating a dataset

Now you can create a new dataset into a project. Here we show an example of adding a dataset by uploading a csv file from the local file system.

new_dataset = util.create_dataset(
    client,
    project_id=my_project_id, # You can obtain this using the ListUserProjects API call
    name='Dataset Name',
    description='What is your uploaded data about',
    data_files=['/path/to/dataset_file.csv'],
    content_type='text/csv'    # file format
)

Retrieving information of a dataset

Once a dataset is imported into the platform, you can get information about this dataset with input of the dataset id.

client.datasets.GetDataset(
    dataset.GetDatasetRequest(
        id='id_of_dataset'
    )
)

Updating a dataset

You can also modify the name and description of an existing dataset in the platform.

client.datasets.UpdateDataset(
    dataset.UpdateDatasetRequest(
        id='id_of_updated_dataset',
        name='Updated Dataset',
        description='This dataset has been updated'
    )
)

Deleting a dataset

If a particular dataset is no longer in use, you can remove it by specifying the dataset id.

client.datasets.DeleteDataset(
    dataset.DeleteDatasetRequest(
        id='id_of_deleted_dataset'
    )
)

Listing datasets in a project

For a particular project, you can list all datasets by giving the project id.

client.datasets.ListDatasets(
    dataset.ListDatasetsRequest(
        project_id='id_of_project_where_datasets_are_included'
    )
)