Dataset
This section shows how to create, modify and list datasets within a project in AI & Analytics Engine using
- Using the GUI
- Using API access through SDK
Using the GUI
Since there is a specialized section dealing with adding a dataset to the project, we shall focus on what we can do once we have a dataset.
Retrieving information of a dataset
The information we can retrieve regarding a dataset in the UI depends on the current stage of processing and the data source.
For example, assume we have uploaded a small csv
file to the platform. The initial information about it can be accessed via the sample project by either choosing the top "DATASETS" button or the lower one within the expanded datasets list.
If you the user did not complete the initial "recipe" generation stage, once selected, you will be enter the "Datasets" menu which will list the datasets available in this project.
When selecting a specific dataset, you can see its summary or the raw columns by choosing between tabs in the tabs ribbon.
SUMMARY | RAW DATA |
---|---|
![]() |
![]() |
However, if the user has completed the initial "recipe", the processed version of the data will be generated. This can be seen in the "Datasets" menu, flagged by the green "READY" sign.
Now, the information we can visually retrieve is much more informative. It will also include the schema, column visualizations, and some extra info about size on disks, rows, columns, etc.
Updating a dataset
In an identical manner to the organization and project, updating a dataset is performed by entering the specific dataset menu through the relevant project, and selecting the "SETTINGS" tab in the tabs ribbon. The only thing that can be changed is the Dataset name.
Deleting a dataset
In the dataset menu, select the garbage bin icon on the right side of the tabs ribbon. Once selecting it, a deletion menu will open where you will have to confirm the deletion operation.
Listing datasets in a project
Listing datasets in the project has already been presented. Once inside a project, selecting the "Datasets", either from the top/bottom parts of the menu will bring you to the datasets list where they can be visually listed.
Using API access through SDK
To access the API functions, you must first authenticate into the platform by
from aiaengine import api
client = api.Client()
Importing modules
Then you need to import the following modules in order to use the related functions.
from aiaengine.api import dataset
from aiaengine import util
Creating a dataset
Now you can create a new dataset into a project. Here we show an example of adding a dataset by uploading a csv file from the local file system.
new_dataset = util.create_dataset(
client,
project_id=my_project_id, # You can obtain this using the ListUserProjects API call
name='Dataset Name',
description='What is your uploaded data about',
data_files=['/path/to/dataset_file.csv'],
content_type='text/csv' # file format
)
Retrieving information of a dataset
Once a dataset is imported into the platform, you can get information about this dataset with input of the dataset id.
client.datasets.GetDataset(
dataset.GetDatasetRequest(
id='id_of_dataset'
)
)
Updating a dataset
You can also modify the name and description of an existing dataset in the platform.
client.datasets.UpdateDataset(
dataset.UpdateDatasetRequest(
id='id_of_updated_dataset',
name='Updated Dataset',
description='This dataset has been updated'
)
)
Deleting a dataset
If a particular dataset is no longer in use, you can remove it by specifying the dataset id.
client.datasets.DeleteDataset(
dataset.DeleteDatasetRequest(
id='id_of_deleted_dataset'
)
)
Listing datasets in a project
For a particular project, you can list all datasets by giving the project id.
client.datasets.ListDatasets(
dataset.ListDatasetsRequest(
project_id='id_of_project_where_datasets_are_included'
)
)