Using the GUI
This page gives a walk-through of performing the task introduced in the Quick Start Introduction page using the web GUI. Screenshots are provided where appropriate.
Dashboard Home Page
Upon logging in to the web GUI of the Engine, you will be shown the dashboard with a collapsed view of all organizations you have created. In this demo, we are logging in for the first time, and we see the default organization for this account as given at the time of registration.
To create a new organization, simply click on any of the existing organizations in the collapsed view. You will then find the "New Organization" button underneath.
Creating an Organization
Upon clicking "New Organization", a dialog-box wizard opens up to ask for details on the new organization you want to create. You would first need to enter basic details such as name and description. Next, select which subscription plan to use for this organisation. In this guide, we select the individual plan.
In the next step, you can optionally add other users to the new organization, to collaborate with you. In this instance, we do not need to do this. Hence, we want to finalize the organization creation by clicking on the "Create Organization" button.
Creating a Project
Upon creation of a new organization, you are taken to its details page for exploration. We want to create a new project under it. To do so, hover over the floating actions button marked by the plus sign at the bottom-right of the page (1). These floating action buttons are available in all details pages to guide you through the next step. For now, click on the "New Project" floating action button:
This opens up another dialog wizard where you will:
- Provide a name for your project. We will call it "Demo Project 1",
- Optionally provide a detailed description for your project, and
- Choose the organization under which you want to create the project.
Assign a name and description to your project and click "Next". In the next step of the new project wizard, you have the option to add other users to your project and assign a role such as "Owner", "Editor", or "Viewer" to them. In this instance, simply finish by clicking on "Create Project". You will then be taken to the details page of the project where we can proceed to create a dataset.
Creating a Dataset
The dataset creation process involves two steps:
- Importing raw data by connecting to a data source or uploading a file, and
- Preparing your data source into a dataset using multiple options, one of which is to create a new data preparation recipe.
Importing from Local File
Click on "New Dataset" in the floating actions button at the bottom right to begin. In the "New Dataset" dialog, you will first need to select a dataset source. In this example, we will be importing the german credit data mentioned in the introduction from a local csv file.
We can drag and drop a file into the upload area, then click on "Create" to start importing the file into the Engine. Once the file has been uploaded, the next step is to use a recipe in order to prepare the dataset for training models. You are required to choose whether you want to create a new recipe, or prepare the dataset using an existing recipe that you've previously made. In this example, we choose "Create a new data wrangling recipe".
Preparing your Dataset
Upon clicking "Done" on the above dialog, you will be taken into the recipe building session, where data can be prepared into the desired form using various types of actions such as fixing column types, imputing missing values, and computing aggregates.
At the end of the session, the user will have:
- The prepared dataset that can next be used to build models, and
- A re-usable recipe that remembers the pipeline of actions that were applied in the building process, so that it can be used on batches of new data in the same format.
The recipe building session is an iterative process. In each iteration:
- Data is analyzed and a list of recommended actions along with justifications are presented,
- You choose which suggested actions you want to accept,
- You may edit the suggested actions, or add new ones,
- These actions are then applied.
Whenever actions are committed, the Engine analyses the dataset again to provide new recommendations and the process is repeated. If a target column is chosen, the Engine can provide better recommendations tailored to the target.
Continuing with our task, when the recipe is first created the dataset is analysed. This typically should take less than a minute. The duration depends on the number of variables or columns your tabular dataset contains. Once complete, we see the following screen:
This contains the following:
- Search bar. This can be used to find and display particular columns of your choice in the dataset.
- Dataset preview. The first 1000 rows of the tabular dataset are shown to the user, in order to make informed decisions about the actions they want to apply next. This view is refreshed whenever actions are queued.
- Actions dialogue. In the currently opened "Suggestions" tab, we see:
- Field to enter the target column (if provided, better recommendations are given)
- Insight generated by the Engine, click to expand and contract
- Recommended actions to address insight
- "See analysis" button, click to see a detailed explanation of the provided recommendations
The first recommended action will always be to cast each column to a particular type, unless the dataset's schema already matches the schema inferred by the Engine. Upon clicking "see analysis", we see the following:
This dialogue box contains the model's justifications for the recommendations it has provided. In this case, an analysis of the values in each column is shown and the Engine has decided whether it seems more numeric or categorical in nature.
Before proceeding, let us provide a target column to the Engine so that it may generate better recommendations in the next iteration. In this example we will choose "class" as our target.
The next step in the data wrangling process is to add the action to the actions queue. Upon clicking the plus icon, we are taken to the "Recipe" tab in the actions dialogue box and we see our selected actions appear in the list of queued actions.
When we first queue an action, the Engine attempts to generate a preview of what our dataset will look like when we commit our actions. Once processed, we should see the icons in each column change from text to numeric or categorical, as below:
From the recipe tab, we can choose to commit the queued actions or to edit or delete our queued actions. In this step, we will simply accept the Engine's recommendations and commit the actions. Once we hit commit, the actions will appear in the "Committed Actions" drop down menu with a spinner indicating that the Engine is in the process of committing these actions and analysing the dataset to generate the next set of recommendations. This will typically take 1-2 minutes.
When these actions have been applied, they will appear in the "committed actions" drop down menu with a tick next to them indicating that they are complete. In addition, once an action has been committed, the option to finalize the dataset is no longer greyed out.
Moving back to the "Suggestions" tab, if "class" was chosen as the target column prior to committing the previous actions, you should see the following recommendations:
We see that some columns have been detected as "not predictive" (1). We will ignore the autoencoder action for now.
Entering into the details (2) by clicking "see analysis", you can see what the reasons are for this insight, and that the recommended action is to drop such column(s) since the presence of such features is unlikely to improve the predictive power of the model.
Let us accept this recommendation and add it to the action queue by clicking the plus button. We will now look at adding actions that were not suggested by the Engine. Click the "Add Action" tab to bring up the list of actions. Let us suppose we wish to bin the "duration" column into 5, equally spaced bins. At the top of the "Add Action" dialogue box, we can use the search bar to find the action we want -- "Bin Columns" -- then select it from the menu:
See the action catalogue for a full list of the data wrangling actions supported by the Engine.
After selecting actions, you can modify their parameters. For this example, we:
- choose the input column "duration",
- name the output column "duration_bin",
- select "Equally Spaced" as our binning method, and
- set the number of bins to "5".
Click "Add" to add this action to the queue. This same parameter selection dialogue can be reopened by clicking the edit icon next to actions in the actions queue. If you are satisfied with the "Drop" and "Bin Columns" actions, click commit.
Once the dataset has finished processing, click "Finalize & End" to complete the recipe and move onto the next stage. This opens up a dialog asking you to name the prepared dataset. Modify the name as desired and click on "Yes" to confirm completion.
You will then be taken into the dataset's details page. At this stage, your prepared dataset is analyzed one final time to generate important statistics and visualizations. Once this is complete, results of the analysis on your dataset are stored behind the scenes as metadata. The dataset details page then fills up as follows:
In the above view of the summary tab, you will notice the following information:
- The name of the dataset
- Basic information such as size, number of rows, and number of columns,
- A tab to see summary stats and visualizations of all numeric columns in your data (if they exist),
- A tab showing similar information for categorical and other types of columns in your data (if they exist)
- The stats of each column shown.
- Density estimates of numeric columns.
- Histograms of numeric columns.
You can also use the data tab to see a 1000 row subsample of your dataset, the schema tab to view which type of data each column contains, and the settings tab to change the name of the dataset.
Creating an App
To move on, you will next need to create an "App" from this dataset. An app is a special container on the Engine that holds multiple models trained and evaluated on the same train set and test set. An app enables you to train, evaluate, and compare models using different machine learning algorithms (called "templates" on the Engine).
Choose the "New App" option in the floating action button at the bottom-right of the screen. In the following dialogue box, you must first name your app and select from which dataset we wish to create it:
You have the option of performing a prediction task or a forecasting task with your data. In this example we will be performing a prediction task. Click "prediction task" and then select our target column - "class" in this case. Upon clicking next, we have the option of splitting the dataset into a train and test set using the standard ratio 80:20, or we can provide a custom train-test split. Choose the default configuration for this example.
When the app creation dialog is completed by clicking on "Create Application", you will be taken to the app's details page, where the progress on the processing task is shown. The processing required for your app involves:
- Splitting your data into train and test sets,
- Computing additional stats on your data,
- Using the model recommender to predict beforehand how fast models can be trained from available templates as well as their estimated quality (in terms of how accurate its predictions are).
Upon completion, you can navigate to several tabs such as the "dataset" tab, which takes us back to the details page of the dataset associated with this app. There are also the "feature sets", "models", and "deployments" tabs which we will come to soon.
At this juncture, we can explore the dashboard a little more to understand where we are. Click on the Apps' summary dashboard card to the left, as illustrated in the animation below:
On this screen we see the summary card of the current app is highlighted, and we see the project the app belongs to and the organisation the project belongs to. Each card in the dashboard displays some summary information about the object it represents. Clicking a different organisation/project will reveal the projects/apps it contains and close the current organisation/project card.
To see more of the dashboard, click on the vertical tab on the right-hand side while the app card is open. You will see the following screen:
On this screen, you can see:
- The summary card of the current app, and the details of the dataset associated with the app,
- The default feature set (which contains every feature)
- The option to create a new model.
We will now explore the feature set functionality of the Engine.
Creating a Feature Set
Whenever an app is created, a feature set named "Default" is also created which contains every feature in the dataset. Optionally, the user can create additional feature sets that contain a subset of the features in order to save model training time, or to test how predictive certain features are of the target. As per usual, click the "feature sets" tab to enter the feature sets details page, then click the floating action button and select "add new feature set" to create a feature set. You can also create a new feature set from the app's dashboard:
Once you have given the feature set a name, click next. You should see the following screen:
On this screen, we see:
- Feature selection area. Click a feature to select/deselect it.
- Selected features area. Lists currently selected features for the feature set. The target column is always included.
- Search bar. Use to search for particular features, which are then displayed in the feature selection area.
- "Select all" button. Click to select every feature currently displayed in the feature selection area.
- "Deselect all" button.
For this example, we will select all categorical features, leaving out all numeric features - "duration", "credit_amount", and "age". We can then compare the quality of the models trained with and without numerical features.
With our feature sets created, navigate to the "models" tab and click the floating actions button followed by "Create New Model". This will take you to the model creation dialog.
In the "New Model" dialog, first select a feature set on which you wish to train a model.
Click next and you will be shown a list of model templates (machine learning algorithm) to choose from. In this view, the output of the model recommender is shown to you, to aid you in choosing the model template that suits your needs. The recommender has estimated the performance of each of these templates on your data in terms of:
- The "predictive performance", a percentage value that measures how close your model's predictions will be to the true values, on new data that was not available when the model was trained. A 100% predictive performance indicates that the model is able to predict without any errors,
- The time taken to train the model, and
- The time taken by the model to predict on new data.
The "Choose a model" step of the dialog shows this information in a visual manner, as seen below:
On this screen you can:
- Sort the models with respect to each of the three metrics estimated by the recommender, from best to worst.
- See the details of the model template.
- See for every model the ratings out of five for each of the three estimated metrics. A higher rating implies:
- Better predictions for the "predictive performance" metric,
- Faster training for the "train time" metric, and
- Faster predictions, for the "prediction time" metric.
- View a bubble plot showing estimated training time on the x-axis, estimated prediction time on the y-axis, and the estimated predictive performance as text inside the bubbles.
Using the check-boxes, choose the model template(s) you want to train, then click on "Next". Leave the option as "Default Configuration" and choose "Next" again. Then click on "Train Models" button to start training the models:
You will then be taken to the models listing page of your App. On this screen, you can see the model templates' names and progress bars for the training jobs.
Models are trained on the training portion and evaluated on the test portion. The evaluation results are available on the individual models' detail pages. One can also compare the evaluation results of all trained models in an app by clicking on the "Comparison" tab (1) in the above screen. This takes us to the model comparison page of the app as seen below:
On this page, you can see:
- The actual performance of the trained models on the three metrics of importance, and
- A comparison of ROC curves and precision/recall curves, for classification models
Below the plots on the same page, there is more information about the evaluation metrcs:
Here, you will see:
- The rating out of five for each of the metrics,
- A table of evaluation metrics, where rows are models and columns are metric names. You can sort models by these metrics.
Your next step is to go back to the models listing page:
Then click on one of the models to view its details page:
You can see some basic information about the model, evaluation metrics, and more visaulizations such as confusion matrix, in addition to the ROC and precision/recall curves.
Deploying a Model
Once a model is trained and you are convinced that it does well on your data, you will next need to deploy a model to be able to use it for predictions.
To deploy a model, click on the floating action button as seen in the previous figure. This will take you to the deployment setup dialog:
Here, you want to choose the option "Deploy to PI.EXCHANGE cloud" and proceed by clicking "Next":
On this screen, simply choose the "New endpoint" option (you have no endpoints created previously, as this is your first model), then click "Deploy".
This makes your model available for online predictions through a simple API call. After this step, you are taken to the details page of the endpoint:
From this page, you will need to copy the URI of the endpoint to make online predictions.
Using your model to predict on new data
GUI support is planned but currently only available under the summary page of the model as an API test widget. To use it, paste CSV data in the left column (without header and target column) and click on the "CALL API" button to observe the sample output. Refer to Using the API in Quick Start for using the API to predict on new data.