Features and Capabilities
The AI & Analytics Engine (the Engine) presents you with diverse and flexible options to power your model-building pipeline and to help you solve your unique business needs. Below are a list of features provided by the AI & Analytics Engine.
Support for a wide range of Data Sources
The Engine can ingest datasets from several types of sources including:
- local file
- cloud storage (e.g. AWS and Google Cloud Storage)
- HTTP/FTP connection, or
- DBMS systems such MySQL and PostgreSQL
- NoSQL systems such as MongoDB, Cassandra
Currently, the Engine supports tabular data format for all the associated tasks. Supported file formats are: CSV, JSON lines, Parquet. For CSV, headers are automatically detected and interpreted as field names.
The supported data types of the columns in the data are
- Numeric (Integer/Double-precision floating point)
- Date/Time, or
The Engine will try to automatically detect the data types. You can optionally review the automatically detected data types and correct them if necessary.
Projects/Users/Datasets management system
You can manage several ML projects, datasets, models, and deployments in a centralized manner. The engine enables this through a logical hierarchy of "assets" with namespace-like separation that is also easy to navigate and search.
Graphical User Interface (GUI) and Data Visualization
The Engine allows clear visual inspections of multiple facets of a typical workflow on the platform. Some but not all of the displays are:
- Assets management
- Summary and statistics over data columns
- Model comparisons
- Data previews
Model Performance Prediction and Recommender
Our Model Recommender can predict how a machine learning model will perform on a particular dataset. This means you can avoid the brute-force approach when choosing the best model for you data. Instead of training hundreds of models, you can simply focus on the top few recommended by the Model Recommender. We also provide a cost-time-performance trade-off view of the recommendations so that you can choose the model best suited to your business needs. The Model Recommender provides the actual figures as well as star ratings for the following metrics for all the machine learning model templates available for a given task:
- Model performance (prediction error rate, accuracy, confidence interval, etc.)
- One-time effort (in terms of computation time) required for training and its cost,
- Ongoing deployment cost (per 1000 API calls or per hour, depending upon the type of the model)
- Response time of deployed API
Automated Hyperparameter Tuning and Evaluation
After selecting some models for training, the Engine performs automated hyperparameter-tuning inside a parameter search space. The results are evaluated and the best (in terms of accuracy or R2) model is chosen.
Data preparation in the traditional way is an expensive process that is time-consuming and difficult to reproduce. The AI & Analytics Engine gives you the ability to discover and apply complex statistical data transformations by providing AI-powered recommendations tailored to your datasets and problem statements.
The results of the data wrangling process is a data processing pipeline referred to as a recipe. A recipe produced by the Engine is scalable, tractable, reproducible, and reusable. It provides you the ability to create, replicate, recalibrate, and customize data preparation processes according to your business needs in a unified framework.
The data wrangling recommendations are generated by AI-powered algorithms that make use of advanced statistical methods, developed through PI.EXCHANGE’s R&D efforts. These AI-powered algorithms can recommend a set of actions, or data transformations, that will be useful for machine learning purposes; some of the AI-powered algorithms can detect issues with the data and suggest cleaning steps. Some examples uses of the AI-powered algorithms include:
- Fixing missing values and records
- Detect non-predictive features
- Detect redundant features
- Detect target leakage (i.e. features created using the target)
- Feature engineering through auto-encoder neural networks
- Feature engineering through clustering, dimensionality reduction, and applying Natural Language Processing (NLP) techniques to text features.
You can customize the AI-powered recommendations through the GUI, see:
Model Performance Comparison
The Engine can compare the performance of multiple models in an intuitive and visual way.
The image shows the TP/FP curves and prediction time vs training time for two models.
"One-click" Model deployment
Deploying models is very simple. Once a model has been trained it is possible to deploy an API endpoint with a few clicks of the mouse. Example code demonstrating usage is supplied to make it eaasier for you to start consuming the predictions from the machine learning models.
Monitoring Deployed Models
Today's large scale ML production environments are constantly fed with data. The characteristics of incoming data tend to shift over time and the adage of "garbage in garbage out" is more apt than ever. The characteristics of data include the distribution of key features, the distribution of actual target outcomes, and the relationships (e.g. correlation) between features and targets. Once the characteristics of income data change, so can the model behavior. This might lead to a range of non-desirable repercussions, for example:
Google engineer apologizes after Photos app tags two black people as gorillas -- The Verge - 2015
In order to avoid these events, it is important to monitor a deployed model's performance over time. Thus, for every deployed model, the Engine automatically computes and keeps track of several monitoring metrics. These monitoring metrics can be used to alert the user if the incoming data and the predicted performance of the model have changed considerably since model development.