How to release a model across your organization in one morning (and stop spending excessive amounts of time on a model that will never go into production)
What is this Guide about ?
This guide is a walkthrough for delivering a complete Machine Learning Project by using the Prevision.io platform in a very short amount of time.
The guide details each step of a Machine Learning project, from data to model usage across the organization, and shows how to accomplish this within the platform.
What’s in this guide ?
This guide assumes that :
You want to create a weekly sales forecast
You have a free version of the Prevision.io platform
Your IT Team (or data team) has historical sales data in a database and given you access ( but if not, csv files are provided below for the sake of this guide)
An objective has been defined by the Line of Business (LoB)
The steps of our guide are defined in the chart below:
Steps of a Machine Learning Project
For each step, the guide explains what to expect and how we accomplish it on the Prevision.io platform.
At the end of this guide :
LoB will receive a weekly sales forecast each Monday at 09:00 am
LoB will receive a simulator for testing hypothesis over the model
Applicative team will get an API for calling the model in their own Application
IT Team will get a dashboard to monitor model Quality of Service
You can open a free account to practice the following steps. When your account is ready, create a Project to host the assets
Create a new project
The first step to any project is getting historical data in order to train our algorithm. As the name implies, Machine Learning is all about reading historical data and letting a computer model learn to predict a target, at least for supervised use cases.
The data should have been loaded into a database by the IT Team and they have generated credentials for you. Once you have created your project, and selected it:
Go to the data section ( sidebar on the left )
Create a new connector and provide the credentials
Create a new datasource from the db and table of past sales
Import it as a dataset
Create a new connector
Create a new datasource
If available, you could import recent sales as a holdout dataset in order to validate and check the stability of your model.
You have two datasets.
Data acquisition is done, you can now start to model.
If you don’t have database credentials, you can use the following files. Just import the file instead of using a datasource when importing the dataset.
Feature engineering is the addition or transformation of one or more features to create new features from the original dataset. In the Prevision Platform, and most of the modern tools, feature engineering is done with components and pipelines, yet in most cases you don’t need to add features as the AutoML engine makes all of the standard feature engineering by itself.
Here we are going to add a fold column on the date features in order to properly evaluate our model stability. A specific component has been developed by the data science team starting from the Prevision Boilerplate and pushed on a private repo.
The component may now be integrated into the component library of the project.
Go to the pipelines section of your project and under the Pipeline Components tab, click New pipeline Component
Create a new component
And select your repo and branch.
Import component from your repo
Once the component is built, its status will be ok and we can use it in a pipeline. Create a new pipeline template with three nodes:
An import dataset, to read the trainset
The newly created component ( “build fold” )
A save dataset node to save the feature engineered dataset into you Data
Create a new scheduled run
Set your trainset as the input dataset
configure your fold component parameters
Once you have done the configuration, select “Manual” as the trigger and run your Schedule run. In a few seconds, a new dataset should be available in your data section as a pipeline output with a new fold column.
Pipeline output dataset
Pipeline output dataset
You now have a dataset with features for a training model and a holdout to validate your models.
For the sake of this guide, we built a very basic feature engineering pipeline, but you can add as many transformations as you want and build a very complex pipeline.
Here we only have one component that adds a fold column, which is the year modulo 4. You can make the feature engineering on your local machine with the following code. Yet, if you want to build your own component you can follow this guide or some others.
Define the problem
This is the most important part and the one that should be allocated the most time.
In this step, you’re going to define with the Line of Business how to qualify the project as a success and you, as a data scientist, are going to translate this as data science metrics.
Choosing the best metrics is out of the scope of this document but you must spend time with your business teams and ask these kinds of questions:
Imagine that I have the perfect model, does it make me gain something?
How much money do I lose if I forecast 110 sales instead of 100?
How much money do I lose if I forecast 90 sales instead of 100?
Are all the predicted products equal?
Should I forecast the total number of items sold, the total amount of sales (in € or $) ), the total weight of my items or the total volume?
How much time before should I forecast?
As a data scientist, by using an AutoML platform, your role is not to code in python or create dockerfiles, but to transcribe business problems to Machine Learning parameters.
In the Prevision Platform, you can build what is called an Experiment to help refine your objectives.
An experiment is a set of Model Building with slightly different parameters across each version and a common Target as well. On each experiment, many models will be automatically built, evaluated in cross-validation and on the holdout dataset if you provide some.
In our case, the models will be trained on our engineered dataset with a fold column and evaluated on a holdout provided by the IT Team.
It is very important to have a good validation strategy to guarantee that the model built in the experiment phase will stay stable on production. Here we choose to :
build a fold column on the modulo of the year number so that we stay confident that the model learned some trends that stay stable over the year
Validate on a holdout with sales from a year that was not in the trainset
Hence, if the holdout score is near to the cross validation score, we know that our model is going to hold up when launched in production and shared across the company.
For creating a new experiment, go to the Experiments section of your project and click New Experiment. You could choose to import some external models if you have some, but here we are using the AutoML Prevision Engine. As we want to forecast sales, choose “Tabular” and “Regression”. Give a name to your experiment and click “Create experiment”.
Setting the experiment up
When you create a new experiment, there is no version of the experiment existing so you will be prompted to create a new version. The next screen is where you set up all of your experiment parameters:
- The train dataset : use the output of the Schedule run from the step 2 with engineered features
The holdout dataset : use a dataset with the same target as the trainset but with data that are not in the trainset
The metric : use the best metrics that solve the business objectives defined in step 3. You can change it on each version of your experiment so run as many versions as you need if you are not sure
set your target ( here we choose “Weekly Sales” )
and set the fold column up, using the column built during the feature engineering phase.
Note that you may go to the models and feature engineering tabs to change some automl configuration but in most cases the default configuration is fine.
Once done, click on train to launch the training. The platform will immediately start to build and select models with the best hyper parameters. The models will stack in the “models” tabs of your experiment:
The experiment dashboard
Note that you can launch another version of your experiment as soon as you want, for testing other metrics for example, by using the new version button in the top right corner.
If you have several versions, the experiment dashboard will always display the last version, but you can change to another version with the version dropdown menu or the versions list tabs.
The experiment dashboard
You can launch as many versions as you want and they will run in parallel. You can now grab a coffee and wait till models are built! Depending on the size of your dataset and the plan you subscribed to, expect to wait from 10 minutes to 2 hours before having enough models to evaluate your experiment. In our case, we got our model in approximately 20 minutes.
After a few minutes, you should have between 15 and 40 models for each version, depending on which option you choose.
List of experiment versions
This step is all about evaluating all the models produced and selecting 2 to 4 models to deploy for testing in real conditions.
First, have a quick look at the list of versions below ( tab versions of your experiment ). There is a small 3-star evaluation that gives you information about each version’s quality. In this instance, Version 3, which has been trained on Mean Absolute Error, looks the most promising. Click on the specific version to enter the version dashboard for the deepest analysis.
On the Version dashboard, you have several indicators, but the most important is the models comparator :
List of models of a version
You can quickly see :
performance of each model done , evaluated on the metrics you choose for this version
stability of each model ( represented with a small error bar ) computed on a cross validation of the trainset using the fold column provided
The Prevision Platform always produces what we sometimes call “simple models”, a linear regression and a Random Forest of only 5 depth, called simple-LR and simple DT. It is always a good idea to watch performance of these models against the most complex one and ask yourself if using them could be good enough for your problem.
Indeed, as they are very simple :
they can be implement in sql ( auto-generated code is even provided on the model analysis page )
they often are more explainable and are more accepted from the Business teams, are they are easier to understand and use.
As a data scientist, deciding to use a simple if-else instead of a complex Blend of Gradient Boosting if it solves the issue is within your purview!
On the experiment above, the :
LGB-3, XGB-4, and CB-2
look promising so we are going to have a closer look. Click on the model barplot to enter the detailed model analysis, CB-2 for example.
Here you have more detail about the models you select, like various metrics and the actual vs predicted Scatterplot.
All the metrics of the model
Predicted vs actual
You can download the Cross validation file if you want to run your own evaluation. The CB2 is quite good but if we look at the Scatterplot, we see that performance falls in the range from 40k to 80k. If we go to the LGB-3 page, we see a more stable performance.
Predicted vs actual ( LGB-3 )
Evaluating a model is out of the scope of this guide but be aware that it is another step where you MUST involve your business team and explain each metrics and chart to them so you can choose the model that best solves their problem through group consensus.
The model analysis page is full of metrics to parse and you can run as many experiments as you want in order to find the model that fits the business problem the best.
After discussions with the LoB, we decided to keep the LGB-3 and the XGB-4, one because it performs well and the others because its performance is stable when evaluated on the holdout.
In order to refine this, we are now going to deploy both models and see how they perform in the real world.
In this step two models will be deployed in order to test them on real data and usage. While deployed, their performance will be closely monitored for deciding if they are good for production grade utilisation.
Go to the “Deployment” section of your project and click on deploy a new experiment. Select LGB-3 as the main model and XGB-4 as a challenger in order to see which one performs best on real data.
Set your main and challenger
The Main model will be used for prediction but each time you call it, a prediction will be done with the challenger model too and a chart will be generated so you can compare them.
Wait a few minutes to get :
a standalone webapp for a human user to test ( “Application link” url )
a batch predictor available for scheduling prediction
a REST API for calling the model from others software ( “Documentation API” link )
Set your main and challenger
That’s all. Your model can now be called from any client of your company and all its requests will be logged for further monitoring. Yet, in order to send predictions each week to the sales team, you need to schedule them.
Once any model is deployed, it can be used to schedule prediction. First step is to insert it into a pipeline template and then create a new Schedule using this template.
Note that you need help from your IT team in this step, in order to define the name of the table where you will read the features from each week. You can use the same table that will be overwritten each week, for example “sales to predict” to read and “Sales predicted” to write, or a more complex naming scheme.
First you need to create two new assets :
a new datasource that will link to the Table where the IT team is going to put the features for prediction each week
a new exporter to push the result
Create an exporter to push data to your crm
So you can use them in a new pipeline template with 3 nodes again:
Import from the datasource, where the datasource is the table with all the weekly features
a deployment prediction regression node
an export dataset node, that uses the exporter above
Once you have your template, create a new Schedule based on it.
Use your template in a schedule run
Choose the Name of your deployment as the experiment deployment ID
Use your template in a schedule run
And then, instead of the manual Trigger, use a periodic one, putting the configuration that fits your need the best ( here, a weekly prediction each Monday at 7:00 AM )
Scheduling a prediction each monday Morning
Click run and wait a few seconds. Your Prediction is now scheduled to run every Monday, from the table of “sales to predict” to the “Sales predicted” table of your databases.
Once a model is deployed, each call to it will be logged, being unit one or scheduled batch. You can track your model into the Deployments section of your project by clicking on a deployed experiment name in the list of experiments to access the deployment dashboard.
Train and production distribution
You can watch the features distribution of the trainset compared to the feature distribution seen in production and check the drift. Target distribution of the Main Model and Challenger model are shown side-by-side with those of the production in order to evaluate performance in a real application.
Under the monitoring/usage tab sit some SLA statistics about number of call average response time and errors.
By tracking all these indicators for a month or more, you can evaluate how your model lives in production and check that it behaves the way you expected while evaluating it in the experiment step.
In this guide, you saw how to complete the whole data science process in less than a morning and went from data to fully deployed model, shared across the company with full monitoring.
Using a tool to solve the technical issue of the data science, like finding the best model, deploying a model or importing the data, allows you to spend more time on what truly matters : talk with the Line of Business team to translate their problem to datascience configuration and metrics.
Prevision.io brings powerful AI management capabilities to data science users so more AI projects make it into production and stay in production. Our purpose-built AI Management platform was designed by data scientists for data scientists and citizen data scientists to scale their value, domain expertise, and impact. The platform manages the hidden complexities and burdensome tasks that get in the way of realizing the tremendous productivity and performance gains AI can deliver across your business