This is the fifth in an six-part series on how to use Prevision.io Python SDK to build production-ready and fully monitored AI models using your real-world business data. If you already have a Prevision.io project, called “Electricity Forecast” containing an experiment deployment called “my-deployed-experiment’’ including an accessible generated APP then you are ready to go. Otherwise, head over to the fourth blog post, follow the instructions and come back!
What Are We Doing Today?
In this blog post, we are going to give an overview of Prevision.io “Pipelines” starting with providing their definition and concluding by showing how to interact with them! But before starting, let us raise a toast for those who made it to PRODUCTION!
Leonardo DiCaprio Raising A Toast
Some Context Before Starting?
Aren’t you wondering what a ML Pipeline is, what is important and what to consider when building it? If you already have the answers, don’t hesitate to skip the present section. Otherwise, stick with me! You’ll end up having answers to your questions.
1. What Is A ML Pipeline?
A machine learning pipeline is a way to codify and automate the workflow it takes to produce and use a machine learning model. ML pipelines consist of multiple sequential steps that do everything from data extraction and preprocessing to model training and deployment until predictions production and analysis.
2. Why is Pipelining so important?
To understand why pipelining is so important in machine learning performance and design, let’s imagine a typical ML workflow in a few words: Data ingestion, Data cleaning, Data preprocessing, Data Modeling and then deployment.
In a mainstream system design, all of these tasks would be run in a monolith: Only one script to get through the whole workflow. Since ML models usually consist of fass less code than software applications, at first sight the approach may make sense. However, once it’s time to scale, three significant problems will surely arise.
I call them the Nightmare V Trio :
- Volume: when deploying multiple versions of the same model, you have to run the whole workflow twice, even though the first steps of ingestion and preparation are exactly identical.
- Variety: when you expand your model portfolio, you’ll have to copy and paste code from the beginning stages of the workflow, which is inefficient and a bad sign in software development.
- Versioning: when you change the configuration of a data source or other commonly used part of your workflow, you’ll have to manually update all of the scripts, which is time consuming and creates room for error.
With the ML pipeline, each part of your workflow is abstracted into an independent service. Then, each time you design a new workflow, you can pick and choose which elements you need and use them where you need them, while any changes made to that service will be made on a higher level. In addition, as you’ve already guessed, pipelining improves both the performance and the organization of the entire workflow, getting models into production quickier and managing them easier and more efficiently.
Isn’t this just AMAZING ! However, Are you worried about moving to technical staff? Grab a drink and no worries! Prevision.io pipeline editor has got you covered!
3. What To Consider When Building A Machine Learning Pipeline?
In the Prevision.io platform you can automate several actions using the pipeline editor. It mainly involves:
- Pipeline components: low level operation applied to a Prevision.io resource
- Pipeline templates: template of multiple chained pipeline components
- Pipeline run: an instance of a pipeline template that can be scheduled and executed (and monitored) multiple times
In order to execute a pipeline, several requirements need to be fulfilled:
- First, you have to create your own template using the pipeline editor. This template includes generic components with no configuration required in this step. This allows you to create a generic template and apply it several times on different experiments by configuring the component.
- Then, you will be able to configure the pipeline run by choosing an already created template and configuring the nodes to your experiment. You also can choose to run the pipeline manually or automatically by using the scheduler.
Optionally, you can create and load into the platform your own components and use them into pipelines.
Let The Fun Begin!
Now that you have a firm grasp on ML Pipelines, their benefits and what to consider once you want to build them, I’ll guide you through the different steps to build a ML pipeline using Prevision.io.
Step 1. Pipeline Components:
Pipeline components can be considered as steps or nodes of the whole pipeline. Even though several categories of components are already built by and available in Prevision.io, you still can create your own custom pipeline components in your favorite language!
Prevision.io Pipeline Components:
Version 11 and higher of the Prevision.io platform includes a library of already built pipeline components. You can access them on the platform UI by following simple steps as showcased below:
Accessing and Listing Prevision.io Pipeline Components
As you’ve already noticed most of the components belong to the following categories:
- Retrain of an experiment
- Prediction of a deployed experiment
- Dataset read/write operation
- Dataset basic transformation (filter outliers, sample,..)
- Dataset Feature augmentation (add weather feature, add special days information..)
Each component has a description helping you to choose the ones suitable for your needs. You can access all components description as follows:
This component will add special days depending of a country code and a date column
Custom Pipeline Components:
While pre-built components from Prevision.io are a must-have and will definitely simplify your data science journey, they may be insufficient for more advanced projects. For instance, you may want to do some advanced feature engineering, use your own prebuilt word embeddings based on an external corpus… To do it, you will have to create your custom components and we’ve got you covered here!
To accomplish this, we have a github public repository with resources aimed at ease of custom component creation.
The general idea is that you have to create a repository (github or gitlab are the two types supported in the actual version of Prevision.io) and submit your component code in it alongside the yaml & docker configuration file (please check the readme from our public repository). Then, you can import your custom component from the UI (to date, importing resources connected to an external GIT repository isn’t supported by SDKs).
Importing a new custom component from UI
Once done (and this can take some time depending on the requirements you have), it will be listed and available through the UI.
Step 2. Pipeline Templates:
Pipelines templates are a succession of pipeline components chained together. The idea is that you can make multiple operations, either coming from Prevision.io components or custom components in order to make a template that fully meets your needs.
As of today, templates are mono input and mono output, even if custom components are a little more flexible than that. In order to create a new pipeline template, you have to navigate through the “pipeline” menu and follow the steps showcased below:
Create An Empty Pipeline Template
Once your empty pipeline template is create, you can add the nodes that fit your needs, including:
- Import → import datasets, already present in your Prevision.io environment or coming from data sources
- Export → export datasets to your Prevision.io environment or in external data sources
- Prevision.io components → various components provided by Prevision.io (sample, data augmentation, outlier filtering, …)
- Custom components → your own previously imported components
- Predict → prediction on a deployed experiment (so make sure to actually deploy an experiment before using this 🙂 )
- Retrain → retrain an experiment, this will automatically create a new version of it
For today’s tutorial, I chose to go for a simple example. However, you still can go wild and create more complex templates and even code your own custom component in the middle that can retrieve real time data or make advanced features engineering !
My pipeline template will mainly consist of three components:
- Import a dataset already present into Prevision.io
- Launch a prediction on a deployed experiment
- Export results as a data set directly into my Prevision.io environment
Template importing a dataset, sending it into a deployed experiment and saving results
As showcased above, the template is pretty simple and generic since we haven’t said which dataset to import or which experiment to predict on. That means that it can be used in multiple “pipeline runs” (in which we can instantiate our template).
Now that your pipeline template is finished and saved, you’ll be then redirected to the pipeline template list to either create other templates or check the already existing ones!
Step 3. Pipeline Runs:
Since we have (at least) one pipeline template ready to go, we can now create a run on top of it.
A pipeline run is just an instance of a template, which is configured (= all nodes requiring parameters will be filled) and that can be scheduled on a regular basis or just launched manually, at your convenience.
To do so, you need to access the “Scheduled runs” tab available under the “Pipeline” section of your project, as shown below:
Create a new scheduled Run & Input the Template previously created in the Run definition
Now that you have created your scheduled run, all you need to do, as shown below, is to fill in the parameters of nodes (if required):
Filling in the parameters of each node of your Pipeline
Now that you have filled all the nodes parameters, you can click on next and choose the trigger type:
- Manual: To run your pipeline once(useful for testing). The pipeline will be run as soon as you click on run (it can be run as much as you want later).
- Periodic: To run your pipeline at a given period for some duration.
For the sake of simplicity, a manual trigger is sufficient for today’s demo. As illustrated below, once created , you’ll have access to the scheduled run and its logs. Also, you’ll even be able to trigger a new run !
Experimenting First Pipeline Run
What’s Coming Next?
Now that we have a deployed experiment, fed with some data coming out of pipelines, the next step is to either code your App and share with us your developed applications! I don’t doubt your creativity if you made it till here!
Or just move to the next blog post, which is the last one to discuss model life cycle management and then we will conclude this series.