Prevision.io News: First Ever Pay-As-You-Go AI Management Platform Prevision.io Launches on Google Cloud.

Introduction

 This is the third in an six-part series on how to use Prevision.io Python SDK to build production-ready and fully monitored AI models using your real-world business data. If you already have a Prevision.io project, called “Electricity Forecast” and containing both training and validation datasets, created then you are ready to go. Otherwise, head over to the second blog post, follow the instructions and come back!

What Are We Doing Today?

In this blog post, we are going to see how we can easily create a Prevision.io Experiment using the Python SDK. Let’s dig in! Launch your code environment or Prevision.io Python notebook and follow the steps!

Some Context Before Starting?

Experiment is a way to group several modelizations under a common target in order to compare them and track progress. An experiment may have one or more versions and you can change any parameter you want from version to version (trainset, features used, metrics, …) The only constant between different versions of an experiment are:

  • The target used: Once you have selected your target, you cannot change it and must create a new experiment if you want to try a new one.
  • The engine used: the model may come from the Prevision.io AutoML engine or to be imported to the Project as an External Model.

A note for Curious Folks Over There:

Do you want to know more in details about what is Experiment tracking? How do we deal with it using Prevision.io, without writing one line of code? And go through the journey of monitoring an external model in the Prevision.io Platform?

Check this three-part series written by Mathurin Ache!

Step 1. Retrieve Your Project & Datasets

 For the present tutorial, we will go for the creation of a tabular regression experiment, based on the Prevision.io AutoML engine, that will later host our models. To do so, you may want to retrieve your project , verify if you have uploaded both training and holdout datasets :

#retrieve your project
project=pio.Project.from_name(name="Electricity Forecast")

#list the uploaded datasets to your project (name, id)
datasets = project.list_datasets()
for dataset in datasets:
    print(dataset.name, dataset.id)

Once you’ve verified that both training and test data are uploaded, type the following line of code to retrieve both datasets by their id.

#if the dataset is already uploaded 
train_pio = pio.Dataset.from_id('618144075f8f22001ced1ec6')
test_pio= pio.Dataset.from_id('618145a05f8f22001ced1ecd')

The id of each dataset can be found either by dataset.id or by doing this way:

Access the ids of Your Uploaded Datasets

Step 2. Create An Experiment Version

 To launch an experiment training, which consists of creating and evaluating different types of models, we proceed as follows:

  1. Set the column’s configuration required to define at least the target column and optionally the weight column and id column.
#the dataset configuration
column_config = pio.ColumnConfig(target_column='TARGET')

       2.Define the training parameters, such as the type of models to be experimented, the feature engineering to be applied.

#The training configuration
training_config = pio.TrainingConfig(
    advanced_models=[pio.AdvancedModel.LinReg, pio.AdvancedModel.LightGBM, pio.AdvancedModel.XGBoost,pio.AdvancedModel.CatBoost, pio.AdvancedModel.ExtraTrees],
    normal_models=[pio.NormalModel.LinReg, pio.NormalModel.LightGBM, pio.NormalModel.XGBoost,pio.NormalModel.CatBoost, pio.NormalModel.ExtraTrees],
    simple_models=[pio.SimpleModel.DecisionTree, pio.SimpleModel.LinReg],
    features=[pio.Feature.TargetEncoding, pio.Feature.PolynomialFeatures,pio.Feature.KMeans, pio.Feature.Frequency, pio.Feature.DateTime,pio.Feature.Counts],
    profile=pio.Profile.Advanced,
)

3.Fit the regression by specifying the training dataset, the holdout (optionally), the corresponding evaluation metric and the attached previous configurations (column and training configurations).

experiment_version = projet.fit_regression(
    name='regression_turbo',
    dataset=train_pio,
    column_config=column_config,
    metric=pio.metrics.Regression.RMSE,
    training_config=training_config,
    holdout_dataset=None,
)

For curious folks, To obtain a full detailed documentation of both configurations and the list of available evaluation metrics, I invite you to check the api reference of the ColumnConfig and TrainingConfig in Experiment configuration and Metrics in Metrics .

This will automatically launch your experiment in your workspace. To check if the experiment was appropriately launched and progressing, you can go to the online interface and check the list of experiments (in the “Experiment” tab)

Access Your Launched Experiment

In the experiments tab, you’ll find the newly launched experiment as well as descriptive details about it: its name, the sources of its models (AutoML or external models), its latest version, its creation date and time, its creator, its datatype (tabular, images or time series), the training type (regression, classification, multi-classification, object detection or text similarity), its score (the chosen metrics, their type and a 3-stars evaluation), the number of built  models, the number of done predictions and the status (running, paused, failed or done).

Step 3. Inspect Your Experiment and Evaluate Your Models

Once an experiment has at least one version and few built models, you can get some details about it and even make predictions on its corresponding dashboard by clicking on its name in the list of experiments or by typing the following lines of code:

  • If you want to wait the end of the experiment-version execution to check:
# wait the end of experiment version
experiment_version.wait_until(lambda ev : ev._status['state']=='done',raise_on_error=False,timeout=1080000)
  • If you want to stop before the end of execution to check:
# if you want to stop before the end: until at least 5 models are trained
experiment_version.wait_until(lambda ev: len(ev.models) > 5)
  • Bonus Code: What about retrieving your experiment version? You can do it simply by its id.
experiment_version= pio.Supervised.from_id('619258d61253c7001c5753ed')
  • Bonus Code: What about retrieving your experiment version? You can do it simply by its id.
experiment_version= pio.Supervised.from_id('619258d61253c7001c5753ed')

The experiment’s id can be found in the same way as the datasets’ id either using The link Trick  or by simply typing the following lines of code:

experiments= project.list_experiments()
for experiment in experiments:
    print(experiment.name, experiment.id)

Now that you’ve made the choice to either wait the end or the training of a specific number of models, you can access information about:

  1. The experiment status as well as the same information found in the experiment tab:
# check out the experiment status and other info
experiment_version.print_info()

       2. The trained models:

  • The list of trained models
#list all created models
experiment_version.models
  • The list of trained models by profile (advanced, normal and simple). In order to get more information about each group of the models, I invite you to check the API reference of TrainingConfig in Experiment configuration.
#list of normal models 
experiment_version.normal_models_list
#list of advanced models 
experiment_version.advanced_models_list
#list of simple models
experiment_version.simple_models_list
  • Detailed information about the best model
print('*************************************')
print('***         GET BEST MODEL        ***')
best_model = experiment_version.best_model
print(best_model.__dict__)
  • Detailed information about the fastest model
print('************************************')
print('***       GET FASTEST MODEL      ***')
fastest_model = experiment_version.fastest_model
print(fastest_model.__dict__)

       3.The used features

  • General information about features’ number, types distribution, samples number and total missing values
experiment_version.features
  • Detailed descriptive statistical information about the list of feature
experiment_version.feature_stats
  • Detailed information about a specific feature
# one specific feature infos
print('************************************')
print('***    GET FEATURE STATISTICS    ***')
print('*************************************')
FI = experiment_version.get_feature_info(feature_name='TEMPERATURE')
print(FI)
  • The most important features of a given model
model =pio.Model.from_id('61930d18bbcbafbe70691442')
model.feature_importance

As well as make predictions using a chosen model. It can be done in different ways:

  1. Predict from a pandas dataframe
print('***************************************')
print('*** GET PREDICTIONS of an input dataframe ***')
print('***************************************')
df_preds = best_model.predict(test)
print(df_preds.head())

       2. Predict from a dataset of your workspace

print('*******************************************')
print('*** GET PREDICTIONS of registered dataset ***')
print('***************************************')
dataset_preds=experiment_version.predict_from_dataset(test_pio)

dataset_preds.get_result()

        3. Get the cross validation predictions

print('***************************************')
print('***        GET CROSS VALIDATION        ***')
print('***************************************')
cv = best_model.cross_validation
print(cv.head())

        4. Predict a single data point

print('***************************************')
print('***        ONE UNIT PREDICTION         ***')
print('***************************************')
unit = test.iloc[7]
print("single unit :", unit)
print('***************************************')
print("prediction : ", best_model.predict(unit))

For the curious Folks over there, if you are interested in trying out more Prevision.io functionalities that would better allow you to inspect and evaluate your experiment, don’t hesitate to take a look at our Reference Manual, try and share with us your experience!

What’s Coming Next?

Now that first experiment version is created, inspected and evaluated, you can go for building new versions with different feature engineering and generating your reports using the Prevision.io autoreport feature and share it with us or just move to the next blog post series in which we will deploy the “Best” model 🧐.

Zina Rezgui

About the author

Zina Rezgui

Data Scientist