In this post, we are going to answer this question:

  • How can I benefit from the experiment tracking and other advantages included in the Prevision.io platform while continuing to build my experiment outside the platform and / or with third-party solutions?

 

If you use another environment to train your models and you wish to benefit from the experiment tracking solutions offered by Prevision.io :

 

1.    You load and prepare data in your environment, on Kaggle notebook or on Google Colab

 

# an example of binary classification
import pandas as pd
import numpy as np
# The dataset comes from https://www.kaggle.com/kmalit/bank-customer-churn-prediction
df=pd.read_csv(“Churn Modeling Bank.csv”)

# transform object/category with label encoder
for c in df.columns:
    col_type = df[c].dtype
    if col_type == ‘object’ or col_type.name == ‘category’:
        df[c] = df[c].fillna(‘Unknown’)
        le = LabelEncoder() 
        df[c]=le.fit_transform(df[c])
    if col_type != ‘object’ and col_type.name != ‘category’:
        df[c] = df[c].fillna(-999)
       
# create train, validation, test split     
train, validation, test = np.split(df.sample(frac=1, random_state=42), [int(.6*len(df)), int(.8*len(df))])

# define target
target=”Exited”
# define X,y and features
y = train[target].copy()
X = train.drop([target],1)
features=train.drop([target],1).columns

2.    You train one or many models in your environment, on Prevision.io notebooks, on Kaggle notebook or on Google Colab and export them in ONNX format.

from sklearn.ensemble import RandomForestClassifier
from skl2onnx import convert_sklearn, update_registered_converter
# train RandomForestClassifier model
rf = RandomForestClassifier()
rf.fit(X, y)
pred=rf.predict_proba(validation.drop(target,1))[:,1]
print(roc_auc_score(validation[target],pred))
# create onnx   
initial_type = [(‘float_input’, FloatTensorType([None, X.shape[1]]))]
onx = convert_sklearn(rf, initial_types=initial_type)
with open(“rf_churn_bank.onnx”, “wb”) as f:
    f.write(onx.SerializeToString())

 

 

from lightgbm import LGBMClassifier
from skl2onnx import convert_sklearn, update_registered_converter
# train LGBMClassifier model
pipe_lgb = Pipeline([(‘scaler’, StandardScaler()),
                 (‘lgbm’, LGBMClassifier())])
pipe_lgb.fit(X, y)
pred=pipe_lgb.predict_proba(validation.drop(target,1))[:,1]
print(roc_auc_score(validation[target],pred))
# create onnx   
update_registered_converter(
    LGBMClassifier, ‘LightGbmLGBMClassifier’,
    calculate_linear_classifier_output_shapes, convert_lightgbm,
    options={‘nocl’: [True, False], ‘zipmap’: [True, False, ‘columns’]})

model_onnx = convert_sklearn(
    pipe_lgb, ‘pipeline_lightgbm’,
    [(‘input’, FloatTensorType([None,13]))],
    target_opset=12)

# And save.
with open(“lgb_churn_bank.onnx”, “wb”) as f:
    f.write(model_onnx.SerializeToString())    

 

 

from xgboost import XGBClassifier
from skl2onnx import convert_sklearn, update_registered_converter
# train XGBClassifier model
pipe_xgb = Pipeline([(‘scaler’, StandardScaler()),
                 (‘xgb’, XGBClassifier(n_estimators=1000))])
pipe_xgb.fit(X, y)
pred=pipe_xgb.predict_proba(validation.drop(target,1))[:,1]
print(roc_auc_score(validation[target],pred))   
try:
    convert_sklearn(pipe_xgb, ‘pipeline_xgboost’,
                    [(‘input’, FloatTensorType([None, 2]))],
                    target_opset=12)
except Exception as e:
    print(e)

update_registered_converter(
    XGBClassifier, ‘XGBoostXGBClassifier’,
    calculate_linear_classifier_output_shapes, convert_xgboost,
    options={‘nocl’: [True, False], ‘zipmap’: [True, False, ‘columns’]})

model_onnx = convert_sklearn(
    pipe_xgb, ‘pipeline_xgboost’,
    [(‘input’, FloatTensorType([None, 13]))],
    target_opset=12)

# And save.
with open(“xgb_churn_bank.onnx”, “wb”) as f:
    f.write(model_onnx.SerializeToString())

from catboost import CatBoostClassifier       
from skl2onnx import convert_sklearn, update_registered_converter
# train CatBoostClassifier model
cat = CatBoostClassifier()
cat.fit(X, y)
pred=cat.predict_proba(validation.drop(target,1))[:,1]
print(roc_auc_score(validation[target],pred))
# create onnx   
cat.save_model(
    “cat_churn_bank.onnx”,
    format=”onnx”,
    export_parameters={
        ‘onnx_domain’: ‘ai.catboost’,
        ‘onnx_model_version’: 1,
        ‘onnx_doc_string’: ‘test model for BinaryClassification’,
        ‘onnx_graph_name’: ‘CatBoostModel_for_BinaryClassification’
    }
)

For each model listed above, you will see a part which consists of converting the model from the scikit learn format to the ONNX format which is the format expected to be processed in the Prevision.io platform.

 

 

# create yaml   
f = open(“churn_bank.yaml”, “a”)
f.write(“class_names:\n”)
for c in df[target].unique():
    f.write(f”- {c}\n”)
f.write(“input:\n”)
for c in df.drop(target,1).columns:
    f.write(f”- {c}\n”)  
f.close()

# export datasets
train.to_csv(“train_churn_bank.csv”,index=False)
validation.to_csv(“validation_churn_bank.csv”,index=False)
test.to_csv(“test_churn_bank.csv”,index=False)

3.    You upload data models, configuration files using the user interface.

Configuring your external model

Process to import your external model

Configuring your external experiment

For each external model, you need to set a name, a yaml with features configuration,and a ONNX file containing the model

You can import as many models as you want

To go further, external model import uses the Standardized ONNX Format and most of the standard ML libraries have a module for export.

After few minutes, you obtain a dashboard with all models

Now you can Evaluate your experiment.

External model information

External model feature importance

External model confusion matrix

External model metrics

Good news:  Once imported, you still can benefit from the insightful analytics available for internally trained models.

4.    You upload data models and the related configuration files using the SDK (Python or R).

 

 

p = pio.Project(_id=’616d9628d26230001d81b0c9′, name=’E2E_Telecom_churn’)

train_dset = p.create_dataset(name=’churn_train’, dataframe=train)
holdout_dset = p.create_dataset(name=’churn_holdout’, dataframe=test)

dset_dict = {‘train’: train_dset.id, ‘test’: holdout_dset.id}

with open(‘e2e_demo/model/artifacts/dset_config.yaml’, ‘w’) as f:
    yaml.dump(dset_dict, f)

 

 

for model in models:
    clr = make_pipeline(OrdinalEncoder(), model)
    clr.fit(train.drop(TARGET, axis=1), train[TARGET])

    initial_type = [(‘float_input’, FloatTensorType([None, train.drop(TARGET, axis=1).shape[1]]))]
    onx = convert_sklearn(clr, initial_types=initial_type)
    model_name = model.__class__.__name__

    onnx_path = ‘e2e_demo/model/artifacts/{}_churn.onnx’.format(model_name)
    yaml_path = ‘e2e_demo/model/artifacts/{}_churn.yaml’.format(model_name)

    with open(onnx_path, ‘wb’) as f:
        f.write(onx.SerializeToString())

    with open(yaml_path, ‘w’) as f:
        yaml.dump(make_yaml_dict(train, TARGET), f)

    pio_model_list.append(
        (model_name, onnx_path, yaml_path)
    )

p = pio.Project(_id=’616d28bee192db001c78566d’, name=’playground’)

experiment = pio.Experiment.from_id(‘616d2d4ee192db001c785683’)

experiment_version = experiment.latest_version

experiment_version.new_version(dataset=train_dset,
                               holdout_dataset=test_dset,
                               target_column=TARGET,
                               external_models=pio_model_list)

 

 

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Ut elit tellus, luctus nec

5.    Once your imported model is deployed, you are able  to use it periodically (every hour, every day, every month …).

 

To proceed to Deployments, I refer you to the paragraph explaining how to deploy an experiment in article 2 of this series or in documentation here https://previsionio.readthedocs.io/fr/latest/studio/deployments/index.html

 

 

Conclusion

 

In this guide, we went through the whole experiment tracking process, while using Prevision.io.

As we have seen, it is essential for a data scientist to document the different iterations over all the data science project stages: from data ingestion, to feature engineering, to model selection, to hyperparameters tuning, while accessing the in depth visual analysis, until the model is deployed and in production.

 

For the production and the models’ monitoring iterative phases, Prevision.io facilitates this task, through providing solutions for documenting and tracing the used models, the stability of features, as well continuously evaluating the produced models’ performance.

Mathurin Aché

About the author

Mathurin Aché

Expert Data Science Advisory