This blog post series will be composed into 3 steps:


In this post, we are going to answer these questions:

  • What is experiment tracking in machine learning?
  • Why is experiment tracking essential ?
  • How should you manage/organize your machine learning experiments?
  • What is the most common way to organize data, pipeline and models in an experiment?
  • What are the advantages of using as an experience tracking solution?

What is experiment tracking in machine learning?

In machine learning, experiment tracking is the practice of keeping track of important information throughout the lifecycle of a data science project.

A typical ML model development process involves collecting, preparing training data, selecting a model and training the model with prepared data. A small change in the training data, model hyperparameters, model type, or code written to run the experiment can dramatically change the model performance.

Experiment tracking is the process of saving all experiment-related information that you care about for every experiment you run.

Data scientists typically run different versions of models by modifying hyperparameters. Therefore, obtaining the best performing model based on one or more performance metrics is an iterative process. Without tracking the experiments conducted during the ML model development process, it is not possible to compare and replicate the results of different iterations. We talk about data versioning to keep track of all the datasets tested, pipeline versioning to keep track of all steps you follow to build your train dataset and experiment versioning to keep track of your models’ training, performance, time to train and hyper parameters.

The following can be worth thinking about while experiment tracking:

  1. when you build your experiment
  • Dataset metadata (dataset name # rows, # columns, #size in Mo/Go),
  • Settings of experiments: feature engineering process, feature selection process, list of tried algorithms, model selection process, stacked models
  • Role of features: target, id, fold, weight, features to be selected and features to be dropped
  • Hyperparameters of algorithms
  • Evaluation metrics (AUC, log loss, RMSE, MAPE, …)
  • Time related information (time to train, time to predict)
  • Features importance
  • Version of experiment
  • Different versions of training data
  • Codes used in model development

      2. when your model is in production

  • Datasets used for predictions (dataset name # rows, # columns, #size in Mo/Go)
  • Features statistical metadata (for example, to compare train dataset and run dataset distribution of each feature)
  • Predictions
  • Usage of predictions
  • Versioning of models you use (main/challenger concept)
  • Backtest of your model with feedback loop and retraining if necessary

⇒ These lists are non-exhaustive lists, since the experiment metadata to be prioritized may vary according to the project’s characteristics. For instance, in the case of a computer vision project, it would be more efficient to trace visual information like the training curve, some examples of well and badly classified images…

⇒ It is strongly recommended to set up alerts, and to monitor data, models, and uses, throughout the data science project life cycle. We will have the opportunity to deal with this subject in a future article.

Why is experiment tracking essential ?

Experimental tracking allows data scientists to identify factors that would affect a model’s performance, compare results and select the optimal version. It is very important to stay organized throughout the iterative learning process,l even if your models do not make it into production

“Save everything in one place and never lose your progress again.”

What is the most common way to organize data, pipeline and models in an experiment?

Several solutions are possible, depending on your habits, the need for collaboration, the need for real-time monitoring, graphical analysis, etc.

  • Level 0, the most basic, includes using Excel or Google SpreadSheet as tools to trace the experiments as we go along

Extract from an excel file tracing the different Machine Learning experiments

  • Level 1 consists of tracking the different versions of the code used under your Github account. This requires having another tool for managing versions of datasets used, which can be done via Google Drive, Google Cloud Storage, Amazon S3, etc.

Extract from a versioned directory under

  • Level 2 involves using dedicated solutions such as Weights & Biases which offers a fully documented, graphic, interactive and collaborative environment to trace all the experiments you attempt. Since it’s multi-environmental, you can run your code locally or on a third party environment such as Google Colab or Kaggle.

Experiment Tracking with Weights and Biases

  • Level 3 embodies centralizing everything you do in a single environment from training to production : this is what offers.

What are the advantages of using as an experience tracking solution?

From our point of view, here is a list of features covered by the tracking and management experiment in the platform.

See all you do in a central dashboard:

  • Have your activity view of full experiments dashboard
  • Control your model building and experimentation
  • Record everything you care about for every ML job you run (in or out)
  • Trace on which dataset, parameters, and code every model was trained on
  • Organize all the metrics, charts, and any other ML metadata organized in a single place
  • Make your model training runs reproducible and comparable with almost no extra effort
  • Make an Auto Report for documentation of your experiment

Be more productive:

  • Don’t waste time looking for folders and spreadsheets with models or configs. Have everything easily accessible in one place
  • Make a significant acceleration of the model building process
  • Reduce context switching by having everything you need in a single dashboard
  • Find the information you need quickly in a dashboard that was built for ML model management
  • Debug and compare your models and experiments with no extra effort
  • Use code (Python/R SDK) or no code (User Interface)

Focus on ML: (we manage the traceability)

  • Propose dedicated modules for vision, text, and tabular data
  • Help your team get started with excellent examples, documentation, and a support team ready to help at any time
  • Know when your runs fail and react right away
  • Don’t re-run experiments because you forgot to track parameters. Make experiments reproducible and run them once

Collaboration features and project management tools:

  • Share your works with your team and stop duplicating expensive training runs
  • Cut unproductive meetings by sharing results, dashboards, or logs with a link

Scalable solution:

  • Use computational resources more efficiently

Build reproducible, compliant, and traceable models:

  • Make every ML job reproducible. Keep track of everything you need
  • Have everything backed up and accessible from anywhere even years after
  • Know who trained a model, on what dataset, code, and parameters. Do it for every model you build
  • Be compliant by keeping a record of everything that happens in your model development


Focus on the whole lifecycle of the machine learning process:

  • Make it possible to include external models that can be integrated with any ML library or language
  • Do not require additional software or systems like containers, does it for you
  • The tracking experiment in will be explored concretely, in pictures, in future articles of this blog post series.


In this first blog post, I wanted to introduce you to the subject of experiment tracking through answering these simple questions:

  • What is experiment tracking in machine learning?
  • Why is experiment tracking essential ?
  • How should you manage/organize your machine learning experiments?
  • What is the most common way to organize data, pipeline and models in an experiment?

Do you want to run your first experiment tracking without one line of code? I invite you to head over to the next blog post. [link to second article] 

Thanks for reading.

Mathurin Aché

About the author

Mathurin Aché

Expert Data Science Advisory