In this article, we present a comparison between two tools with automated machine learning (AutoML) capabilities: TPOT and The comparison will be based on the availability of software development tools, user interface, select features, resources needed to run an experiment, performances on classification and regression tasks, and finally the deployment and monitoring capabilities.



Machine learning (ML) and artificial intelligence (AI) have been revolutionizing the world in almost every domain. From predicting customer churn to drug discovery, the availability of efficient algorithms capable of completing complex tasks is everywhere.  To that end, ML and AI practitioners carry out a workflow generally composed of three components: data, models, and production. The former component consists of data collection and preparation for analysis. Once prepared, the data is used to train AI/ML models. This step is an iterative process in which the practitioners have to choose the feature engineering techniques, the models to train and compare, the comparison metrics and the hyperparameters optimization strategies amongst others. The production component oversees the models deployment and ongoing monitoring and maintenance.

While each component raises its own challenges, the model component requires a huge amount of manual work. Fortunately, this process can be fully automated thanks to AutoML tools such as, the first AI management platform developed by data scientists for data scientists and TPOT, an open-source AutoML tool developed by the University of  .

We refer the readers to this article for a detailed discussion on the importance of AutoML in data science.

UI and SDK availability

For the first comparison, we compare the availability development tools in TPOT and While TPOT offers only a Python SDK, in addition to the Python SDK also offers a R SDK. Along with the SDKs, also offers a user interface from which users can upload or connect to databases, train the AutoML or externally created models, create experiments, configure training models (see the figure below), deploy, and monitor ML/AI applications.

Figure 1: Training configuration from UI

Features comparisons

  • Object detection and text classification: In its current version (0.11.7), TPOT AutoML supports only classification and regression tasks which limits its application to other machine learning tasks such as object detection and text classification. covers all of these features through the UI or the SDKs. We recommend readers check out these articles on disaster tweet classification, YouTube ads detection and French cheese detection project (Cheezam) which use the’s AutoML text classification and object detection features.
  • Command line: TPOT offers a command line tool to use the AutoML contrary to
  • Pipeline export: As described above, AutoML tools automate the training pipeline of a machine learning model. Exporting the pipeline code can help to reuse the latter in other applications. This feature is available in TPOT and planned in the 2022 roadmap of
  • Automate date and textual feature engineering: Despite feature engineering automation capabilities of AutoML, some feature types such as date and textual are not taken into account. However, such features can improve model performances when engineered correctly. In, users can parametrize feature engineering process such that these types of features be engineered automatically (see Figure 2)
  • Deployment capabilities: Deploying machine models can be a complex and tedious task. However, no matter the performance of a model, in some scenarios, the most important thing is how it behaves in production. allows users to deploy their model with a single click. On the other hand, deployment of models with TPOT should be done manually (e.g. exporting the model to h5 format, loading or training it in production, etc…).
  • Monitoring capabilities: An important aspect in machine learning is model monitoring. This feature can help for instance to know when a model is obsolete and to be retrained. offers this capability contrary to TPOT.

Pipeline comparisons

Both TPOT and automate feature selection, model selection and evaluation. The two approaches differ on the pipeline used to accomplish them. TPOT provides an option called template which allows a user to set the training pipeline steps to be performed. For instance in the figure below, the template value corresponds to Transformer-Selector-Classifier, therefore, the pipeline steps are the following:

  1. Feature transformation (e.g. PCA, Polynomial features)
  2. Feature selection (e.g. SelectPercentile from Scikit-Learn)
  3. Classification

Figure 3: TPOT pipeline example. Image from TPOT documentation.


When the template option is not set by the user, TPOT determines automatically the pipeline steps that produce optimal performance.

The total number of pipelines evaluated by TPOT corresponds to:

population_size + generations × offspring_size pipelines


Where population_size, generations, offspring_size correspond respectively to the number of individuals to retain in the genetic programming population every generation; the number of iterations to the run pipeline optimization process and the number of offspring to produce in each genetic programming generation. By default the number of offspring is set to the population size.


Similarly to TPOT, automatically determines the pipeline steps to be performed to produce the highest performance. A typical pipeline with Prevision AutoML is as follows:

  1. Datasets: column types are automatically detected and basic statistics are performed
  2. Features Preprocessing: according to the selected training configurations, different levels of feature engineering are performed:
  • Simple Level Feature Transformations: Basic transformations that are always performed whatever the selected options by the user. It includes label-encoding for categorical features, missing values imputing, basic scaling for numeric variables…
  • Lite Level Features Transformations: Basic transformations are enhanced with more variant transformation such as min-max scaling for numeric features and other types of encoding for categorical features such as one hot encoding…
  • Advanced Level Features Transformations: The Advanced transformations that were used are:
    • Row Statistics : new features based on row-wise counts are created, such as number of 0, number of missing values, …
  1. Dataset Statistics
  • Univariate Descriptive Statistics: compute some univariate data statistics such as central tendency (mean, mode and median) and dispersion: range, variance, maximum, minimum, quartiles and standard deviation.
  • Bivariate Descriptive Statistics: For each feature Previsions Performs bivariate analysis along with the TARGET feature; the analysis depends on the type of the target column (linear, binary or multilabel)
  1. Models uses the models specified in the user training configuration see Figure 1. offers a graphical tool called direct acyclic graph (DAG) that allows users to visualize the pipeline and thereby providing a more flexible way to explain the models.

Figure 4: Example of execution graph from AutoML


The final step in the TPOT AutoML pipeline corresponds to the model selection. In, users can set advanced pipeline parameters such as model blending by enabling blending (see Figure 1). Blending has been shown to improve the performance of ML models in different applications as in data science competitions on Kaggle.


Hyperparameter optimization

The optimization process used in AutoML tools directly affects the time, memory complexity, and consequently its performance. TPOT uses genetic algorithms to optimize model hyperparameters while uses hyperopt.


As seen in the pipeline comparison, TPOT’s number of pipelines can grow exponentially depending on the number of generations, the population and offspring sizes. The optimal values of these hyperparameters by themselves can be hard to set as a tradeoff between the complexity of the models and their performance should be taken into account. In, with the hyperopt optimization process only a few number of iterations (generally less than 50) are tested. This leads to good performance.


Performance comparison

We compared the performance of TPOT and AutoML on 6 datasets among which 4 are used for classification and 2 for regression. The characteristics of each dataset are presented in the following table.




# Objects

# Variables

# Classes































We split each dataset into training and testing with the proportions 2 / 3 and 1 / 3 . We used the log loss as a metric for the classification task and the root mean square error for regression. We set the training parameter in TPOT and as follows:

  • TPOT
    • Generations: 50
    • Population size: 50
    • Offspring size: 50
    • Number of CPU used: 12
    • Set of algorithms: default
    • Training profile: advanced
    • Number of CPU used: 147
    • Set of algorithms: Logistic regression, decision tree, random forest, extra trees, xgboost, catboost, lightgbm, and neural network.


The classification results of the test sets are reported in Figure 5.

Figure 5: Log Loss (the lower the better) obtained from TPOT and


From Figure 5 it can be seen that outperformed TPOT in three of the four datasets. While there is a significant difference between the TPOT and on the IRIS, WINE and DIGITS datasets, the one on the BREAST-CANCER are close.


We reported the regression scores in the following table.










RMSE scores obtained from TPOT and


It follows from the table above that outperformed TPOT on the DIABETES even though the difference is not significant and conversely on the HOUSING dataset.


As the two AutoML tools were not run in the same computer setup (12 CPU for TPOT and 147 CPU for, we are not reporting the duration performances because the comparison will be biased.



In this article, we compare the TPOT and AutoML offerings based on the different characteristics. The main difference between the two tools relies on the optimization process which is linked to the models complexity (time and memory) and performance. While TPOT uses genetic programming, AutoML uses hyperopt. We show that both TPOT and AutoMLs have features that are not yet available in the concurrent tool in one hand and that’s AutoML along with some connext features (deployment and monitoring capabilities, pipeline visualization, SDKs, …) offer ML practitioners all the necessary tools to conduct an end-to-end ML project. Finally, our analysis on six datasets show that in 67% of the cases’s AutoML outperformed TPOT.


We provide below a table that summarizes the comparisons done in this article.




Python SDK



Command line tools


Pipeline export

Available soon



Object detection




Date feature engineering


Textual variables feature engineering




Pipeline visualization



Genetic programming


Deployment capability


Monitoring capability


Outperforming performance over 6 datasets




We encourage you to test the two offerings for your use case.  There is no fee to use as a free trial on the website. Or, you can access it via the Google Cloud Marketplace. Again, no software fees apply. You only are charged based on consuming GCP cloud resources when your free credits end. 

Happy testing!

Abdoul Djiberou

About the author

Abdoul Djiberou

Machine Learning Scientist