Prevision.io News: First Ever Pay-As-You-Go AI Management Platform Prevision.io Launches on Google Cloud.

So far in this blog post series, we discussed the importance of choosing metrics for machine learning models, their importances and presented common metrics used in binary classification. In this article, we present common metrics that can be used to evaluate regression tasks. 

 

To access the other articles, click below on the subject that interests you:


Introduction

Regression refers to predictive modeling problems that involve predicting a numeric value. It is different from classification that involves predicting a class label. Unlike classification, you cannot use classification accuracy to evaluate the predictions made by a regression model. 

Instead, you must use error metrics specifically designed for evaluating predictions made on regression problems.

 

In this article, you will discover how to calculate error metrics for regression predictive modeling projects.

 

I would like to thank Abishek Takhur for allowing us to reuse the implementation code for the metrics discussed in this article. We invite you to read the excellent book Approaching (Almost) Any Machine Learning Problem.

 

Mean Square Error (MSE)

Mean Square Error (MSE) is defined as Mean or Average of the square of the difference between actual and estimated values. It measures the average of the squares of the errors or deviations. MSE takes the distances from the points to the regression line (these distances are the “errors”) and squaring them to remove any negative signs. The MSE score incorporates both the variance and the bias of the predictor. It also gives more weight to larger differences. The bigger the error, the more it is penalized.

 

MSE Formula

 

def mean_squared_error(y_true, y_pred):
“””
This function calculates mse
:param y_true: list of real numbers, true values
:param y_pred: list of real numbers, predicted values
:return: mean squared error
“””
# initialize error at 0
error = 0
# loop over all samples in the true and predicted list
for yt, yp in zip(y_true, y_pred):
# calculate squared error
# and add to error
error += (yt – yp) ** 2
# return mean error
return error / len(y_true)

 

Root Mean Square Error (RMSE)

 

The Root mean squared error (RMSE) is the square root of the mean of the square of all of the errors. The use of RMSE is very common, and it is considered an excellent general-purpose error metric for numerical predictions.

 

where:

  • Oi are the observations,
  • Si predicted values of a variable,
  • n the number of observations available for analysis. 

The RMSE score is a good measure of accuracy, but only to compare prediction errors of different models or model configurations for a particular variable and not between variables, as it is scale-dependent.


def root_mean_squared_error(y_true, y_pred):
“””
This function calculates rmse
:param y_true: list of real numbers, true values
:param y_pred: list of real numbers, predicted values
:return: root mean squared error
“””
# initialize error at 0
error = 0
# loop over all samples in the true and predicted list
for yt, yp in zip(y_true, y_pred):
# calculate squared error
# and add to error
error += (yt – yp) ** 2
# return mean error
return np.sqrt(error / len(y_true))

 

Root Mean Square Logarithmic Error (RMSLE)

The RMSLE measures the ratio between actual and predicted. It is the Root Mean Squared Error of the log-transformed predicted and log-transformed actual values. RMSLE adds 1 to both actual and predicted values before taking the natural logarithm to avoid taking the natural log of possible 0 (zero) values.

 

In case of RMSLE, you take the log of the predictions and actual values. So basically, what changes is the variance that you are measuring. I believe RMSLE is usually used when you don’t want to penalize huge differences in the predicted and the actual values when both predicted and true values are huge numbers.

 

 

where:

  • yi are the real values,
  • ŷi predicted values of a variable,
  • n the number of observations available for analysis. 


import numpy as np
def root_mean_squared_log_error(y_true, y_pred):
“””
This function calculates msle
:param y_true: list of real numbers, true values
:param y_pred: list of real numbers, predicted values
:return: mean squared logarithmic error “””
# initialize error at 0
error = 0
# loop over all samples in true and predicted list
for yt, yp in zip(y_true, y_pred):
# calculate squared log error
# and add to error
error += (np.log(1 + yt) – np.log(1 + yp)) ** 2
# return mean error
return np.sqrt(error / len(y_true))

 

Root Mean Square Percentage Error (RMSPE)

 

The RMSPE, which is mostly used as an advanced forecasting metric, is defined by the following equation:

where:

  • yi are the observations, 
  • ŷi predicted values of a variable,
  • n the number of observations available for analysis. 

 

def root_mean_squared_percentage_error(y_true, y_pred):
“””
This function calculates rmse
:param y_true: list of real numbers, true values
:param y_pred: list of real numbers, predicted values
:return: root mean squared percentage error
“””
# initialize error at 0
error = 0
# loop over all samples in the true and predicted list
for yt, yp in zip(y_true, y_pred):
# calculate squared error
# and add to error
error += ((yt – yp)/yt) ** 2
# return mean error
return np.sqrt(error / len(y_true))

 

Mean Absolute Error (MAE)

The MAE score measures the average magnitude of the errors in a set of predictions, without considering their direction. It’s the average over the test sample of the absolute differences between prediction and actual observation where all individual differences have equal weight.

The mean absolute error uses the same scale as the data. This is known as a scale-dependent accuracy measure and, therefore, cannot be used to make comparisons between series using different scales.

where:

  • yj are the observations, 
  • ŷj predicted values of a variable,
  • n the number of observations available for analysis. 

import numpy as np
def mean_absolute_error(y_true, y_pred):
“””
This function calculates mae
:param y_true: list of real numbers, true values
:param y_pred: list of real numbers, predicted values
:return: mean absolute error
“””
# initialize error at 0
error = 0
# loop over all samples in the true and predicted list
for yt, yp in zip(y_true, y_pred):
# calculate absolute error
# and add to error
error += np.abs(yt – yp)
# return mean error
return error / len(y_true)

Mean Absolute Percentage Error (MAPE)

The MAPE score measures the accuracy of a forecasting method in statistics, for example in trend estimation. It is also used as a loss function for regression problems in machine learning. The MAPE usually expresses accuracy as a percentage.

From Wikipedia, although the concept of MAPE sounds very simple and convincing, it has major drawbacks in practical application, and there are many studies on shortcomings and misleading results from MAPE.

  • It cannot be used if there are zero values (which happens frequently in demand data for example) because there would be a division by zero.
  • For forecasts which are too low the percentage error cannot exceed 100%, but for forecasts which are too high there is no upper limit to the percentage error.
  • MAPE puts a heavier penalty on negative errors than on positive errors. As a consequence, when MAPE is used to compare the accuracy of prediction methods it is biased in that it will systematically select a method whose forecasts are too low. This little-known but serious issue can be overcome by using an accuracy measure based on the logarithm of the accuracy ratio (the ratio of the predicted to actual value). This approach leads to superior statistical properties and leads to predictions which can be interpreted in terms of the geometric mean.

To overcome these issues with MAPE, there are some other measures proposed in literature like the Symmetric Mean Absolute Percentage Error (SMAPE) presented later.

import numpy as np
def mean_abs_percentage_error(y_true, y_pred):
“””
This function calculates MAPE
:param y_true: list of real numbers, true values
:param y_pred: list of real numbers, predicted values
:return: mean absolute percentage error
“””
# initialize error at 0
error = 0
# loop over all samples in true and predicted list
for yt, yp in zip(y_true, y_pred):
# calculate percentage error
# and add to error
error += np.abs(yt – yp) / yt
# return mean percentage error
return error / len(y_true)

Median Absolute Error (MedAE)

 

Median absolute error output is a non-negative floating point with an optimal value corresponding to 0. The median absolute error is particularly interesting because it is robust to outliers. The loss is calculated by taking the median of all absolute differences between the target and the prediction. If ŷ is the predicted value of the ith sample and yi is the corresponding true value, then the median absolute error estimated over n samples is defined as follows:

Where:

  • X̂i is the median of the sample
  • yi predictions
  • n is total number of observations

 

import numpy as np
def median_absolute_error(y_true, y_pred):
    “””Median absolute error regression loss.
    Median absolute error output is non-negative floating point. The best value
    is 0.0. Read more in the :ref:`User Guide <median_absolute_error>`.
    Parameters
    ———-
    y_true : array-like of shape = (n_samples) or (n_samples, n_outputs)
        Ground truth (correct) target values.
    y_pred : array-like of shape = (n_samples) or (n_samples, n_outputs)
    Returns
        weighted average of all output errors is returned.
    “””
    return np.median(np.abs(y_pred – y_true), axis=0)

Symmetric Mean Absolute Percentage Error (SMAPE)

SMAPE means Symmetric Mean Absolute Percentage Error. This metric is an accuracy measure based on percentage (or relative) errors. 

Where:

  • Ai is the actual value

  • Fi is the forecast value

  • n is total number of observations

 

In contrast to the mean absolute percentage error, SMAPE has both a lower bound and an upper bound. Indeed, the formula above provides a result between 0% and 200%. However a percentage error between 0% and 100% is much easier to interpret. 

A limitation to SMAPE is that if the actual value or forecast value is 0, the value of error will boom up to the upper-limit of error. (200% for the first formula and 100% for the second formula).


import numpy as np

def smape(y_true, y_pred):
    return 1/len(y_true) * np.sum(2 * np.abs(y_pred-y_true) / (np.abs(y_true) + np.abs(y_pred))*100)

R2 (R squared)

R² helps us to know how good our regression model is compared to a very simple model that just predicts the mean value of target from the train set as predictions.

 

Where:

  • SS RES term shows the sum of the Square of the distance between the actual point and the predicted point in the best-fit line.
  • SS TOT term  shows the sum of the Square of the distance between the actual point and the mean of all the points in the mean line.
  • yi are the observations, 
  • ŷi predicted values,
  • y mean values


import numpy as np
def r2(y_true, y_pred):
“””
This function calculates r-squared score
:param y_true: list of real numbers, true values
:param y_pred: list of real numbers, predicted values
:return: r2 score
“””

# calculate the mean value of true values
mean_true_value = np.mean(y_true)

# initialize numerator with 0
numerator = 0
# initialize denominator with 0
denominator = 0

# loop over all true and predicted values
for yt, yp in zip(y_true, y_pred):
# update numerator
numerator += (yt – yp) ** 2
# update denominator
denominator += (yt – mean_true_value) ** 2
# calculate the ratio
ratio = numerator / denominator
# return 1 – ratio
return 1 – ratio



Regression metrics summary table

 

       

Prevision.io Notation

   

Metric

Range

Lower is better

Weights accepted

3 Stars

2 Stars

1 Star

0 Star

Sensitive to Outliers

Tips

MSE

0 – ∞

True

True

[0 ; 0.01 * VAR[

[0.01 * VAR ; 0.1 * VAR[

[0.1 * VAR ; VAR [

[VAR ; +inf [

Yes

 

RMSE

0 – ∞

True

True

[0 ; 0.1 * STD[

[0.1 * STD ; 0.3 * STD[

[0.3 * STD ;STD[

[STD ; +inf [

Yes

 

MAE

0 – ∞

True

True

[0 ; 0.1 * STD[

[0.1 * STD ; 0.3 * STD[

[0.3 * STD ;STD[

[STD ; +inf [

No

 

MAPE

0 – ∞

True

True

[0 ; 0.1[

[0.1 ; 0.3[

[0.3 ; 1[

[1 ; +inf[

No

Use when target values are across different scales

RMSLE

0 – ∞

True

True

[0 ; 0.095[

[0.095 ; 0.262[

[0.262 ; 0.693[

[0.693 ; +inf[

Yes

 

RMSPE

0 – ∞

True

True

[0 ; 0.1[

[0.1 ; 0.3[

[0.3 ; 1[

[1 ; +inf[

Yes

Use when target values are across different scales

SMAPE

0 – ∞

True

True

[0 ; 0.1[

[0.1 ; 0.3[

[0.3 ; 1[

[1 ; +inf[

No

Use when target values are close to 0

R2

-∞ – 1

False

True

]0.9 ; 1]

]0.7 ; 0.9]

]0.5 ; 0.7]

]-inf ; 0.5]

No

Use when you want performance scaled between 0 and 1



Conclusion

We have introduced regression metrics, those implemented in Prevision.io.

In this article we have seen:

  • the main regression metrics,
  • their code implementation in Python
  • in which situations are they used
  • a summary table of these metrics

Example of gradient descent. Courtesy of towardsdatascience.com

 

In this example we saw why it is important for our objective function to be differentiable.

Now, let’s imagine we had a regularization parameter to our linear regression. We can use the Ridge Regression, and Ŷi = ax1i + bx2i + c + (a + b)².  We want to choose the best , through hyper parameter optimization. For simplicity we will try =1, 2, 3 :

  1. fix = 1

  2. repeat the above process to determine a, b, c (with the objective function)

  3. evaluate metric(Y, Ŷ, =1), with the metric we want. It can be anything

We do the same thing with ⁼ 2 and 3 and choose the one which gives the best result, from metric perspective.

Why do we need metrics?

 

As we saw above, different models can be evaluated thanks to metrics. Metrics are more flexible than objective functions and must be business oriented. The advantage is that you will be able to communicate with less sophisticated audiences. Never forget that the end-user of your model is generally not you, and must be able to understand why your model performs well. Once your models have been trained, you can use different metrics to present it. For example, for a classification problem, you could show the true positive rate, false positive rate etc. If you’re building an algorithm to determine if one has COVID, true positive and false negative rate can be very important…

 

Metrics can also be used to calculate expected gain or ROI. If you can evaluate the gain / loss attached to  the outcomes of your model, it can help you to show the monetary value of your work, as you can see here with a cost matrix alongside a confusion matrix:

cost and confusion matrix 

 

 How to choose a good metric ?

The first question you have to ask yourself is “why am I building this machine learning model?” And, in which context will it be used? Algorithms that we build are usually used in a very well defined work environment. Let’s take a (very basic) example.

 

You work in a telecommunication firm. The Marketing team wants to tackle the issue of churn in your company. They noticed that customers seem to leave the company and want to prevent it. One idea is to begin a customized email campaign with some special offer for each and every customer.. If we had an unlimited budget, we could send a personal email to every customer in our database, and it would be just great, wouldn’t it? Well, we don’t have unlimited funds and resources, so we want to target customers which have the highest probability to churn. This is when you get to show your magic. Now back to metrics. We deal with a classification problem. So you will probably use a log-loss as an objective function. But log-loss is not very business oriented and I can see the long faces from here when you will say “hey, I have a log-loss of 0.12, let’s put my model in production”.

One question you should ask to the marketing team is “is it worse to miss a churner or to consider a churner as a non-churner?” In other words, is it better to optimize recall or precision…which will lead you to choose a threshold, as we will see in another blog post on classification metrics.

 

This reasoning can be applied to any machine learning use case. You have to discuss with the people who will actually use your model, and ask questions in a business oriented manner.

 

Now that we have covered an overview of what a Machine Learning metric is and how it differentiates from an objective function, we will go deeper into typical metrics for regression, classification and multi classification use cases and see which ones are the best for you.