Bias Variance Trade-off, Overfitting & Underfitting
Whenever we discuss model prediction, it’s important to understand prediction errors (bias and variance). There is a tradeoff between a model’s ability to minimize bias and variance. Gaining a proper understanding of these errors would help us not only to build accurate models but also to avoid the mistake of overfitting and underfitting.
So let’s start with the basics and see how they make difference to our machine learning models.
Bias is the difference between the average prediction of our model and the correct value which we are trying to predict. A model with high bias pays very little attention to the training data and oversimplifies the model. It always leads to high errors in training and test data.
Low Bias: Predicting less assumption about the label function.
High Bias: Predicting more assumptions about the label function.
Examples of low bias machine learning algorithms are Decision Trees, k-Nearest Neighbors, and Support Vector Machine.
Examples of high bias machine learning algorithms are Linear Regression, Linear Discriminant Analysis, Logistic Regression.
Variance is the variability of model prediction for a given data point or a value that tells us the spread of our data. A model with high variance pays a lot of attention to training data and does not generalize on the data wich it hasn't seen before. As a result, such models perform very well on training data but have high error rates on test data.
Low Variance: Predicting small changes to the estimate of the label function with changes to the training dataset.
High Variance: Predicting the large changes to the estimate of the label function with the changes to the training dataset.
Examples of low variance machine learning algorithms include Linear Regression, Linear Discriminant Analysis, and Logistic Regression.
Examples of high variance machine learning algorithms are Decision Trees, k-Nearest Neighbors, and Support Vector Machines.
Irreducible error is the error that can’t be reduced by creating good models. It is a measure of the amount of noise in our data. Here it is important to understand that no matter how good we make our model, our data will have a certain amount of noise or irreducible error that can not be removed.
Bias and variance using the bulls-eye diagram
In the above diagram,
center of the target is a model that perfectly predicts correct values. As we move away from the bulls-eye our predictions become get worse and worse. We can repeat our process of model building to get separate hits on the target.
In supervised learning, underfitting happens when a model unable to capture the underlying pattern of the data. These models usually have high bias and low variance. It happens when we have a very less amount of data to build an accurate model or when we try to build a linear model with nonlinear data. Also, this kind of model is very simple to capture the complex patterns in data like Linear and logistic regression.
In supervised learning, overfitting happens when our model captures the noise along with the underlying pattern in data. It happens when we train our model a lot over noisy dataset. These models have low bias and high variance. These models are very complex like Decision trees which are prone to overfitting.
Underfitting :
When a ML model performs poorly with both seen and the unseen dataset is called the Underfitting model. It is nothing but a high Bias and low Variance kind of situation.
Overfitting :
When a ML model performs well with seen and poor with the unseen dataset is called the Overfitting model. It is nothing but a low Bias and high Variance kind of situation.
Good fitting :
When a ML model performs well with both seen and the unseen dataset is called the Good fitting model. It is nothing but a low Bias and low Variance kind of situation.
Why is Bias Variance Tradeoff?
If a model is simple and have a smaller number of features, then it may have high bias and low variance, in contrast, if a model has huge number of features, then it may have low bias and high variance. So, as the bias increases variance decreases and vice-versa. So, we need to get a model which has low bias as well as low variance. That is why the trade-off is required.
Bias can be minimized by training with more data and variance can be reduced by using regularization methods.
Total Error
To build a good model, we need to find a good balance between bias and variance such that it minimizes the total error.


An optimal balance of bias and variance would never overfit or underfit the model.
Therefore understanding bias and variance is critical for understanding the behavior of prediction models.



0 Comments