Skewness in Statistics
What is skewness?
Skewness is the measure of the asymmetry of an ideally symmetric probability distribution and is given by the third standardized moment. If that sounds way too complex, don’t worry! Let me break it down for you.
In simple words, skewness is the measure of how much
the probability distribution of a random variable deviates from the normal
distribution.
The normal distribution is the probability distribution without any skewness. You
can look at the image below which shows symmetrical distribution that’s
basically, normal distribution and you can see that it is symmetrical on both
sides of the dashed line. Apart from this, there are two types of skewness:
- Positive
Skewness
- Negative
Skewness
Example for Right Skewness:
Suppose we
have a data with the outliers, D = [1, 2, 3, 4, 1, 2, 8, 100, 101, 110]
We can see
that 100, 101, 110 are outliers here which will the cause of skewness over
here.
If we
measure the mean and median here, we will get that Mean = 33.2 and Median =
3.5.
Here Mean
> Median which means that there are positive outliers here which results in
right-skewed data which means that most of the data contains on the left-hand side.
Example for Left Skewness :
Suppose we
have a data with the outliers, D = [2, 3, 4, 100, 101, 110, 200, 150, 160, 140]
We can see
that 2, 3, 4 are outliers here which will the cause of skewness over here.
If we measure the mean and median here, we will get that Mean = 97 and
Median =
105.5.
Here Median
> Mean which means that there are negative outliers here which results in
left-skewed data which means that most of the data contains on the right-hand side.
Why is Skewness
Important?
First,
linear models work on the assumption that the distribution of the independent
variable and the target variable are similar. Therefore, knowing about the skewness of data helps us in creating better linear models.
Skewness
is also helpful to get the knowledge about outliers in our dataset and with that knowledge, we try to reduce the skewness from the dataset and get a normal a distributed dataset that will fit the model more efficiently.
Some
of the ways you can transform your skewed data:
- Power
Transformation
- Log
Transformation
- Exponential Transformation



0 Comments