Skewness in Statistics

 

Skewness in Statistics


What is skewness? 


Skewness is the measure of the asymmetry of an ideally symmetric probability distribution and is given by the third standardized moment. If that sounds way too complex, don’t worry! Let me break it down for you.

In simple words, skewness is the measure of how much the probability distribution of a random variable deviates from the normal distribution.

 

The normal distribution is the probability distribution without any skewness. You can look at the image below which shows symmetrical distribution that’s basically, normal distribution and you can see that it is symmetrical on both sides of the dashed line. Apart from this, there are two types of skewness:

  • Positive Skewness
  • Negative Skewness


Example for Right Skewness:

Suppose we have a data with the outliers, D = [1, 2, 3, 4, 1, 2, 8, 100, 101, 110]

We can see that 100, 101, 110 are outliers here which will the cause of skewness over here.

If we measure the mean and median here, we will get that Mean = 33.2 and Median = 3.5.

Here Mean > Median which means that there are positive outliers here which results in right-skewed data which means that most of the data contains on the left-hand side.


Example for Left Skewness :

Suppose we have a data with the outliers, D = [2, 3, 4, 100, 101, 110, 200, 150, 160, 140]

We can see that 2, 3, 4 are outliers here which will the cause of skewness over here.

If we measure the mean and median here, we will get that Mean = 97 and 

Median = 105.5.

Here Median > Mean which means that there are negative outliers here which results in left-skewed data which means that most of the data contains on the right-hand side.



Why is Skewness Important?

 

First, linear models work on the assumption that the distribution of the independent variable and the target variable are similar. Therefore, knowing about the skewness of data helps us in creating better linear models.

Skewness is also helpful to get the knowledge about outliers in our dataset and with that knowledge, we try to reduce the skewness from the dataset and get a normal a distributed dataset that will fit the model more efficiently.

Some of the ways you can transform your skewed data:

  • Power Transformation
  • Log Transformation
  • Exponential Transformation



Post a Comment

0 Comments

📅 📢 I'm open to new opportunities! If you're hiring or know someone who is, feel free to connect.
📧 Email: gk765813@gmail.com | LinkedIn | Resume ×