Normalization and Standardization

 Normalization and Standardization


Probability Density Function:

The distribution provides a parameterized mathematical function that can be used to calculate the probability for any individual observation from the sample space. This distribution describes the grouping or the density of the observations, called the probability density function. We can also calculate the likelihood of an observation having a value equal to or lesser than a given value. A summary of these relationships between observations is called a cumulative density function.

Normal distribution:

A sample of data will form distribution and by far the most well-known distribution is the Gaussian distribution, often called the Normal distribution.

Data from many fields of study surprisingly can be described using a Gaussian distribution, so much so that the distribution is often called the “normal distribution "because it is so common.

A Gaussian distribution can be described using two parameters:

·        Mean: Denoted with the Greek lowercase letter 'mu' is the expected value of the distribution.

 

·        Variance: Denoted with the Greek lowercase letter sigma raised to the second power (because the units of the variable are squared), describes the spread of observation from the mean.

If your data is equally dispersed or if there are equal invariance in between data then we get a bell-shaped curve.


We say the data is "normally distributed":

The Normal Distribution has:

  • mean = median = mode
  • symmetry about the center
  • 50% of values less than the mean
  • and 50% greater than the mean



Normally distributed data have no outliers and does not have skewness.

In Normalization, we try to change the scale of the dataset but we are not trying to change the actual value or significance of data. We are trying to reduce the dispersion data in between. To remove outliers we use Normalization Techniques by square root, cube root, a log of data, etc.

When we have a large value in the dataset then we use Normalization this method is called Scaling.

The normal distribution gives values between 0 and 1.


Normal Distributed Function:


Standardization of Data:

Normal Distributed Function is a subset of the probability density function

If our dataset having mean = 0 and standard deviation = 1, then that data is a standard normal distribution dataset.

Standardization is a form of normal distribution where mean = 0 and standard deviation = 1.

The area under the Standard Distribution curve is 1.













Post a Comment

0 Comments