Normalization and Standardization
Probability Density Function:
The distribution provides a parameterized mathematical function that can be used to
calculate the probability for any individual observation from the sample space.
This distribution describes the grouping or the density of the observations,
called the probability density function. We can also calculate the likelihood
of an observation having a value equal to or lesser than a given value. A
summary of these relationships between observations is called a cumulative
density function.
Normal distribution:
A sample of data will form distribution and by far the most
well-known distribution is the Gaussian distribution, often called the Normal
distribution.
Data from many fields of study
surprisingly can be described using a Gaussian distribution, so much so that
the distribution is often called the “normal distribution "because
it is so common.
A Gaussian distribution can be described using two parameters:
·
Mean: Denoted with the Greek
lowercase letter 'mu' is the expected value of the distribution.
·
Variance: Denoted with the Greek
lowercase letter sigma raised to the second power (because the units of the variable are squared), describes the spread of observation from the mean.
If your data is equally dispersed or if there are equal invariance
in between data then we get a bell-shaped curve.
We say the data is
"normally distributed":
The Normal Distribution has:
- mean = median = mode
- symmetry about the center
- 50% of values less than the mean
- and 50% greater than the mean
Normally distributed data have no outliers and does not have
skewness.
In Normalization, we try to change the scale of the dataset but we
are not trying to change the actual value or significance of data. We are trying to
reduce the dispersion data in between. To remove outliers we use Normalization
Techniques by square root, cube root, a log of data, etc.
When we have a large value in the dataset then we use Normalization this
method is called Scaling.
The normal distribution gives values between 0 and 1.
Standardization of Data:
Normal Distributed Function is a subset of the probability density
function
If our dataset having mean = 0 and standard deviation = 1, then
that data is a standard normal distribution dataset.
Standardization is a form of normal distribution where mean = 0
and standard deviation = 1.
The area under the Standard Distribution curve is 1.
0 Comments