Logistic Regression
Logistic regression is one such regression algorithm
that can be used for performing classification problems. It calculates the
probability that a given value belongs to a specific class. If the probability
is more than 50%, it assigns the value in that particular class else if the
probability is less than 50%, the value is assigned to the other class.
Therefore, we can say that logistic regression acts as a binary classifier.
In linear regression y
= mx + c we use, in Classification we need probability y for a variable x which
is between 0 to 1, but using y = mx +c we get values less than 0 or greater than
1 also, so we use the sigmoid function.
Sigmoid function
We use the sigmoid function as the underlying function in Logistic regression.
The sigmoid function’s range is bounded between 0 and 1.
Thus it’s useful in calculating the probability for the Logistic function.
It’s a derivative that is easy to calculate than other functions which are useful during gradient descent calculation.
It is a simple way of introducing non-linearity to the model.
The logistic function is given as:
The cost function for the whole training set is given as :
Multiple Logistic Function
We can generalize the simple logistic function for
multiple features as:
And
the logit function can be written as:
Multinomial Logistics Regression
When we have labels greater than 2. We can extend Logistic regression for multi-class classification. The logic is simple; we train our logistic model for each class and calculate the probability that a specific feature belongs to that class. Once we have trained the model for all the classes, we predict a new value’s class by choosing that class for which the probability is maximum. Although we have libraries that we can use to perform multinomial logistic regression, we rarely use logistic regression for classification problems where the number of classes is more than 2.
Evaluation of a Classification Model
For a
regression problem, we have different metrics like R Squared score, Mean
Squared Error etc.
The accuracy is generally
measured in terms of the difference in the actual values and the predicted
values. In a classification problem, the credibility of the model is measured
using the confusion matrix generated, i.e., how accurately the true positives
and true negatives were predicted. The different metrics used for this purpose
are:
- Accuracy
- Recall
- Precision
- F1
Score
- Specifity
- AUC(
Area Under the Curve)
- RUC(Receiver
Operator Characteristic)
Confusion Matrix
True
Positive(TP): A result that was predicted as positive by the
classification model and also is positive
True Negative(TN): A result that was predicted as negative by the
classification model and also is negative
False Positive(FP): A result that was predicted as positive by the
classification model but actually is negative
False Negative(FN): A result that was predicted as negative by the
classification model but actually is positive.
Accuracy
The mathematical
formula is :
Accuracy =
Or, it can be said
that it’s defined as the total number of correct classifications divided by the a total number of classifications.
Recall
or Sensitivity
The mathematical
formula is:
Or, as the name
suggests, it is a measure of: from the total number of positive results how
many positives were correctly predicted by the model.
It shows how relevant
the model is, in terms of positive results only.
Precision
Precision is a measure of amongst all the positive
predictions, how many of them were actually positive. Mathematically,
Precision =
F1 Score
From the previous examples, it is clear that we need a
metric that considers both Precision and Recall for evaluating a model. One
such metric is the F1 score.
F1 score is defined as the harmonic mean of Precision
and Recall.
The mathematical
formula is:
F1 score=
ROC(Receiver Operator Characteristic)
Threshold, A threshold is
set, any probability value below the threshold is a negative outcome, and
anything more than the threshold is a favorable or positive outcome. For example, if the threshold is 0.5, any probability value below 0.5 means a negative
or an unfavorable outcome and any value above 0.5 indicates a positive or favorable
outcome.
The following diagram shows a typical logistic regression curve.
- The
horizontal lines represent the various values of thresholds ranging from 0
to 1.
- Let’s
suppose our classification problem was to identify the obese people from
the given data.
- The
green markers represent obese people and the red markers represent the
non-obese people.
- Our
confusion matrix will depend on the value of the threshold chosen by us.
- For example, if 0.25 is the threshold then
·
TP(actually obese)=3
·
TN(Not obese)=2
·
FP(Not obese but predicted
obese)=2(the two red squares above the 0.25 line)
FN(Obese but predicted as not obese )=1(Green circle below 0.25line )
A
typical ROC curve looks like the following figure
Black point represents
a confusion matrix for a threshold value
Marked re circle is
the best threshold value because it has a minimum false positive rate and having
maximum true positive rate.
The green dotted line represents the scenario when the true
positive rate equals the false positive rate.
ROC curve answers our question of which Threshold to choose. But if
we used different classification algorithms and different ROC for the
corresponding algorithm, then which algorithm to choose?
So we use AUC(Area Under Curve)
- It
helps us to choose the best model amongst the models for which we have
plotted the ROC curves
- The best model is the one that encompasses the maximum area under it.
What is the significance of the Roc curve and AUC?
In real life, we create various models using different
algorithms that we can use for classification purposes. We use AUC to determine
which model is the best one to use for a given dataset. Suppose we have created
Logistic regression, SVM as well as a clustering model for classification
purposes. We will calculate AUC for all the models separately. The model with the highest AUC value will be the best model to use.
Advantages of Logistic Regression
- It is very simple and easy to implement.
- The output
is more informative than other classification algorithms
- It expresses
the relationship between independent and dependent variables
- Very
effective with linearly separable data
Disadvantages of Logistic Regression
- Not
effective with data that are not linearly separable
- Not as
powerful as other classification models
- Multiclass
classifications are much easier to do with other algorithms than logistic
regression
- It can only
predict categorical outcomes
0 Comments