Logistic Regression

Logistic regression is one such regression algorithm that can be used for performing classification problems. It calculates the probability that a given value belongs to a specific class. If the probability is more than 50%, it assigns the value in that particular class else if the probability is less than 50%, the value is assigned to the other class. Therefore, we can say that logistic regression acts as a binary classifier.

In linear regression y = mx + c we use, in Classification we need probability y for a variable x which is between 0 to 1, but using y = mx +c we get values less than 0 or greater than 1 also, so we use the sigmoid function.

Sigmoid function

We use the sigmoid function as the underlying function in Logistic regression.

The sigmoid function’s range is bounded between 0 and 1.

Thus it’s useful in calculating the probability for the Logistic function.

It’s a derivative that is easy to calculate than other functions which are useful during gradient descent calculation.

It is a simple way of introducing non-linearity to the model.

The logistic function is given as:

The cost function for the whole training set is given as :

Multiple Logistic Function

We can generalize the simple logistic function for multiple features as:

And the logit function can be written as:

Multinomial Logistics Regression

When we have labels greater than 2. We can extend Logistic regression for multi-class classification. The logic is simple; we train our logistic model for each class and calculate the probability that a specific feature belongs to that class. Once we have trained the model for all the classes, we predict a new value’s class by choosing that class for which the probability is maximum. Although we have libraries that we can use to perform multinomial logistic regression, we rarely use logistic regression for classification problems where the number of classes is more than 2.

Evaluation of a Classification Model

For a regression problem, we have different metrics like R Squared score, Mean Squared Error etc.

The accuracy is generally measured in terms of the difference in the actual values and the predicted values. In a classification problem, the credibility of the model is measured using the confusion matrix generated, i.e., how accurately the true positives and true negatives were predicted. The different metrics used for this purpose are:

Accuracy
Recall
Precision
F1 Score
Specifity
AUC( Area Under the Curve)
RUC(Receiver Operator Characteristic)

Confusion Matrix

True Positive(TP): A result that was predicted as positive by the classification model and also is positive

True Negative(TN): A result that was predicted as negative by the classification model and also is negative

False Positive(FP): A result that was predicted as positive by the classification model but actually is negative

False Negative(FN): A result that was predicted as negative by the classification model but actually is positive.

Accuracy

The mathematical formula is :

Accuracy =

Or, it can be said that it’s defined as the total number of correct classifications divided by the a total number of classifications.

Recall or Sensitivity

The mathematical formula is:

Or, as the name suggests, it is a measure of: from the total number of positive results how many positives were correctly predicted by the model.

It shows how relevant the model is, in terms of positive results only.

Precision

Precision is a measure of amongst all the positive predictions, how many of them were actually positive. Mathematically,

Precision = $\frac{T P}{(T P + F P)}$

$\frac{T P}{(T P + F P)}$

F1 Score

From the previous examples, it is clear that we need a metric that considers both Precision and Recall for evaluating a model. One such metric is the F1 score.

F1 score is defined as the harmonic mean of Precision and Recall.

The mathematical formula is:

F1 score=

ROC(Receiver Operator Characteristic)

Threshold, A threshold is set, any probability value below the threshold is a negative outcome, and anything more than the threshold is a favorable or positive outcome. For example, if the threshold is 0.5, any probability value below 0.5 means a negative or an unfavorable outcome and any value above 0.5 indicates a positive or favorable outcome.

The following diagram shows a typical logistic regression curve.

The horizontal lines represent the various values of thresholds ranging from 0 to 1.
Let’s suppose our classification problem was to identify the obese people from the given data.
The green markers represent obese people and the red markers represent the non-obese people.
Our confusion matrix will depend on the value of the threshold chosen by us.
For example, if 0.25 is the threshold then

· TP(actually obese)=3

· TN(Not obese)=2

· FP(Not obese but predicted obese)=2(the two red squares above the 0.25 line)

FN(Obese but predicted as not obese )=1(Green circle below 0.25line )

A typical ROC curve looks like the following figure

Black point represents a confusion matrix for a threshold value

Marked re circle is the best threshold value because it has a minimum false positive rate and having maximum true positive rate.

The green dotted line represents the scenario when the true positive rate equals the false positive rate.

ROC curve answers our question of which Threshold to choose. But if we used different classification algorithms and different ROC for the corresponding algorithm, then which algorithm to choose?

So we use AUC(Area Under Curve)

It helps us to choose the best model amongst the models for which we have plotted the ROC curves
The best model is the one that encompasses the maximum area under it.

What is the significance of the Roc curve and AUC?

In real life, we create various models using different algorithms that we can use for classification purposes. We use AUC to determine which model is the best one to use for a given dataset. Suppose we have created Logistic regression, SVM as well as a clustering model for classification purposes. We will calculate AUC for all the models separately. The model with the highest AUC value will be the best model to use.

Advantages of Logistic Regression

It is very simple and easy to implement.
The output is more informative than other classification algorithms
It expresses the relationship between independent and dependent variables
Very effective with linearly separable data

Disadvantages of Logistic Regression

Not effective with data that are not linearly separable
Not as powerful as other classification models
Multiclass classifications are much easier to do with other algorithms than logistic regression
It can only predict categorical outcomes

Logistic Regression