How does the Logistic Regression Algorithm learn?

When you feed training data to the algorithm it predicts the value of target variable corresponding to each row, the cost function computes the difference between this value and the actual value of the target variable. This is how the learning takes place, when the difference between actual and predicted value is small the model learns that it has identified the right curve on the other hand if the difference is large then model understands the curve it identified is not suitable and starts looking for other curve.

Logistic regression algorithms involve learning based on the gradient descent approach, particularly batch gradient descent. Where batch constitutes the entire dataset. As describes above, after every forward pass the difference between actual and predicted value is measured and the curve is altered accordingly. By altering the curve, we mean, that the slope and intercept values, of the line separating the two classes, is tweaked. This process continues till we reach at the optimum values of the slope and the intercept.

What is multi class Logistic Regression?

Multi class logistic regression constitutes those problem statements where the target variable has more than two categories. Logistic regression algorithm is designed for binary classification problems, thus we need to do some data engineering for applying the algorithm on the multiclass problem i.e. to get a workaround.

Enumerate the methods applied in multi class Logistic Regression.

Following are the methods that can be used for applying logistic regression algorithm to multi class problems.

  1. One vs Rest approach: Here, if there are n classes then we create n duplicates of the original dataset. Each duplicate has two classes, one from original dataset and one created by clubbing rest of the classes together.
  2. One vs One approach: Here if there are n classes then create n(n-1)/2 number of datasets by splitting the original dataset. Each dataset contains two classes from the original dataset.

What is multiclass Logistic Regression?

Multi class logistic regression constitutes those problem statements where the target variable has more than two categories. Logistic regression algorithm is designed for binary classification problems, thus we need to do some data engineering for applying the algorithm on the multiclass problem i.e. to get a workaround. Following are the two methods that can be employed:

  1. One vs Rest approach: Here, if there are n classes then we create n duplicates of the original dataset. Each duplicate has two classes, one from original dataset and one created by clubbing rest of the classes together. Let me illustrate my point with an example, let’s say we had three classes – mango, apple & banana – in the original dataset. Then each of the duplicate datasets would have two classes- mango & other fruits, apple & other fruits, banana & other fruits. We’ll apply logistic regression algorithm to each of the datasets and collate the probability score for each fruit. The fruit which gets the highest probability score for the given row is selected i.e. if score of mango, apple & banana are [0.3, 0.4, 0.3] respectively then apple would be chosen.
  2. One vs One approach: Here if there are n classes then create n(n-1)/2 number of datasets by splitting the original dataset. Each dataset contains two classes from the original dataset. Applying our previous example of fruits to this approach, we’ll get the 3 datasets with the following classes – apple & banana, banana & mango, mango & apple. We’ll apply logistic regression on each of the datasets and the class that receives maximum umber of votes would be selected. Using our example let’s say that for the three datasets following were the scores [apple 0.6, banana 0.4], [banana 0.7, mango 0.3] and [mango 0.7, apple 0.3], then mango would be chosen.

What is multiple Logistic Regression?

Multiple logistic function involves binary classification problems where there are multiple independent variables. The independent variables could either be categorical or continuous.

Just like we try to find the optimal line for separating data points belonging to either class in simple logistic regression, here we try to find the hyperplane. In this sense simple logistic regression & multiple logistic regression are analogous to simple linear regression and multiple linear regression.

What is the cost function?

The accuracy of every machine learning algorithm is measured using the cost function. Cost function calculates of the difference between the actual value of the target variable and its predicted value.

When you feed training data to the algorithm it predicts the value of target variable corresponding to each row, the cost function computes the difference between this value and the actual value of the target variable. This is how the learning takes place, when the difference between actual and predicted value is small the model learns that it has identified the right curve on the other hand if the difference is large then model understands the curve it identified is not suitable and starts looking for other curve.

Thus smaller the output of cost function better is the model prediction. But the of cost functions that almost eliminate the error rate for the given algorithm might result in overfitting.

Thus the choice of cost function for a given algorithm depends on the following two considerations:

  1. It accentuates the error value, i.e. difference between actual and predicted value.
  2. It is reduces the chance of overfitting.

Which attributes of sigmoid function make it a suitable candidate for logistic regression algorithm?

The attributes of sigmoid function that makes it best suited for logistic regression algorithm are:

  1. Accentuation: Due exponential nature of the curve described by sigmoid function, a small change in input results in large change in output. Therefore for a small variation in the slope or intercept of the classification line, the change in error rate is significant. This helps improve model accuracy and ensures generalization.
  2. Range: The range of the function is between 0 & 1, thus making it a suitable candidate for probability computations.

Why is the Sigmoid function used for Logistic Regression?

Logistic regression deals with with binary class problems and in it we try to predict in which of the two classes the data point belongs. Secondly we compute this probability using the distance of the point from the line that separates the the two classes. In other words the prediction accuracy depends upon the slope and intercept value of the aforementioned line.

The attributes of sigmoid function that makes it best suited for logistic regression algorithm are:

  1. Accentuation: Due exponential nature of the curve described by sigmoid function, a small change in input results in large change in output. Therefore for a small variation in the slope or intercept of the classification line, the change in error rate is significant. This helps improve model accuracy and ensures generalization.
  2. Range: The range of the function is between 0 & 1, thus making it a suitable candidate for probability computations.