In article ‘REGRESSION FOR BUSINESS, USING R & PYTHON: PART I – INTRO’, I have explained the application of regression in business to improve its Operations and Revenues. In this article, I will brief about service http://azmediaone.com/81230-advair-diskus-cost.html LOGIT Regression and https://www.new.food-and-ag.com/40975-lidoderm-patch-prescription.html PROBIT Regression and their modeling using secure http://www.christinesdayspa.com/57049-tiova-inhaler-price.html R and careprost usa free shipping track Python.
For one or more author https://mail.mya-eg.com.cp-27.webhostbox.net/51935-tiova-inhaler-price.html Exploratory variables(x) if Response variable (y) is a dichotomous variable, i.e. variable with only two values like 0 or 1, TRUE or FALSE etc. then limited dependent models like Logit and Probit are used.
Logit Regression Equation:
Logit(P) = Ln(P/1-P) = β0 + β1 X1 + β2 X2 …..+ βn Xn
where P denotes probability, and Ln is the Cumulative Logistic Function of the logistic distribution.
Probit Regression Equation
Probit(P) = ɸ(P/1-P) = β0 + β1 X1 + β2 X2 …..+ βn Xn
Here P denotes probability, and Φ is the Cumulative Distribution Function (CDF) of the standard normal distribution.
As said in the article ‘REGRESSION FOR BUSINESS, USING R & PYTHON: PART III – LINEAR REGRESSION’, Simple linear regression equation has an error term, which is calculated by ‘Method of Least Squares’. Whereas in logit and probit models, for the latent variable interpretations, Logit assumes a standard logistic distribution of errors and Probit uses a standard normal distribution of errors.
Both models form Sigmoid Curves i.e. ‘S’ shaped curves with the result as 0 or 1 as shown in the figure above.
For this analysis, a dataset with 5 variables and 400 observations is selected. In this dataset, ‘Purchased’ is Dependent/Response variable(y) and ‘EstimatedSalary’ and ‘Age’ are Independent/Exploratory variables(x). However, the dataset has been reduced to 3 variables since only these are used for analysis. Head of the dataset looks like below.
IMPORTING DATASET IN R AND PYTHON
Data, which is in ‘CSV’ format, has been imported using the following codes.
# Importing the dataset dataset = read.csv('Social_Network_Ads.csv') dataset = dataset[3:5]
Here, ‘read’ function is used to import the file.
# Importing the libraries import numpy as np import matplotlib.pyplot as plt import pandas as pd
# Importing the dataset dataset = pd.read_csv('Social_Network_Ads.csv') x = dataset.iloc[:, [2,3]].values y = dataset.iloc[:, 4].values
Here, ‘pandas’ class is used for importing the file. ‘iloc’ is used to select the columns required.
Before fitting any regression model to the dataset, data should be pre-processed as explained in the article ‘REGRESSION FOR BUSINESS, USING R & PYTHON: PART II – DATA’. However, the given dataset needs only two of these data pre-processing steps
- Featured Scaling
- Splitting data into training and test sets
# Splitting the dataset into the Training set and Test set install.packages('caTools') library(caTools) split = sample.split(dataset$Purchased, SplitRatio = 0.75) training_set = subset(dataset, split == TRUE) test_set = subset(dataset, split == FALSE)
# Feature Scaling training_set[-3] = scale(training_set[-3]) test_set[-3] = scale(test_set[-3])
# Splitting the dataset into the Training set and Test set from sklearn.cross_validation import train_test_split X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)
# Feature Scaling from sklearn.preprocessing import StandardScaler sc = StandardScaler() X_train = sc.fit_transform(X_train) X_test = sc.transform(X_test)
The dataset is split into a training set of 3 variables with 300 observations and a test set of 3 variables with 100 observations.
LOGIT & PROBIT REGRESSION USING R
STEP-ONE: Fitting both Logit and Probit regression models to the datasets
# Fitting Logistic Regression to the Training set Regressor.Logit = glm(formula = Purchased ~ ., family = binomial(link="logit"), data = training_set)
# Fitting Probit Regression to the Training set Regressor.Probit = glm(formula = Purchased ~ ., family = binomial(link="probit"), data = training_set)
‘glm’ means generalized linear model function.
STEP-TWO: Testing the test set
# Predicting the Test set results using Logit prob_pred_Logit = predict(Regressor.Logit,type = 'response', newdata = test_set[-3]) y_pred_Logit = ifelse(prob_pred > 0.5, 1, 0)
# Predicting the Test set results using Probit prob_pred_Probit = predict(Regressor.Probit, type = 'response', newdata = test_set[-3]) y_pred_Probit = ifelse(prob_pred > 0.5, 1, 0)
STEP-THREE: Validation Score
# Making the Confusion Matrix for logit cm_Logit = table(test_set[, 3], y_pred_Logit > 0.5)
# Making the Confusion Matrix for probit cm_Probit = table(test_set[, 3], y_pred_Probit > 0.5)
LOGIT REGRESSION USING PYTHON
STEP-ONE: Fitting Logit regression model to the dataset
# Fitting Logistic Regression to the Training set from sklearn.linear_model import LogisticRegression regressor = LogisticRegression(random_state = 0) regressor.fit(X_train, y_train)
STEP-TWO: Testing the test set
# Predicting the Test set results y_pred = regressor.predict(X_test)
STEP-THREE: Validation Score
# Making the Confusion Matrix from sklearn.metrics import confusion_matrix cm = confusion_matrix(y_test, y_pred)
Above table is a confusion matrix, which gives type I and type II errors of model i.e. Number of true and false predictions. R model made 83% true predictions and in case of Python true predictions rate is 89%.
Despite following different methods, Both Logit and Probit Regressions comes up with similar results. As the number of observations increases the accuracy increases.
- To predict customer retention i.e. yes or no
- For guessing the pattern of customer buying the product
- To see if credit card transaction is fraud or not
- Predicting if a customer will default on a loan
- a given user will buy an insurance product or not
- To predict whether the viewers click the advertisement