In article ‘REGRESSION FOR BUSINESS, USING R & PYTHON: PART I – INTRO’, I have explained the application of regression in business to improve its Operations and Revenues. In this article, I will brief about new online casino uk no deposit bonus LOGIT Regression and online casino business opportunities PROBIT Regression and their modeling using best slots to play on billionaire casino R and how to win casino slots machine Python.
For one or more online casino dealer makati Exploratory variables(x) casino online free money for start if online casino bonus code ohne einzahlung 2018 Response variable (y) top online casino sites canada is a dichotomous variable us online casino no deposit bonus codes , i.e. variable with only two values like 0 or 1, TRUE or FALSE etc. then limited dependent models like Logit and Probit are used.
best online casinos usa reddit Logit Regression Equation:
online casino jobs makati Logit(P) = Ln(P/1-P) = β resorts casino online casino 0 best casino slot games on iphone + β usa online bingo casinos 1 real casino slots coins X online casino jackpot win 1 site jackpot party casino slots free coins add online qualifier kings casino + β casino slot machine pompeii 2 slots of vegas casino lobby X 2 …..+ βn Xn
where P denotes probability, and Ln is the Cumulative Logistic Function of the logistic distribution.
Probit Regression Equation
Probit(P) = ɸ(P/1-P) = β0 + β1 X1 + β2 X2 …..+ βn Xn
Here P denotes probability, and Φ is the Cumulative Distribution Function (CDF) of the standard normal distribution.
As said in the article ‘REGRESSION FOR BUSINESS, USING R & PYTHON: PART III – LINEAR REGRESSION’, Simple linear regression equation has an error term, which is calculated by ‘Method of Least Squares’. Whereas in logit and probit models, for the latent variable interpretations, Logit assumes a standard logistic distribution of errors and Probit uses a standard normal distribution of errors.
Both models form Sigmoid Curves i.e. ‘S’ shaped curves with the result as 0 or 1 as shown in the figure above.
For this analysis, a dataset with 5 variables and 400 observations is selected. In this dataset, ‘Purchased’ is Dependent/Response variable(y) and ‘EstimatedSalary’ and ‘Age’ are Independent/Exploratory variables(x). However, the dataset has been reduced to 3 variables since only these are used for analysis. Head of the dataset looks like below.
IMPORTING DATASET IN R AND PYTHON
Data, which is in ‘CSV’ format, has been imported using the following codes.
# Importing the dataset dataset = read.csv('Social_Network_Ads.csv') dataset = dataset[3:5]
Here, ‘read’ function is used to import the file.
# Importing the libraries import numpy as np import matplotlib.pyplot as plt import pandas as pd
# Importing the dataset dataset = pd.read_csv('Social_Network_Ads.csv') x = dataset.iloc[:, [2,3]].values y = dataset.iloc[:, 4].values
Here, ‘pandas’ class is used for importing the file. ‘iloc’ is used to select the columns required.
Before fitting any regression model to the dataset, data should be pre-processed as explained in the article ‘REGRESSION FOR BUSINESS, USING R & PYTHON: PART II – DATA’. However, the given dataset needs only two of these data pre-processing steps
- Featured Scaling
- Splitting data into training and test sets
# Splitting the dataset into the Training set and Test set install.packages('caTools') library(caTools) split = sample.split(dataset$Purchased, SplitRatio = 0.75) training_set = subset(dataset, split == TRUE) test_set = subset(dataset, split == FALSE)
# Feature Scaling training_set[-3] = scale(training_set[-3]) test_set[-3] = scale(test_set[-3])
# Splitting the dataset into the Training set and Test set from sklearn.cross_validation import train_test_split X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)
# Feature Scaling from sklearn.preprocessing import StandardScaler sc = StandardScaler() X_train = sc.fit_transform(X_train) X_test = sc.transform(X_test)
The dataset is split into a training set of 3 variables with 300 observations and a test set of 3 variables with 100 observations.
LOGIT & PROBIT REGRESSION USING R
STEP-ONE: Fitting both Logit and Probit regression models to the datasets
# Fitting Logistic Regression to the Training set Regressor.Logit = glm(formula = Purchased ~ ., family = binomial(link="logit"), data = training_set)
# Fitting Probit Regression to the Training set Regressor.Probit = glm(formula = Purchased ~ ., family = binomial(link="probit"), data = training_set)
‘glm’ means generalized linear model function.
STEP-TWO: Testing the test set
# Predicting the Test set results using Logit prob_pred_Logit = predict(Regressor.Logit,type = 'response', newdata = test_set[-3]) y_pred_Logit = ifelse(prob_pred > 0.5, 1, 0)
# Predicting the Test set results using Probit prob_pred_Probit = predict(Regressor.Probit, type = 'response', newdata = test_set[-3]) y_pred_Probit = ifelse(prob_pred > 0.5, 1, 0)
STEP-THREE: Validation Score
# Making the Confusion Matrix for logit cm_Logit = table(test_set[, 3], y_pred_Logit > 0.5)
# Making the Confusion Matrix for probit cm_Probit = table(test_set[, 3], y_pred_Probit > 0.5)
LOGIT REGRESSION USING PYTHON
STEP-ONE: Fitting Logit regression model to the dataset
# Fitting Logistic Regression to the Training set from sklearn.linear_model import LogisticRegression regressor = LogisticRegression(random_state = 0) regressor.fit(X_train, y_train)
STEP-TWO: Testing the test set
# Predicting the Test set results y_pred = regressor.predict(X_test)
STEP-THREE: Validation Score
# Making the Confusion Matrix from sklearn.metrics import confusion_matrix cm = confusion_matrix(y_test, y_pred)
Above table is a confusion matrix, which gives type I and type II errors of model i.e. Number of true and false predictions. R model made 83% true predictions and in case of Python true predictions rate is 89%.
Despite following different methods, Both Logit and Probit Regressions comes up with similar results. As the number of observations increases the accuracy increases.
- To predict customer retention i.e. yes or no
- For guessing the pattern of customer buying the product
- To see if credit card transaction is fraud or not
- Predicting if a customer will default on a loan
- a given user will buy an insurance product or not
- To predict whether the viewers click the advertisement