REGRESSION FOR BUSINESS, USING R & PYTHON: PART III – LINEAR REGRESSION

If you like this, Share!

In article ‘REGRESSION FOR BUSINESS, USING R & PYTHON: PART I – INTRO’, I have explained the application of regression in business for improving its Operations and Revenues. Every data produced by the business need to be analyzed with different regression models since we cannot fit one model for every dataset as each model has its own importance and specific conditions. In this article, I will go in details about Linear Regression model and its applications using R and Python.

LINEAR REGRESSION

Linear Regression is basic and most common regression model used for data analysis. It gives a relationship between independent variables(x1, x2 ….) and a dependent variable(y) i.e. zyrtec 10mg informationCAUSE AND EFFECT relationship. Linear regression attempts to model the relationship between two variables by fitting a linear equation to observed data.

SIMPLE LINEAR REGRESSION

Here the value of response variable(y) depends on the value of the single exploratory variable(x).

 

MULTIPLE LINEAR REGRESSION

Here the value of response variable(y) depends on the value of multiple Exploratory variables(x1,x2,x3…..).

LEAST SQUARES METHOD

The most common method for fitting a regression line is ‘METHOD OF LEAST SQUARES’. This method calculates the best-fitting line for the observed data by minimizing the sum of the squares of the vertical deviations from each data point to the line.

DATASET

A dataset with 5 variables and 50 Observations has been selected for this analysis. In this dataset ‘Profit’ is Dependent/Response variable(y) and remaining variables are Independent/Exploratory variables (x1, x2, x3, and x4). The first 5 observations of the dataset are shown in the table below.

R&D Spend Administration Marketing Spend State Profit
165349.2 136897.8 471784.1 New York 192261.83
162597.7 151377.59 443898.53 California 191792.06
153441.51 101145.55 407934.54 Florida 191050.39
144372.41 118671.85 383199.62 New York 182901.99
142107.34 91391.77 366168.42 Florida 166187.94

IMPORTING DATASET IN R AND PYTHON

Data, which is in ‘CSV’ format, has been imported using following code.

R

# Importing the dataset

dataset = read.csv('50_Startups.csv')

here ‘read’ function is used to import the file.

PYTHON

# Importing the dataset

import pandas

dataset = pandas.read_csv('50_Startups.csv')

X = dataset.iloc[:, :-1].values

y = dataset.iloc[:, 4].values

Here ‘pandas’ class is used for importing the file. ‘iloc’ is used to select columns required.

DATA PRE-PROCESSING

Before fitting any regression model to the dataset, data should be pre-processed as explained in the article ‘REGRESSION FOR BUSINESS, USING R & PYTHON: PART II – DATA’.

Steps involved in Data Preprocessing

  1. Replacing the Missing Data
  2. Converting categorical variable to numerical variable
  3. Splitting Data to Training and Test set
  4. Adding Dummy Variable(only for Python)

Featured Scaling’ should not be done for the linear regression model.

RESULT

After data pre-processing, the dataset with 50 observation and 5 variables is divided as follows:

  • Training Set of 40 observation and 5 variables
  • Test Set of 10 observation and 5 variables

LINEAR REGRESSION USING R

STEP- ONE

# Fitting Multiple Linear Regression to the Training set

regressor = lm(formula = Profit ~ State+Marketing.Spend+Administration+R.D.Spend,

               data = training_set)

Here ‘lm’ is Linear Models is used for linear regression. Independent variables are after ‘ ~ ’.

 

STEP- TWO

Now, fitted ‘regressor’ model is tested to predict profits of test set using independent variables of the test set.

# Predicting the Test set results

y_pred = predict(regressor, newdata = test_set)
 

Here ‘predict’ function is used to calculate the dependent variable

LINEAR REGRESSION USING PYTHON

STEP- ONE

# Fitting Multiple Linear Regression to the Training set

from sklearn.linear_model import LinearRegression

regressor = LinearRegression()

regressor.fit(X_train, y_train)

LinearRegression’ is class used for fitting Linear model for dataset

STEP- TWO

# Predicting the Test set results

y_pred = regressor.predict(X_test)

Here ‘predict’ function is used to calculate the dependent variable

FINAL RESULT

Difference between values of predicted and real profits of the test set is shown below

Y_Predict Y_Real
173981.09 182901.99
172655.64 166187.94
160250.02 155752.6
135513.9 146121.95
146059.36 129917.04
114151.03 122776.86
117081.62 118474.03
110671.31 108733.99
98975.29 99937.59
96867.03 97483.56

CONCLUSION

Since predicted values are approximately equal to Real values, ‘regressor’ can be used to find the ‘profits’ for any value of independent variables i.e. R&D Spend, Administration, Marketing Spend and State. The accuracy of ‘regressor’ improves with an increase in the number of observations.

APPLICATIONS

  • Evaluating Trends and Sales Estimates
  • Analyzing the Impact of Price Changes
  • Assessing Risk
  • Effect of interest rates on stock price
  • Sensitivity of Sales on advertising expenditures
  • Predicting the Future sales, requirements etc.

 

-Avinash Reddy

1
Leave a Reply

1 Comment threads
0 Thread replies
0 Followers
 
Most reacted comment
Hottest comment thread
1 Comment authors
  Subscribe  
newest oldest most voted
Notify of

Needed to compose you a very little word to thank you yet again regarding the nice suggestions you’ve contributed here.
python training in chennai