REGRESSION FOR BUSINESS, USING R & PYTHON: PART IV – POLYNOMIAL REGRESSION

If you like this, Share!

In article ‘REGRESSION FOR BUSINESS, USING R & PYTHON: PART I – INTRO’, I have explained the application of regression in business to improve its Operations and Revenues. In this article, I will brief about POLYNOMIAL REGRESSION and its application using R and Python.

When Response variable(y) changes linearly with Exploratory variable(x), Linear Regression is applied as explained in article ‘REGRESSION FOR BUSINESS, USING R & PYTHON: PART III – LINEAR REGRESSION’ but what happens when Response variable(y) changes exponentially with Exploratory variable(x) then simple linear regression model cannot predict accurate results. For this kind of datasets, POLYNOMIAL REGRESSION is used.

Y= β0 + β1 X1 1 + β2  X12 + ε

In above Polynomial Regression equation, β is the coefficient of the variable(x) and ε is the error. Whereas, x power two is the degree.

For example, Fixed cost per unit of a firm decreases with increase in the number of units produced. In this case, nonlinear line fits better than the linear line.

DATASET

For this analysis, a dataset with 2 variables and 10 different levels is selected. In this dataset, ‘SALARY’ is Dependent/Response variable(y) and ‘LEVEL’ is Independent/Exploratory variable (x).

Position

Level

Salary

Business Analyst

1

45000

Junior Consultant

2

50000

Senior Consultant

3

60000

Manager

4

80000

Country Manager

5

110000

Region Manager

6

150000

Partner

7

200000

Senior Partner

8

300000

C-level

9

500000

CEO 10

1000000

IMPORTING DATASET IN R AND PYTHON

Data, which is in ‘CSV’ format, has been imported using the following codes.

R

# Importing the dataset

dataset = read.csv('Position_Salaries.csv')

dataset = dataset[2:3]

Here, ‘read’ function is used to import the file. Deleted the ‘Position’ column since it is not a variable.

PYTHON

# Importing the libraries

import numpy as np

import matplotlib.pyplot as plt

import pandas as pd
# Importing the dataset

dataset = pd.read_csv('Position_Salaries.csv')

X = dataset.iloc[:, 1:2].values

y = dataset.iloc[:, 2].values

Here, ‘pandas’ class is used for importing the file. ‘iloc’ is used to select the columns required. Deleted the ‘Position’ column since it is not a variable.

DATA PRE-PROCESSING

Before fitting any regression model to the dataset, data should be pre-processed as explained in the article ‘REGRESSION FOR BUSINESS, USING R & PYTHON: PART II – DATA’. However, the given dataset does not need any of these data pre-processing steps, as it is cleaned and arranged.

LINEAR & POLYNOMIAL REGRESSION USING R

STEP- ONE: Fitting both simple linear and polynomial regression lines to the datasets

# Fitting Linear Regression to the dataset

lin_reg = lm(formula = Salary ~ Level,

             data = dataset)
# Fitting Polynomial Regression to the dataset

dataset$Level2 = dataset$Level^2

dataset$Level3 = dataset$Level^3

dataset$Level4 = dataset$Level^4

dataset$Level5 = dataset$Level^5

poly_reg = lm(formula = Salary ~ .,

              data = dataset)

Here ‘lm’ (Linear Model) function had been used for linear regression. Independent variables are after ‘ ~ ’. ‘.’ represents remaining variables that dataset.

STEP- TWO: Visualization of Regression lines

# Visualising the Polynomial Regression results

install.packages('ggplot2')

library(ggplot2)

ggplot() +

  geom_point(aes(x = dataset$Level, y = dataset$Salary),

             colour = 'navyblue') +

  geom_line(aes(x = dataset$Level, y = predict(lin_reg, newdata = dataset)),

            colour = 'green4') +

  geom_line(aes(x = dataset$Level, y = predict(poly_reg, newdata = dataset)),

            colour = 'red3') +

  ggtitle('Polynomial Regression with Degree FIVE') +

  xlab('Level') +

  ylab('Salary')

ggplot2‘ has been used for the plotting the lines.

RESULT

STEP- THREE: Predicting the values

# Predicting a new result with Linear Regression
predict(lin_reg, data.frame(Level = 8.5)) 

# Predicting a new result with Polynomial Regression
predict(poly_reg, data.frame(Level = 6.25,
                             Level2 = 6.25^2,
                             Level3 = 6.25^3,                             
                             Level4 = 6.25^4,
                             Level4 = 6.25^5))

Here, ‘predict’ function had been used to calculate the dependent variable.

RESULT

Linear Regression= Rs.310159.1

Polynomial Regression= Rs.163491.7

 

LINEAR & POLYNOMIAL REGRESSION USING PYTHON

STEP- ONE: Fitting both LINEAR & POLYNOMIAL regression lines to the datasets

# Fitting Linear Regression to the dataset

from sklearn.linear_model import LinearRegression

lin_reg = LinearRegression()

lin_reg.fit(X, y)
# Fitting Polynomial Regression to the dataset

from sklearn.preprocessing import PolynomialFeatures

poly_reg = PolynomialFeatures(degree = 5)

X_poly = poly_reg.fit_transform(X)

poly_reg.fit(X_poly, y)

lin_reg_2 = LinearRegression()

lin_reg_2.fit(X_poly, y)

LinearRegression’ is a class used for fitting the linear model for the dataset. ‘PolynomialFeatures’ is a class to form Polynomial equation. ‘fit_transform’ had been to fit and transform the dataset.

STEP- TWO: Visualization of regression lines

# Visualising the Polynomial Regression results

plt.scatter(X, y, color = 'blue')

plt.plot(X, lin_reg_2.predict(poly_reg.fit_transform(X)), color = 'red')

plt.plot(X, lin_reg.predict(X), color = 'green')

plt.title('Polynomial Regression with degree FIVE')

plt.xlabel('Position level')

plt.ylabel('Salary')

plt.show()

matplotlib.pyplot‘ is used for plotting.

RESULT

STEP- THREE: Predicting the values

# Predicting a new result with Linear Regression

lin_reg.predict(6.25)

# Predicting a new result with Polynomial Regression

lin_reg_2.predict(poly_reg.fit_transform(6.25))

Result

Linear Regression= Rs.310159.09090909

Polynomial Regression= Rs.163491.70775529

How to Determine Right Degree?

Polynomial Equation had to be built with a right degree, for accurate predictions. Power of x had been increased until the line fits the dataset. In the given dataset, polynomial equation with degree FIVE gives the best fit, as shown in the figure below.

CONCLUSION

Predicted value from Polynomial Regression is more accurate than the value from Linear regression and it differs by Rs. 2,00,000. From this, we can conclude that for polynomial regression is better when Response variable(y) changes exponentially with Exploratory variable(x).

APPLICATION OF POLYNOMIAL REGRESSION

  • The average cost for particular output can be Predicted
  • To predict Output for labor hired
  • Estimating Ordering Cost for the specific number of units
  • To estimate salaries of job applicants with specific years of experience etc.

-Avinash Reddy

 

1
Leave a Reply

1 Comment threads
0 Thread replies
0 Followers
 
Most reacted comment
Hottest comment thread
1 Comment authors
  Subscribe  
newest oldest most voted
Notify of

Thanks for your article .its very useful for everyone. best aws training in chennai | aws course fees details