A general multiple regression model consist of one dependent and various independent variables. And generally we also say more the independent variables better is the model, but that do not hold true for all cases. What if there is a multicollinearity between the independent variable (i.e. an independent variable can be linearly predicted by the others with substantial degree of accuracy) or it simply increases the residual distance. So blindly adding all the independent variables is not an option. So variable selection is an important job, there are various methods of variable selection; Stepwise Regression is one of the naive methods but must know method.
Stepwise regression uses the least AIC (Akaike’s Information Criterion) value for the selection of variables for the model because AIC is proportional to both RSS and number of independent variables, Hence with change in number of independent variable(adding or eliminating), the AIC value is affected significantly. Less the AIC value better is the model.
The stepwise regression is done in 3 ways,
- Forward Stepwise Regression
- Backward Stepwise Regression
Forward Stepwise Regression
Forward selection is a very attractive approach, because it’s both tractable and it gives a good sequence of models. It starts with a null model. The null model has no predictors, just one intercept (The mean over dependent variable).
Fit p simple linear regression models, each with one of the variables in and the intercept. So basically, you just search through all the single-variable models the best one (the one that results in the lowest residual sum of squares). You pick and fix this one in the model. R can do this for you.
Now search through the remaining p minus 1 variable and find out which variable should be added to the current model to best improve the residual sum of squares. It will stop when addition of another predictor variable will result in increased AIC.
Backward Stepwise Regression
Backward stepwise regression starts with all the independent variables in the model. Then remove the variable with the largest p-value, i.e. the variable that is the least statistically significant. The new (p – 1)-variable model is t, and the variable with the largest p-value is removed.
Continue until removing of one more independent variable will increase the AIC value.
Let’s take an example of “mtcars”, a datset present in R
> step(lm(mpg~wt+drat+disp+qsec,data=mtcars),direction="backward") Start: AIC=65.63 mpg ~ wt + drat + disp + qsec Df Sum of Sq RSS AIC - disp 1 1.506 183.52 63.891 <none> 182.01 65.627 - drat 1 13.447 195.46 65.908 - qsec 1 61.739 243.75 72.974 - wt 1 109.330 291.35 78.681 Step: AIC=63.89 mpg ~ wt + drat + qsec Df Sum of Sq RSS AIC <none> 183.52 63.891 - drat 1 11.942 195.46 63.908 - qsec 1 85.720 269.24 74.156 - wt 1 275.686 459.21 91.241 Call: lm(formula = mpg ~ wt + drat + qsec, data = mtcars) Coefficients: (Intercept) wt drat qsec 11.3945 -4.3978 1.6561 0.9462
.In this example we have four independent variables namely wt, drat, disp and qsec. Dependent variable is mpg. Starting value of AIC is 65.63, when all the variables are in the model. We can see if we remove the variable disp (represented as – disp, the new AIC will be 83.891), and so for removal of other variables like drat, qsec, wt will give other AIC values 65.908, 72.974, 78.681 respectively. But the least AIC value can be obtained by removal of disp.
So Step value of AIC after removal of Disp is 63.89, and as we can see after this in step model (mpg~wt+drag+qsec), removal of any value will not give lesser AIC value, so we call for the best multiple regression model as (mpg~wt+drag+qsec). i.e. mpg=-4.3978wt+1.6561drat+0.9462qsec
Both Stepwise Regression
Unlike forward or backward stepwise regression here we both add and remove predictor variables, and check for the lowest AIC value. Let’s see with same example of dataset anscombe and find the linear mode, where y1 is dependent variable and others are independent variables for say.
Here we check if the previously removed variable y2 is added again how it will affect the AIC value. By adding and removing the variables we check for the best model.
> step(lm(y1~.,data = anscombe),direction = "both") Start: AIC=7.19 y1 ~ x1 + x2 + x3 + x4 + y2 + y3 + y4 Step: AIC=7.19 y1 ~ x1 + x2 + x4 + y2 + y3 + y4 Step: AIC=7.19 y1 ~ x1 + x4 + y2 + y3 + y4 Df Sum of Sq RSS AIC - y2 1 0.1117 7.2122 5.3567 - x4 1 0.4086 7.5092 5.8005 <none> 7.1006 7.1850 - y4 1 1.4359 8.5365 7.2110 - y3 1 3.2024 10.3030 9.2799 - x1 1 8.1726 15.2732 13.6102 Step: AIC=5.36 y1 ~ x1 + x4 + y3 + y4 Df Sum of Sq RSS AIC - x4 1 0.2976 7.5099 3.8015 - y4 1 1.3274 8.5396 5.2150 <none> 7.2122 5.3567 + y2 1 0.1117 7.1006 7.1850 - y3 1 3.6941 10.9064 7.9060 - x1 1 17.4080 24.6202 16.8624 Step: AIC=3.8 y1 ~ x1 + y3 + y4 Df Sum of Sq RSS AIC - y4 1 1.4175 8.9274 3.7035 <none> 7.5099 3.8015 + x4 1 0.2976 7.2122 5.3567 + y2 1 0.0007 7.5092 5.8005 - y3 1 3.7729 11.2828 6.2792 - x1 1 17.3674 24.8773 14.9767 Step: AIC=3.7 y1 ~ x1 + y3 Df Sum of Sq RSS AIC <none> 8.927 3.7035 + y4 1 1.4175 7.510 3.8015 + x4 1 0.3878 8.540 5.2150 + y2 1 0.1809 8.746 5.4783 - y3 1 4.8353 13.763 6.4647 - x1 1 23.2779 32.205 15.8166 Call: lm(formula = y1 ~ x1 + y3, data = anscombe) Coefficients: (Intercept) x1 y3 4.7802 0.7964 -0.5929
Finally we called for the linear model y1~x1+y3, i.e y1=4.7802+0.7964×1-0.5929y3. Stepwise Regression is a method for variable selection for regression models on basis of just AIC values. There can be other criterion for selection of variables like p values, etc. This is one of the simplest method to start with.