Hey guys, this is a small article where in I talk about ARIMA modelling in R. The method is simple and straight forward. So, let’s start off. Firstly we need some data for the modeling. Head over to the link to get some time series data.

The data that we are using here is a time series data meaning it is recorded over a period of equal intervals of time.

Now, since you have the data lets get to work. Firstly lets start off by loading some libraries. The required libraries are

- forecast
- tseries

Below the R code for loading the libraries.

require(forecast) require(tseries)

Now we need to assign the data that we have to a variable. We do this by.

y<-read.csv(file.choose(), header=T)

The data is stored in a variable known as y. Now let’s make this into a time series data. This can be tone by using the ts command in R.

yobs<-ts(y$Obs, start=c(2003,1), frequency=12)

Here I am setting the frequency as 12 since the data is a monthly data that starts on January 2003.

Let’s plot the data now to see if there are any abnormalities in it.

plot(yobs)

We need to remove the trend, and the best way to do this is by differentiating the data. To differentiate the data the diff command is used.

plot(diff(yobs))

Looking at the above data, the variance is not constant with time. To make the variance constant logging the data is a good method.

plot(log10(yobs)) plot(diff(log10(yobs)))

To check for stationary series, we need to Conduct the Dicker-fuller test to check for stationary series

adf.test(log10(yobs))

Augmented Dickey-Fuller Test

data: log10(yobs) Dickey-Fuller = -4.7137, Lag order = 4, p-value = 0.01 alternative hypothesis: stationary

The Series is stationary, so now we can do the arima modelling. Using the auto.arima command the AR and MA values are obtained automatically.

#Conducting the Arima a1<-auto.arima(log10(yobs),approximation=FALSE,trace=FALSE) #Arima modeling Done! summary(a1)

Series: log10(yobs) ARIMA(1,0,1)(0,1,1)[12] with drift

Coefficients: ar1 ma1 sma1 drift 0.8213 -0.2572 -0.6454 0.0048 s.e. 0.0902 0.1508 0.1124 0.0003 sigma^2 estimated as 0.0002722: log likelihood=224.21 AIC=-438.43 AICc=-437.66 BIC=-426.27 Training set error measures: ME RMSE MAE MPE MAPE MASE Training set 0.0004781441 0.01506067 0.01091572 0.01796991 0.4745754 0.1828288 ACF1 Training set -0.02601957 #Checking for independence in a given time series Box.test(a1$residuals, lag=12, type = "Ljung-Box") #We accept the null hypostesis and the model is a Success!

Box-Ljung test

data: a1$residuals X-squared = 4.8448, df = 12, p-value = 0.963 f<-forecast(a1, h=12) plot(f)

Forecasting for 12 months

f<-forecast(a1, h=12) plot(f)

Decomposing the data

h1<-stl(yobs, s.window="periodic") plot(h1)

## Leave a Reply