question 4

mail@pastecode.io avatar
unknown
r
2 years ago
3.6 kB
2
Indexable
Never

## **Exercise 4**

**4.1)** Firstly we plot the global_economy dataset without alteration, then we plot the same data after it has been normalized with a box_cox transformation. We can see that after the data has been normalized the variance in the graph has been reduced, therefore, we can justify the need for the transformation.
```{r, figures-side, fig.show="hold", out.width="50%"}

US_GDP <- global_economy %>%  filter(Country=="United States") %>% 
  select(Country, GDP)

lambda <- US_GDP %>%
  features(GDP, features = guerrero) %>%
  pull(lambda_guerrero)

box_US_GDP <- US_GDP  %>%
  mutate(GDP = box_cox(GDP, lambda)) 

US_GDP %>% autoplot(GDP)
box_US_GDP %>% autoplot(GDP)
```
The time plot shows some non-stationarity with a steady increase and a strong upwards trend. The Autocorrelation function ACF() is done on the GDP column. The
data shows significant spikes all through out the lags and we can see a
decrease in the ACF as the amount of lags increase caused by the
seasonality trend. This plot has a high autocorrelation.

```{r}
box_US_GDP %>%
  gg_tsdisplay(GDP, plot_type = 'partial') +
  labs(title = "United States GDP")

arima_fit <- box_US_GDP %>% 
  model(arima = ARIMA(GDP, stepwise = FALSE, approx = FALSE))

arima_fit %>% gg_tsresiduals()
```
In the PACF The last significant spike can be seen at lag 1, which
should be expected from an ARIMA(1,1,0) with drift. The box cox
transformation is applied, this normalizes the data and reduces the
AICc.
```{r}
print(arima_fit)
```

**4.2)** 
After trying out all the possible combinations of ARIMA models, we can
see that the model labelled 'arima022' performs the best with a lowest
AICc score of 648.7516. The model is built as ARIMA(0,2,2), with p = 0, d = 2, q = 2.

```{r}
fit_models <- box_US_GDP %>% 
model(arima000 = ARIMA(GDP ~ pdq(0,0,0)),
    arima010 = ARIMA(GDP ~ pdq(0,1,0)),
    arima110 = ARIMA(GDP ~ pdq(1,1,0)),
    arima210 = ARIMA(GDP ~ pdq(2,1,0)),
    arima020 = ARIMA(GDP ~ pdq(0,2,0)),
    arima120 = ARIMA(GDP ~ pdq(1,2,0)),
    arima220 = ARIMA(GDP ~ pdq(2,2,0)),
    arima320 = ARIMA(GDP ~ pdq(3,2,0)),
    arima011 = ARIMA(GDP ~ pdq(0,1,1)),
    arima111 = ARIMA(GDP ~ pdq(1,1,1)),
    arima211 = ARIMA(GDP ~ pdq(2,1,1)),
    arima021 = ARIMA(GDP ~ pdq(0,2,1)),
    arima121 = ARIMA(GDP ~ pdq(1,2,1)),
    arima221 = ARIMA(GDP ~ pdq(2,2,1)),
    arima321 = ARIMA(GDP ~ pdq(3,2,1)),
    arima021 = ARIMA(GDP ~ pdq(0,1,2)),
    arima112 = ARIMA(GDP ~ pdq(1,1,2)),
    arima212 = ARIMA(GDP ~ pdq(2,1,2)),
    arima022 = ARIMA(GDP ~ pdq(0,2,2)),
    arima122 = ARIMA(GDP ~ pdq(1,2,2)),
    arima222 = ARIMA(GDP ~ pdq(2,2,2)),
    arima322 = ARIMA(GDP ~ pdq(3,2,2)),
        stepwise = ARIMA(GDP),
        search = ARIMA(GDP, stepwise=FALSE))
report(fit_models)

fit_models %>% pivot_longer(!Country, names_to = "Model name", values_to = "Orders")
glance(fit_models) %>% arrange(AICc) %>% select(.model:AICc)

fit_models %>% select(search) %>% gg_tsresiduals()
```

**4.3)** The best ARIMA model is plotted along side the ETS. The AICc of
the best ARIMA model is significantly less than that of the ETS model
with an AICc of 3191.941 compared to the ARIMA's 648. Remembering that
the lower the AICc the better the model.

```{r}
fit_ets <- US_GDP %>% model(ETS(GDP))
fit_ets %>% forecast(h = 10) %>%
  autoplot(US_GDP) +
  labs(title = "United States GDP 10 Year Forecast Using ETS")

fit_models %>% forecast(h=5) %>% filter(.model =='search') %>%
  autoplot(box_US_GDP) +
  labs(title = "United States GDP 10 Year Forecast Using Arima Model")

```