Set 5 Prophet Lab
Before doing this lab, read the paper on Canvas (“Forecasting at Scale,” in Other Resources within Modules) that describes the method implemented by the prophet package. Install and load the prophet
package. And, skim the online tutorial so you know roughly where to look for further guidance if you get stuck anywhere. In this lab you will model daily US conversions generated across Kayak’s marketing channels from November 2014 through August 2015.
- Read in the conversions data (see code below). Manipulate the data frame so that it is compatible with
prophet
. That is, make it a dataframe with columnsds
andy
, containing the date (starting with2014-11-01
) and response variable (total US conversions across all marketing channels by day, starting with 18,669) respectively. See https://facebook.github.io/prophet/docs/quick_start.html#r-api for more details. In your written response, just show the first 6 rows of your dataframe, nameddf
.
conversions = read.csv("https://raw.githubusercontent.com/dbreynol/DS809/main/data/conversions.csv")
Fit the prophet model, using
pmodel = prophet(df)
. Plot the model predictions along with the observed values. This can be achieved with the generic plot function, by passing in the model and the forecast dataframe,plot(pmodel, forecast)
, whereforecast = predict(pmodel)
.Plot the components of the model fit using the function
prophet_plot_components(model, forecast)
.What is the average squared difference between the observed values of \(y_t\) (i.e., us conversions on day t) versus the predicted values, \(\hat{y}_t\)? The predicted values can be found in the
forecast
dataframe, as defined in question 2. That is, calculate:
\[\begin{align} \text{MSE} = \frac{1}{T} \sum_{t=1}^T (y_t - \hat{y}_t)^2. \end{align}\]
- Now, include a regressor which is the total number of site visits per day in the US. This data is below (but will need to be processed). Details on adding regressors to a prophet model are here: https://facebook.github.io/prophet/docs/seasonality,_holiday_effects,_and_regressors.html#additional-regressors. What is the MSE for the updated model?
Compare the widths of the prediction intervals for the initial model,
pmodel
, and the updated model with the visits covariate. The prediction interval width is the difference betweenyhat_upper
andyhat_lower
, both of which are in the forecast dataframe as defined in question 2.Coerce the response,
y
, from yourdf
into a time series object with frequency = 7 (since there is a clear weekly seasonality). Usingauto.arima
, fit a seasonal ARIMA model to this series. Write out the fitted model.Provide a summary comparison of the three models:
pmodel
,pmodel
+ covariates, and the seasonal ARIMA fit in question 7. You can use MSE or more visual methods to compare the fits of these three models.What are remaining questions you have about the prophet model? How about with ARIMA models? Answer this individually. There are no wrong answers.