Set 5 Prophet Lab

Before doing this lab, read the paper on Canvas (“Forecasting at Scale,” in Other Resources within Modules) that describes the method implemented by the prophet package. Install and load the prophet package. And, skim the online tutorial so you know roughly where to look for further guidance if you get stuck anywhere. In this lab you will model daily US conversions generated across Kayak’s marketing channels from November 2014 through August 2015.

  1. Read in the conversions data (see code below). Manipulate the data frame so that it is compatible with prophet. That is, make it a dataframe with columns ds and y, containing the date (starting with 2014-11-01) and response variable (total US conversions across all marketing channels by day, starting with 18,669) respectively. See https://facebook.github.io/prophet/docs/quick_start.html#r-api for more details. In your written response, just show the first 6 rows of your dataframe, named df.
conversions = read.csv("https://raw.githubusercontent.com/dbreynol/DS809/main/data/conversions.csv")
  1. Fit the prophet model, using pmodel = prophet(df). Plot the model predictions along with the observed values. This can be achieved with the generic plot function, by passing in the model and the forecast dataframe, plot(pmodel, forecast), where forecast = predict(pmodel).

  2. Plot the components of the model fit using the function prophet_plot_components(model, forecast).

  3. What is the average squared difference between the observed values of \(y_t\) (i.e., us conversions on day t) versus the predicted values, \(\hat{y}_t\)? The predicted values can be found in the forecast dataframe, as defined in question 2. That is, calculate:

\[\begin{align} \text{MSE} = \frac{1}{T} \sum_{t=1}^T (y_t - \hat{y}_t)^2. \end{align}\]

  1. Now, include a regressor which is the total number of site visits per day in the US. This data is below (but will need to be processed). Details on adding regressors to a prophet model are here: https://facebook.github.io/prophet/docs/seasonality,_holiday_effects,_and_regressors.html#additional-regressors. What is the MSE for the updated model?
visits = read.csv("https://raw.githubusercontent.com/dbreynol/DS809/main/data/visits.csv")
  1. Compare the widths of the prediction intervals for the initial model, pmodel, and the updated model with the visits covariate. The prediction interval width is the difference between yhat_upper and yhat_lower, both of which are in the forecast dataframe as defined in question 2.

  2. Coerce the response, y, from your df into a time series object with frequency = 7 (since there is a clear weekly seasonality). Using auto.arima, fit a seasonal ARIMA model to this series. Write out the fitted model.

  3. Provide a summary comparison of the three models: pmodel, pmodel + covariates, and the seasonal ARIMA fit in question 7. You can use MSE or more visual methods to compare the fits of these three models.

  4. What are remaining questions you have about the prophet model? How about with ARIMA models? Answer this individually. There are no wrong answers.