Set 6 Bayesian Inference

6.1 Bayesian logic

In the Bayesian paradigm, parameters (\(\theta\)) are random and data (\(y\)) are fixed. Inference is carried out through the posterior distribution of parameters given data.

\[p(\theta \vert y) = \frac{p(y|\theta) p(\theta)}{p(y)}\] For very simple models, we can write the analytical solution for the posterior. For example, suppose we are flipping a coin and we are interested in learning about the parameter \(\theta = P(\text{Heads})\).

Before observing any data, we believe that \(\theta\) is probably around 0.5 but we are not sure. We can encode this belief with a prior distribution whose expected value is 0.5 and ranges from 0 to 1. A good choice is the Beta distribution. In particular, we can use a Beta(\(\alpha = 2\), \(\beta = 2\)),

\[\begin{align} p(\theta) = 6\theta (1-\theta). \end{align}\]

Now we generate some data. We flip the coin (independently) 7 times and observe 5 heads. Since our data is a sequence of 0s and 1s, we can use the binomial distribution for the likelihood. That is:

\[\begin{align} p(y_1, \ldots y_7 \vert \theta) &= {7 \choose 5} \theta^5 (1-\theta)^2. \end{align}\]

Now we can multiply them together to get the posterior distribution of \(\theta\):

\[\begin{align} p(\theta \vert y) &= \frac{p(y|\theta) p(\theta)}{p(y)} \\ &= \theta^5 (1-\theta)^2 \times \theta (1-\theta) \times 6/p(y) \\ &= \theta^6 (1-\theta)^3 \times 6/p(y) \\ &= \text{Beta}(7,4) \end{align}\]

6.2 Monte Carlo Idea

Suppose we can sample from \(p(\theta \vert \text{data})\). Then we could generate,

\[\begin{align} \theta^{1},\ldots,\theta^{S} \sim p(\theta | \text{data}) \end{align}\]

and obtain Monte Carlo approximations of posterior quantities:

\[\begin{align} E(g(\theta) \vert \text{data}) \approx \frac{1}{S} \sum_{i=1}^S g(\theta^i). \end{align}\]

But what if you can’t sample from \(p(\theta \vert \text{data})\)?

6.3 Metropolis Algorithm

The metropolis algorithm proceeds as follows:

Sample \(\theta^{\star} \sim J(\theta \vert \theta^{s})\), where \(J\) is called the proposal distribution. For the Metropolis Algorithm, we assume that this distribution is symmetric.
Compute the acceptance ratio, \(r\):

\[r = \frac{p(\theta^\star \vert \text{data})}{p(\theta^{s} \vert \text{data})}\] 3 Let

\[\begin{equation} \theta^{s+1} = \begin{cases} \theta^{\star} & \text{with prob min}(r,1) \\ \theta^{s} & \text{otherwise} \end{cases} \end{equation}\]

Step 3 can be accomplished by sampling \(u \sim \text{Unif}(0,1)\) and setting \(\theta^{s+1} = \theta^\star\) if \(u < r\) and setting \(\theta^{s+1} = \theta^s\) otherwise.

6.4 Example

Let’s consider a multivariate linear regression model. That is, we will model our response variable, \(y\), as

\[\begin{align} y = X\beta + \epsilon. \end{align}\]

For simplicity, let us assume that \(\epsilon \sim N(0,1)\). Therefore, the parameter that we want to do inference on is \(\beta\). So, we want to sample from \(p(\beta \vert y)\).To do so, we need two ingredients, a likelihood function and a prior distribution for \(\beta\), \(p(\beta)\).

\[\begin{align} p(\beta \vert y) &\propto p(y \vert \beta) p(\beta) \\ &= \text{likelihood} \times \text{prior} \end{align}\]

For the linear regression model, our likelihood function can be written as,

\[\begin{align} p(y \vert \beta) = (2\pi \sigma)^{-n/2} \exp{\bigg( -\frac{1}{2 \sigma^2} (y - X \beta)^T (y - X \beta) \bigg)}. \end{align}\]

For the prior, we will assume that we know nothing about \(\beta\). So, we will use the prior distribution,

\[\begin{align} p( \beta) \propto 1. \end{align}\]

We will use the proposal distribution,

\[\begin{align} J(\theta^{\star} \vert \theta) \sim N(\theta, cI) \end{align}\].

Let us start by sampling some data that follows this model.

set.seed(1)
n = 100
X = matrix( data = c(rep(1,n), rnorm(n) ), ncol = 2)
beta_true = c(1,2)
y = rnorm(100, mean = X %*% beta_true, sd = 1)

Now let’s implement the MH algorithm and sample from the posterior distribution of \(\beta\). Our MCMC output for the slope coefficient, \(\beta_1\), is summarized below.

nIter = 1e5
betas = matrix(0, nrow = nIter, ncol = 2)

beta_current = betas[1,]
p_current = exp ( -1/2 * t(y - X %*% betas[1,]) %*% (y- X %*% betas[1,] ))

for(j in 2:nIter) {
  # propose a new value for betas
  beta_prop = rnorm( 2, mean = betas[(j-1),], sd = .2)
  
  # compute probability of data | beta proposed
  p_prop = exp ( -1/2 * t(y - X %*% beta_prop) %*% (y- X %*% beta_prop ))
  
  r = p_prop/p_current
  
  if ( r > 1 ) {
    beta_current = beta_prop
    p_current = p_prop
    
  } else if (runif(1) < r) {
    beta_current = beta_prop
    p_current = p_prop
  }
  
  betas[j, ] = beta_current
  
}

6.4.1 Specific Problem

For the more general state space model we will use the package bsts.

For our problem, we will introduce the covariate into the measurement equation. Our model is,

\[\begin{align} y_t &= \mu_t + \beta x_t + \epsilon_t \\ \mu_t &= \mu_{t-1} + \eta_t. \end{align}\]

We will use the package bsts to fit this model. The first thing to do when specifying a bsts package is the specify the contents of the latent state vector \(\mu_t\)

NA