Basic regression in Gaussian processes

Working out coded examples for basic Gaussian process regression using R. I’ve just read through the first few chapters of Rasmussen & Williams (2006), this implements the examples discussed in Chapter 2.1-2.5.

Required R libraries (for multivariate normal, also for plotting):

require(MASS)
require(reshape2)
require(ggplot2)

Set a seed for repeatable plots

set.seed(12345)

Define the points at which we want to compute the function values (x values of the prediction points or test points), and the scale parameter for the covariance function \(\ell=1\)

x_predict <- seq(-5,5,len=50)
l <- 1

We will use the squared exponential (also called radial basis or Gaussian, though it is not this that gives Gaussian process it’s name; any covariance function would do) as the covariance function, \(\operatorname{cov}(X_i, X_j) = \exp\left(-\frac{(X_i - X_j)^2}{2 \ell ^ 2}\right)\)

SE <- function(Xi,Xj, l) exp(-0.5 * (Xi - Xj) ^ 2 / l ^ 2)
cov <- function(X, Y) outer(X, Y, SE, l)
COV <- cov(x_predict, x_predict)

Generate a number of functions from the process

values <- mvrnorm(3, rep(0, length=length(x_predict)), COV)

Reshape the data into long (tidy) form, listing x value, y value, and sample number

dat <- data.frame(x=x_predict, t(values))
dat <- melt(dat, id="x")
head(dat)
       x variable   value
1 -5.000       X1 -0.6450
2 -4.796       X1 -0.9227
3 -4.592       X1 -1.1587
4 -4.388       X1 -1.3277
5 -4.184       X1 -1.4139
6 -3.980       X1 -1.4103

Plot the result

fig2a <- ggplot(dat,aes(x=x,y=value)) +
  geom_rect(xmin=-Inf, xmax=Inf, ymin=-2, ymax=2, fill="grey80") +
  geom_line(aes(group=variable)) +   theme_bw() +
  scale_y_continuous(lim=c(-3,3), name="output, f(x)") +
  xlab("input, x")
fig2a
plot of draws from the null distribution
plot of draws from the null distribution

Posterior distribution given the data

Imagine we have some data,

obs <- data.frame(x = c(-4, -3, -1,  0,  2),
                  y = c(-2,  0,  1,  2, -1))

In general we aren’t interested in drawing from the prior, but want to include information from the data as well. We want the joint distribution of the observed values and the prior is:

\[\begin{pmatrix} y_{\textrm{obs}} \\ y_{\textrm{pred}} \end{pmatrix} \sim \mathcal{N}\left( \mathbf{0}, \begin{bmatrix} cov(X_o,X_o) & cov(X_o, X_p) \\ cov(X_p,X_o) & cov(X_p, X_p) \end{bmatrix} \right)\]

No observation noise

Assuming the data are known without error and conditioning on the data, and given \(x \sim \mathcal{N}(0, cov(X_o, X_o))\), then the conditional probability of observing our data is easily solved by exploiting the nice properties of Gaussians,

\[x|y \sim \mathcal{N}\left(E,C\right)\] \[E = cov(X_p, X_o) cov(X_o,X_o)^{-1} y\] \[C= cov(X_p, X_p) - cov(X_p, X_o) cov(X_o,X_o)^{-1} cov(X_o, X_p) \]

(We use solve(M) which with no second argument will simply invert the matrix M, but should use the Cholsky decomposition instead for better numerical stability)

cov_xx_inv <- solve(cov(obs$x, obs$x))
Ef <- cov(x_predict, obs$x) %*% cov_xx_inv %*% obs$y
Cf <- cov(x_predict, x_predict) - cov(x_predict, obs$x)  %*% cov_xx_inv %*% cov(obs$x, x_predict)

Now lets take 3 random samples from the posterior distribution,

values <- mvrnorm(3, Ef, Cf)

and plot our solution (mean, 2 standard deviations, and the ranom samples.)

dat <- data.frame(x=x_predict, t(values))
dat <- melt(dat, id="x")

fig2b <- ggplot(dat,aes(x=x,y=value)) +
  geom_ribbon(data=NULL, 
              aes(x=x_predict, y=Ef, ymin=(Ef-2*sqrt(diag(Cf))), ymax=(Ef+2*sqrt(diag(Cf)))),
              fill="grey80") +
  geom_line(aes(color=variable)) + #REPLICATES
  geom_line(data=NULL,aes(x=x_predict,y=Ef), size=1) + #MEAN
  geom_point(data=obs,aes(x=x,y=y)) +  #OBSERVED DATA
  scale_y_continuous(lim=c(-3,3), name="output, f(x)") +
  xlab("input, x")
fig2b
plot of draws from the posterior, with no proces noise
plot of draws from the posterior, with no proces noise

Additive noise

In general the model may have process error, and rather than observe the deterministic map \(f(x)\) we only observe \(y = f(x) + \varepsilon\). Let us assume for the moment that \(\varepsilon\) are independent, normally distributed random variables with variance \(\sigma_n^2\).

sigma.n <- 0.25
cov_xx_inv <- solve(cov(obs$x, obs$x) + sigma.n^2 * diag(1, length(obs$x)))
Ef <- cov(x_predict, obs$x) %*% cov_xx_inv %*% obs$y
Cf <- cov(x_predict, x_predict) - cov(x_predict, obs$x)  %*% cov_xx_inv %*% cov(obs$x, x_predict)

Now lets take 3 random samples from the posterior distribution,

values <- mvrnorm(3, Ef, Cf)

and plot

dat <- data.frame(x=x_predict, t(values))
dat <- melt(dat, id="x")

fig2c <- ggplot(dat,aes(x=x,y=value)) +
  geom_ribbon(data=NULL, 
              aes(x=x_predict, y=Ef, ymin=(Ef-2*sqrt(diag(Cf))), ymax=(Ef+2*sqrt(diag(Cf)))),
              fill="grey80") + # Var
  geom_line(aes(color=variable)) + #REPLICATES
  geom_line(data=NULL,aes(x=x_predict,y=Ef), size=1) + #MEAN
  geom_point(data=obs,aes(x=x,y=y)) +  #OBSERVED DATA
  scale_y_continuous(lim=c(-3,3), name="output, f(x)") +
  xlab("input, x")
fig2c
plot of chunk unnamed-chunk-16
plot of chunk unnamed-chunk-16

Note that unlike the previous case, the posterior no longer collapses completely around the neighborhood of the test points.

We can also compute the likelihood (and marginal likelihood over the prior) of the data directly from the inferred multivariate normal distribution, which can allow us to tune the hyperparameters such as the characteristic length scale \(\ell\) and the observation noise \(\sigma_n\). The most obvious approach would be to do so by maximum likelihood, giving point estimates of the hyper-parameters, though presumably we could be Bayesian about these as well.