# Tgp Mcmc For Gaussian Processes

Exploring implementation through tgp package MCMC routine and background reading (manual, vignettes, thesis).

Example call:

gp <- bgp(X=X, XX=XX, Z=Z, meanfn="constant", bprior="b0", BTE=c(1000,6000,2), m0r1=TRUE, verb=4, corr="exp", trace=TRUE, s2.p=c(5,10), tau2.p=c(5,10), s2.lam="fixed", tau2.lam="fixed")

Verbose return:

n=39, d=1, nn=101
BTE=(1000,6000,2), R=1, linburn=0
RNG state RK using rk_seed
preds: data krige
T[alpha,beta,nmin,smin,bmax]=[0,0,10,1,1]
mean function: constant
beta prior: b0 hierarchical
s2[a0,g0]=[5,10]
s2 prior fixed
tau2[a0,g0]=[5,10]
tau2 prior fixed
corr prior: isotropic power
nug[a,b][0,1]=[1,1],[1,1]
nug prior fixed
gamlin=[0,0.2,0.7]
d[a,b][0,1]=[1,20],[10,10]
d prior fixed

The parameters in this run are constant across the trace points (because hyperparameters are fixed).

• X is the observed X values (or matrix of appropriate dimension)
• XX the desired predicted grid. (will also predict on X)
• Z is observed Y values (or matrix of appropriate dimension)
• m0r1 means scale data to mean zero and unit range
• bprior is the prior distribution, defaulting to bfalt. b0 is a Gaussian. This prior is for the Gaussian process and is not to be confused with the hyperpriors describing the distribution of various parameters.
• BTE Burn-in B steps, Terminate after T steps, sample Every E steps.
• verb Very verbose output
• Other options (for adaptive/query learning, etc, igored for now. Focus on priors).
• The priors for the hyperparameters are given by s2.p, tau.p etc (both are Inverse Gammas). Their parameters can vary hiearchically in general, but we hold them fixed, s2.lam = 'fixed', etc.)

• s2 is $$\sigma^2$$ in Gramacy and $$\tau$$ in Rassmussen and Williams, is the scale of the noise. Comes from an inverse gamma prior. Set by inverse-gamma parameters a0 and g0.

• tau2 Also comes from an inverse gamma prior. It is also set by inverse-gamma parameters a0 and g0.

$Z | \beta, \sigma^2, K \sim N_n(\mathbf{F} \beta, \sigma^2 \mathbf{K})$ $\beta | \sigma^2, \tau^2, \mathbf{W}, \beta_0 ~ N_m(\beta_0, \sigma^2 \tau^2 \mathbf{W} )$ $\beta_0 | N_m(\mathbf{\mu}, \mathbf{B})$

• We keep the heirachical priors fixed in the tau.lm, s2.lm, etc. nug fixed by default. Note this results in the the priors from which these are sampled are fixed: e.g. gp$trace$hier$s2.a0, gp$trace$hier$s2.g0

• nug.p This is the “nugget,” the measurement error. Comes from a mixture of gamma prior parameter (initial values) for the nugget parameter c(a1,g1,a2,g2) where g1 and g2 are scale (1/rate) parameters. Default reduces to simple exponential prior. Specifying nug.p = 0 fixes the nugget parameter to the “starting” value in gd, i.e., it is excluded from the MCMC

• index is the sampling point (from BTE, we see we sample starting at step 1000 and ending at step 6000, recording every 2 steps, so there are $$(T-B)/E = 2500$$ index points).
• lambda possibly a mixing parameter in the importance sampler temperature??
• beta0 No mention in the manual
• d The parameter for priors using the hierachical exponential distribution for the parameters a1, a2, g1, g2.
• b ??

• Where are the hyperparameters for the correlation function (e.g. the length-scale for the Gaussian??)

### Traces

pars <- melt(gp$trace$XX[], id = "index")
ggplot(pars, aes(index, value)) + geom_line() + facet_wrap(~variable, scales="free_y")

It is possible to calculate summary statistics and check out the distribution of these fellows.

mean(gp$trace$XX[][,"nug"])
 0.009755
ggplot(pars, aes(value)) + geom_histogram() + facet_wrap(~variable, scales="free")

## Extracting the estimated process

(The MAP, maximum a posteriori expected mean and variance).

Extract the posterior Gaussian process mean and the $$\pm 2$$ standard deviations over the predicted grid from the fit:

V <- gp$ZZ.ks2 dat <- data.frame(x = gp$XX[],
y    = gp$ZZ.km, ymin = gp$ZZ.km - 1.96 * sqrt(gp$ZZ.ks2), ymax = gp$ZZ.km + 1.96 * sqrt(gp$ZZ.ks2)) Plot the posterior Gaussian Process:  p1 <- ggplot(dat) + geom_ribbon(aes(x=x,y=y, ymin=ymin, ymax=ymax), fill="grey80") + # Var geom_line(aes(x=x,y=y), size=1) + #MEAN geom_point(data=gp$obs,aes(x=x,y=y)) +
labs(title=paste("llik =", prettyNum(gp$llik))) if(!is.null(true)) p1 <- p1 + geom_line(data = true, aes(x, y), col = "red", lty = 2) p1 + theme_notebook plot of chunk gp-plot Find the actual optimum policy from the true parametric model x_grid <- dat$x
h_grid <- x_grid
matrices_true <- f_transition_matrix(f, p, x_grid, h_grid)
opt_true <- find_dp_optim(matrices_true, x_grid, h_grid, 20, 0, profit, delta=.01)

Estimate a policy from the Gaussian Process:

rownorm <- function(M) t(apply(M, 1, function(x) x/sum(x)))
matrices_gp <- lapply(h_grid, function(h){
S <- dat$y - h F_ <- t(sapply(1:length(S), function(i){ if(S[i]>0) { out <- dlnorm(x_grid/S[i], 0, V[i]) } else { out <- numeric(length(x_grid)) out <- 1 out } })) F_ <- rownorm(F_) }) opt_gp <- find_dp_optim(matrices_gp, x_grid, h_grid, 20, 0, profit, delta=.01) and plot policies <- melt(data.frame(stock=x_grid, GP = x_grid[opt_gp$D[,1]], Exact = x_grid[opt_true$D[,1]]), id="stock") policy_plot <- ggplot(policies, aes(stock, stock - value, color=variable)) + geom_point() + xlab("stock size") + ylab("escapement") policy_plot + theme_notebook plot of chunk policy_plot We can see what happens when we attempt to manage a stock using this: z_g <- function() rlnorm(1,0, sigma_g) set.seed(1) sim_gp <- ForwardSimulate(f, p, x_grid, h_grid, K, opt_gp$D, z_g, profit=profit)
set.seed(1)
sim_true <- ForwardSimulate(f, p, x_grid, h_grid, K, opt_true$D, z_g, profit=profit) df <- data.frame(time = sim_gp$time, stock_gp = sim_gp$fishstock, stock_true = sim_true$fishstock, harvest_gp = sim_gp$harvest, havest_true = sim_true$harvest)
df <- melt(df, id="time")
simplot <- ggplot(df) + geom_line(aes(time,value, color=variable))
simplot + theme_notebook