Multiple Uncertainty Value Function Iteration

Multiple Uncertainty algorithm

Define each of the transition matrices:

    D <- matrix(NA, nrow=length(x_grid), ncol=Tmax)  
    P <- outer(x_grid, h_grid, profit)
    F <- outer(x_grid, f(x_grid, 0, p), pdfn, sigma_g)
    M <- outer(x_grid, x_grid, pdfn, sigma_m)
    I <- outer(h_grid, h_grid, pdfn, sigma_i)

Probably row-normalize each:

rownorm <- function(M) t(apply(M, 1, function(x) x/sum(x))

(Note the transpose is needed since the silly function of a vector turns the column into a row. Not an issue of margin 2)

If we have no uncertainty in measurement or implementation, then the algorithm is:

    V <- P
    for(t in 1:Tmax){
      D[,(Tmax-t+1)] <- apply(V, 1, which.max)
      V <- P + delta * F %*% V %*% D[, (Tmax-t+1)]  

With uncertainty

Ep <- M %*% P %*% I
U <- t(M) %*% F %*% M 
V <- Ep
  for(t in 1:Tmax){
    D[,(Tmax-t+1)] <- apply(V, 1, which.max)
    V <- Ep + delta * U %*% V %*% I %*% D[, (Tmax-t+1)]  


** rambling thinking while working out the above **

The October 1st entry derives the rule for updating the state (measured stock),

\[ Y_{t+1} = \mathbb{F} \mathbb{M} \mathbb{I}_h \vec Y_t, \]

but this isn’t sufficient to complete the value iteration. Recall from the Bellman recursion,

\[V_t = \max_h \operatorname{E}\left( \Pi(Y_t, h_t) + V(Y_{t+1}, h_{t+1}) \right) \]

In the final timestep with no scrap value, we assert that the vector of values for each possible state, \(\vec V_T = {0}\). The previous year is then the first year that can have profits, given by \(\Pi(Y_{T-1}, h_{T-1})\). Of course \(Y_{T-1}\) is unknown, but is given by the recursion in (1).

Our strategy in discrete space is to work backwards from \(T\), keeping track of the \(h\) that maximizes the value-to-go for each possible state. So at \(T-1\), we put each possible state \(y\) and each possible havest \(h\) into \(\Pi(y,h)\) and select the \(h\) that maximizes the profit. In the case of a deterministic harvest and a lack of measurement error this first iteration is trivial: harvest what you see \(h=y\) (unless \(h\) is bounded or there is a cost on effort), yielding the value \(V_{T-1}\) for each state \(y\) is \(\Pi(y,y)\). When profits are proportional to harvest, up to a constant this is simply \(y\).

For a generic profit function that may have a cost on effort, we must store the map of which \(h\) is optimal for each \(y\). In discrete space, call this the decision vector \(D_{T-1}\) where the indices correspond to the states \(y\) and the values correspond to the harvest (or harvest index) that is optimal for that state, and we write \(\Pi(y_{T-1}, D_{T-1}(y_{T-1}))\)

If there is uncertainty in either implementation of the harvest or measurement of the stock, or both, then we must integrate over this in determining the optimal \(h\). The expected profit derived from choosing a harvest quota \(q\) and having measured a stock at \(y\) is given by

\[ \Pi(y,q) = \int \int \Pi(x,h) P(x) dx P(h) dh \]

In matrix form, if \(\mathbb{P}\) is the matrix of profits where element \(p_{ij} = \Pi(x_i, h_j)\) and vector \(\vec H\) with elements \(H_i\) give the probability of harvesting \(h_i\) fish given the quota \(q\), then \(\mathbb{P} \vec H\) is a vector of profits for each \(x_i\). Representing distribution in the measurement similarly as \(\vec X\), the integration for the expected profit is \(\vec X^T \mathbb{P} \vec H\) (giving the scalar value \(\Pi(y,q)\)).

We can store & lookup \(X\) using a matrix whose columns give \(\vec X\) for each value \(y\), and similarly for \(H\) given \(q\). These are our matrices \(\mathbb{M}\) (each row is a fixed observed \(y\), maps true \(x\) into observed \(y\), elements \(P_m(y, x)\)) \(\mathbb{I}\) (mapping quota \(q\) (columns) to implemented harvest \(h\), elements \(P_I(h_i, q_j)\)) from before. Then we can write \(\operatorname{E}(\Pi(x,h) | y,q)\) as a matrix resulting from the product of matrices:

\[\tilde{\mathbb{P}} = \mathbb{M} \mathbb{P} \mathbb{I} \]

In the next iteration, \(T-2\), we realize that our choice of \(h\) impacts the value we could get a time \(T-1\) as well. In choosing the harvest quota in this interval we must also consider what value it introduces in the \(T-1\) interval. We already have the matrix above giving the profits for each \(y\) and \(h\), we just need to transition from the space of the current \(y\) to the future \(y\) through application of the transition matrix \(T: y_t \to y_{t+1}\). \(F: x_t \to x_{t+1}\), \(F(x_{t+1}, x_t)\), \(M: x \to y\), \(I: q \to h\), so

\[\mathbb{T} = M^T F M \tilde{P} I D_t \]