Adapted from Marescot et al. appendix 5, to Reed optimal control problem, including direct comparison against (semi) analytic optimum.

## step 1: define objectives

This is a conceptual step which does not require coding

## step 2: define states

```
K <- 150 # state space limit
states <- 0:K # Vector of all possible states
```

## step 3: define control actions

```
# Vector of actions: harvest
H <- states
```

## step 4: define dynamic model (with demographic parameters)

```
p <- c(6,0.05)
f <- function(x, h){
A <- p[1]
B <- p[2]
s <- pmax(x-h, 0)
A * s/(1 + B * s)
}
sigma_g = 0.1
```

## step 5: define utility

```
# Utility function
get_utility <- function(x,h) {
pmin(x,h)
}
```

## step 6: solve bellman equation with value iteration

```
# Initialize transition matrix
transition <- array(0, dim = c(length(states), length(states), length(H)))
# Initialize utility matrix
utility <- array(0, dim = c(length(states), length(H)))
```

```
# Fill in the transition and utility matrix
# Loop on all states
for (k in 0:K) {
# Loop on all actions
for (i in 1:length(H)) {
# Calculate the transition state at the next step, given the
# current state k and the harvest H[i]
nextpop <- f(k, H[i])
if(nextpop <= 0)
transition[k+1, , i] <- c(1, rep(0, length(states) - 1))
# Implement demographic stochasticity by drawing
# probability from a density function
else {
# We need to correct this density for the final capping state ("Pile on boundary")
# For discrete probability distribution, this is easy if `states` includes all possible
# discrete states below the capping state (e.g. all non-negative integers less than K).
# For a continuous distribution, this is more problematic as we have to first normalize the densities.
# EDIT: this can be negative, due to floating-point errors. so we take max(v,0) to avoid
# Get long-tailed denominator as normalizing factor (continuous distributions only):
fine_states <- seq(min(states), 10 * max(states), by = states[2]-states[1])
N <- sum(dlnorm(fine_states, log(nextpop), sdlog = sigma_g))
transition[k+1, , i] <- dlnorm(states, log(nextpop), sdlog = sigma_g) / N
# We need to correct this density for the final capping state ("Pile on boundary")
transition[k+1, K+1, i] <- max(1 - sum(transition[k+1, -(K+1), i]), 0)
}
# Compute utility
utility[k+1, i] <- get_utility(k, H[i])
} # end of action loop
} # end of state loop
```

```
# Discount factor
discount <- 0.95
# Action value vector at tmax
Vtmax <- numeric(length(states))
# Action value vector at t and t+1
Vt <- numeric(length(states))
Vtplus <- numeric(length(states))
# Optimal policy vector
D <- numeric(length(states))
# Time horizon
Tmax <- 150
```

## Solution calculated explicitly:

The backward iteration consists in storing action values in the vector `Vt`

which is the maximum of utility plus the future action values for all possible next states. Knowing the final action values, we can then backwardly reset the next action value `Vtplus`

to the new value `Vt`

. We start The backward iteration at time `T-1`

since we already defined the action value at `Tmax`

.

```
for (t in (Tmax - 1):1) {
# We define a matrix Q that stores the updated action values for
# all states (rows)
# actions (columns)
Q <- array(0, dim = c(length(states), length(H)))
for (i in 1:length(H)) {
# For each harvest rate we fill for all states values (row)
# the ith column (Action) of matrix Q
# The utility of the ith action recorded for all states is
# added to the product of the transition matrix of the ith
# action by the action value of all states
Q[,i] <- utility[, i] + discount * (transition[,,i] %*% Vtplus)
} # end of the harvest loop
# Find the optimal action value at time t is the maximum of Q
Vt <- apply(Q, 1, max)
# After filling vector Vt of the action values at all states, we
# update the vector Vt+1 to Vt and we go to the next step standing
# for previous time t-1, since we iterate backward
Vtplus <- Vt
} # end of the time loop
# Find optimal action for each state
for (k in 0:K) {
# We look for each state which column of Q corresponds to the
# maximum of the last updated value
# of Vt (the one at time t + 1). If the index vector is longer than 1
# (if there is more than one optimal value we chose the minimum
# harvest rate)
D[k + 1] <- H[(min(which(Q[k + 1, ] == Vt[k + 1])))]
}
```

## plot solution

`plot(states, states - D, xlab="Population size", ylab="Escapement")`

## proof of optimality: compare with analytical solution

From Reed (1979) we know that the optimal solution is a constant-escapement rule:

\[f'(s^*) = 1/\alpha\]

For growth-rate function \(f\), where \(\alpha\) is the discount factor and \(s^*\) the stock size for the constant escapement. Analytic solutions are clearly possible for certain growth functions, but here I’ve just implemented a generic numerical solution.

```
fun <- function(x) - f(x,0) + x / discount
out <- optimize(f = fun, interval = c(0,K))
S_star <- out$minimum
exact_policy <- sapply(states,
function(x)
if(x < S_star) 0
else x - S_star)
```

```
plot(states, states - D, xlab="Population size", ylab="Escapement")
# The difference between Bellman equation solution and the analytical
# solution is small:
lines(states, states - exact_policy)
```

## Using toolbox

```
library("MDPtoolbox")
mdp_check(P = transition, R = utility)
```

`[1] ""`

`out <- mdp_value_iteration(transition, utility, discount = discount, epsilon = 0.001, max_iter = 5e3, V0 = Vtmax)`

`[1] "MDP Toolbox: iterations stopped, epsilon-optimal policy found"`

```
plot(states, states - D, xlab="Population size", ylab="Escapement")
lines(states, states - H[out$policy], col="red", lty=2)
```