Monday: submodels for full traits

Two shifts

alpha

sigma

likelihood

Pharyngeal shift

alpha

sigma

likelihood

Intramandibular shift

alpha

sigma

likelihood

Intramandibular shift (parrotfish only)

alpha

sigma

likelihood

Troubleshooting

It seems alpha is often higher in the regime that has a the higher sigma when variance is restricted to sigma. This cannot be right.

Creating unit tests:

Sigma behaves correctly in the following way. Simulate data where regimes have different sigmas.


require(wrightscape)
data(labrids)
s1_spec  <- list(alpha = "global", sigma = "indep", theta = "global")
s1 <- multiTypeOU(data = dat["close"], tree = tree, regimes = pharyngeal,
             model_spec = s1_spec,  control = list(maxit=5000))
# Order of entries in sigma is the order regime names are given by levels (alphabetical)
names(s1$sigma) <- levels(pharyngeal)
s1$sigma["other"] <- .01
s1$sigma["pharyngeal"] <- 10
testcase <- simulate(s1)[[1]]
testcase[pharyngeal=="other" & !is.na(testcase) & testcase != 0 ] -> lowvar
testcase[pharyngeal!="other" & !is.na(testcase) & testcase != 0 ] -> highvar

Results check out.


> var(lowvar)
[1] 0.009888678
> var(highvar)
[1] 12.22513
> # and the recovered estimates are pretty close to the underlying simulation parameters
> dummy <- update(s1, testcase)
> dummy$sigma
[1] 0.1015016 7.6793429
> levels(pharyngeal)
[1] "other"      "pharyngeal"

lowvar indeed has the lower variance (the regime with smaller sigma). And the higher sigma is estimated for the pharyngeal regime (with some expected error: 7.7 instead of 10, and 0.1 instead of 0.01).

BUT repeat with alpha:


data(labrids)
a1_spec  <- list(alpha = "indep", sigma = "global", theta = "global")
a1 <- multiTypeOU(data = dat["close"], tree = tree, regimes = pharyngeal,
             model_spec = a1_spec,  control = list(maxit=5000))
names(a1$alpha) <- levels(pharyngeal)

data(labrids)
a1_spec  <- list(alpha = "indep", sigma = "global", theta = "global")
a1 <- multiTypeOU(data = dat["close"], tree = tree, regimes = pharyngeal,
             model_spec = a1_spec,  control = list(maxit=5000))
names(a1$alpha) <- levels(pharyngeal)
a1$alpha["other"] <- 10
a1$alpha["pharyngeal"] <- 1e-12
a1$sigma <- c(5,5)
dat[["simulated_a1"]] <-simulate(a1)[[1]]
dummy <- update(a1, dat[["simulated_a1"]])

testcase <- dat[["simulated_a1"]]
testcase[pharyngeal=="other" & !is.na(testcase) & testcase != 0 ] -> lowvar
testcase[pharyngeal!="other" & !is.na(testcase) & testcase != 0 ] -> highvar

and we get


> var(lowvar)
[1] 1.187643
> var(highvar)
[1] 0.000385144
> dummy$alpha
[1] 10.0551960  0.2670818

And the variances are backwards. Note that the parameter estimates reasonably reflect the simulated regimes (other = 10.1, vs 10, intramandibular = 0.27 vs 1e-12).

How is this possible? The indexing routine is handling the sigma case correctly, and the same code is used to handle the indexing of each.

Does this have anything to do with the simulation function: No. We can recreate the issue by drawing data as iid variables instead of using the simulation along the tree:


require(wrightscape)
data(labrids)
spec = list(alpha="indep", sigma="global", theta="global")
#spec = list(alpha="global", sigma="indep", theta="global")
testcase <- dat[["prot.y"]]
other.prot.data <- testcase[intramandibular=="other" & !is.na(testcase)]
intra.prot.data <- testcase[intramandibular!="other" & !is.na(testcase)] 
## Set the distribution of "other"
testcase[intramandibular=="other" & !is.na(testcase)] <- rnorm(length(other.prot.data), sd=.1)
#Set the distribution of the focal trait
testcase[intramandibular!="other" & !is.na(testcase)] <- rnorm(length(intra.prot.data), sd=10)

m <- multiTypeOU(data=testcase, tree=tree, regimes=intramandibular, model_spec=spec)
names(m$sigma) <- levels(intramandibular)
names(m$alpha) <- levels(intramandibular)

The higher alpha is estimated on the regime that has higher variance


> print(m$alpha)
intramandibular           other 
      13.121679        5.981398 

The data has higher variance among the intramandibular traits. (same is true for pharyngeal). This problem does not face the sigmas model, hence appears to do with the indexing of alphas somehow.

Trying direct debug:


require(wrightscape)
data(labrids)
spec = list(alpha="indep", sigma="global", theta="global")
testcase <- dat[["prot.y"]]
other.prot.data <- testcase[intramandibular=="other" & !is.na(testcase)]
intra.prot.data <- testcase[intramandibular!="other" & !is.na(testcase)] 
testcase[intramandibular=="other" & !is.na(testcase)] <- rnorm(length(other.prot.data), sd=.1)
testcase[intramandibular!="other" & !is.na(testcase)] <- rnorm(length(intra.prot.data), sd=10)

# Load internal functions
require(devtools)
load_all("..")
data <- testcase
lca <- lca_calc(tree)
multiOU_lik_lca(data, tree, intramandibular, alpha=c(2,7), sigma=c(16,5), theta=c(0.5, 0.5), Xo=0.5, lca)

Looks to be keeping the alpha with sigma at least.

Check that in global ou mode, smaller alphas fit high variance better: (yup)


# Test of single OU
test <- dat[["prot.y"]]
test[!is.na(test)] <- rnorm(sum(!is.na(test)), sd=10)
ou  <- list(alpha = "global", sigma = "global", theta = "global")
m1 <- multiTypeOU(data=test, tree=tree, regimes=intramandibular, model_spec=ou)
test[!is.na(test)] <- rnorm(sum(!is.na(test)), sd=.1)
m2 <- multiTypeOU(data=test, tree=tree, regimes=intramandibular, model_spec=ou)
m1$alpha # larger variance has smaller alpha
m2$alpha

All okay. But compare fits


testcase <- dat[["prot.y"]]
other.prot.data <- testcase[intramandibular=="other" & !is.na(testcase)]
intra.prot.data <- testcase[intramandibular!="other" & !is.na(testcase)] 
testcase[intramandibular=="other" & !is.na(testcase)] <- rnorm(length(other.prot.data), sd=.1)
testcase[intramandibular!="other" & !is.na(testcase)] <- rnorm(length(intra.prot.data), sd=10)
# Load internal functions
require(devtools)
load_all("..")
data <- testcase
lca <- lca_calc(tree)

# expect higher alpha on "other" (alpha2) to be best, higher on alpha1 is worst
# BUT NO!
> multiOU_lik_lca(testcase, tree, intramandibular, alpha=c(2,2), sigma=c(5,5), theta=c(0.5, 0.5), Xo=0.5, lca)
[1] -1802.169
> multiOU_lik_lca(testcase, tree, intramandibular, alpha=c(5,2), sigma=c(5,5), theta=c(0.5, 0.5), Xo=0.5, lca)
[1] -248.6667
> multiOU_lik_lca(testcase, tree, intramandibular, alpha=c(2,5), sigma=c(5,5), theta=c(0.5, 0.5), Xo=0.5, lca)
[1] -110821.8



> a1_spec  <- list(alpha = "indep", sigma = "global", theta = "global")
> a1 <- multiTypeOU(data=testcase, tree=tree, regimes=intramandibular, model_spec=a1_spec)
> a1$alpha
[1] 14.165247  6.579535

[flickr-gallery mode=“search” tags=“phylogenetics” min_upload_date=“2011-11-28 10:15:00” max_upload_date=“2011-11-28 10:30:20”]