Wednesday: Simulated model fit analysis on wrightscape, parrotfish tree

  • Reviewed last night’s simulations, see yesterday’s post.

Simulated data testing:

Fitting the general model well on the parrotfish tree even with clear simulated data seems hard.  Simulating with alpha(intramandibulars) = .01, and others = 20, and fitting general model and the alpha-only independent model (release of constraint model).  The release model MLE recovers reasonable values: alpha(intra) = 1.7, alpha(others) = 21.8, while the general model fit is clearly struggling even with SANN.

[table id=3 /]

May help to try seeding with optima of the sub-models.  This is expressed in the bootstraps of course, where the release model produces rather clear distributions (all with 80 reps):

though the general model is much more confused:

Repeating with more replicates, but should try the remaining submodels.  This obviously shows up in the model comparison by bootstrapped likelihood ratio:

Note how crucial it is we did the test distribution.  Without it, we’d think this a rather clear rejection of the variable alphas model in place of the general model.

MCMCs

  • MCMCs on these models.  Having difficulty with undefined Hastings ratios, probably on out-of-bound proposals, though not sure why those aren’t being caught.  Good to know the MLE runs with the same likelihood function.

  • MCMC errors are occurring in MCMCMC, dropping the heated chains function for the moment works fine.  Some difficulty in recombining chain intervals in the coupling, will need a close look.  FIXME.

  • Running long MCMCs on alpha (on zero, n8) and general (on one, n15) models.  Combines 16 chains (minus burnin), of 1e5 reps.  Simulated_mcmc.R example and sim_general_mcmc.R examples.

Ooh, much faster when not merging multiple chains.  let’s run 16 x 1e6.

Max Likelihood + bootstraps

  • Piecewise initialized simulated annealing from the submodels: Works!  Also note: the aggressive SANN is still needed to get convergence:


> gen_fit <- smart_multiType(data=sim_trait, tree=labrid$tree,
+                   regimes=intramandibular,Xo=NULL, alpha = .1,
+                   sigma = .1, theta=NULL)
[1] "alpha:  3.02538925492894e-07" "alpha:  0.0585819705289946"
[1] "sigma:  0.00166254126803767" "sigma:  2.90960629757019"
[1] "theta:  1.23973641541187" "theta:  1.84059077604827"
>
> print(getParameters.multiOU(gen_fit))
 alpha1       alpha2       theta1       theta2       sigma1       sigma2
 0.756620314  3.758072410 13.732765801  4.853751270  0.006160961  3.913957909
 Xo
 4.667250491
>
>
> sann_fit <- smart_multiType(data=sim_trait, tree=labrid$tree,
+                   regimes=intramandibular,Xo=NULL, alpha = .1,
+                   sigma = .1, theta=NULL,
+                   method ="SANN", control=list(maxit=80000,temp=25,tmax=50))
[1] "alpha:  0.271749815040882" "alpha:  20.4096202543866"
[1] "sigma:  0.0037342888296498" "sigma:  5.5349835595542"
[1] "theta:  4.81905476727201" "theta:  4.58842972148977"
>
> print(getParameters.multiOU(sann_fit))
 alpha1     alpha2     theta1     theta2     sigma1     sigma2         Xo
 0.2609878 21.9790836 -5.5837586  4.6213975 17.9955025  9.6907227  7.3360959
>
  • Should probably compare likelihoods of all models as well to check convergence.

  • Hmm, likelihood of sann_fit isn’t actually better than the very far off combo of nelder-mead?  Change SANN starting temp?

  • Seems to be lack of power.  Seeding from a theta-only estimate doesn’t help at all in this case, as chooses wildly different theta.  meanwhile, sigma model is doing slightly better anyway:

Each model fit creates reasonably tight bootstraps around its MLE estimates:

(plotted on different axes due the large difference in y max).

  • Fixing “fixed” parameter style/ brownie.  has no alpha value stored, so can’t simulate.  FIXED.

  • Try on larger (labrid? simulated?) tree, will need a reasonable painting defined as well.  Seems to be a convergence problem, but more power may help tame the ridges.

Database projects

TreeBASE

  • iEvoBio talk accepted!

  • Johan Nylander’s advice: try readNexus() from phylobase package.  Unfortunately no go so far.  Can’t handle urls so download the file (needs wget interface).  See today’s update onissue tracker.

RPLoS

  • Wrote an R function implementing CrossRef API for use in rplos.

  • Oooh, PLoS and Mendeley have combined forces in the competition to use their APIs.  Wonder if they’ll be very responsive to feature requests on the API itself…