Approximate Bayesian Computing (algorithms) methods

Erick Matsen asked me about the question at the end of Luke Harmon’s phyloseminar critiquing Approximate Bayesian Computing ABC methods.  The question wasn’t any more specific, though Luke recognized the paper.  Searching the literature didn’t turn anything up, asking Leo through the MCMC group on Mendeley I think this is the paper:

Robert, C.P. et al. Lack of confidence in ABC model choice. PNAS I, 8(2011). (arXiv)

Seems a more pointed critique to ABC then Templeton’s discussion (Templeton, 2010) particularly as the lead author is a critic of Templeton (Berger et. al. 2010).

At first I thought this would be a relatively intuitive argument that the method is often applied when we lack appropriate sufficient statistics for models.  Of course it comes down to a question of sufficient statistics, but the result is somewhat stronger,

a straightforward proof that a model-wise sufficient statistic is usually not sufficient across models, i.e. for model comparison.  An immediate corollary is that the ABC-MC approximation does not converge to the exact Bayes factor.

Hence the emphasis on model choice in the title – lacking sufficient statistics for the particular model would obviously impact posterior density estimation even within models.

There’s some rather excellent discussion of these issues on Christian Robert’s blog, again thanks to Leo for pointing me to this, still some more reading to do on this before we here more about ABC from Brian O’Meara on phyloseminar next week.

Update

Dan Lawson (comments below) drew my attention to another paper treating this problem of model-wise sufficient statistics Didelot et. al. 2011.  The authors propose a potentially straight-forward work-around: define a model M in which both the original models M1 and M2 are embedded.  They then demonstrate for exponential family models that sufficient statistics for M are sufficient statistics for comparing M1 and M2 by Bayes factors, if I’ve followed the argument correctly.  I’ll try to work out an example in a phylogenetics context in the notebook when I get a chance.

Further reading

Search also led me to stumble across two nice books applying this in R.

Robert, C.P. & Casella, G. Introducing Monte Carlo Methods with R (Use R). 284 (Springer Verlag: 2009), (Amazon link)

Albert, J. Bayesian Computation with R (Use R). 300 (Springer: 2009). (Amazon link).

References

  • Templeton A (2010). “Coherent And Incoherent Inference in Phylogeography And Human Evolution.” Proceedings of The National Academy of Sciences, 107. ISSN 0027-8424, https://dx.doi.org/10.1073/pnas.0910647107.

  • Berger J, Fienberg S, Raftery A and Robert C (2010). “Incoherent Phylogeographic Inference.” Proceedings of The National Academy of Sciences, 107. ISSN 0027-8424, https://dx.doi.org/10.1073/pnas.1008762107.

  • Likelihood-Free Estimation of Model Evidence, Xavier Didelot, Richard G. Everitt, Adam M. Johansen, Daniel J. Lawson, (2011) Bayesian Analysis, 6 10.1214/11-BA602