Lab Notebook

Entries

Monday

12 Dec 2011

  • Wrote method2 test functions out in knitr/sweave. Sent document to Peter.
  • Move my ropensci git origins to the ropensci github page.
  • Ropensci comments on : recommended for anything non-trivial, including web calls. Richer tests should be put into testthat.
  • Reply to Charles Jervis, biology high school teacher interested in tools to teach math/modeling in ecology.  Sounds like he’s already doing some cool stuff with phylogenies with his students.
  • prepare vignette for rfishbase based on tutorial.
  • prepare vignette for treebase base on tutorial.

Misc Wordpress/Notebook Stuff

Kcite upgrade – moves to a new platform, becomes author-year format, smaller format bibliography.  Whoops, it also breaks my dynamic header-footers – for some reason they no longer appear on any page or post not calling the kcite citation!  Fixed with a hack call to kcite function in my footer

<!-- MY ADDITION: KCITE HACK -->
<script type="text/javascript">
       var kcite_citation_data;
if( kcite_citation_data == undefined ){
kcite_citation_data = [];
}
</script>

but not happy about this.

Read more



Better dynamic documents (Sweave) with syntax highlighting, caching, etc

12 Dec 2011

The highlight package is a simple solution for very nice syntax highlighted code boxes in latex documents.  Requires switching the driver, which is best done from within R and requires creating a makefile though.  Needs the “highlight” package installed.  Here’s a simple makefile.

A wealth/mess of Sweave related packages on the CRAN taskview for Reproducible Research (now what other software platform has a the equivalent of a Reproducible Research task view?)

But why not really do things correctly? The knitr package is beautiful. Abstracts the concept beyond just latex integration.  You can write in pure latex, (instead of noweb mix), in pure markdown, or pure html, embedding the R commands in comments.  knitr can run the code in those comments to generate a file of the same format, but with the dynamic content embedded.  Handles caching, syntax highlighting, and figures in a much more seamless manner.  It can also use the package author’s formatRcode to tidy up your code before including it.  You can convince R to use this for vignettes with a similar makefile.

[gist id=1470786]

The github markdown integration makes github look like a better and better candidate for a lab notebook.  Wonder how it handles mathjax or citations?

Another nice find for bringing my older package documentation into roxygen form: Rd2roxygen.

More ambitious dynamic documentation

These improvements to Sweave are nice, though the dynamic document concept can certainly go much further.  Dexy.it is an interesting multi-lingual version, but doesn’t seem to combine source and the documentation in the same file.  Guess that can be kinda good, kinda bad. It seem to use a non-standard syntax, and while it’s python easy_install sounds fine, might still be a barrier to adoption among R users…

Meanwhile, some notes form a presentation I saw recently that have a much more ambitious view of what’s possible.  Nov 22 (which I’ve recast into the language Tim Berners-Lee uses to describe linked-data on the web:

  • One star – sweave

  • two stars - XDynDocs multiple views (alternate analyses, full lab notebook)

  • three stars - rfirefox - interactive web view

  • star four - live data sources (self updating)

Some people involved: Deborah Nolan, Thomas Lee, Paul Baines, Wolfgang Polonik, Duncan Temple-Lang, Gabriel Becker.

some of my questions:

  • In the context of these tools, do we distinguish documenting code from publishing analyses?

  • Does this make remaining errors the more insidious?

Read more



Some Notes on Open Notebooks, Open peer review practices.

11 Dec 2011

What follows is a collection of notes, examples, and reflections I’ve been meaning to write down.

Revealing Revisions

Sharing version history is an important part of an open notebook really being open, and one reason why wikis have remained so popular for Open Notebook Science.  Wordpress (this platform) does version management internally, and thepost-revision display plug-inmakes this visible on posts.  I had experimented but disabled this a while ago because I didn’t like visual impact of the long list at the end of my posts.  I may often put a post up early in the day and add subsequent notes as I go along, so my revisions are nearly exclusively additions (I show any cross-outs on the final copy anyway) so it seemed unnecessary.  Anyway, this FF discussion made me feel guilty so I added them back.  Luckily the manual mode lets me have this display in the metadata section, which feels a bit more out of the way. ((Configuration notes to self:  Directions basically follow the package instructions – I added  the two php lines to single.php in the sidebar section.  Would be overwritten by theme updates so I should write a child theme, but I understand my theme is no longer being updated anyhow.  I’d have to edit the css to get color highlighting of the changes, doesn’t really fit with my theme.  My theme also doesn’t use the standard CSS file, but it’s own directory for css files which means I have to go outside the wordpress editor to edit it.)).  We’ll see.

Revealing Impact

A while ago Heather Piwowar asked me about setting up WP Post Views on my notebook so that projects like total-impact could include the data.  There should be a good plugin that just uses the API from the standard statistics plugin (now part of jetpack) to display views on individual posts, but it seems there isn’t.  WP Post Views requires modifying the theme files, and requires the ajax-the-views plugin to update the display on sites using caching, and uses it’s own counter (so it’s view counts only start once it’s installed), but I don’t have time to write my own, so it now appears in the metadata section of the posts.

ONS Logos

Interesting discussion by Andy Maloney on the use of ONS logos to track Open Notebook science.  (See mine in the footer “further information.”  The choice of ONS claim is a different discussion, e.g. here).

Journals with Open Peer Review

Been meaning to jot these notes down, prompted to by the open-knowledge / G+ discussion here.  Here’s a few different models of open peer review.

Biology Direct has always practiced open peer review in which reviews are solicited; then having received the reviews (either positive or negative) the author can opt to have the paper published, can make changes & publish, or can withdraw the paper. Any time the paper is published the reviews are published with it, with the reviewers name, and with the author’s reply to the reviews. http://www.biology-direct.com/

The Nature journal EMBO has an innovative but less extreme process where reviews are solicited, identities are included, but the reviews are published (as a supplement) only if the paper is accepted. The open review is opt-in, with 95% opting in. Interestingly reviewers are encouraged to “cross-review” or comment on remarks of other reviewers. They have stats showing that about 10% of the time people download the peer-review comments when they download the paper, comparable to downloads of traditional supplements. http://www.nature.com/emboj/about/process.html

Nature did an experiment with its flagship journal on unsolicited open peer review in 2006 in parallel with it’s traditional peer review. They deemed it unsuccessful due to low opt-in rates among authors and few & lower quality reviews in the open. http://www.nature.com/nature/peerreview/debate/nature05535.html

BMC Molecular Biology journals has authors recommend reviewers, non-anonymous review, and publishing of the reviews with the article.  http://www.biomedcentral.com/bmcmolbiol/authors/peerreview

Some vibrant discussion on non-commercial (NC) (and to some extent, share-alike, or SA) clauses in licenses on the open-knowledge list (and in the recent published literature: (Carroll, 2011), (Hagedorn et. al. 2011)). The question at hand is targeting Open Access publishers who use NC license, but has interesting reflections onto bloggers and software licenses.

While I’d heard discontent with NC before (particularly around PLoS ONE clones using NC clause), I was surprised at the extent – Creative Commons is apparently considering removing it since it is so problematic.  Similar troubles have dogged the share-alike/GPL clause for software (license incompatibility, etc), and it sounds that GPL is out of favor in preference for more open/permissive BSD licenses.

References

Read more



Friday

09 Dec 2011

Wrightscape/pmc

  • Writing up derivation from Thursday properly / for manuscript.
  • Writing up test cases for method2, direct evaluation of covariance matrix for comparisons
  • Minor revisions submitted to Evolution

pdg-control To Do

  • Implement the NCO.
  • Implement HS SDP
  • Closer read of (Sethi et. al. 2005), (Su & Peterman, 2012) and Woodward & Tomberlin

Warning signals

  • Allee effects in linear programming with different classes.
  • Timescales of noise using van Kampen expansion.

Image Log

[flickr-gallery mode=“search” tags=“phylogenetics, PDG_Control” min_upload_date=“2011-12-09 7:00:37” max_upload_date=“2011-12-09 23:23:37”]

References

  • Sethi G, Costello C, Fisher A, Hanemann M and Karp L (2005). “Fishery Management Under Multiple Uncertainty.” Journal of Environmental Economics And Management, 50. ISSN 00950696, http://dx.doi.org/10.1016/j.jeem.2004.11.005.

  • Su Z and Peterman R (2012). “Performance of A Bayesian State-Space Model of Semelparous Species For Stock-Recruitment Data Subject to Measurement Error.” Ecological Modelling, 224. ISSN 03043800, http://dx.doi.org/10.1016/j.ecolmodel.2011.11.001.

Read more



Thursday

08 Dec 2011

Another day of working on 4 different projects at once.

Warning signals

  • review Alan pulses manuscript

  • review (Dulvy et. al. 2003) Paper discussing the data you want $ $ the data you want

No way in Anchoveta data, look at time series:

Example in the cod data

using only up until 1991 data:

  • Way forward – construct the step-wise predictor of probabilities.  Add the economic costs of both.  run in replicate — eek this will definitely be one for the NERSC cluster.  Is there really no policy to compare against?

wrightscape

Evolution

  1. describe the differences in optimization methods, and modify our mention of the Hessian to say that it may provide reasonable CIs, but not necessarily, and can be problematic to compute. – Done

  2. Consider AICc for comparison?  – supplement?

  3. Cite Revel & Collar 2009 – Done

Reply letter.

Code polish and submit.

pdg-control

Very clever! State-space too big for SDP?  Just simulate forward a few steps and call it good. Now with citation.  Nicol & Chadès, 2011

Discussing with Paul – read:

Checking expected behavior when the cost of change is high and the stock has significant scrap value (reward for non-zero terminal condition):

in this limit the trivial solution is not to fish.

Just implement NCO tracking on Reed’s S.

References

  • Dulvy N, Sadovy Y and Reynolds J (2003). “Extinction Vulnerability in Marine Populations.” Fish And Fisheries, 4. ISSN 1467-2960, http://dx.doi.org/10.1046/j.1467-2979.2003.00105.x.

  • Beyond Stochastic Dynamic Programming: A Heuristic Sampling Method For Optimizing Conservation Decisions in Very Large State Spaces, Sam Nicol, Iadine Chadès, (2011) Methods in Ecology And Evolution, 2 10.1111/j.2041-210X.2010.00069.x

Read more



Wednesday - wrightscape runs and various other things

07 Dec 2011

pdg-control

  • plots for changing costs
  • Report out to Training Problem II group.

ggplot note

defining functions that return a ggplot object can be tricky. If you compute some objects in the plot function (stats etc), those stats are not stored by the object (due to lazy evaluation, usually a very nice time-saving feature), so the plot cannot be produced unless the function returns those things to the global environment (or from wherever the plot will be printed/evaluated). Can return them to the global env with “<<-” assignment, but seems poor form.

Evolution

power in phylogenies

  • Hessian reply.  manuscript reply.
  • Select wrightscape examples.

wrightscape

  • 1530089 series: bm vs a2 on parrotfish tree. looks like close & SH.y reject brownie, open almost does. kt almost does.
  • 7a347b0 series: bm vs a1 on parrotfish tree: (missing lr line).  More power, smaller diffs.  Kt maybe? must rerun these!
  • 4335204  bm vs a1 with the lines.

prot.y sig. open sig in reverse

  • cce5a5a/93a8630 series: labrid tree, intramandibular division, bm vs a1   (more power on larger tree, go for finer division between models?)   only bodymass rejects.  (cce5a5a missing lr line).

  • 8382283 now running labrid bm vs a2.

KT significant 

closing: boarderline
closing: boarderline
bodymass sig
bodymass sig
open: kinda not really
open: kinda not really
  • open & prot.y significant in reverse
  • 6befae4 Check out open and kt on a2 vs bm when regime=two_shifts.
Prot
Prot
open
open
kt
kt

bodymass are significant.  Close is significant reversed (funny pattern at a2 anyhow).

Make sure to run against standard ouch too!

Warning Signals

  • Alan Manuscript
  • Reading: re data sources (Dulvy et. al. 2003)
  • (Le Quesne & Jennings, 2012)
  • Example decisions under uncertainty of crash

Reading

  • (Molloy, 2011)
  • And on NC clause (Carroll, 2011), (Hagedorn et. al. 2011)
  • Fiction: Non-commerical license means no one can make money from it. “I don’t want to give some entrepreneur a free lunch”
  • Fact: NC means the journal (by holding the copyright) retains the right to sell your work to pharmaceuticals, teachers, researchers interested in text-mining,  (rather than those groups having free use).
  • Data reviews – F1000?

[flickr-gallery mode=“search” tags=“phylogenetics, PDG_Control” min_upload_date=“2011-12-07 7:00:37” max_upload_date=“2011-12-08 12:23:37”]

References

Read more



Optimal control -- costs to policy shifts

07 Dec 2011

Considering the case where there is a cost to changing the harvest quota (representing the cost of efforts to change policy).  We can imagine several scenarios that could introduce this:

  • Fixed cost to change

  • Cost proportional to the size of the change (L1 norm)

  • Cost proportional to the square of the size (L2 norm, i.e. small changes aren’t expensive, large ones are very expensive)

  • asymmetric costs – lowering quotas may be harder than raising them

Implementation:

Before, we could calculate the optimal harvest rate for each value of the state variable $ x_t $ on the grid, at each time step (using Bellman’s equation for the value-to-go).  The optimal solution was given by a matrix of size n by T, where n is the grid size and T the stopping time  - n rows for the states, T columns for the timesteps.  Now we have to have the optimal harvest level at state $ x_t $ calculated out for each possible value of last year’s quota ( h_{t-1}  ), to know how big a change we’re considering.  This makes our matrix n by n by T (if harvest can assume any value on the grid, it has n options – we might make the harvest grid coarser or bounded at some h_max < x_max).

I think I have this right(?)  Slower to compute this much bigger optimum, but here we go:

Results

L1 norm tends to follow the optimal solution to a point, and then give up on changing the policy – putting the harvest all the way up and running the population to extinction:

The “alternate” line shows what the fishstock would have been if there was no cost to changing the policy (the classical Reed optimum), and its harvest rate is shown as harvest_alt. Looking across the ensemble we see this pattern repeat, with the manager throwing in the hat and fishing out the population at different times:

Higher costs make this happen instantly, lower costs nearly mimic the Reed optimum strategy.

Code

available in R package form now,install instructions.

(version of code used for figures should be linked from the figures as always too).

Read more



Tuesday -- two insights

06 Dec 2011

Control & Optimization

Dominique Bonvin visit

  • two-step control (optimize parameters, optimize mean-square error between model and measurement also by parameter adjustment) fails when reality isn’t in the model set.  Problem is under-determined (N free variables, 4N equations).  (converges, but onto the wrong optimum).

  • Better solution – adjust the penalty/regularization terms to to get better agreement between model and data

  • Bonvin approach – adjust the control instead (NCO Control).  Can be almost model free in the control process.   (**)

pdg-control Training Problem 1b & 2

  • Coded SDP with costs to changing quotas.

  • Best way to visualize the optimal solution?

Evolution

  • Hessian calculation?  Non-trivial – how to get Hessian from a non-gradient method, or a gradient method when you need to handle bounds?  Various numerical challenges.

  • Wrightscape model comparisons – comparing meaningful hypotheses! (**)

Day’s graphs log

[flickr-gallery mode=“search” tags=“phylogenetics, PDG_Control” min_upload_date=“2011-12-06 7:00:37” max_upload_date=“2011-12-06 23:23:37”]

Read more



Monday - wrightscape meetings, PDG meeting

05 Dec 2011

Wrightscape – added a user utility to to specify custom model types, such that some subset of the total number of regimes can share certain parameters.

Peter Meeting

  • Derivation – provide clear examples of the differences and by how much it matters.
  • Expected result from the biology: the jaw lever traits should show greater changes than the other traits after a release of constraint.
  • Based on previous studies, expect most of the action to be around the intramandibular joint innovation.

Luke Meeting

  • Set up wrightscape to run on the Anolis ecomorph data.  Requires custom_Multitype() to specify separate shared and independent parameters on different regimes.
  • Get wrightscape to compile on Luke’s Mac.  He has gsl libraries installed, yet somehow package won’t compile by R, complains that R cannot find gcc-4.2.  Solved aliased gcc to gcc-4.2, and compile worked fine.  Wish that didn’t take so long to think of.
  • Up and running on Luke’s Anolis data.

PDG Training Problem 1 Conference Call

1b: penalties on rate of change of control variable.

  • c depend on h – not a velocity penalty.
  • L_1 vs L_2  is a bigger penalty for large changes, small penalty for smaller
  • Operational question: do penalties change the nature of the optimal policy?
  • No cost penalties, upper bound on h
  • Jake with h_max stochastic problem – lose the state independence, alter the Reed solution.
  • Highlight differences from constant-escapement policy.

1a: curvature question

Focus on cases where state equation has stable node, chattering/limit cycles introduced by the control.

[flickr-gallery mode=“search” tags=“phylogenetics” min_upload_date=“2011-12-05 8:00:37” max_upload_date=“2011-12-05 23:23:37”]

Read more