Reflecting on five years of the open lab notebook

I have been keeping an open lab notebook now since February of 2010 1. Those years have seen 271, 301, 164, 136, 86 and 44 posts, respectively. The notebook started on OpenWetWare, with code on (now defunct) Google Code (using svn!). By October of that year I had moved to a Wordpress.org platform running on DreamHost, with code moving to GitHub earlier that year. Wordpress meant my own domain name, control of layout and plugins, better mobile support, and perhaps most importantly, avoided the occasional day-long down-times I had seen on OpenWetWare. By the end of 2010 I had also cobbled together a system in which my scripts committed themselves to GitHub and automatically posted figures the generated to flickr that included the Git hash. Knitr was first released to CRAN in January of 2012, and by March I announced a new phase for my lab notebook focused on notes written with knitr markdown living on GitHub. For the first time, part of the daily research narrative lived not directly in the lab notebook itself (thus the decline in total notebook posts starting that year). Most of my posts in the first two years are much more journal like; reflecting on what I have been working on or reading earlier that day, describing some equation or graph, or muddling over a sticking point. Some of this remained in the notebook, particularly for work unconnected to code (thoughts on some literature, mathematical derivations, etc), but more migrated to knitr documents outside of the notebook. While knitr brought the code, results, and discussion elements closer together, it also often meant I was writing with less distance from the results, and so the text shifted towards more technical and less reflective content. Notebook entries became more brief and bullet pointed, though also punctuated by an increasing frequency of notebook entries that really were blog posts, aimed at an audience and sometimes drawing lengthy comment sections, primarily on topics related to open science and research workflow.

In May of 2012, I then shifted platforms again, this time moving to Jekyll, primarily for reasons of performance and cost (as detailed in that post). I have always experimented heavily with different features and technology connected to the lab notebook, and Jekyll also opened new possibilities. Meanwhile, it fit nicely into the git/markdown-centric workflow I had already adapted. This made it easy to turn knitr outputs into notebook entries, though I waffled back on forth on whether it was more natural to post such material in the relevant project repository in GitHub or directly into the notebook. The notebook remained the home for work not (yet) connected to a project, and for blog posts, and occasionally cross-posted material on a particular project, often capturing a key result. Meanwhile more of my other work became open – in particular, I began writing my manuscripts in public GitHub project repositories from the start, and also making greater use of GitHub issues in projects. The end of 2012 also saw me leave graduate school and begin a post-doc, luckily with advisers that were equally neutral to my open notebook habit.

The start of 2015 saw the next significant evolution in platform. Though still based on Jekyll, I first I broke up what had been growing into one rather large labnotebook repo on GitHub into separate repos by year. This left me with a more lightweight repository and freed me to experiment with infrastructure without breaking all my old posts. I began writing all posts directly in knitr’s .Rmd format, allowing the code to be run automatically when the site was published, rather than executing it locally and copying over a .md file into the notebook. That this was possible is thanks to my adoption of Docker (making the myriad dependencies portable), Continuous integration platforms that could run the code (using the Docker container) and automatically push the compiled site, and servr, another Yihui Xie R package that facilitates knitr+jekyll integration. Some entries involved particularly heavy computation, including a few that required several hours on a large Amazon EC2 machine to complete. knitr’s caching feature made it easy to snapshot these caches (which I did in a somewhat convoluted way with another docker container) so that the whole site could still be rebuilt on a public CI service (primarily circle-ci) within the 50 minute permitted window. The Docker+knitr+CI combination brought reproducible to a new level, though still not perfectly automated, since changing dependencies can still have some effect, particularly since I allow the container to be rebuilt from Dockerfile rather than committed as a binary image. I’ve been pleased that this system has worked remarkably well the entire year, though I’m frequently nervous that some piece of its increasingly complex architecture will break. Moreover, the complicated re-running of all the R code was increasingly redundant as most of my work continued to happen in GitHub repositories which had largely adopted their own system of continuous integration and unit testing (for basic functions and final results, if not for individual notebook posts).

More and more, GitHub, knitr, and other such tools have allowed me to actually perform my research openly, not just discuss it in an open notebook. Perhaps ironically, this means less and less content in the ‘lab notebook’ itself (as I wrote in 2012, ‘the real notebook is on GitHub’). Interestingly, this has the side effect of making the ‘lab notebook’ much less visible. Living on my website, my open notebook has been hard to miss; a visitor could easily page through the entries to see what I had done recently without any of the familiarity it requires to navigate to the same material on GitHub. The idea that this was a scientist’s notebook, a concept both immediately familiar and yet so rarely public, perhaps gave it an out-sized significance to having the same material scattered around GitHub (some under my account and other work in various GitHub Organization accounts), where openness was already the norm. It attracted the attention of reporters and, I’m told, my faculty search committee. So while I think I have benefited from this visibility, for the science it is perhaps much the same.

Halfway though 2015 also saw a different kind of change to my notebook as I transitioned from a post-doc to a faculty position. Just as my homepage (now maintained in its own GitHub repo, though still appearing in the same theme as my notebook) shifted from being “me” to being “my research group,” part of my time also shifted from “my research” to mentoring the research of others. While I continue to believe in transparency in scientific process and results, I have also appreciated that as a graduate student open science was something I was allowed to choose for myself. I want those I mentor to make their own choices as well. Thus we have begun projects with students primarily in private GitHub repositories (now hosted on github.com/boettiger-lab).

So what does this mean for 2016 and the future of my open lab notebook? I suspect it will continue in its role as occasional blog and communication tool, where it’s greater visibility has a practical use, while becoming simpler technically, with the day-to-day of research (& teaching, etc) activity confined to their own repositories.

1. though some material going back to 2009 and 2008 was posted later.