Reflections on the Mozilla Science Code Review Pilot

I was recently interviewed a Nature senior reporter Erika Check Hayden on the subject of the scientific code review project being conducted by Mozilla Science Lab. The piece appears in this week’s issue, Hayden 2013. My blog post sharing my own approach to code review is mentioned at the beginning of the article, though it is rather Roger Peng’s comments at the end that have stirred some interesting discussion.

Roger raises two concerns. First, that increased scrutiny will discourage researchers from sharing code, (which, right or wrong, remains a voluntary choice in most journals):

One worry I have is that, with reviews like this, scientists will be even more discouraged from publishing their code

and second, that code review does not focus on what matters mosts:

We need to get more code out there, not improve how it looks

(Erika provides a bit more context to Roger’s comments below).

@ctitusbrown @cboettig @kaythaney @nickbarnes see whole @simplystats quote on prof. code review discouraging sharing pic.twitter.com/pNQWT9Safz
— Erika Check Hayden (@Erika_Check) September 25, 2013

The Nature News piece thus nails a central tension in the community between promoting higher standards for code (A position exemplified in an earlier Nature column titled “Computational Science … Error: Why scientific programming does not compute”, and more recently in Science by Joppa et al. 2013, which explicitly calls for peer review of scientific software) vs promoting more widespread sharing of software (as exemplified by Nick Barnes piece in the same issue, “Publish your computer code: it is good enough”.

The arguments made in each of the perspectives are excellent, and should be required reading for anyone interested in the subject. In addition to the shorter comment by Barnes, I also recommend the more recent Nature perspective, The Case for Open Computer Programs, which lays out the argument and modest practical recommendations (that have largely been ignored as far as I can tell) just brilliantly. In particular, I think they nail the issue of why describing the algorithm or providing pseudo-code is not a satisfactory description of the method.

However, I also think the tension between review and sharing is somewhat artificial. While each of these positions emphasizes the need to share source-code, the call for code review by Merali, Joppa et al (or in my own blog post mentioned earlier), focus on scientific software aimed at reuse by others. The concerns voiced in Roger Peng’s comments and echoed by Nick Barnes focus on another class of code entirely – code associated with a particular research publication that would primarily serve only to document and support those results, rather than be readily adapted to other uses.

@cboettig @ctitusbrown @kaythaney but far more important to get code out than to get it “right”, IYSWIM.
— Nick Barnes (@nickbarnes) September 25, 2013

Classes of Code: Snippets vs Software

For me, the crux of these concerns lies in the difference between “software papers”¹ and papers which merely use code in some element of the methodology. The Mozilla study focused exclusively on code appearing in the full-text of publications in PLoS Computational Biology. Though I am pleased to see the Science Lab tackle the issue of software review and bring the expertise of their professional software developers to bear on scientific code, this is perhaps not the kind of code I would have chosen to focus on (something I shared with the team early on in seeing the announcement).

Without knowing which papers are included it is of course difficult to say to much. But knowing that the code appears in the full text of the papers themselves, we can assume that it is not a complete software package intended for reuse by other researchers. Using code within the body of a manuscript implies the intent to communicate methodology more concisely and precisely than might be done in prose; in much the same manner that we use equations in place of prose. This is an important development in scientific communication, but is also rather distinct from the use of code in other contexts, in which the code itself is meant to be read primarily by machines. It is code that is already intended to help explain.

Code included in appendices to scientific papers is meant rather to document exactly what has been done, in a manner that assists replication, and may require considerable effort to decipher exactly what is being done. Instead, it merely supports the more readable but less precise description and potentially the pseudo-code that would appear in the body text.

Code intended for reuse as research software (in software papers) is another class entirely. Ostensibly, the user never needs to see the code itself, but only interact with the user interface or end-user functions (API) provided. Code that is written clearly and concisely still has value – helping identify bugs and facilitating future researcher-developers extending the software, but most of it’s functionality can be accessed and assessed without looking at the source. I think it is in this kind of review that we as a researcher-developer community could learn the most from the Mozilla software engineering experts.

I believe the most important focus of code review is in scientific software rather than in code snippets. And in reviewing software, I think all of the most important elements do not actually involve reading the source code at all (as I discuss in my revised position on reviewing software papers), but rather in establishing that the software behaves as expected and follows software development practices that make it more sustainable, such as hosting in a software repository, version control, or example input and output.

Code vanity?

Roger’s second comment appears more dismissive of code review than I think it actually is:

“We need to get more code out there, not improve how it looks.”

@kaythaney @ctitusbrown yup, ‘pretty’ is dismissive terminology, though possibly short-hand for ‘human-readable’ not just ‘machine-readable’
— Carl Boettiger (@cboettig) September 25, 2013

Most modern languages include syntactic sugar: ways of expressing commands that are more easily interpretable to human readers. For instance, in C, a[i] is syntactic sugar for *(a+i). Higher-level languages are in some ways all sugar around existing lower-level libraries.² Like good mathematical notation or good prose, this is not just about being ‘pretty’, but being more effective in communicating with humans. Certainly this is something we can improve upon as researchers, but it is perhaps not the best starting point.

If code review should apply to all levels of code or be reserved for scientific software may still be an open question. What we should be able to agree on is in the publishing of code in the first place:

@cboettig @kaythaney @nickbarnes I think code sharing should be mandatory shrug. It's part of the methods. I reject papers w/o it.
— Titus Brown (@ctitusbrown) September 25, 2013

It continues to surprise me how few journals require code deposition. Science explicitly adopted a new policy in 2011 stating

Science is extending our data access requirement listed above to include computer codes involved in the creation or analysis of data

which asks that the data be placed in a appropriate permanent repository or otherwise placed in the supplementary materials (see information for authors).

Yet I have not seen code provided for any analysis I have read in Science since 2011. Either we have a very different understanding of what it means to use computer codes in the analysis of data or Science grossly neglects its own policy.³ Not to pick on them of course, few other journals have explicitly adopted such a policy.

.@cboettig @kaythaney @nickbarnes As one of my grad students said to me, “I don't understand why ‘must share code’ is a radical opinion.”
— Titus Brown (@ctitusbrown) September 25, 2013

Nick Barnes suggests that this alone may be enough to improve code quality:

@cboettig @ctitusbrown @kaythaney require sharing. Pride will then rapidly lead to review and other improvement techniques.
— Nick Barnes (@nickbarnes) September 25, 2013

Certainly it will help, though not enough if the state of open source scientific software is any indication (Nick does acknowledge a rather geological notion of ‘rapid’). Smaller codes used in particular analyses will certainly feel this pressure more, as they will be easier to scrutinize.

So what might we learn from the Mozilla Code review?

Focusing on code appearing in-line in papers certainly addresses a different beast than large scientific software packages intended for reuse. As Roger observed, we will likely learn that scientists aren’t software engineers. We may learn how to use code to communicate more effectively. We may learn some lessons that apply for larger codebases involved in scientific software, but I think there the problem are often outside of the individual lines of code themselves and arise from other development practices.

Still, learning how to do code review at all would be an invaluable start. As the discussion on my own post on the subject made clear, we researchers have no training in this practice. The Mozilla study would give the first taste. I only hope they turn there attention next to larger scientific software that lives outside of the publications themselves and is intended at re-purposing and reuse. Meanwhile, I also hope journals will become more serious about recognizing code as methods, as they have started to do with data.

which I define as papers primarily aimed at promoting the reuse of a particular codebase and providing an indexed citation target to credit the work.↩
It is frequently argued that ideal code should be ‘self-documenting’, with syntax so precise that no other explanation of the function is necessary. While I think this is an admirable ideal, it should never be a substitute for actually providing prose documentation as well. I find we too easily decieve ourselves as to just how self-documenting our own code really is (it’s all so obvious at the time, right?).↩
From my reading it appears that Science requirements are intentionally too vague on this issue, stating: “All computer codes involved in the creation or analysis of data must also be available to any reader of Science.” It is unclear if ‘available on request’ is considered appropriate for code, though it is explicitly not acceptable for data: “we have therefore required authors to enter into an archiving agreement, in which the author commits to archive the data on an institutional Web site, with a copy of the data held at Science”, and the editorial makes it clear that “the data access requirement … includes computer codes”. Meanwhile all studies involving ‘requests for data’ have shown a return rate of substantially less than a 100% (usually less than 20%, e.g. this recent study). If that was a viable option we could just publish abstracts and have papers available on request too. If you need to know who wants your data, why not do the same for papers?↩

Classes of Code: Snippets vs Software

Code vanity?

Share first, fix later

So what might we learn from the Mozilla Code review?