Notes For Van Noorden Chat

Below are my scribble notes in prep for an interview with Richard, a writier for Nature.

Advice for early-career scientists on how to think about organizing and publishing research data online.

Things like - why do it,

Good for science (idealist), good for reputation (careerist), etc, but most importantly: helps you do your research.

I had the pleasure of meeting with then Secretary of Energy Stephen Chu last summer at a small Department of Energy computational science conference. Secretary Chu described what he missed most in being away from his research Berkeley National Laboratories was the joy of sharing the excitement of scientific discovery. He descibed reading a recent article on some advance in quantum computing while on a return flight on Airforce One, and probably boring the Secretary of Agriculture sitting next to him to death as he nararrated just how clever the study was. Whether you are a noble lauraeate or just starting your career, there’s nothing like the joy of engaging other scientists in discussing details of research you are passionate about.

Drake story: Becuase he had access to my data, he could engage more substantively by directly demonstrating the point he wished to make about the observed pattern in the data. His ideas then prompted me to run several additional analyses to test if my original intuition was correct, or if he had indeed discovered a novel indicator. Because I had my workflow, I could do this quickly and easily. Our discussion will be published by PRSB, accompanied by data in Dryad, and others can weigh in on the question.

which specific sites/applications/tools are there,

A wealth of tools today. Anyone who can manage to submit a paper with EditorialManager or ManuscriptCentral without swearing will have no problem using the tools available today (though it may increase personal resentment towards said clunky submission systems)

Both are free at the moment, provide DOIs with CLOCKSS archival backup, discoverability, APIs.

Published data: Dryad. Integrated with our journals, data DOI appears on the cover page, good use statistics, good API, familar to the community.
Figshare: Beautiful online display, (lacks: download / citation data so far, less journal integration (PLoS, some NPG), start-up model instead of non-profit model; sustainability plan less obvious.) Private (e.g. within-group) and public options.

Github: ‘a better dropbox.’ free, infinite, open, collaborative,

Effortlessly share rapidly changing content (data, code, manuscript drafts). Fast, verifiable, and persistent identifiers to versions. Collaborative writing and feedback. Provider-independent technology (e.g. git FOSS works with or without Github INC). No CLOCKSS backup. Possible but less ideal for binary data files (e.g. images). Web hosting for sharing truly custom content with full analytics.

Lab Notebook:

what things do you need to look out for (such as data permanence, how to make data citable and discoverable, how to fit in with your field’s requirements, etc).

Metadata.

I see that you’re a great advocate of sharing data and open science, and I’d be keen to talk about what you’ve got out of it,

what you’ve learned,

and any advice you’d give to early-career scientists who are wondering about how to start.

Start in your lab group, and make it habit. You’ll be the biggest beneficiary…

Data sharing isn’t new, and established researchers are often better positioned and more willing to take risks, so I’d hate for them to think “youngsters today think they invented this stuff.” That said, one of the best things early-career scientists can do is encourage this practice in their field.

(early Drosophila story)