iEvoBio lightning talk

Submitting a five minute lighting talk on open science. Submitted version here, final text below. See the page revision history for earlier drafts and the original call below. Special thanks to Sam, Alistair, Betta, Shaun, and Alex for feedback.

My experiment with open science: Why the benefits of sharing go beyond source code

The practice and philosophy of open source run deep within the iEvoBio community. Sharing source code has made software development faster and more reliable. Reuse helps catch errors, develop standards, provide more extensible platforms and ensure scientific reproducibility. It has helped foster a community that can tackle problems that would be impossible for single labs. Can an “open-source” approach benefit other elements of research? I discuss how I am fostering open science throughout my research process from my reading, through code development, my lab notebook to feedback and conversation. I highlight three themes that I have found most valuable: (1) Facilitating access and reuse. Content that is legally and practically accessible makes for more reproducible science, better error checking, and farther impact as more people may see or build upon it. (2) Linking data. In a web-native environment, everything can be archived, linked, tagged with keywords and searched. In this way, my content can connect to that of other researchers working on similar questions, data, articles or algorthms. (3) Networking people. The most successful tools not only link data but link people. This provides a network of expertise, a filter of exponentially growing amounts of information, and a source of inspiration and collaboration. I will briefly describe four tools that embrace this approach in different elements of my own research.

Literature (using Mendeley): My bibliographies for each topic become publicly visible collections. When other researchers do the same, the content is automatically linked to see how many people in my field are reading what articles, authors, journals and also generates recommendations. This tool also provides a network of contacts, following other public collections or collaboratively adding to one.

Code Development (using Github): My code development is regularly committed to a public version management repository. One can follow line by line changes, view my commit log and visualize branches and merges in the project. The code can be downloaded in the state it was at any point in its history to replicate my work and be reused in other projects. Github’s social coding tools connects me to similar researchers whose work I can follow, pose questions or begin a direct collaboration.

Lab Notebook (using OpenWetWare): I keep a daily online lab notebook on a community wiki. This binds and shares all elements of my work, from readings and meetings to mathematical derivations and simulations. Graphs, code snippets, spreadsheets, references, and even conversations can be embedded directly into the notebook. I can search the complete text, and organize by tags, links, and the automatic versioning archive. This facilitate sharing my work with collaborators, and has helped me discover new datasets and collaborations from other users on the site.

Feedback (using uservoice and stackoverflow): Before forums, asking a question meant knowing someone who knows the answer. Online tools make it possible to ask questions to particular expertise groups. Questions and replies can be tagged, linked, and searched; cites automatically search for existing similar questions. They can also be rated, drawing more focus to those with more value to the community, and contributions can be tracked and rewarded, while profiles help identify contributors and provide elements of a human network. This is also well adapted to feedback between users and developers.

I find this approach saves time as it integrates naturally into my workflow. It reduces errors through greater transparency and review. It increases quality through better access to expert advice and building on established work. It increases impact by connecting to other data and being amplified by other researchers in the network. The more researchers adopt these approaches, the more effective they’ll become.


### Call For Lightning Talks

“Lightning talks are short presentations of 5 minutes. They are ideal for drawing the attention of the audience to new developments, tools, and resources, or to subsequent events where more in-depth information can be obtained. Please also see our FAQ for more information (https://ievobio.org/faq.html#lightning). Lightning talks will be part of the more interactive afternoon program on both conference days.

Submitted talks should be in the area of informatics aimed at advancing research in phylogenetics, evolution, and biodiversity, including new tools, cyberinfrastructure development, large-scale data analysis, and visualization.

Submissions consist of a title and an abstract at most 1 page long. The abstract should provide an overview of the talk’s subject. Reviewers will judge whether a submission is within scope of the conference (see above). If applicable, the abstract must also state the license and give the URL where the source code is available so reviewers can verify that the open-source requirement(*) is met."

From iEvoBio Conference.

Reading / Notes

Excellent article on organizing computational projects:

  1. Noble WS. . pmid:19649301. PubMed HubMed [Noble]