rOpenSci

Motivation

I strongly believe in shar­ing sci­en­tific data — I believe this is a win for indi­vid­ual sci­en­tists (http://dx.doi.org/10.1371/journal.pone.0000308), (http://dx.doi.org/10.1371/journal.pone.0026828) as well as for the larger enter­prise of other researchers, pub­lish­ers, fun­ders, edu­ca­tors, and the gen­eral pub­lic. A lot of data has become avail­able, (helped on by recent require­ments), but most researchers (at least in my field) con­tinue to work on their own data as before.  We lack the tools and we lack the train­ing to quickly and eas­ily lever­age such data.  To address some of these chal­lenges, I have helped found a project called rOpen­Sci, together with ecol­o­gists Scott Cham­ber­lain and Karthik Ram.

What we do

The project cre­ates tools in the form of R pack­ages — a pow­er­ful data-manipulation and visu­al­iza­tion envi­ron­ment famil­iar to most ecol­o­gists — that inter­face with the major data repos­i­to­ries to make seri­ous data explo­ration and analy­sis easy and script-able.   We have sev­eral released and many actively devel­oped projects (listed below).

Advis­ing Team

We have assem­bled a world class team  to advise the project on R and sci­en­tific databases:

  • Dun­can Temple-Lang, Pro­fes­sor of Sta­tis­tics at UC Davis and R core devel­oper.  Dun­can is author of many pack­ages includ­ing RCurl and XML which under­lie many of the rOpen­Sci web inter­fac­ing, and a lead­ing thinker in repro­ducible research, data min­ing, and user inter­ac­tion with data.
  • Hadley Wick­man, Pro­fes­sor of Sta­tis­tics at Rice Uni­ver­sity and R core devel­oper.  Hadley is the author of ggplot, and a leader in the com­plex data visu­al­iza­tion and analysis.
  • Matt Jones, Direc­tor of Infor­mat­ics Research and Devel­op­ment at the NCEAS, is a lead­ing expert in glob­ally scaled data-sharing net­works (KNB, DataONE), repro­ducible research work­flows (Kepler) and data semantics.
  • Bertram Ludaescher, Pro­fes­sor of Com­puter Sci­ence at UC Davis, is leader in sci­en­tific data man­age­ment and repro­ducible research.
  • J. J. Allaire dig­i­tal entre­pre­neur, founder of Cold­Fu­sion, Onfo­lio and now RStu­dio, and brings exper­tise beyonds the sphere of aca­d­e­mic research.

We hope to pro­vide a model other researchers and data­bases can repli­cate for inter­fac­ing with researchers, inter­fac­ing with R.  Our pack­ages are open-source and devel­oped in the open from their incep­tion.  The rOpen­Sci project is becom­ing a means of dis­cov­er­ing just how many data­bases are avail­able, and a com­mon envi­ron­ment in which to allow them to interact.

Active and devel­op­ing projects

  • RMende­ley
    Imple­men­ta­tion of the Mende­ley API in R (now on CRAN)
  • taxize_
    Search web tax­on­omy sites and down­load data
  • rplos
    Wrap­per for the PLoS Jour­nals API
  • rspringer
    Wrap­per for the Springer Jour­nals API
  • rbold
    Inter­face to the Bold Sys­tems bar­code web­site
  • rbhl
    R inter­face to the Bio­di­ver­sity Her­itage Library (BHL) API
  • rnpn
    Wrap­per to the National Phe­nol­ogy Net­work data­base API
  • rgbif
    Wrap­per to the Global Bio­di­ver­sity Infor­ma­tion Facil­ity API
  • rvert­net
    Wrap­per to the Vert­Net API
  • rdat­acite
    Wrap­per to Dat­aCite meta­data
  • rfish­eries
    pack­age for inter­act­ing with fish­eries data­bases
  • ritis
    Wrap­per to the Inte­grated Tax­o­nomic Infor­ma­tion Ser­vice (ITIS) API
  • rfly­base
    A wrap­per to Fly­Base data
  • rOpen­Sci
    R inter­face for lit­er­a­ture and data repos­i­to­ries
  • rEWDB
    An R wrap­per for Eco­log­i­cal Web Data­base
  • rfish­base
    R inter­face to the fishbase.org data­base
  • tree­BASE
    An R inter­face to the tree­BASE API
  • rfna
    Web page scrap­ing for Flora of North Amer­ica
  • use­cases
    Use cases for rOpen­Sci pack­ages
  • ropen­snp
    Wrap­per to the open­SNP data API
  • citeu­like
    R inter­face to CiteU­Like.
  • ropensci.github.com
    Project page for rOpen­Sci
  • ropen­sc­i­Toolkit
    Helper func­tions for ropen­sci pack­ages
  • rebird
    Wrap­per to the eBird API.
  • ralt­met
    Inter­face to many alt­met­rics data ser­vices.
  • rdryad
    R imple­men­ta­tion of the API for the Dryad bio­log­i­cal data repos­i­tory
  • rEco­Data
    Pro­gram­matic inter­face to the eco­data retreiver
  • rpen­soft
    Wrap­per to PENSOFT jour­nals web ser­vices.
  • textmine
    Our textmin­ing using rOpen­Sci pkgs man­u­script

(this list is gen­er­ated auto­mat­i­cally from Github)

Bib­li­og­ra­phy