Motivation
I strongly believe in sharing scientific data — I believe this is a win for individual scientists (http://dx.doi.org/10.1371/journal.pone.0000308), (http://dx.doi.org/10.1371/journal.pone.0026828) as well as for the larger enterprise of other researchers, publishers, funders, educators, and the general public. A lot of data has become available, (helped on by recent requirements), but most researchers (at least in my field) continue to work on their own data as before. We lack the tools and we lack the training to quickly and easily leverage such data. To address some of these challenges, I have helped found a project called rOpenSci, together with ecologists Scott Chamberlain and Karthik Ram.
What we do
The project creates tools in the form of R packages — a powerful data-manipulation and visualization environment familiar to most ecologists — that interface with the major data repositories to make serious data exploration and analysis easy and script-able. We have several released and many actively developed projects (listed below).
Advising Team
We have assembled a world class team to advise the project on R and scientific databases:
- Duncan Temple-Lang, Professor of Statistics at UC Davis and R core developer. Duncan is author of many packages including RCurl and XML which underlie many of the rOpenSci web interfacing, and a leading thinker in reproducible research, data mining, and user interaction with data.
- Hadley Wickman, Professor of Statistics at Rice University and R core developer. Hadley is the author of ggplot, and a leader in the complex data visualization and analysis.
- Matt Jones, Director of Informatics Research and Development at the NCEAS, is a leading expert in globally scaled data-sharing networks (KNB, DataONE), reproducible research workflows (Kepler) and data semantics.
- Bertram Ludaescher, Professor of Computer Science at UC Davis, is leader in scientific data management and reproducible research.
- J. J. Allaire digital entrepreneur, founder of ColdFusion, Onfolio and now RStudio, and brings expertise beyonds the sphere of academic research.
We hope to provide a model other researchers and databases can replicate for interfacing with researchers, interfacing with R. Our packages are open-source and developed in the open from their inception. The rOpenSci project is becoming a means of discovering just how many databases are available, and a common environment in which to allow them to interact.
Active and developing projects
-
RMendeley
Implementation of the Mendeley API in R (now on CRAN) -
taxize_
Search web taxonomy sites and download data -
rplos
Wrapper for the PLoS Journals API -
rspringer
Wrapper for the Springer Journals API -
rbold
Interface to the Bold Systems barcode website -
rbhl
R interface to the Biodiversity Heritage Library (BHL) API -
rnpn
Wrapper to the National Phenology Network database API -
rgbif
Wrapper to the Global Biodiversity Information Facility API -
rvertnet
Wrapper to the VertNet API -
rdatacite
Wrapper to DataCite metadata -
rfisheries
package for interacting with fisheries databases -
ritis
Wrapper to the Integrated Taxonomic Information Service (ITIS) API -
rflybase
A wrapper to FlyBase data -
rOpenSci
R interface for literature and data repositories -
rEWDB
An R wrapper for Ecological Web Database -
rfishbase
R interface to the fishbase.org database -
treeBASE
An R interface to the treeBASE API -
rfna
Web page scraping for Flora of North America -
usecases
Use cases for rOpenSci packages -
ropensnp
Wrapper to the openSNP data API -
citeulike
R interface to CiteULike. -
ropensci.github.com
Project page for rOpenSci -
ropensciToolkit
Helper functions for ropensci packages -
rebird
Wrapper to the eBird API. -
raltmet
Interface to many altmetrics data services. -
rdryad
R implementation of the API for the Dryad biological data repository -
rEcoData
Programmatic interface to the ecodata retreiver -
rpensoft
Wrapper to PENSOFT journals web services. -
textmine
Our textmining using rOpenSci pkgs manuscript
(this list is generated automatically from Github)
Bibliography