Writing out a data management plan for myself. Suggestions and feedback welcome.
Data Types and Structure
All source code, documentation, scripts and data for the analyses performed in the course of this research shall be maintained in a digital compendium using the R package structure as recommended in Gentleman and Lang (2007). The progress and results of this research shall be regularly chronicled in an electronic lab notebook, maintained openly at carlboettiger.info/lab-notebook.html. Figures and other results are maintained with the source-code required to reproduce them by writing these analyses using knitr dynamic documentation.
Data Acquisition, Integrity and Quality
The lab notebook is written and maintained in plain text (UTF-8) and rendered in HTML5. Likewise, the R package compendium will maintain all source code, scripts and documentation in plain-text (UTF-8) files. Plain-text files with standard encodings help retain compatibility independent of software. Both the notebook entries and the compendium will be maintained in unique git repositories.
Git repositories use unique SHA hashes to protect against corruption. Synchronized backups of the git repositories are maintained on local and remote servers (RAID 6) to protect against hardware failures, as well as on the public international software repository, Github github.com/cboettig. Version history preserves a time-line of changes and protects against user error.
Archival copies of notebook entries shall be published annually to figshare where they are assigned DOIs and preserved by the CLOCKSS geopolitically distributed 12 node global archive. Likewise an archival copy of the R compendium shall be published to figshare at the time of each peer-reviewed publication.
Rights Management & Dissemination
All products generated by this research will be licensed under permissive licenses supporting reuse, re-distribution, and derivatives for free for any purpose without request from a major online repository; Table 1.
|research notebook||cc-zero||Github, figshare|
Peer-reviewed publications will target preprint-friendly publishers and an author’s preprint will be posted on the arXiv under a Creative Commons Attribution license to facilitate access and distribution.
Gentleman R and Temple Lang D (2007). “Statistical Analyses And Reproducible Research.” Journal of Computational And Graphical Statistics, 16. ISSN 1061-8600, doi: 10.1198/106186007X178663