ugh, most journals don’t seem to hold to their own data publication policies.

PNAS theoretically endorses UPSIDE, r citet("10.1073/pnas.0400437101") principles from the National Academy of Science:

  • Principle 1. Authors should include in their publications the data, algorithms, or other information that is central or integral to the publication—that is, whatever is necessary to support the major claims of the paper and would enable one skilled in the art to verify or replicate the claims.
  • Principle 2. If central or integral information cannot be included in the publication for practical reasons (for example, because a data set is too large), it should be made freely (without restriction on its use for research purposes and at no cost) and readily accessible through other means (for example, online). Moreover, when necessary to enable further research, integral information should be made available in a form that enables it to be manipulated, analyzed, and combined with other scientific data.
  • Principle 3. If publicly accessible repositories for data have been agreed on by a community of researchers and are in general use, the relevant data should be deposited in one of the repositories by the time of publication.
  • Principle 4. Authors of scientific publications should anticipate which materials integral to their publications are likely to be requested and should state in the “Materials and Methods” section or elsewhere how to obtain them.
  • Principle 5. If a material integral to a publication is patented, the provider of the material should make the material available under a license for research use.

Kinda cool. Cooler if it was actually practiced. By the way, when did PNAS discontinue their HTML versions of articles? They were useful! (and one of few with mouse-over citations).


  • Added datatype detection, fixed some checks: ad2d333 and see issue #18.

  • Not sure how to handle recursive node definitions in R’s S4 representation of Schema (ask Duncan..) e.g. a meta element can have child meta elements, but setClass("meta", ..., contains="meta") seems problematic (arises in EML taxon definitions too..)

  • meta elements can also take arbitrary RDF / RDFa as child elements; possibly convenient for adding large blocks of existing semantic metadta, see issue #23

thinking about CalCofi data

Ooh, CalCofi data in Nature Comment this morning: r citet("10.1038/502163a"). Sam says data will be moving to NOAA’s ERDDAP when shutdown ends. Also see his new book on this data, r citet("").

dataone reml integration

Hmm, read-only scripts working for me now but not authentication, see issue #3. Using credentials from Google via


Citations correlate with … no other metric of merit. r citet("10.1371/journal.pbio.1001677")