• Mostly writing application today…
  • Review for EcoLet
  • looking over pomdp/momdp
  • Notebook archives now on figshare (plain-txt versions). Can now also cite entries by doi, e.g.: Lab Notebook (2010) doi:10.6084/m9.figshare.96916, and Lab Notebook, (2011) doi:10.6084/m9.figshare.96919. Would publishing individual entries and individual figshare objects be preferable or just annoying?
  • Meeting tomorrow with Annette Thomas, CEO NPG/MacMillian. review interview with David Worlock

Munch data project

  • Data extraction for geospatial life history project (file

R script for converting place names into latitude/longitude:

construct.geocode.url <- function(address, = "json", sensor = "false") {
    root <- ""
    u <- paste(root,, "?address=", address, "&sensor=", sensor, sep = "")
gGeoCode <- function(address,verbose=FALSE) {
  if(verbose) cat(address,"\n")
  u <- construct.geocode.url(address)
  doc <- getURL(u)
  x <- fromJSON(doc,simplify = FALSE)
  if(x$status=="OK") {
    lat <- x$results[[1]]$geometry$location$lat
    lng <- x$results[[1]]$geometry$location$lng
    return(c(lat, lng))
  } else {

XML compliance in HTML5

See labnotebook/issues/21

HTML5 allows tags to be unclosed, e.g. <meta charset="utf-8"> instead of <meta charset="utf-8" /> but this is not valid XML.

See: * Polyglot/xhtml5 author guide * Benefits of XML * W3C XHTML media type * Optional tags in html5

Note that tags with no content must be closed with /> (like the meta example) while tags with content must get a normal closing (<li> item </li> etc). Despite this, the HTML5 spec is kinda logical since closing is syntax redundant even though not closing is not valid XML, see SO question for discussion.


Basic XML parsing is now possible, e.g. in R:

tt <- getURLContent("")
doc <- xmlParse(tt)
getNodeSet(doc, "//x:meta", namespace="x")

Validation requires meta tags only appear in <head> and not <body> (but still do use property and not name for RDFa properties). Validation also requries all images have alt-text, which means I should remember to use anchor text om pandoc images (e.g. also name my graph knitr chunks intelligently). Otherwise the site validates as HTML5.

Ignoring the declared content-type will nearly validate as XML. Had to make the attribute onClick from the Google tracker javascript lowercase onclick, like all other attributes. Now th only challenge is that HTML5 needs lang= declared in <html> tag, (while is tolerant of xml:lang), but XML doesn’t want to see lang in the <html> tag. Perhaps can go somewhere else and be valid HTML5? (Using a meta tag for this is now obsolete).


Mathml namespace is loaded by default in HTML5 (a native namespace). Currently using the javascript MathJax which has very nice display and interactive toggles (including getting the latex or mathml source). It looks like MathJax can take either latex input or mathml input, and Pandoc can render the latex source (much much nicer to type) into mathml. Just need to get pandoc to convert the source to mathml and then have the mathml rendered pretty by MathJax css, e.g. This may also load faster than javascript-based rendering of MathJax(?)

ALM goals

Headed to the PLOS ALM conference with ropensci next week. Should be a good chance to catch up on projects:

  • rfigshare to CRAN
  • semantic knitcitations
  • Mark Hahnel: figshare archiving of notebook – archive the markdown? archive code? archive git repos?
  • Martin Fenner: Scholarly HTML standards, notebook standards (can we get John Kunze involved?)