Using JSON Queries (JQ)

Pure JSON

vita <- readr::read_file("../../static/js/vita.json")

jq(vita, '."@reverse".author[]  | 
   { year: .dateCreated, 
     author: .author[] | [.givenName, .familyName]  | join(" ")
   }') %>% combine() %>% fromJSON()
yearauthor
2017-12-09Wayne M. Getz
2017-12-09Charles R. Marshall
2017-12-09Colin J. Carlson
2017-12-09Luca Giuggioli
2017-12-09Sadie J. Ryan
2017-12-09Stephanie S. Romañach
2017-12-09Carl Boettiger
2017-12-09Samuel D. Chamberlain
2017-12-09Laurel Larsen
2017-12-09Paolo D’Odorico
2017-12-09David O’Sullivan
2017-03-15Stephanie E. Hampton
2017-03-15Matthew B. Jones
2017-03-15Leah A. Wasser
2017-03-15Mark P. Schildhauer
2017-03-15Sarah R. Supp
2017-03-15Julien Brun
2017-03-15Rebecca R. Hernandez
2017-03-15Carl Boettiger
2017-03-15Scott L. Collins
2017-03-15Louis J. Gross
2017-03-15Denny S. Fernández
2017-03-15Amber Budden
2017-03-15Ethan P. White
2017-03-15Tracy K. Teal
2017-03-15Stephanie G. Labou
2017-03-15Juliann E. Aukema
2016-8-19T. Alex Perkins
2016-8-19Carl Boettiger
2016-8-19Benjamin L. Phillips
2016-5-6Carl Boettiger
2016-5-6Michael Bode
2016-5-6James N. Sanchirico
2016-5-6Jacob LaRiviere
2016-5-6Alan Hastings
2016-5-6Paul R. Armsworth
2015-11-16Carl Boettiger
2015-11-16Scott Chamberlain
2015-11-16Ted Harte
2015-11-16Karthik Ram
2015-9-3Carl Boettiger
2015-9-3Scott Chamberlain
2015-9-3Rutger Vos
2015-9-3Hilmar Lapp
2015-1-28Carl Boettiger
2015-1-7Boettiger
2015-1-7M. Mangel
2015-1-7S. Munch
2013-7-10Boettiger
2013-7-10Alan Hastings
2013-6-20Carl Boettiger
2013-6-20Noam Ross
2013-6-20Alan Hastings
2013-01-08Boettiger
2013-01-08Alan Hastings
2012-10-10Boettiger
2012-10-10Alan Hastings
2012-11-6Boettiger
2012-11-6D. T. Lang
2012-11-6P. C. Wainwright
2012-10-11Carl Boettiger
2012-10-11Duncan Temple Lang
2012-5-16
2012-5-16Alan Hastings
2012-3-13Jeremy M. Beaulieu
2012-3-13Dwueng-Chwuan Jhwueng
2012-3-13
2012-3-13Brian C. O’Meara
2012-2-19Carl Boettiger
2012-2-19Graham Coop
2012-2-19Peter Ralph
2009-10-19Carl Boettiger
2009-10-19Jonathan Dushoff
2009-10-19Joshua S. Weitz
2006-11-27James J. Wray
2006-11-27Neta A. Bahcall
2006-11-27Paul Bode
2006-11-27Carl Boettiger
2006-11-27Philip F. Hopkins

With JSON-LD frame

By first constructing a frame, we can get back a subset of the data we are interested in. This is not as powerful as a graph query, but still has aspects of schema-on-read.

frame <-
'{
"@context": "http://schema.org",
"@type": "ScholarlyArticle",
"author": {
  "@type": "Person",
  "givenName": {},
  "familyName": {},
  "@explicit": true
},
"dateCreated": {},
"@explicit": true
}'


vita <- jsonld::jsonld_frame("../../static/js/vita.json", frame)
as.character(vita) %>% 
  jq('."@graph"[] | { 
     year: .dateCreated, 
     author: .author[] | [.givenName, .familyName] | join(" ")    
     }') %>% combine() %>% fromJSON()
yearauthor
2017-12-09Wayne M. Getz
2017-12-09Charles R. Marshall
2017-12-09Colin J. Carlson
2017-12-09Luca Giuggioli
2017-12-09Sadie J. Ryan
2017-12-09Stephanie S. Romañach
2017-12-09Carl Boettiger
2017-12-09Samuel D. Chamberlain
2017-12-09Laurel Larsen
2017-12-09Paolo D’Odorico
2017-12-09David O’Sullivan
2017-12-09Carl Boettiger
2016-8-19T. Alex Perkins
2016-8-19Carl Boettiger
2016-8-19Benjamin L. Phillips
2013-6-20Carl Boettiger
2013-6-20Noam Ross
2013-6-20Alan Hastings
2009-10-19Carl Boettiger
2009-10-19Jonathan Dushoff
2009-10-19Joshua S. Weitz
2013-01-08Carl Boettiger
2013-01-08Alan Hastings
2006-11-27James J. Wray
2006-11-27Neta A. Bahcall
2006-11-27Paul Bode
2006-11-27Carl Boettiger
2006-11-27Philip F. Hopkins
2017-03-15Stephanie E. Hampton
2017-03-15Matthew B. Jones
2017-03-15Leah A. Wasser
2017-03-15Mark P. Schildhauer
2017-03-15Sarah R. Supp
2017-03-15Julien Brun
2017-03-15Rebecca R. Hernandez
2017-03-15Carl Boettiger
2017-03-15Scott L. Collins
2017-03-15Louis J. Gross
2017-03-15Denny S. Fernández
2017-03-15Amber Budden
2017-03-15Ethan P. White
2017-03-15Tracy K. Teal
2017-03-15Stephanie G. Labou
2017-03-15Juliann E. Aukema
2012-5-16Carl Boettiger
2012-5-16Alan Hastings
2012-10-10Carl Boettiger
2012-10-10Alan Hastings
2013-7-10Carl Boettiger
2013-7-10Alan Hastings
2015-1-7Carl Boettiger
2015-1-7M. Mangel
2015-1-7S. Munch
2015-9-3Carl Boettiger
2015-9-3Scott Chamberlain
2015-9-3Rutger Vos
2015-9-3Hilmar Lapp
2012-11-6Carl Boettiger
2012-11-6D. T. Lang
2012-11-6P. C. Wainwright
2012-2-19Carl Boettiger
2012-2-19Graham Coop
2012-2-19Peter Ralph
2012-3-13Jeremy M. Beaulieu
2012-3-13Dwueng-Chwuan Jhwueng
2012-3-13Carl Boettiger
2012-3-13Brian C. O’Meara
2012-10-11Carl Boettiger
2012-10-11Duncan Temple Lang

SPARQL and RDF

A simple example

"http://dx.doi.org/10.1002/ece3.2314" %>%
  httr::GET(httr::add_headers(Accept="application/rdf+xml")) %>%
  httr::content(as = "parsed", type = "application/xml") %>%
  xml2::write_xml("ex.xml")

Our rdflib functions perform the simple task of parsing this rdfxml file into R (as a redland rdf class object) and then writing it back out in jsonld serialization:

rdf_parse("ex.xml", "rdfxml") %>% 
  rdf_serialize("ex.json", "jsonld")

and we now have JSON file. We can clean this file up a bit by replacing the long URIs with short prefixes by “compacting” the file into a specific JSON-LD context. FOAF, OWL, and Dublin Core are all recognized by schema.org, so we need not declare them at all here. PRISM and BIBO ontologies are not, so we simply declare them as additional prefixes:

context <- 
'{ "@context": [
    "http://schema.org",
  {
    "prism": "http://prismstandard.org/namespaces/basic/2.1/",
    "bibo": "http://purl.org/ontology/bibo/"
  }]
}'
json <- jsonld_compact("ex.json", context)

Switching contexts and framing

context <- 
 '{
    "prism": "http://prismstandard.org/namespaces/basic/2.1/",
    "dc": "http://purl.org/dc/terms/",
    "bibo": "http://purl.org/ontology/bibo/",
    "foaf": "http://xmlns.com/foaf/0.1/",
    "owl": "http://www.w3.org/2002/07/owl#",
    "schema": "http://schema.org/",

    "schema:pageStart": "prism:startingPage", 
    "schema:pageEnd": "prism:endingPage",
    "schema:volumeNumber": "prism:volume",
    "schema:identifier": {"@id": "prism:issn", "@type": "@id"},

    "schema:Periodical": "bibo:Journal",

    "schema:author": "dc:creator",
    "schema:isPartOf": "dc:isPartOf",
    "schema:publisher": "dc:publisher",
    "schema:name": "dc:title",

    "schema:familyName": "foaf:familyName",
    "schema:givenName": "foaf:givenName",
    "schema:Person": "foaf:Person",

    "schema:sameAs": {"@id": "owl:sameAs", "@type": "@id"},
    "schema:Date": "xsd:date",
    "schema:datePublished": {"@id": "http://purl.org/dc/terms/date", "@type": "schema:Date"}
}'

Compact raw JSON into this context

jsonld_compact("ex.json", context) %>% 
  fromJSON(simplifyVector = FALSE) -> X

Now replace that context with schema.org context, a bit of a hack

X[["@context"]] <- "http://schema.org"
X %>% 
  toJSON(auto_unbox = TRUE, pretty = TRUE) %>% 
  jsonld_compact("http://schema.org") -> Y

Now frame our desired results to explicitly include only the elements we request, giving the graph in the desired tree structure:

frame <- 
'{"@context": "http://schema.org",
 "@graph": {
   "id": {},
   "name": {},
   "pageStart": {},
    "pageEnd": {},
    "isPartOf": {
      "name": {},
      "identifier": {},
      "@explicit": true
    },
    "author": [
            {
              "givenName": {},
              "familyName": {},
              "@explicit": true
            }],
   "@explicit": true
 }
}'

jsonld_frame(Y, frame)
## {
##   "@context": "http://schema.org",
##   "@graph": [
##     {
##       "id": "http://dx.doi.org/10.1002/ece3.2314",
##       "author": [
##         {
##           "id": "http://id.crossref.org/contributor/carl-boettiger-2etprmps2zm1a",
##           "type": "Person",
##           "familyName": "Boettiger",
##           "givenName": "Carl"
##         },
##         {
##           "id": "http://id.crossref.org/contributor/t-alex-perkins-2etprmps2zm1a",
##           "type": "Person",
##           "familyName": "Perkins",
##           "givenName": "T. Alex"
##         },
##         {
##           "id": "http://id.crossref.org/contributor/benjamin-l-phillips-2etprmps2zm1a",
##           "type": "Person",
##           "familyName": "Phillips",
##           "givenName": "Benjamin L."
##         }
##       ],
##       "isPartOf": null,
##       "name": "After the games are over: life-history trade-offs drive dispersal attenuation following range expansion",
##       "pageEnd": "6434",
##       "pageStart": "6425"
##     }
##   ]
## }

Note that the RDF has different semantic models than schema.org: for instance, volume is a property of the scholarly article (well, it’s untyped in the RDF, but it’s a property of the object described by the article DOI), while in schema.org, volumeNumber is a property of a Periodical (or PublicationVolume), which hasParts made up of PublicationIssue objects, themselves hasParts made up of ScholarlyArticles. The whole purpose of JSON-LD functions are to respect semantics, therefore there is no way we can use JSON-LD operations to alter these semantics.

As long as we aren’t changing the object structures though, we can change the vocabulary. This is really also something of a hack: we compact the original data, and then just chop off the @context and provide our own @context that gives schema.org definitions to the terms.

JSON-LD is commonly used to change key names, but this assumes that both contexts can be defined relative to the same URIs. e.g. we can say that in the context of Dublin Core, implicitly "title": "http://schema.org/name", or explicitly:https://purl.org/dc/title”: “http://schema.org/name”`.

Perhaps this ought instead to be done with an ontological operation and the assertion of sameAs and similar relationships. Perhaps that would also permit moving between these different levels?

Note that items with specific types must be declared as such to match types expected in schema.org. Others can be captured as schema.org terms just by setting the default @vocab.

comments powered by Disqus