Using JSON Queries (JQ)

Pure JSON

vita <- readr::read_file("../../static/js/vita.json")

jq(vita, '."@reverse".author[]  | 
   { year: .dateCreated, 
     author: .author[] | [.givenName, .familyName]  | join(" ")
   }') %>% combine() %>% fromJSON()
year author
2017-12-09 Wayne M. Getz
2017-12-09 Charles R. Marshall
2017-12-09 Colin J. Carlson
2017-12-09 Luca Giuggioli
2017-12-09 Sadie J. Ryan
2017-12-09 Stephanie S. Romañach
2017-12-09 Carl Boettiger
2017-12-09 Samuel D. Chamberlain
2017-12-09 Laurel Larsen
2017-12-09 Paolo D’Odorico
2017-12-09 David O’Sullivan
2017-03-15 Stephanie E. Hampton
2017-03-15 Matthew B. Jones
2017-03-15 Leah A. Wasser
2017-03-15 Mark P. Schildhauer
2017-03-15 Sarah R. Supp
2017-03-15 Julien Brun
2017-03-15 Rebecca R. Hernandez
2017-03-15 Carl Boettiger
2017-03-15 Scott L. Collins
2017-03-15 Louis J. Gross
2017-03-15 Denny S. Fernández
2017-03-15 Amber Budden
2017-03-15 Ethan P. White
2017-03-15 Tracy K. Teal
2017-03-15 Stephanie G. Labou
2017-03-15 Juliann E. Aukema
2016-8-19 T. Alex Perkins
2016-8-19 Carl Boettiger
2016-8-19 Benjamin L. Phillips
2016-5-6 Carl Boettiger
2016-5-6 Michael Bode
2016-5-6 James N. Sanchirico
2016-5-6 Jacob LaRiviere
2016-5-6 Alan Hastings
2016-5-6 Paul R. Armsworth
2015-11-16 Carl Boettiger
2015-11-16 Scott Chamberlain
2015-11-16 Ted Harte
2015-11-16 Karthik Ram
2015-9-3 Carl Boettiger
2015-9-3 Scott Chamberlain
2015-9-3 Rutger Vos
2015-9-3 Hilmar Lapp
2015-1-28 Carl Boettiger
2015-1-7 Boettiger
2015-1-7 M. Mangel
2015-1-7 S. Munch
2013-7-10 Boettiger
2013-7-10 Alan Hastings
2013-6-20 Carl Boettiger
2013-6-20 Noam Ross
2013-6-20 Alan Hastings
2013-01-08 Boettiger
2013-01-08 Alan Hastings
2012-10-10 Boettiger
2012-10-10 Alan Hastings
2012-11-6 Boettiger
2012-11-6 D. T. Lang
2012-11-6 P. C. Wainwright
2012-10-11 Carl Boettiger
2012-10-11 Duncan Temple Lang
2012-5-16
2012-5-16 Alan Hastings
2012-3-13 Jeremy M. Beaulieu
2012-3-13 Dwueng-Chwuan Jhwueng
2012-3-13
2012-3-13 Brian C. O’Meara
2012-2-19 Carl Boettiger
2012-2-19 Graham Coop
2012-2-19 Peter Ralph
2009-10-19 Carl Boettiger
2009-10-19 Jonathan Dushoff
2009-10-19 Joshua S. Weitz
2006-11-27 James J. Wray
2006-11-27 Neta A. Bahcall
2006-11-27 Paul Bode
2006-11-27 Carl Boettiger
2006-11-27 Philip F. Hopkins

With JSON-LD frame

By first constructing a frame, we can get back a subset of the data we are interested in. This is not as powerful as a graph query, but still has aspects of schema-on-read.

frame <-
'{
"@context": "http://schema.org",
"@type": "ScholarlyArticle",
"author": {
  "@type": "Person",
  "givenName": {},
  "familyName": {},
  "@explicit": true
},
"dateCreated": {},
"@explicit": true
}'


vita <- jsonld::jsonld_frame("../../static/js/vita.json", frame)
as.character(vita) %>% 
  jq('."@graph"[] | { 
     year: .dateCreated, 
     author: .author[] | [.givenName, .familyName] | join(" ")    
     }') %>% combine() %>% fromJSON()
year author
2017-12-09 Wayne M. Getz
2017-12-09 Charles R. Marshall
2017-12-09 Colin J. Carlson
2017-12-09 Luca Giuggioli
2017-12-09 Sadie J. Ryan
2017-12-09 Stephanie S. Romañach
2017-12-09 Carl Boettiger
2017-12-09 Samuel D. Chamberlain
2017-12-09 Laurel Larsen
2017-12-09 Paolo D’Odorico
2017-12-09 David O’Sullivan
2017-12-09 Carl Boettiger
2016-8-19 T. Alex Perkins
2016-8-19 Carl Boettiger
2016-8-19 Benjamin L. Phillips
2013-6-20 Carl Boettiger
2013-6-20 Noam Ross
2013-6-20 Alan Hastings
2009-10-19 Carl Boettiger
2009-10-19 Jonathan Dushoff
2009-10-19 Joshua S. Weitz
2013-01-08 Carl Boettiger
2013-01-08 Alan Hastings
2006-11-27 James J. Wray
2006-11-27 Neta A. Bahcall
2006-11-27 Paul Bode
2006-11-27 Carl Boettiger
2006-11-27 Philip F. Hopkins
2017-03-15 Stephanie E. Hampton
2017-03-15 Matthew B. Jones
2017-03-15 Leah A. Wasser
2017-03-15 Mark P. Schildhauer
2017-03-15 Sarah R. Supp
2017-03-15 Julien Brun
2017-03-15 Rebecca R. Hernandez
2017-03-15 Carl Boettiger
2017-03-15 Scott L. Collins
2017-03-15 Louis J. Gross
2017-03-15 Denny S. Fernández
2017-03-15 Amber Budden
2017-03-15 Ethan P. White
2017-03-15 Tracy K. Teal
2017-03-15 Stephanie G. Labou
2017-03-15 Juliann E. Aukema
2012-5-16 Carl Boettiger
2012-5-16 Alan Hastings
2012-10-10 Carl Boettiger
2012-10-10 Alan Hastings
2013-7-10 Carl Boettiger
2013-7-10 Alan Hastings
2015-1-7 Carl Boettiger
2015-1-7 M. Mangel
2015-1-7 S. Munch
2015-9-3 Carl Boettiger
2015-9-3 Scott Chamberlain
2015-9-3 Rutger Vos
2015-9-3 Hilmar Lapp
2012-11-6 Carl Boettiger
2012-11-6 D. T. Lang
2012-11-6 P. C. Wainwright
2012-2-19 Carl Boettiger
2012-2-19 Graham Coop
2012-2-19 Peter Ralph
2012-3-13 Jeremy M. Beaulieu
2012-3-13 Dwueng-Chwuan Jhwueng
2012-3-13 Carl Boettiger
2012-3-13 Brian C. O’Meara
2012-10-11 Carl Boettiger
2012-10-11 Duncan Temple Lang

SPARQL and RDF

A simple example

"http://dx.doi.org/10.1002/ece3.2314" %>%
  httr::GET(httr::add_headers(Accept="application/rdf+xml")) %>%
  httr::content(as = "parsed", type = "application/xml") %>%
  xml2::write_xml("ex.xml")

Our rdflib functions perform the simple task of parsing this rdfxml file into R (as a redland rdf class object) and then writing it back out in jsonld serialization:

rdf_parse("ex.xml", "rdfxml") %>% 
  rdf_serialize("ex.json", "jsonld")

and we now have JSON file. We can clean this file up a bit by replacing the long URIs with short prefixes by “compacting” the file into a specific JSON-LD context. FOAF, OWL, and Dublin Core are all recognized by schema.org, so we need not declare them at all here. PRISM and BIBO ontologies are not, so we simply declare them as additional prefixes:

context <- 
'{ "@context": [
    "http://schema.org",
  {
    "prism": "http://prismstandard.org/namespaces/basic/2.1/",
    "bibo": "http://purl.org/ontology/bibo/"
  }]
}'
json <- jsonld_compact("ex.json", context)

Switching contexts and framing

context <- 
 '{
    "prism": "http://prismstandard.org/namespaces/basic/2.1/",
    "dc": "http://purl.org/dc/terms/",
    "bibo": "http://purl.org/ontology/bibo/",
    "foaf": "http://xmlns.com/foaf/0.1/",
    "owl": "http://www.w3.org/2002/07/owl#",
    "schema": "http://schema.org/",

    "schema:pageStart": "prism:startingPage", 
    "schema:pageEnd": "prism:endingPage",
    "schema:volumeNumber": "prism:volume",
    "schema:identifier": {"@id": "prism:issn", "@type": "@id"},

    "schema:Periodical": "bibo:Journal",

    "schema:author": "dc:creator",
    "schema:isPartOf": "dc:isPartOf",
    "schema:publisher": "dc:publisher",
    "schema:name": "dc:title",

    "schema:familyName": "foaf:familyName",
    "schema:givenName": "foaf:givenName",
    "schema:Person": "foaf:Person",

    "schema:sameAs": {"@id": "owl:sameAs", "@type": "@id"},
    "schema:Date": "xsd:date",
    "schema:datePublished": {"@id": "http://purl.org/dc/terms/date", "@type": "schema:Date"}
}'

Compact raw JSON into this context

jsonld_compact("ex.json", context) %>% 
  fromJSON(simplifyVector = FALSE) -> X

Now replace that context with schema.org context, a bit of a hack

X[["@context"]] <- "http://schema.org"
X %>% 
  toJSON(auto_unbox = TRUE, pretty = TRUE) %>% 
  jsonld_compact("http://schema.org") -> Y

Now frame our desired results to explicitly include only the elements we request, giving the graph in the desired tree structure:

frame <- 
'{"@context": "http://schema.org",
 "@graph": {
   "id": {},
   "name": {},
   "pageStart": {},
    "pageEnd": {},
    "isPartOf": {
      "name": {},
      "identifier": {},
      "@explicit": true
    },
    "author": [
            {
              "givenName": {},
              "familyName": {},
              "@explicit": true
            }],
   "@explicit": true
 }
}'

jsonld_frame(Y, frame)
## {
##   "@context": "http://schema.org",
##   "@graph": [
##     {
##       "id": "http://dx.doi.org/10.1002/ece3.2314",
##       "author": [
##         {
##           "id": "http://id.crossref.org/contributor/carl-boettiger-2etprmps2zm1a",
##           "type": "Person",
##           "familyName": "Boettiger",
##           "givenName": "Carl"
##         },
##         {
##           "id": "http://id.crossref.org/contributor/t-alex-perkins-2etprmps2zm1a",
##           "type": "Person",
##           "familyName": "Perkins",
##           "givenName": "T. Alex"
##         },
##         {
##           "id": "http://id.crossref.org/contributor/benjamin-l-phillips-2etprmps2zm1a",
##           "type": "Person",
##           "familyName": "Phillips",
##           "givenName": "Benjamin L."
##         }
##       ],
##       "isPartOf": null,
##       "name": "After the games are over: life-history trade-offs drive dispersal attenuation following range expansion",
##       "pageEnd": "6434",
##       "pageStart": "6425"
##     }
##   ]
## }

Note that the RDF has different semantic models than schema.org: for instance, volume is a property of the scholarly article (well, it’s untyped in the RDF, but it’s a property of the object described by the article DOI), while in schema.org, volumeNumber is a property of a Periodical (or PublicationVolume), which hasParts made up of PublicationIssue objects, themselves hasParts made up of ScholarlyArticles. The whole purpose of JSON-LD functions are to respect semantics, therefore there is no way we can use JSON-LD operations to alter these semantics.

As long as we aren’t changing the object structures though, we can change the vocabulary. This is really also something of a hack: we compact the original data, and then just chop off the @context and provide our own @context that gives schema.org definitions to the terms.

JSON-LD is commonly used to change key names, but this assumes that both contexts can be defined relative to the same URIs. e.g. we can say that in the context of Dublin Core, implicitly "title": "http://schema.org/name", or explicitly:https://purl.org/dc/title”: “http://schema.org/name”`.

Perhaps this ought instead to be done with an ontological operation and the assertion of sameAs and similar relationships. Perhaps that would also permit moving between these different levels?

Note that items with specific types must be declared as such to match types expected in schema.org. Others can be captured as schema.org terms just by setting the default @vocab.

comments powered by Disqus