Parsing very nested JSON documents can be a pain. Here are some notes on co-opting the strategy of “Framing” used in JSON-LD. (Note that unlike the basic operations of compaction and expansion, the JSON-LD framing algorithm actually is essentially independent of the @context and any linked data concepts.

Here’s a toy example of some nested JSON. Very nested structures are usually the source of issues for me, even with purrr, because often I want to pull data found at various different levels of nesting into a single row for the data.frame I care about.

library("jsonlite")
library("jsonld")
library("magrittr")
json <-'{
      "@id": "http://example.org/library",
      "@type": "ex:Library",
      "ex:contains": {
        "@id": "http://example.org/library/the-republic",
        "@type": "ex:Book",
        "ex:contains": {
          "@id": "http://example.org/library/the-republic#introduction",
          "@type": "ex:Chapter",
          "dc:description": "An introductory chapter on The Republic.",
          "dc:title": "The Introduction"
        },
        "dc:creator": "Plato",
        "dc:title": "The Republic"
      }
    }
  '

The default behavior of jsonlite:flatten does not return a data frame here:

df <-fromJSON(json, flatten = TRUE)
class(df)
## [1] "list"

Note that df is still a (rather cumbersome!) list. This is particularly annoying because the type/structure is unpredictable (depends on how much a nesting a given element might have), so hard to program around, so we usually wind not flattening the data (but having to iterate over some often ugly nesting).

A JSON-LD framing solution

Let’s imagine I just want to pull out book titles from the middle of that nested structure. Here’s a frame for that:

frame <-
'{
  "@explicit": "true",
  "@type": "ex:Book",
  "dc:title": {}
}'
jsonld_frame(json, frame) %>% fromJSON()
## $`@graph`
##                                       @id   @type     dc:title
## 1 http://example.org/library/the-republic ex:Book The Republic

How about a data frame with the title and creator for all objects, regardless of nesting depth:

frame <-
'{
  "@explicit": "true",
  "@id": {},
  "dc:title": {"@default": "NA"},
  "dc:creator": {"@default": "NA"}
}'

jsonld_frame(json, frame) %>% fromJSON()
## $`@graph`
##                                                    @id      @type
## 1              http://example.org/library/the-republic    ex:Book
## 2 http://example.org/library/the-republic#introduction ex:Chapter
##   dc:creator         dc:title
## 1      Plato     The Republic
## 2         NA The Introduction

This strategy is also very effective when you either don’t know exactly how the data is structured, or the data structure changes either over time or across different records provided by the data provider (e.g. when some entries may have more nested content than other entries of the same type).

More details on the syntax used in specifying a frame can be found in the offical documentation.

comments powered by Disqus