Why JSON Pointer falls short (and why XPath for JSON would be …

Exploring XPath: A Flexible Selector Language for JSON Documents

Today, I stumbled upon an interesting discussion on my Twitter timeline about the JSON Pointer draft proposal and an Erlang JSON Pointer implementation on GitHub. I couldn't help but feel a surge of excitement, hoping to discover a solution that would revolutionize JSON document querying.

However, my enthusiasm quickly waned as I delved into the JSON Pointer spec draft. It became apparent that it lacked a crucial feature: the ability to retrieve all matching nodes in the JSON document representation tree for a given "pointer". This limitation posed a significant obstacle when attempting to extract multiple values efficiently. Let's take a closer look at a few examples to illustrate the issue.

JSON Pointer Examples

Imagine we have a JSON document representing a hilarious mock movie called "Java 4-ever." It contains details of the production, including a list of actors and their corresponding characters, like so.

{
  "title": "Java 4-ever",
  "url": "http://www.youtube.com/watch?v=H7QVITAWdBQ",
  "actors": [
    {
      "name": "Scala Johansson",
      "character": "A"
    },
    {
      "name": "William Windows",
      "character": "B"
    },
    {
      "name": "Eddie Larrison",
      "character": "C"
    },
    {
      "name": "Mona Lisa Harddrive",
      "character": "D"
    },
    {
      "name": "Lenny Linux",
      "character": "C (Young)"
    }
  ]
}

Suppose we want to extract the names of all the actors from this movie. With JSON Pointer, we would need to access each name individually using the respective indices:

/actors/1/name
/actors/2/name
/actors/3/name
/actors/4/name
/actors/5/name

This approach is undeniably cumbersome, verbose, and even inefficient. It hardly seems practical to employ JSON Pointers in this manner to retrieve a list of names.

Now, consider a scenario where the JSON document contains not just one movie's details but an entire category of movies. In such a case, using the JSON Pointer /actors/X/name (where X is a valid index for the first movie) would retrieve the Xth actor's name for the first movie, as dictated by the aforementioned spec draft. This limitation can quickly become tiresome and impractical.

Enter XPath

While XML may no longer be the trendiest technology around, there are certain aspects that XML introduced and which remain incredibly useful. One such feature is XPath, in my humble opinion.

I first encountered XPath approximately 7-8 years ago when constructing a canonical object-oriented model for a wide range of financial products spanning multiple asset classes. The goal was to represent every financial product, regardless of complexity, using the same fundamental modeling building blocks. It was a challenging endeavor, but XPath proved invaluable when querying the rich object model, often represented as XML at the system's integration endpoints.

XPath's value manifested in several ways:

Flexibility: XPath offers the ability to describe various paths for data representation.
Descriptiveness: Almost anyone can read an XPath expression and grasp the data it intends to access.
Representation agnosticism: XPath allows clients to utilize it with either a fully realized object representation or an unparsed XML document. This flexibility enables laziness in the runtime where appropriate.

You might be wondering how XPath achieves flexibility. While the JSON Pointer spec draft I mentioned earlier was concise and easy to read, the XPath specification goes further, providing means to initiate matching midway through a path (by adding an extra '/' as a prefix), select the parent of a matching node (by appending '..' to the XPath), choose attributes (by prefixing the attribute name with '@' in the XPath), and utilize various predicates (e.g., last(), position(), index numbers, >, <). However, the most significant distinction between XPath and JSON Pointer lies in the result set. XPath returns all elements or values that match the given selector, offering unparalleled flexibility.

To better understand this distinction, let's examine the aforementioned movie JSON document and how it would be selected using XPath:

XPath Expression: //actors/name Result: ["Scala Johansson", "William Windows", ..., "Lenny Linux"]

XPath Expression: title Result: ["Java 4-ever"]

XPath Expression: //actors/character Result: ["A", "B", "C", "D", "C (Young)"]

XPath Expression: //actors[2]/character Result: ["C"]

As you can see, XPath provides an incredible degree of flexibility. Even if our JSON document contained multiple movie documents nested under a subtree, the aforementioned XPath expressions would still yield the desired results.

A Glaring Gap in JSON Document Selectors

At this point, you might be wondering, "So, what's the point of this post, Susan?" Admittedly, this article primarily serves as an outlet for my frustration and longing for a robust selector language specifically tailored for JSON documents. While XPath has proven its worth and versatility in the XML world, JSON currently lacks a similarly powerful and expressive selector language.

However, if you share my sentiments and believe that a subset of XPath, customized for JSON documents, could provide a viable solution, I invite you to collaborate with me. Let's work together to distill a practical and usable subset of XPath, specifically designed for JSON documents. Whether it's through a draft proposal or any other means, let's join forces to address this pressing need. Even HTML boasts a capable element selection mechanism in the DOM for applying styles and behaviors. Shouldn't JSON documents have a comparable solution?

If you're interested in contributing or discussing this further, please feel free to contact me. Let's kickstart the initiative and explore the possibilities together. Who knows? We might just revolutionize JSON document querying and make the lives of developers and data enthusiasts a whole lot easier.

Susan Potter

Quant

Work with me

I spent the first half of my career building risk models and market data infrastructure at BNP Paribas, Bank of America, and Citadel, then fourteen years shipping production systems at scale. Now I bring both sides to quantitative trading. If you're a trading firm, family office, or fund looking to tighten the connection between your research ideas and your production trading systems, whether that's building validation pipelines, formalizing signal logic, or getting microstructure analytics into a deployable state, I'd like to hear what you're working on. Reach me at me@susanpotter.net.