Skip to content

Latest commit

 

History

History
579 lines (498 loc) · 18.3 KB

README.md

File metadata and controls

579 lines (498 loc) · 18.3 KB

jql

Hey there!

You're probably here cause you're fed up with other json query processors being too complicated to use for anything surpassing simple single field selection.

Well, at least that's what led me here. And that's why I've written jql, a json query processor with an even more cryptic - lispy - syntax (Maybe not that cryptic after all? I quite like it :) )

jql aims to be friendly, helpful, pat you on the back when you're faced with a monstrosity of a JSON blob. Help you mold it into something useful, step by step.

If you want to see benchmarks, they're at the end of the README.

If you want to see a cheatsheet with function types, it's right before the benchmarks.

Ok, let's check it out now, but first things first, you have to install it:

go get github.com/cube2222/jql

Ok. Done.

If you don't have the Go toolchain installed, just download one of the release binaries.

If you want an interactive window to get live output for queries, use:

echo '' | fzf --print-query --preview-window wrap --preview 'cat test.json | jql {q}'

Let's check out a few simple examples. (remember? That's explicitly not why we're here. But it aids understanding of the more complex examples, so stay with me just a little bit longer!)

We'll be working with this piece of json:

{ 
  "count": 3,
  "countries": [
    {
      "name": "Poland",
      "population": 38000000,
      "european": true,
      "eu_since": "2004"
    },
    {
      "name": "United States",
      "population": 327000000,
      "european": false
    },
    {
      "name": "Germany",
      "population": 83000000,
      "european": true,
      "eu_since": "1993"
    }
  ]
}

To start with, let's get the countries array only.

> cat test.json | jql '(elem "countries")'
[
  {
    "eu_since": "2004",
    "european": true,
    "name": "Poland",
    "population": 38000000
  },
  {
    "european": false,
    "name": "United States",
    "population": 327000000
  },
  {
    "eu_since": "1993",
    "european": true,
    "name": "Germany",
    "population": 83000000
  }
]

Whoa, whoa, whoa! What's up with the parentheses?!

Remember what I said before? Lispy syntax. In short, whenever you see (add 1 2), it's basically the same as add(1, 2). I like that syntax for big hierarchical expressions, if you really really really don't like it, then you can probably stop right here, though you'll be losing out!

I've warned you.

What if we wanted only the first country from that list?

> cat test.json | jql '(elem "countries" (elem 0))'
{
  "eu_since": "2004",
  "european": true,
  "name": "Poland",
  "population": 38000000
}

Let's break down what happened here. First we took the "countries" field. elem takes an additional argument, it says "how to transform the element" and is also an expression. Here we say we want to take the first element of the countries array. The default function is id, which stands for identity.

array

We can also pass an array of positions to elem, to get more than one country:

> cat test.json | jql '(elem "countries" (elem (array 0 2)))'
[
  {
    "eu_since": "2004",
    "european": true,
    "name": "Poland",
    "population": 38000000
  },
  {
    "eu_since": "1993",
    "european": true,
    "name": "Germany",
    "population": 83000000
  }
]

elem can work with single strings, single integers, arrays of those, and objects with them as values (but we won't cover those now to keep things simple).

keys

What if we want to get all the country names? A new friend - keys - is key in this situation. 🗝

cat test.json | jql '(elem "countries" (elem (keys) (elem "name")))'
[
  "Poland",
  "United States",
  "Germany"
]

It returns an array of all the keys of the given collection. Fields and Indices for Objects and Arrays respectively.

To illustrate, here's keys used on an object:

> cat test.json | jql '(elem "countries" (elem 0 (keys)))'
[
  "name",
  "population",
  "european",
  "eu_since"
]

Attention

Now we have to understand a very important mechanism underlying jql. All functions operate in the context of the JSON we're operating on.

Some functions, like elem, will cut down the context for expressions it evaluates. The first argument - which should evaluate to the positions we need - gets evaluated in the context of the entire array, that's why keys returns all the indices. The second one on the other hand, operates in the context of a single element.

In theory we're really just creating and composing a big function - pipeline, so to say - which gets applied to our JSON blob.

This may sound complicated, but I find it becomes intuitive quite quickly.


You can see that elem is the most used function, and in fact it's what you'll usually be using when munging data, so there's a shortcut. If you put a value in function name position, it implicitly converts it to an elem.

This way we can rewrite the previous query to be much shorter, and better match the shape of the data.

> cat test.json | jql '("countries" ((keys) ("name")))'
[
  "Poland",
  "United States",
  "Germany"
]

Little showcase

Remember when I said you can use integers, strings, arrays and objects as positions?

> cat test.json | jql '("countries" ((array (array 0 (array 0 (array 0 (array 0 2)))) 1 (object "key1" 1 "key2" (array 0 (object "key1" 1 "key2" (array 0 2))))) ("population")))'
[
  [
    38000000,
    [
      38000000,
      [
        38000000,
        [
          38000000,
          83000000
        ]
      ]
    ]
  ],
  327000000,
  {
    "key1": 327000000,
    "key2": [
      38000000,
      {
        "key1": 327000000,
        "key2": [
          38000000,
          83000000
        ]
      }
    ]
  }
]

🔥🔥🔥

Don't do this.


range

We can also select a range of elements, using the... you guessed it - range function.

> cat test.json | jql '("countries" ((range 1 3) ("name")))'
[
  "United States",
  "Germany"
]

You can use the array function in value position too obviously. If you want a list of name-population tuples you can just

> cat test.json | jql '("countries" ((keys) (array ("name") ("population"))))'
[
  [
    "Poland",
    38000000
  ],
  [
    "United States",
    327000000
  ],
  [
    "Germany",
    83000000
  ]
]

Here you can see that array passes the whole context given to it to each of its arguments. (Reminder: We're using "name" and "population" as elem shortcuts here.)

Most functions work like this. Only elem is the "context-cutdowner", so to say.

object

You can also use object to create objects, with arguments alternating keys and values.

> cat test.json | jql '(object
                            "names" ("countries" ((keys) ("name")))
                            "populations" ("countries" ((array 0 0 1) ("population"))))'
{
  "names": [
    "Poland",
    "United States",
    "Germany"
  ],
  "populations": [
    38000000,
    38000000,
    327000000
  ]
}

zip

Occasionally you might want to combine multiple arrays, you can use zip to do that. For example, to create a list of key-value entries from an object, you can do the following:

> cat test.json | jql '("countries" 
                           (0 (zip 
                                 (keys) 
                                 ((keys)))))'
[
  [
     "eu_since",
     "2004"
  ],
  [
     "european",
     true
  ],
  [
     "name",
     "Poland"
  ],
  [
     "population",
     38000000
  ]
]

Now we're done with the core functionality of jql. The stuff so far will probably suffice for most use cases and even very complex data structures.

However, here come more functions:

String manipulation 🎻

join

If you ever need to join an array of expressions into a string, join's the mate you're looking for! join will also stringify anything it meets. Without separator:

> cat test.json | jql '("countries" ((keys) (join (array ("name") ("population") ("european")))))'
[
  "Poland3.8e+07true",
  "United States3.27e+08false",
  "Germany8.3e+07true"
]

With separator:

> cat test.json | jql '("countries" ((keys) (join (array ("name") ("population") ("european")) ", ")))'
[
  "Poland, 3.8e+07, true",
  "United States, 3.27e+08, false",
  "Germany, 8.3e+07, true"
]

sprintf

Whenever I learn a new language, I feel much more comfortable when I know there's a sprintf function. (and how to use it)

Anyways, here you go, the syntax is the same as that of the go standard library fmt.Sprintf function:

> cat test.json | jql '("countries" ((keys) (sprintf "%s population: %.0f" ("name") ("population"))))'
[
  "Poland population: 38000000",
  "United States population: 327000000",
  "Germany population: 83000000"
]

Hope you're feeling comfortable 🛋 now :)

error

There's a little helper function - error - for those times when you're debugging your queries.

It's an expression which errors on evaluation and gives you a stack trace. It can also evaluate and print any expression you want in it's context.

> cat test.json | jql '("countries" ((keys) (sprintf "%s population: %.0f" ("name") (error "test message"))))'
2019/12/26 00:17:04 error getting expression value for object: couldn't get transformed value for field countries with value [map[eu_since:2004 european:true name:Poland population:3.8e+07] map[european:false name:United States population:3.27e+08] map[eu_since:1993 european:true name:Germany population:8.3e+07]]: couldn't get element using position at array index 0: couldn't get transformed value for index 0 with value map[eu_since:2004 european:true name:Poland population:3.8e+07]: couldn't evaluate sprintf argument with index 1: Message: test message
goroutine 1 [running]:
runtime/debug.Stack(0xc0000723f0, 0x1114e00, 0xc0000744e0)
        /usr/local/Cellar/go/1.13.4/libexec/src/runtime/debug/stack.go:24 +0x9d
github.com/cube2222/jql/jql.Error.Get(0x1164680, 0xc0000723f0, 0x1114e00, 0xc0000744e0, 0x110c5c0, 0xc000072440, 0x0, 0x0)
        /Users/jakub/Projects/Go/src/github.com/cube2222/jql/jql/functions.go:715 +0x102
github.com/cube2222/jql/jql.Sprintf.Get(0x1164680, 0xc0000723b0, 0xc000074460, 0x2, 0x2, 0x1114e00, 0xc0000744e0, 0x100b136, 0xc000072490, 0x10, ...)
        /Users/jakub/Projects/Go/src/github.com/cube2222/jql/jql/functions.go:299 +0x16a
github.com/cube2222/jql/jql.GetElement(0x110bb80, 0x12479e0, 0x1109ec0, 0xc00008a1e0, 0x1164a20, 0xc000074480, 0x122c560, 0x128b6d0, 0x0, 0x100ba08)
        /Users/jakub/Projects/Go/src/github.com/cube2222/jql/jql/functions.go:101 +0xa96
github.com/cube2222/jql/jql.GetElement(0x1109ec0, 0xc00008a220, 0x1109ec0, 0xc00008a1e0, 0x1164a20, 0xc000074480, 0x0, 0x15fffff, 0xc0000afba0, 0x194)
        /Users/jakub/Projects/Go/src/github.com/cube2222/jql/jql/functions.go:71 +0x46d
github.com/cube2222/jql/jql.Element.Get(0x1164980, 0x12470c8, 0x1164a20, 0xc000074480, 0x1109ec0, 0xc00008a1e0, 0xc0000c6018, 0x1ee, 0xc0000c6000, 0x103301c)
        /Users/jakub/Projects/Go/src/github.com/cube2222/jql/jql/functions.go:135 +0x12f
github.com/cube2222/jql/jql.GetElement(0x110c5c0, 0xc0000722b0, 0x1114e00, 0xc0000744b0, 0x11648c0, 0xc00008a160, 0x0, 0x10, 0xc0000724b0, 0xa)
        /Users/jakub/Projects/Go/src/github.com/cube2222/jql/jql/functions.go:118 +0x862
github.com/cube2222/jql/jql.Element.Get(0x1164680, 0xc0000723a0, 0x11648c0, 0xc00008a160, 0x1114e00, 0xc0000744b0, 0x10, 0x1114860, 0x1, 0xc0000724b0)
        /Users/jakub/Projects/Go/src/github.com/cube2222/jql/jql/functions.go:135 +0x12f
main.main()
        /Users/jakub/Projects/Go/src/github.com/cube2222/jql/main.go:39 +0x2b4

Logic

Yey, back to the basics!

You've got eq, lt and gt, all working as you'd probably expect:

> cat test.json | jql '(eq "test" "test")'
true
> cat test.json | jql '(eq "test" "test2")'
false
> cat test.json | jql '(lt "a" "b")'
true
> cat test.json | jql '(lt "b" "a")'
false
> cat test.json | jql '(gt 5 4)'
true

In case you're wondering, eq does a reflect.DeepEqual on both arguments.

You've also got and, or, not, to cover your back when tackling those primal and primitive (some would say fundamental) problems you may encounter:

> cat test.json | jql '(and true true true)'
true
> cat test.json | jql '(and true true false)'
false
> cat test.json | jql '(and true true null)'
false
> cat test.json | jql '(or true true false)'
true
> cat test.json | jql '(or)'
false
> cat test.json | jql '(and)'
true
> cat test.json | jql '(not true)'
false
> cat test.json | jql '(not false)'
true
> cat test.json | jql '(not null)'
true
> cat test.json | jql '(not (array false))'
false

and and or are both lazy.

Truthiness

This brings us to the topic of truthiness. What does and consider to be "true"? Well, it's quite simple actually.

  • null is not truthy.
  • false is not truthy.
  • anything else is truthy.

ifte

ifte sounds kinda fluffy, but unfortunately it only stands for If Then Else.

> cat test.json | jql '(ifte true "true" "false")'
"true"

It's lazy too. If it weren't, this would error:

l> cat test.json | jql '(ifte true "true" (error ":("))'
"true"

Fluffy and lazy. Like a cat. Who doesn't like cats? Who doesn't like ifte? 🐈

filter 🍰

Sometimes you want just part of the cake, the part with no <insert disliked fruit here>.

I've got no data set on cakes though, so let's get back to our beloved countries:

> cat test.json | jql '("countries" (filter (gt ("population") 50000000)))'
[
  {
    "european": false,
    "name": "United States",
    "population": 327000000
  },
  {
    "eu_since": "1993",
    "european": true,
    "name": "Germany",
    "population": 83000000
  }
]

Also, because null is not truthy, and elem returns false if it encounters missing fields or indices, you can use filter to get rid of wrong-schema data:

> cat test.json | jql '("countries" (filter ("eu_since")))'
[
  {
    "eu_since": "2004",
    "european": true,
    "name": "Poland",
    "population": 38000000
  },
  {
    "eu_since": "1993",
    "european": true,
    "name": "Germany",
    "population": 83000000
  }
]

pipe

pipe is a fairly useless function because you can just use a bash pipe. But if for some reason you want to save cpu cycles:

> cat test.json | jql '(pipe
                           ("countries")
                           ((range 2))
                           ((keys) ("name")))'
[
  "Poland",
  "United States"
]

is equal to

> cat test.json | jql '("countries")' | jql '((range 2))' | jql '((keys) ("name"))'
[
  "Poland",
  "United States"
]

recover

Finally, the one whose name shall not be spoken out loud. 👹 Needing him means you either encountered a bug in jql, or that your dataset is seriously botched.

He'll help you catch errors and panics, nullyfing the expression, leaving only void behind.

> cat test.json | jql '("countries" ((keys) (rec*** (ifte ("european") (id) (error "not european")))))'
[
  {
    "eu_since": "2004",
    "european": true,
    "name": "Poland",
    "population": 38000000
  },
  null,
  {
    "eu_since": "1993",
    "european": true,
    "name": "Germany",
    "population": 83000000
  }
]

Combine him with filter and even the void will be gone!

In practice you obviously have to spell out his name, otherwise it won't work, but that's on you!

Summary

Hope you enjoyed this incredible journey!

Moreover, I hope it's not the end of it! Hordes of JSON blobs still await and I hope jql will become your weapon of choice for dealing with them from now on! ⚔️

Issues, ⭐️stars⭐️, comments, messages, reviews, benchmarks, you name it! - all are very appreciated! 😉

Type Cheatsheet

JSON: Any value
Expression[T]: (JSON -> T)
Expression can be seen as a Continuation
elem:
    With one arg: (Expression[Position]) -> (Expression[JSON]) = (elem position (id))
    With two args: (Expression[Position] x Expression[T]) -> (Expression[T])
keys: () -> (Expression[JSON])
id: () -> (Expression[JSON])
array: (Expression[T]...) -> (Expression[Array[T]]) (T's can vary)
object: ((Expression x Expression[T])...) -> (Expression[T])
pipe: (Expression...) -> (Expression)
sprintf: (Expression[String] x Expression[T]...) -> (Expression[String])
join: (Expression[T]...) -> (Expression[String])
filter: (Expression[Bool]) -> (Expression[JSON])
eq,lt,gt: (Expression x Expression) -> (Expression[Bool])
range: 
    With one arg: (Expression[Int]) -> (Expression[Array[Int]])
    With two args: (Expression[Int] x Expression[Int]) -> (Expression[Array[Int]])
zip: (Expression[Array[A]], Expression[Array[B]], ...) -> (Expression[Array[Array[A | B | ...]]])
and,or: (Expression[Bool]...) -> (Expression[Bool])
not: (Expression[Bool]) -> (Expression[Bool])
ifte: (Expression[Bool] x Expression[A] x Expression[B]) -> (Expression[A|B])
error: (Expression[JSON]) -> (!)
recover: (Expression[JSON]) -> (Expression[JSON])

Benchmarks

This is a fairly synthetic benchmark (it's the first that came to my mind), so I won't say jql is faster than jq. Especially since jq has various much more advanced functionalities.

What I will say though, is that jql is definitely not slow and you can freely cut your way through gigabytes of data quickly, as in the following benchmarks it was 2-3 times faster than jq.

The benchbig.json file contains 2^20 repetitions of the json we've been using in the examples so far, and the benchmedium.json file contains 20.

Command Mean [s] Min [s] Max [s] Relative
cat benchbig.json | jql '("countries" ((keys) ("name")))' >> out.json 5.923 ± 0.034 5.890 6.003 797.90 ± 52.78
cat benchbig.json | jq '.countries[].name' >> out.json 14.960 ± 0.047 14.906 15.047 2015.24 ± 132.94
cat benchmedium.json | jql '("countries" ((keys) ("name")))' >> out.json 0.007 ± 0.000 0.007 0.010 1.00
cat benchmedium.json | jq '.countries[].name' >> out.json 0.024 ± 0.001 0.022 0.032 3.23 ± 0.26

The benchmarks were run using hyperfine on a preheated (very loud) Macbook Pro 13 mid-2019 i7 2.8GHz 16GB 256GB.

You can generate the benchmark data with benchmarks/gen.sh.

Special Thanks

A big thank you to the following people for reviewing the current or previous versions of this document:

  • Jan Chomiak ( @JasiekChomiak )
  • Hugo Dutka ( @hugodutka )
  • Katarzyna Gajos
  • Andrzej Głuszak ( @agluszak )
  • Wojciech Kuźmiński ( @woojtek )