JSON processing with Jq made simpler
Jq is a very convenient tool to handle JSON from the cli or in scripts and programs.
Yet, many find it complex to use, and not obvious.
Jq relies on a few core concepts. Once understood, they make using jq a lot easier.
In this post, I’ll discuss the 4 most important jq concepts. For each, I’ll give a few examples of usual things that are possible.
1. Concepts
Jq allows you do a few simple things, and to mix them together:
-
formatting and conversion;
-
filtering;
-
modifying;
-
building new objects.
Once each concept is clear, you’ll be able to decompose your problem into those 4 axis, and enjoy using jq.
2. Formatting and conversion
2.1. Definition and usage
Formatting is about converting any JSON into a standard, well-formatted shape.
This is usually the first step when receiving a JSON, both to make it easier to read by people, but also to detect errors from a malformed document.
This is the simplest action, and the shortest:
(yes, just nothing) or .
.
fromjson
and tojsong
allow converting a JSON to a string and vice versa.
2.2. Examples
2.2.1. Formatting a 1-line JSON
$ echo '{"name": "Julia", "age": "unknown"}' | jq
{
"name": "Julia",
"age": "unknown"
}
2.2.2. Ensuring the JSON is well-formed
$ echo '{"name": "Julia" "age": "unknown"}' | jq
parse error: Expected separator between values at line 1, column 22
2.2.3. Piping from curl
curl
outputs the body of the http response on the standard input, and the rest
on the error output, allowing one to use jq directly after a curl call.
$ curl -v "http://worldtimeapi.org/api/timezone/Europe/Berlin" | jq
* Connected to worldtimeapi.org (34.253.22.180) port 80 (#0)
> GET /api/timezone/Europe/Berlin HTTP/1.1
> Host: worldtimeapi.org
> User-Agent: curl/7.70.0
> Accept: */*
>
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Connection: keep-alive
< Access-Control-Allow-Credentials: true
< Access-Control-Allow-Origin: *
< Access-Control-Expose-Headers:
< Cache-Control: max-age=0, private, must-revalidate
< Content-Length: 394
< Content-Type: application/json; charset=utf-8
< Cross-Origin-Window-Policy: deny
< Date: Thu, 11 Jun 2020 06:58:20 GMT
< Server: Cowboy
< X-Content-Type-Options: nosniff
< X-Download-Options: noopen
< X-Frame-Options: SAMEORIGIN
< X-Permitted-Cross-Domain-Policies: none
< X-Request-Id: 1e775e63-0da6-4a15-9993-98dfc6cdb855
< X-Runtime: 2ms
< X-Xss-Protection: 1; mode=block
< Via: 1.1 vegur
<
{ [394 bytes data]
100 394 100 394 0 0 4104 0 --:--:-- --:--:-- --:--:-- 4104
* Connection #0 to host worldtimeapi.org left intact
{
"abbreviation": "CEST",
"client_ip": "95.90.197.39",
"datetime": "2020-06-11T08:58:20.767744+02:00",
"day_of_week": 4,
"day_of_year": 163,
"dst": true,
"dst_from": "2020-03-29T01:00:00+00:00",
"dst_offset": 3600,
"dst_until": "2020-10-25T01:00:00+00:00",
"raw_offset": 3600,
"timezone": "Europe/Berlin",
"unixtime": 1591858700,
"utc_datetime": "2020-06-11T06:58:20.767744+00:00",
"utc_offset": "+02:00",
"week_number": 24
}
3. Filtering
3.1. Definition and usage
Filtering allows extracting parts of a JSON. You can filter with almost anything you want, from fields names, index in an array, or by searching for a match.
3.2. Examples
3.2.1. Filtering by selecting a field
Syntax: .field.subfield.subsubfield
.
{
"blue": {
"rgb": {
"r": 124,
"g": 445,
"b": 777
},
"hex": "345F28"
},
"green": {
"rgb": {
"r": 845,
"g": 234,
"b": 099
},
"hex": "FFF445"
}
}
Query: .blue.rgb.b
777
3.2.2. Filtering arrays by index
Syntax:
-
[index]
ornth(index)
(orfirst
for[0]
). -
[begin:end]
-
[-1]
orlast
for the last one.
[
{
"name": "blue",
"rgb": {
"r": 124,
"g": 445,
"b": 777
},
"hex": "345F28"
},
{
"name": "green",
"rgb": {
"r": 845,
"g": 234,
"b": 099
},
"hex": "FFF445"
}
]
Queries (equivalent):
-
.[0]
-
. | first
-
first(.)
.
{
"name": "blue",
"rgb": {
"r": 124,
"g": 445,
"b": 777
},
"hex": "345F28"
}
3.2.3. Filtering by finding a match
Syntax: | select(expression)
. expression
can be complex, and use jq functions
as well.
{
"colours":[
{
"name": "blue",
"rgb": {
"r": 124,
"g": 445,
"b": 777
},
"hex": "345F28"
},
{
"name": "green",
"rgb": {
"r": 845,
"g": 234,
"b": 099
},
"hex": "FFF445"
}
]
}
Queries (equivalent):
-
.colours[] | select(.name=="blue")
-
.colours[] | select(.name | test("blu.\*"))
(with regexps).
{
"name": "blue",
"rgb": {
"r": 124,
"g": 445,
"b": 777
},
"hex": "345F28"
}
Another example is to find all colours with enough green (same input, and same output):
Query: .colours[] | select(.rgb.g > 400)
4. Modifying
4.1. Definition and usage
You can modify part of a JSON, or construct a totally new JSON using sub-pieces.
4.2. Examples
4.2.1. Modifying values
Using the same input as before, we can decide to increase by 10 all the green components in the rgb values.
Queries (equivalent):
-
.colours[].rgb.g |= . + 10
-
.colours[].rgb.g += 10
.
{
"colours": [
{
"name": "blue",
"rgb": {
"r": 124,
"g": 455,
"b": 777
},
"hex": "345F28"
},
{
"name": "green",
"rgb": {
"r": 845,
"g": 244,
"b": 99
},
"hex": "FFF445"
}
]
}
4.2.2. Adding static keys
Using the same input as before, we decide that colours now have a new field
owner
with a fixed value.
Query: .colours[].owner = "everyone"
{
"colours": [
{
"name": "blue",
"rgb": {
"r": 124,
"g": 445,
"b": 777
},
"hex": "345F28",
"owner": "everyone"
},
{
"name": "green",
"rgb": {
"r": 845,
"g": 234,
"b": 99
},
"hex": "FFF445",
"owner": "everyone"
}
]
}
4.2.3. Modifying a key’s name
Using the same input as before, we decide that colours no longer have a name
,
but an id
instead.
This is slightly more complicated, because we will have to create new objects, because keys can’t be changed that easily.
Query: .colours = [.colours[] | with_entries(if .key == "name" then .key = "id" else . end)]
{
"colours": [
{
"id": "blue",
"rgb": {
"r": 124,
"g": 445,
"b": 777
},
"hex": "345F28"
},
{
"id": "green",
"rgb": {
"r": 845,
"g": 234,
"b": 99
},
"hex": "FFF445"
}
]
}
4.2.4. Adding dynamic keys
What if we want to add a new key whose value depends on the values of other keys in same object?
Let’s build a new key address
, from the name of the colour (same input):
Queries (equivalent):
-
reduce .colours[] as $item ([]; . + [$item + {address: ("https://colours.com/" + $item.name)}])
-
foreach .colours[] as $item ([]; . + [$item + {address: ("https://colours.com/" + $item.name)}]; if $item.name == "green" then . else empty end)
[
{
"name": "blue",
"rgb": {
"r": 124,
"g": 445,
"b": 777
},
"hex": "345F28",
"address": "https://colours.com/blue"
},
{
"name": "green",
"rgb": {
"r": 845,
"g": 234,
"b": 99
},
"hex": "FFF445",
"address": "https://colours.com/green"
}
]
5. Conclusion
With those concepts in mind, next time you want to do something with jq, try to identify them and decompose your problem as such.
I highly recommend building your query interactively with jq play.