Using Node.js Streams to Create a Toy Version of jq

To play around with Node.js streams, I made a simple ‘toy’ version of jq, a handy command line JSON processor.

To be clear, the real jq utility is very lightweight and has a ton of functionality. In this post I’ll be mimicking some of it’s functionality and using Node.js to do so.

This will of course result in a bloated tool with way more bundled in than what we actually need.

That being said, it’ll be a useful exercise to go through to learn a little bit about Node.js streams.

In the github repository, you’ll see how easy it is to hack together a very simple command-line tool to process data streamed in from stdin.

node.js stream transform example with PowerShell pipeline processing.
JSON going in to the tool, being filtered, and then converted to an object in PowerShell through the pipeline.

Node.js Streams

The stream documentation describes them as as:

A stream is an abstract interface for working with streaming data in Node.js. The stream module provides an API for implementing the stream interface.

For this example I’ll be jumping straight into the Transform Streamstream.Transform. This is a duplex stream, where the input would usually be related to the output in some way.

Transforming Input from stdin

The basic use of a Transform stream in Node.js (to process input from stdin) looks like this:

const {Transform} = require('stream')
const TransformStream = new Transform;
TransformStream._transform = (chunk, encoding, callback) => {
    // do something with the chunk
    console.log(chunk.toString().toUpperCase());
    callback();
}

process.stdin.pipe(TransformStream);

Toy jq Utility With Transform Streams

I created a ‘toy’ version of jq using a Node.js Transform stream. It’s a very quickly hacked together example, so don’t expect it to do everything that jq can do. I’m also fully aware that the real jq utility is a very lightweight tool and that doing this in the Node.js runtime adds a lot of unecessary bloat!

This is purely for demonstration purposes.

Packaging up the Node.js app with pkg, we get a platform specific binary called toyjq.

Examples

Using node.js streams - toyjs usage examples

Pretty print input JSON

cat ./example.json | toyjq-linux

{
  name: 'directoryobject',
  path: '/path/to/directoryobject',
  type: 'Directory',
  children: [ { foo: 'bar' }, { foo: 'bar1' } ]
}

Output the `type` field only from input JSON:

cat ./example.json | toyjq-linux

cat ./example.json | toyjq-linux '.type'

"Directory"

Output just the `name` and `children` fields in the input JSON:

cat ./example.json | toyjq-linux '{name, children}'

{
  name: 'directoryobject',
  children: [ { foo: 'bar' }, { foo: 'bar1' } ]
}

Assuming now the Windows platform version of toyjq, and using a PowerShell cmdlet for this example…

Select the children array, convert it to an object in PowerShell and then select the last item in that object array:

(cat .\example.json | .\toyjq-win.exe '.children' | ConvertFrom-Json).foo | Select -Last 1

bar1

The above examples show how you can easily process data from one input pipeline (stdin in this case) and send it along through the pipeline using Node.js streams.

You can find the example toyjq app on my GitHub repository.

Select, match and pipe output into another command with jq, grep, and xargs

select-match-and-pipe-output-with-jq-grep-xargs

This is a quick reference post if you’re looking to pipe output into another command on Linux using xargs.

The pipeline is immensley powerful and you can leverage it to act on different stages of your full command to do specific selecting, matching, and manipulation.

Say you are running an executable that outputs a bunch of JSON and you want to select certain a certain subset of this data, pattern match it, and then send that matched data into another command.

This is the perfect use case for a mixture of the jq, grep and xargs commands in the pipeline.

Practical example with xargs

Here is a practical example where you might want to list all your AWS CodePipeline pipelines, match only on certain ones, and then execute (Release Changes) on each of them.

aws codepipeline list-pipelines | jq -r '.pipelines[].name' | grep project-xyz | xargs -n1 aws codepipeline start-pipeline-execution --name

The piped command above does the following:

  • Lists all AWS CodePipelines with the command aws codepipeline list-pipelines
  • Uses jq to ‘raw’ select the name from each pipeline object in the pipelines[] array that the above command outputs
  • Sends each pipeline name into grep to match only those containing the string “project-xyz”
  • Pipes the resulting pipeline names using xargs into the command aws codepipeline start-pipeline-execution --name. The -n1 argument tells xargs to use at most max-args of 1 per command line.