Add "jsonl" (JSON Lines) format option to condor_history

Description

For SOFTWARE-4312, i'd like to use condor_history to dump completed job ads as json, for ingestion into elasticsearch.

Currently there is a -json option, but this writes a single json list object, pretty-printed across many lines for each ad. What I'd like though, which elasticsearch takes for input, is each ad (record) on its own line. (This format is sometimes referred to as "JSON Lines" and has a suggested filename suffix of ".jsonl". See https://jsonlines.org/)

This also allows simply appending json lines (one ad/record per line) to a log file, which doesn't quite work for the current -json format, which has a top-level list to hold all the ad records.

Besides elasticsearch, anyone else consuming json lines can read/process one line at a time as a pipeline, without needing to load the entire file/json list object into memory before iterating.

I don't want to change the current -json output behavior, of course, but i would like to add a new -jsonl option for "json lines" output, which is not a lot of work to add support for.

Other client utilities might benefit from this option also, but for now I am only concerned with condor_history, so that is where I plan to add support for it. (Of course, the main implementation change is in the classad library, so adding support to other client utilities should be fairly straightforward also.)

Activity

Show:
Jaime Frey
March 1, 2021, 4:32 PM

Code Review

After requested changes, this code is good.

Carl Edquist
October 24, 2020, 11:20 PM

- do I give this to you to assign for review?

Carl Edquist
October 24, 2020, 11:16 PM

I don't know if these days it's still preferred to make a branch on AFS, or a github PR, but for starters here's a PR:

https://github.com/htcondor/htcondor/pull/134

Due date

None

Time remaining

0m

Assignee

Jaime Frey

Is PATh development

None

Fix versions

Priority

Major

HTCondorCustomerGroup

CHTC

Components

Reporter

Carl Edquist