You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by "Matt Burgess (JIRA)" <ji...@apache.org> on 2018/04/13 23:06:00 UTC

[jira] [Commented] (NIFI-4456) Update JSON Record Reader / Writer to allow for 'json per line' format

    [ https://issues.apache.org/jira/browse/NIFI-4456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16438062#comment-16438062 ] 

Matt Burgess commented on NIFI-4456:
------------------------------------

For the readers, we can relax the constraint that the incoming flow file is either a JSON array or a single object (i.e. well-formed JSON). If instead we simply "jump into" JSON arrays and treat each element as a record, then the existing parser will handle other non-well-formed formats such as object-per-line. It will also handle other weird cases such as an array followed by whitespace followed by a JSON object; the rule of thumb will be that an incoming flow file "is expected to be comprised of any combination of JSON arrays and objects separated by optional whitespace".

For the writer, we can offer a property for "Output Grouping" that defaults to "Array" (to maintain current behavior of outputting JSON records as a JSON array) and also offers "One Object Per Line". From an implementation standpoint, we can use a MinimalPrettyPrinter with the record separator being a newline for that case. Also we would not allow the Pretty Print property to be set to "true" if "One Object Per Line" was selected.

> Update JSON Record Reader / Writer to allow for 'json per line' format
> ----------------------------------------------------------------------
>
>                 Key: NIFI-4456
>                 URL: https://issues.apache.org/jira/browse/NIFI-4456
>             Project: Apache NiFi
>          Issue Type: Improvement
>          Components: Extensions
>            Reporter: Mark Payne
>            Assignee: Matt Burgess
>            Priority: Major
>
> It is common, especially for archiving purposes, to have many JSON objects combined with new-lines in between, in order to delimit the records. It would be useful to allow record readers and writers to support this, instead of requiring that JSON records being elements in a JSON Array.
> For example, the following JSON Is considered two records:
> {code}
> [
>   { "greeting" : "hello", "id" : 1 },
>   { "greeting" : "good-bye", "id" : 2 }
> ]
> {code}
> It would be beneficial to also support the format:
> {code}
> { "greeting" : "hello", "id" : 1 }
> { "greeting" : "good-bye", "id" : 2 }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)