You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by GitBox <gi...@apache.org> on 2019/04/20 15:17:21 UTC

[GitHub] [nifi] arenger commented on issue #3414: NIFI-5900 Add a SplitLargeJson processor

arenger commented on issue #3414: NIFI-5900 Add a SplitLargeJson processor
URL: https://github.com/apache/nifi/pull/3414#issuecomment-485135173
 
 
   No problem on delays, I hope you had a good vacation.  As for additional detail to the html documentation you mentioned: Yes I could do that, but I don't know where to add it.  Did you take a look at the `@CapabilityDescription` that I added for `SplitLargeJson`?  I tried to make that wording clear but also succinct.  I could add more detail there, or is there a better place where I should expound on the function of the processor?
   
   Also, as I mentioned in [an above comment](https://github.com/apache/nifi/pull/3414#issuecomment-482096603), I think there are four roads we could take from here:
   
   1) Create a new `SplitJsonProcessor` that uses `javax.json` (this PR)
   2) Create a new `SplitJsonProcessor` that uses `JsonSurfer`
   3) Keep only `SplitJson` and optionally employ a streaming approach, backed by `javax.json`, when a new property is set
   4) Keep only `SplitJson` and optionally employ a streaming approach, backed by `JsonSurfer`, when a new property is set
   
   I looked briefly into the 2nd and 4th option but have yet to confirm whether the memory usage is comparable.  In order to use `JsonSurfer` in NiFi it looks like we'd need to suppress the version of ANTLR that is pulled from the [nifi-syslog-utils](https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-extension-utils/nifi-syslog-utils/pom.xml#L28) module (via `simple-syslog-5424`) and explicitly replace it with `4.7.2` of `antlr4-runtime`.  After I did that, I was able to run `JsonSurfer` without a runtime error.
   
   `JsonSurfer` does have wider support for the JSON Path specification.  If we went that route, I'd suggest we create a new processor called "JsonExtract", or something, that would simply receive a JSON file and a JSON Path.  It would output zero, one, or more JSON documents from the incoming document.  The notion of "splitting" isn't really the best description at that point, since the full JSON Path specification can be used to specify any part -- or set of parts -- of a JSON document.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services