You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@nifi.apache.org by Matt Burgess <ma...@apache.org> on 2017/02/04 03:10:05 UTC

Re: Validating an array of objects using ConvertJSONToAvro

Bas,

Sorry for the late reply, I should've mentioned sooner that I am
looking into this issue. From your description it seems like
ConvertJSONtoAvro should be able to handle this kind of thing; if I
can't find a schema that fits and instead confirm it is a
bug/improvement, I will write up a Jira and inform this list either
way.  Thank you for your question, IMO this is indeed a valid use case
that should be supported.

Regards,
Matt

On Tue, Jan 31, 2017 at 9:10 AM, Bas van Kortenhof
<ba...@sanoma.com> wrote:
> Hi all,
>
> Not completely sure if this is a developer or user question, but I'm posting
> it here for now as at this moment it is related to flow design.
>
> So what I'm trying to achieve is to get a JSON response from an API, extract
> the relevant values, validate this data and convert it to avro. I am able to
> complete the first two steps with InvokeHTTP and JoltTransformJSON, after
> which my data is an array of objects in JSON, so my flowfile looks like
> this:
>
> [
>   {"key1": "val1", "key2": "val2"},
>   {"key1": "val3", "key2": "val4"}
> ]
>
> My idea was now to put this JSON in a ConvertJSONToAvro together with the
> appropriate avro schema. However, ConvertJSONToAvro cannot apply schema
> validation on the individual elements of an array. It can, however, apply
> schema validation to records that are not contained in an array but are
> separated by newlines, so it can handle the following flowfile (note that
> this, on a file level, is basically invalid JSON):
>
> {"key1": "val1", "key2": "val2"}
> {"key1": "val3", "key2": "val4"}
>
> I can achieve this in NiFi by splitting the JSON flowfile with SplitJSON and
> merging it back together immediately with a MergeContent processor with '\n'
> as demarcator. These both have to be applied before the ConvertJSONToAvro,
> because otherwise invalid records would cause the merge step to fail. So
> this splitting can't even be used to redistribute files in a cluster
> setting, so I don't really like this workaround.
>
> I was wondering if anyone knows a way to produce the second example format
> of JSON using a JOLT transformation, which would be an elegant fix. If not,
> I'd like to ask if there is a reason that ConvertJSONToAvro can only handle
> newline separated objects and not objects in an array (which is the closest
> representation in JSON of the concept of records in Avro in my opinion). If
> no such reason I think it can be considered a bug and then I would like to
> propose to provide an option in the ConvertJSONToAvro processor to apply the
> schema validation on the whole file, on objects separated by newlines or on
> objects in an array.
>
> Please let me know what you think!
>
> Regards,
> Bas
>
>
>
> --
> View this message in context: http://apache-nifi-users-list.2361937.n4.nabble.com/Validating-an-array-of-objects-using-ConvertJSONToAvro-tp832.html
> Sent from the Apache NiFi Users List mailing list archive at Nabble.com.

Re: Validating an array of objects using ConvertJSONToAvro

Posted by Bas van Kortenhof <ba...@sanoma.com>.
Matt,

Thanks for letting me know you're looking into this. When playing around a
bit more I actually came across an interesting benefit of this behaviour. If
the input file is guaranteed to consist of a JSON object per line, the file
can be parsed line by line. This means that not the whole file has to be
loaded in memory, which could be beneficial for files of several MBs or even
GBs. May be an interesting use-case to keep in mind when looking into this.

Regards,
Bas



--
View this message in context: http://apache-nifi-users-list.2361937.n4.nabble.com/Validating-an-array-of-objects-using-ConvertJSONToAvro-tp832p862.html
Sent from the Apache NiFi Users List mailing list archive at Nabble.com.