You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by Allen Tang <na...@gmail.com> on 2018/05/14 18:58:09 UTC

[DISCUSS] KIP-301 Schema Inferencing in JsonConverters

Hi,

I just opened a KIP to add Schema Inferencing in JsonConverters for
Kafka Connect.

The details of the proposal can be found here:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-301%3A+Schema+Inferencing+for+JsonConverter

Also, I have created a -

1.) JIRA ticket: https://issues.apache.org/jira/browse/KAFKA-6895

2.) Provisional PR with initial discussion:
https://github.com/apache/kafka/pull/5001

Looking forward to the community's feedback! Cheers!

-Allen

Re: [DISCUSS] KIP-301 Schema Inferencing in JsonConverters

Posted by Randall Hauch <rh...@gmail.com>.
There are lots of potential details/gotchas here, with no obvious defaults.
Even if we did have optional configs for the type(s) to use for empty
arrays and null values, how likely is it that those would apply for all
occurrences? Two fields might have empty arrays, but the best schema for
each might be different.

Allowing the user to specify a schema makes sense as a powerful fallback,
though it wouldn't be the most usable for cases where you might just want
to correct one thing.

I think we should pick one approach that works. If we can't find a simple
way for a few configuration options to work in nearly all cases, then
perhaps the explicit schema override is the least complex approach.

I'd like to hear what other people think.

Randall

On Tue, May 15, 2018 at 4:37 PM, Allen Tang <na...@gmail.com> wrote:

> I've went through several iterations of back-and-forth with @rhauch on the
> PR and on Confluent's Slack Community. The current thinking is that
> assuming
> an empty array is a String array is not necessarily the best option, nor is
> assuming that all null values in a JSON node is a String.
>
> We might be able to account for these potentially false
> assumptions/inferences by introducing new task properties (with
> value.converter prefix) that explicitly define overrides for either
> specific json field keys, or give the option for Kafka Connect users to
> provide a full immutabl schema they know are true for the topics impacted
> by the Sink Connector.
>
> What do you think?
>
> - Allen
>
>
> On Mon, May 14, 2018 at 2:58 PM, Allen Tang <na...@gmail.com> wrote:
>
> > Hi,
> >
> > I just opened a KIP to add Schema Inferencing in JsonConverters for
> Kafka Connect.
> >
> > The details of the proposal can be found here: https://cwiki.apache.org/
> confluence/display/KAFKA/KIP-301%3A+Schema+Inferencing+for+JsonConverter
> >
> > Also, I have created a -
> >
> > 1.) JIRA ticket: https://issues.apache.org/jira/browse/KAFKA-6895
> >
> > 2.) Provisional PR with initial discussion: https://github.com/apache/
> kafka/pull/5001
> >
> > Looking forward to the community's feedback! Cheers!
> >
> > -Allen
> >
> >
>

Re: [DISCUSS] KIP-301 Schema Inferencing in JsonConverters

Posted by Allen Tang <na...@gmail.com>.
I've went through several iterations of back-and-forth with @rhauch on the
PR and on Confluent's Slack Community. The current thinking is that assuming
an empty array is a String array is not necessarily the best option, nor is
assuming that all null values in a JSON node is a String.

We might be able to account for these potentially false
assumptions/inferences by introducing new task properties (with
value.converter prefix) that explicitly define overrides for either
specific json field keys, or give the option for Kafka Connect users to
provide a full immutabl schema they know are true for the topics impacted
by the Sink Connector.

What do you think?

- Allen


On Mon, May 14, 2018 at 2:58 PM, Allen Tang <na...@gmail.com> wrote:

> Hi,
>
> I just opened a KIP to add Schema Inferencing in JsonConverters for Kafka Connect.
>
> The details of the proposal can be found here: https://cwiki.apache.org/confluence/display/KAFKA/KIP-301%3A+Schema+Inferencing+for+JsonConverter
>
> Also, I have created a -
>
> 1.) JIRA ticket: https://issues.apache.org/jira/browse/KAFKA-6895
>
> 2.) Provisional PR with initial discussion: https://github.com/apache/kafka/pull/5001
>
> Looking forward to the community's feedback! Cheers!
>
> -Allen
>
>