You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Mike Cargal <mi...@cargal.net> on 2017/01/29 13:02:04 UTC

KafkaAvroSerializer to produce to a single topic with different schemas used for records

I was just looking into using KafkaAvroSerializer to produce records to a Kafka topic.  We are interested because the wire format has a reference to the schema so we don’t have to schema lookup information independently.

We plan to keep a single topic that contain records using many different schemas (it’s important to maintain the ordering of these records).

In looking at the code, it appears that it registers the schema with the registry with a topic+”-topic” subject.  This would seem to imply an assumption that a topic has a single schema associated with it (not many schemas that can vary from record to record).

Am I understanding this correctly?  It seems like a surprising constraint.

Re: KafkaAvroSerializer to produce to a single topic with different schemas used for records

Posted by Mike Cargal <mi...@icloud.com>.
If it comes to that we may consider it.  However will will have a LOT of different schemas coming through and new ones added frequently.

(Seems we’ve also seen issues that the Schema Registry doesn’t allow references to anything not in the same “file” for lack of a better term, that would become quite a choke point for changes.)

I’m playing with it now, but it looks like multiple independent schemas will work fine.

> On Jan 30, 2017, at 11:32 AM, Andy Chambers <ac...@gmail.com> wrote:
> 
> How about defining an avro union type containing all the schemas you wish
> to put on this topic (the schemas themselves could be defined independently
> and then bundled into an "uber-schema" at build time)?
> 
> That means any messages you put on the topic must match one of the schemas
> defined in the union.
> 
> This would allow you to retain the compatibility features of the schema
> registry.
> 
> Cheers,
> Andy
> 
> On Mon, Jan 30, 2017 at 1:48 PM, Mike Cargal <mi...@icloud.com>
> wrote:
> 
>> This helps some.  W’re planning to write a non-homogeneous set of records
>> to a single topic (to preserve order).  There would be no compatibility
>> between records of different types.  I assume that if I set the schema
>> compatibility for this subject to “none” this would not be a problem. (can
>> you confirm?)
>> 
>> Also of potential concern is deduplication.  If I write type R1, R2, R3,
>> R4, R2, R1, … etc., will I only have 4 resulting schemas in the registry?
>> I see that it’s using a caching class to access the registry, but this
>> needs to be across many jobs.
>> 
>> I suppose I’ll be sorting this out as I test, but any insight ahead of
>> time is appreciated.
>> 
>>> On Jan 30, 2017, at 7:13 AM, Gerard Klijs <ge...@openweb.nl> wrote:
>>> 
>>> Not really, as you can update the schema, and have multiple of them at
>> the
>>> same time. By default each schema has to backwards compatible, so you do
>>> have to exclude the specific topic you use with different schema's. With
>>> every write, the 'id' of the schema used is also written, so when you
>>> deserialise the messages, you know which schema to use for which message.
>>> 
>>> Op zo 29 jan. 2017 om 17:35 schreef Mike Cargal <mi...@cargal.net>:
>>> 
>>>> I was just looking into using KafkaAvroSerializer to produce records to
>> a
>>>> Kafka topic.  We are interested because the wire format has a reference
>> to
>>>> the schema so we don’t have to schema lookup information independently.
>>>> 
>>>> We plan to keep a single topic that contain records using many different
>>>> schemas (it’s important to maintain the ordering of these records).
>>>> 
>>>> In looking at the code, it appears that it registers the schema with the
>>>> registry with a topic+”-topic” subject.  This would seem to imply an
>>>> assumption that a topic has a single schema associated with it (not many
>>>> schemas that can vary from record to record).
>>>> 
>>>> Am I understanding this correctly?  It seems like a surprising
>> constraint.
>> 
>> 


Re: KafkaAvroSerializer to produce to a single topic with different schemas used for records

Posted by Andy Chambers <ac...@gmail.com>.
How about defining an avro union type containing all the schemas you wish
to put on this topic (the schemas themselves could be defined independently
and then bundled into an "uber-schema" at build time)?

That means any messages you put on the topic must match one of the schemas
defined in the union.

This would allow you to retain the compatibility features of the schema
registry.

Cheers,
Andy

On Mon, Jan 30, 2017 at 1:48 PM, Mike Cargal <mi...@icloud.com>
wrote:

> This helps some.  W’re planning to write a non-homogeneous set of records
> to a single topic (to preserve order).  There would be no compatibility
> between records of different types.  I assume that if I set the schema
> compatibility for this subject to “none” this would not be a problem. (can
> you confirm?)
>
> Also of potential concern is deduplication.  If I write type R1, R2, R3,
> R4, R2, R1, … etc., will I only have 4 resulting schemas in the registry?
> I see that it’s using a caching class to access the registry, but this
> needs to be across many jobs.
>
> I suppose I’ll be sorting this out as I test, but any insight ahead of
> time is appreciated.
>
> > On Jan 30, 2017, at 7:13 AM, Gerard Klijs <ge...@openweb.nl> wrote:
> >
> > Not really, as you can update the schema, and have multiple of them at
> the
> > same time. By default each schema has to backwards compatible, so you do
> > have to exclude the specific topic you use with different schema's. With
> > every write, the 'id' of the schema used is also written, so when you
> > deserialise the messages, you know which schema to use for which message.
> >
> > Op zo 29 jan. 2017 om 17:35 schreef Mike Cargal <mi...@cargal.net>:
> >
> >> I was just looking into using KafkaAvroSerializer to produce records to
> a
> >> Kafka topic.  We are interested because the wire format has a reference
> to
> >> the schema so we don’t have to schema lookup information independently.
> >>
> >> We plan to keep a single topic that contain records using many different
> >> schemas (it’s important to maintain the ordering of these records).
> >>
> >> In looking at the code, it appears that it registers the schema with the
> >> registry with a topic+”-topic” subject.  This would seem to imply an
> >> assumption that a topic has a single schema associated with it (not many
> >> schemas that can vary from record to record).
> >>
> >> Am I understanding this correctly?  It seems like a surprising
> constraint.
>
>

Re: KafkaAvroSerializer to produce to a single topic with different schemas used for records

Posted by Mike Cargal <mi...@icloud.com>.
This helps some.  W’re planning to write a non-homogeneous set of records to a single topic (to preserve order).  There would be no compatibility between records of different types.  I assume that if I set the schema compatibility for this subject to “none” this would not be a problem. (can you confirm?)

Also of potential concern is deduplication.  If I write type R1, R2, R3, R4, R2, R1, … etc., will I only have 4 resulting schemas in the registry?  I see that it’s using a caching class to access the registry, but this needs to be across many jobs.

I suppose I’ll be sorting this out as I test, but any insight ahead of time is appreciated.

> On Jan 30, 2017, at 7:13 AM, Gerard Klijs <ge...@openweb.nl> wrote:
> 
> Not really, as you can update the schema, and have multiple of them at the
> same time. By default each schema has to backwards compatible, so you do
> have to exclude the specific topic you use with different schema's. With
> every write, the 'id' of the schema used is also written, so when you
> deserialise the messages, you know which schema to use for which message.
> 
> Op zo 29 jan. 2017 om 17:35 schreef Mike Cargal <mi...@cargal.net>:
> 
>> I was just looking into using KafkaAvroSerializer to produce records to a
>> Kafka topic.  We are interested because the wire format has a reference to
>> the schema so we don’t have to schema lookup information independently.
>> 
>> We plan to keep a single topic that contain records using many different
>> schemas (it’s important to maintain the ordering of these records).
>> 
>> In looking at the code, it appears that it registers the schema with the
>> registry with a topic+”-topic” subject.  This would seem to imply an
>> assumption that a topic has a single schema associated with it (not many
>> schemas that can vary from record to record).
>> 
>> Am I understanding this correctly?  It seems like a surprising constraint.


Re: KafkaAvroSerializer to produce to a single topic with different schemas used for records

Posted by Gerard Klijs <ge...@openweb.nl>.
Not really, as you can update the schema, and have multiple of them at the
same time. By default each schema has to backwards compatible, so you do
have to exclude the specific topic you use with different schema's. With
every write, the 'id' of the schema used is also written, so when you
deserialise the messages, you know which schema to use for which message.

Op zo 29 jan. 2017 om 17:35 schreef Mike Cargal <mi...@cargal.net>:

> I was just looking into using KafkaAvroSerializer to produce records to a
> Kafka topic.  We are interested because the wire format has a reference to
> the schema so we don’t have to schema lookup information independently.
>
> We plan to keep a single topic that contain records using many different
> schemas (it’s important to maintain the ordering of these records).
>
> In looking at the code, it appears that it registers the schema with the
> registry with a topic+”-topic” subject.  This would seem to imply an
> assumption that a topic has a single schema associated with it (not many
> schemas that can vary from record to record).
>
> Am I understanding this correctly?  It seems like a surprising constraint.