You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by "Shajahan, Nishanth" <ns...@visa.com> on 2017/08/16 22:12:56 UTC

Different Schemas on same Kafka Topic

Hello,

Does kafka support writing different    avro record types(very different schema) to the same topic . I  guess we would have to write our own avro serializer and de serializer to do this ?. Is there a preferred way to do this ?.It would be great if some one can point me in the right direction.

Thanks,
Nishanth


Re: Different Schemas on same Kafka Topic

Posted by Svante Karlsson <sv...@csi.se>.
Well, the purpose of the schema registry is to map a 16 bit id to a avro
schema. with or without rules on how you may update a schema with a given
name. To decode avro you need a schema. Either you "know" whats in a given
topic and then you can hardcode it. Or you prepend it with something. ie
the 16 bit id. But you could hardcode your possible schemas in the
consumers and prepend it with something else (fingerprint, uuid ...).

That said a schema registry is by far the easiest way forward. You really
should stick to either text encoding or avro using schema registry. I've
been down the road with lots of binary schemas and it works fine for a
while. When your schema changes is when you will feel the pain.

even if using NONE, you still have a benefits from the schema registry

regards
svante

2017-08-17 20:10 GMT+02:00 Sreejith S <sr...@gmail.com>:

> Hi Stephen,
>
> Thank you very much.
>
> Please give clarity on the statement.
>
> "each unique avro schema has a unique id associated with it. That id
> can be used across multiple different topics. The enforcement of which
> schemas are allowed in a particular topic comes down to the combination of
> the subject (usually topic-name-key/value) and version (the version itself
> starts at 1 inside the subject, and itself has an id that ties to the
> globally unique schema id). ".
>
> How ?. You are always registering a schema against a topic using the
> topicname and schema registry is assiging a unique id across the registry
> cluster. Where is the global unique schema id here ?
>
> I think in Producer Consumer API you will have more freedom to pass a
> schema id of ur choice and ask avro serialize/deserialize. But in connect
> framework all these things are abstracted.
>
> Its a good pointer on using NONE compatibility type so that even if schema
> registry holds same id for a topic, each schema version under it is
> entirely different schema. Is my understanding correct ?
>
> But,  when defines NONE,  the purpose of the schema registry itself
> lost.Rght ?
>
> Regards
> Sreejith
>
> On 17-Aug-2017 11:03 pm, "Stephen Durfey" <sj...@gmail.com> wrote:
>
> > There is a little nuance to this topic (hehe). When it comes down to it,
> > yes, each unique avro schema has a unique id associated with it. That id
> > can be used across multiple different topics. The enforcement of which
> > schemas are allowed in a particular topic comes down to the combination
> of
> > the subject (usually topic-name-key/value) and version (the version
> itself
> > starts at 1 inside the subject, and itself has an id that ties to the
> > globally unique schema id). . So, yes, you can have multiple schemas
> > within
> > the same topic, and thats perfectly fine, so long as you're correctly
> > configuring the schema registry.
> >
> > Whether or not a schema is allowed to be registered for a particular
> > subject is dependent upon the type of avro compatilibty enforced. There
> are
> > 4 types: BACKWARD, FORWARD, FULL (combines forward and backward), and
> NONE.
> > The schema registry is going to evaluate the schema being published to
> the
> > history of schemas it knows about in the past for that subject + version
> > combination. If the schema is evolved correctly according to the
> particular
> > type configured in the schema registry, it will be allowed.
> >
> > So, if you select NONE as the compatibility type the schema registry will
> > allow any schema to be registered, even if they are not compatible
> because
> > you've informed the registry not to care. So, you should really choose
> > amongst backward, forward, and full. I use FULL in production because the
> > data being written is long lived, and will have multiple readers and
> > writers of the data, and the data needs to be passively evolved. Backward
> > and forward can be fine too, just depending upon the needs of the data
> > being produced and consumed.
> >
> > On Thu, Aug 17, 2017 at 12:22 PM, Tauzell, Dave <
> > Dave.Tauzell@surescripts.com> wrote:
> >
> > > Hmm, I think you are right that you cannot have multiple schemas on the
> > > same topic.
> > >
> > > -Dave
> > >
> > >
> > > -----Original Message-----
> > > From: Sreejith S [mailto:srssreejith@gmail.com]
> > > Sent: Thursday, August 17, 2017 11:42 AM
> > > To: users@kafka.apache.org
> > > Subject: RE: Different Schemas on same Kafka Topic
> > >
> > > Hi Dave,
> > >
> > > Would like to get a clarity on one thing.  If i register more than one
> > > schema for a topic, i am providing topic-key, topic-value to the schema
> > > registry.
> > >
> > > Id is created by schema registry and it will create different version
> of
> > > different schema. Still all schema have same id.  Am i right ?
> > >
> > > If so, all avro messages holds same id. Then how multiple schemas on
> same
> > > topic possble ?
> > >
> > > Please clarify
> > >
> > > Thanks,
> > > Sreejith
> > >
> > > On 17-Aug-2017 9:49 pm, "Tauzell, Dave" <Da...@surescripts.com>
> > > wrote:
> > >
> > > > > How does consumer  know A is the avro class when there could be
> > > > > other
> > > > classes like B,C and D denoting different schemas?.
> > > >
> > > > There isn't a good way.   One option is to have an avro wrapper that
> > > > contains type, version and data fields.  Then you wrap everything.
> > > >  Another option is to do what Kafka is doing and prepend some sort of
> > > > fixed length value to all messages that have the schema and version
> > > > you are using for that message.
> > > >
> > > > -Dave
> > > >
> > > > -----Original Message-----
> > > > From: Shajahan, Nishanth [mailto:nshajaha@visa.com]
> > > > Sent: Thursday, August 17, 2017 11:02 AM
> > > > To: users@kafka.apache.org
> > > > Subject: RE: Different Schemas on same Kafka Topic
> > > >
> > > > Thanks Dave. We may not want to start using schema registery
> > immediately
> > > .
> > > > We would have java producers and consumers . I might also go with
> > > > using byte messages  but when  consumer de serilize how can they  map
> > > > the byte[] to the correct Avro object  For example:
> > > >
> > > > KafkaConsumer<String,A> consumer = new
> > > > KafkaConsumer<>(consumerConfig,new
> > > > StringDeserializer(),new AvroDeserializer<>(A));
> > > >
> > > > How does consumer  know A is the avro class when there could be other
> > > > classes like B,C and D denoting different schemas?.
> > > >
> > > >
> > > > -Nishanth
> > > >
> > > > -----Original Message-----
> > > > From: Tauzell, Dave [mailto:Dave.Tauzell@surescripts.com]
> > > > Sent: Thursday, August 17, 2017 8:30 AM
> > > > To: users@kafka.apache.org
> > > > Subject: RE: Different Schemas on same Kafka Topic
> > > >
> > > > It does.  The way it works is that the Avro serializer precedes each
> > > > message with a two-byte integer that references a schema id in the
> > > > confluent schema registry.   The Avro deserializer looks at this
> value
> > to
> > > > determine which schema to de-serialize with.   In order for this to
> > work
> > > > you need to use the java client on both ends and have the schema
> > > > registry setup.
> > > >
> > > > We have some slightly different needs ( including non-java languages)
> > > > so we are just using byte messages and then have our applications do
> > > > the serialization and deserialization.
> > > >
> > > > -Dave
> > > >
> > > > -----Original Message-----
> > > > From: Shajahan, Nishanth [mailto:nshajaha@visa.com]
> > > > Sent: Wednesday, August 16, 2017 5:13 PM
> > > > To: users@kafka.apache.org
> > > > Subject: Different Schemas on same Kafka Topic
> > > >
> > > > Hello,
> > > >
> > > > Does kafka support writing different    avro record types(very
> > different
> > > > schema) to the same topic . I  guess we would have to write our own
> > > > avro serializer and de serializer to do this ?. Is there a preferred
> > > > way to do this ?.It would be great if some one can point me in the
> > right
> > > direction.
> > > >
> > > > Thanks,
> > > > Nishanth
> > > >
> > > > This e-mail and any files transmitted with it are confidential, may
> > > > contain sensitive information, and are intended solely for the use of
> > > > the individual or entity to whom they are addressed. If you have
> > > > received this e-mail in error, please notify the sender by reply
> > > > e-mail immediately and destroy all copies of the e-mail and any
> > > attachments.
> > > >
> > > >
> > > >
> > > This e-mail and any files transmitted with it are confidential, may
> > > contain sensitive information, and are intended solely for the use of
> the
> > > individual or entity to whom they are addressed. If you have received
> > this
> > > e-mail in error, please notify the sender by reply e-mail immediately
> and
> > > destroy all copies of the e-mail and any attachments.
> > >
> >
>

Re: Different Schemas on same Kafka Topic

Posted by Stephen Durfey <sj...@gmail.com>.
You're welcome. I'm glad it was helpful. I think it is a good idea to maintain a schema that can be evolved per topic and configure the schema registry to the type of Avro evolution rules that fits your use case. While it is possible to have many different non-compatible schemas per topic, it's much easier to reason about both for consumers and producers if only one is maintained. It's also much easier to develop against if you provide that guarantee to consumers.

________________________________
From: Sreejith S <sr...@gmail.com>
Sent: Thursday, August 17, 2017 11:13:59 PM
To: users@kafka.apache.org
Subject: Re: Different Schemas on same Kafka Topic

Thank you Stephen for a very detailed write up. Really helpful.

 I was stuck in a concept of one schema per topic. Let me try this in my
use case.

Thank you very much. And thank you Svante.

Regards,
Sreejith

On 18-Aug-2017 12:17 am, "Stephen Durfey" <sj...@gmail.com> wrote:

> There's a lot to unpack here, so I'll do my best to answer.
>
> How ?. You are always registering a schema against a topic using the
> > topicname and schema registry is assiging a unique id across the registry
> > cluster. Where is the global unique schema id here ?
>
>
> When I say globally unique, I mean that the id for a particular schema
> (when I say schema I'm referring to each unique version of the schema as it
> evolves) is unique across all schemas that the registry knows about. The id
> for a particular schema can appear in many different topics, but will
> always refer to one and only one schema (so long as you dont lose your
> _schemas topic, but thats a different discussion). Schemas are stored
> uniquely, but can be used by many subjects (a subject being usually the
> <topic-name>-key/value) [1]. So, you can have one schema appearing inside
> many different topics, and have the same id re-used, since the ids are
> unique per schema, not per subject.
>
> I think in Producer Consumer API you will have more freedom to pass a
> > schema id of ur choice and ask avro serialize/deserialize. But in connect
> > framework all these things are abstracted.
> >
>
> I disagree with this. When using the schema registry it is up to the
> serializer used to interact with it. For this part I'm specifically talking
> about the confluent kafka SerDe's. If those are being used, the behavior
> will be the same regardless of whether it is used in a generic
> KafkaProducer or in Kafka Connect. That serializer will interact with the
> schema registry (if configured to do so), and will register schemas on
> behalf of the producer. The schema registry must be in control of all
> schema IDs (see here: [2]) and it cannot be delegated to the producer.
> Otherwise it would be possible for multple producers to generate the same
> ID, and thus during deserialization the consumer wouldn't know which schema
> to deserialize with. In kafka connect, the SerDr operations are carried out
> by the specified DataConverter in the worker properties. In the quickstart
> version it defaults to using the AvroConverter, which uses the confluent
> SerDe's
>
> Its a good pointer on using NONE compatibility type so that even if schema
> > registry holds same id for a topic, each schema version under it is
> > entirely different schema. Is my understanding correct ?
> >
> > But,  when defines NONE,  the purpose of the schema registry itself
> > lost.Rght ?
> >
>
> I don't recommend using NONE. I've only ever used NONE during testing to
> allow a non-passive change to a schema to correct a previous mistake in a
> schema. This was done because deleting schemas wasn't an option (I believe
> in confluent 3.3.0 you can delete the association between a schema and a
> subject, but still cannot delete the schema itself). So, as you mention
> setting the value to NONE defeats the purpose (mostly) of the schema
> registry. If you only ever plan on dealing with the data in terms of
> generic records, NONE is fine, but you need a way of dealing with the
> multitude of types in your topic.
>
> [1]
> https://github.com/confluentinc/schema-registry/blob/
> 8eb664dbc84b1c2db3666fa0771eeb0e0909f892/avro-serializer/
> src/main/java/io/confluent/kafka/serializers/AbstractKafkaAvroSerDe.java#
> L83-L89
>
> [2]
> https://github.com/confluentinc/schema-registry/
> blob/master/avro-serializer/src/main/java/io/confluent/kafka/serializers/
> AbstractKafkaAvroSerializer.java#L74
>
> On Thu, Aug 17, 2017 at 1:10 PM, Sreejith S <sr...@gmail.com> wrote:
>
> > Hi Stephen,
> >
> > Thank you very much.
> >
> > Please give clarity on the statement.
> >
> > "each unique avro schema has a unique id associated with it. That id
> > can be used across multiple different topics. The enforcement of which
> > schemas are allowed in a particular topic comes down to the combination
> of
> > the subject (usually topic-name-key/value) and version (the version
> itself
> > starts at 1 inside the subject, and itself has an id that ties to the
> > globally unique schema id). ".
> >
> > How ?. You are always registering a schema against a topic using the
> > topicname and schema registry is assiging a unique id across the registry
> > cluster. Where is the global unique schema id here ?
> >
> > I think in Producer Consumer API you will have more freedom to pass a
> > schema id of ur choice and ask avro serialize/deserialize. But in connect
> > framework all these things are abstracted.
> >
> > Its a good pointer on using NONE compatibility type so that even if
> schema
> > registry holds same id for a topic, each schema version under it is
> > entirely different schema. Is my understanding correct ?
> >
> > But,  when defines NONE,  the purpose of the schema registry itself
> > lost.Rght ?
> >
> > Regards
> > Sreejith
> >
> > On 17-Aug-2017 11:03 pm, "Stephen Durfey" <sj...@gmail.com> wrote:
> >
> > > There is a little nuance to this topic (hehe). When it comes down to
> it,
> > > yes, each unique avro schema has a unique id associated with it. That
> id
> > > can be used across multiple different topics. The enforcement of which
> > > schemas are allowed in a particular topic comes down to the combination
> > of
> > > the subject (usually topic-name-key/value) and version (the version
> > itself
> > > starts at 1 inside the subject, and itself has an id that ties to the
> > > globally unique schema id). . So, yes, you can have multiple schemas
> > > within
> > > the same topic, and thats perfectly fine, so long as you're correctly
> > > configuring the schema registry.
> > >
> > > Whether or not a schema is allowed to be registered for a particular
> > > subject is dependent upon the type of avro compatilibty enforced. There
> > are
> > > 4 types: BACKWARD, FORWARD, FULL (combines forward and backward), and
> > NONE.
> > > The schema registry is going to evaluate the schema being published to
> > the
> > > history of schemas it knows about in the past for that subject +
> version
> > > combination. If the schema is evolved correctly according to the
> > particular
> > > type configured in the schema registry, it will be allowed.
> > >
> > > So, if you select NONE as the compatibility type the schema registry
> will
> > > allow any schema to be registered, even if they are not compatible
> > because
> > > you've informed the registry not to care. So, you should really choose
> > > amongst backward, forward, and full. I use FULL in production because
> the
> > > data being written is long lived, and will have multiple readers and
> > > writers of the data, and the data needs to be passively evolved.
> Backward
> > > and forward can be fine too, just depending upon the needs of the data
> > > being produced and consumed.
> > >
> > > On Thu, Aug 17, 2017 at 12:22 PM, Tauzell, Dave <
> > > Dave.Tauzell@surescripts.com> wrote:
> > >
> > > > Hmm, I think you are right that you cannot have multiple schemas on
> the
> > > > same topic.
> > > >
> > > > -Dave
> > > >
> > > >
> > > > -----Original Message-----
> > > > From: Sreejith S [mailto:srssreejith@gmail.com]
> > > > Sent: Thursday, August 17, 2017 11:42 AM
> > > > To: users@kafka.apache.org
> > > > Subject: RE: Different Schemas on same Kafka Topic
> > > >
> > > > Hi Dave,
> > > >
> > > > Would like to get a clarity on one thing.  If i register more than
> one
> > > > schema for a topic, i am providing topic-key, topic-value to the
> schema
> > > > registry.
> > > >
> > > > Id is created by schema registry and it will create different version
> > of
> > > > different schema. Still all schema have same id.  Am i right ?
> > > >
> > > > If so, all avro messages holds same id. Then how multiple schemas on
> > same
> > > > topic possble ?
> > > >
> > > > Please clarify
> > > >
> > > > Thanks,
> > > > Sreejith
> > > >
> > > > On 17-Aug-2017 9:49 pm, "Tauzell, Dave" <
> Dave.Tauzell@surescripts.com>
> > > > wrote:
> > > >
> > > > > > How does consumer  know A is the avro class when there could be
> > > > > > other
> > > > > classes like B,C and D denoting different schemas?.
> > > > >
> > > > > There isn't a good way.   One option is to have an avro wrapper
> that
> > > > > contains type, version and data fields.  Then you wrap everything.
> > > > >  Another option is to do what Kafka is doing and prepend some sort
> of
> > > > > fixed length value to all messages that have the schema and version
> > > > > you are using for that message.
> > > > >
> > > > > -Dave
> > > > >
> > > > > -----Original Message-----
> > > > > From: Shajahan, Nishanth [mailto:nshajaha@visa.com]
> > > > > Sent: Thursday, August 17, 2017 11:02 AM
> > > > > To: users@kafka.apache.org
> > > > > Subject: RE: Different Schemas on same Kafka Topic
> > > > >
> > > > > Thanks Dave. We may not want to start using schema registery
> > > immediately
> > > > .
> > > > > We would have java producers and consumers . I might also go with
> > > > > using byte messages  but when  consumer de serilize how can they
> map
> > > > > the byte[] to the correct Avro object  For example:
> > > > >
> > > > > KafkaConsumer<String,A> consumer = new
> > > > > KafkaConsumer<>(consumerConfig,new
> > > > > StringDeserializer(),new AvroDeserializer<>(A));
> > > > >
> > > > > How does consumer  know A is the avro class when there could be
> other
> > > > > classes like B,C and D denoting different schemas?.
> > > > >
> > > > >
> > > > > -Nishanth
> > > > >
> > > > > -----Original Message-----
> > > > > From: Tauzell, Dave [mailto:Dave.Tauzell@surescripts.com]
> > > > > Sent: Thursday, August 17, 2017 8:30 AM
> > > > > To: users@kafka.apache.org
> > > > > Subject: RE: Different Schemas on same Kafka Topic
> > > > >
> > > > > It does.  The way it works is that the Avro serializer precedes
> each
> > > > > message with a two-byte integer that references a schema id in the
> > > > > confluent schema registry.   The Avro deserializer looks at this
> > value
> > > to
> > > > > determine which schema to de-serialize with.   In order for this to
> > > work
> > > > > you need to use the java client on both ends and have the schema
> > > > > registry setup.
> > > > >
> > > > > We have some slightly different needs ( including non-java
> languages)
> > > > > so we are just using byte messages and then have our applications
> do
> > > > > the serialization and deserialization.
> > > > >
> > > > > -Dave
> > > > >
> > > > > -----Original Message-----
> > > > > From: Shajahan, Nishanth [mailto:nshajaha@visa.com]
> > > > > Sent: Wednesday, August 16, 2017 5:13 PM
> > > > > To: users@kafka.apache.org
> > > > > Subject: Different Schemas on same Kafka Topic
> > > > >
> > > > > Hello,
> > > > >
> > > > > Does kafka support writing different    avro record types(very
> > > different
> > > > > schema) to the same topic . I  guess we would have to write our own
> > > > > avro serializer and de serializer to do this ?. Is there a
> preferred
> > > > > way to do this ?.It would be great if some one can point me in the
> > > right
> > > > direction.
> > > > >
> > > > > Thanks,
> > > > > Nishanth
> > > > >
> > > > > This e-mail and any files transmitted with it are confidential, may
> > > > > contain sensitive information, and are intended solely for the use
> of
> > > > > the individual or entity to whom they are addressed. If you have
> > > > > received this e-mail in error, please notify the sender by reply
> > > > > e-mail immediately and destroy all copies of the e-mail and any
> > > > attachments.
> > > > >
> > > > >
> > > > >
> > > > This e-mail and any files transmitted with it are confidential, may
> > > > contain sensitive information, and are intended solely for the use of
> > the
> > > > individual or entity to whom they are addressed. If you have received
> > > this
> > > > e-mail in error, please notify the sender by reply e-mail immediately
> > and
> > > > destroy all copies of the e-mail and any attachments.
> > > >
> > >
> >
>

Re: Different Schemas on same Kafka Topic

Posted by Sreejith S <sr...@gmail.com>.
Thank you Stephen for a very detailed write up. Really helpful.

 I was stuck in a concept of one schema per topic. Let me try this in my
use case.

Thank you very much. And thank you Svante.

Regards,
Sreejith

On 18-Aug-2017 12:17 am, "Stephen Durfey" <sj...@gmail.com> wrote:

> There's a lot to unpack here, so I'll do my best to answer.
>
> How ?. You are always registering a schema against a topic using the
> > topicname and schema registry is assiging a unique id across the registry
> > cluster. Where is the global unique schema id here ?
>
>
> When I say globally unique, I mean that the id for a particular schema
> (when I say schema I'm referring to each unique version of the schema as it
> evolves) is unique across all schemas that the registry knows about. The id
> for a particular schema can appear in many different topics, but will
> always refer to one and only one schema (so long as you dont lose your
> _schemas topic, but thats a different discussion). Schemas are stored
> uniquely, but can be used by many subjects (a subject being usually the
> <topic-name>-key/value) [1]. So, you can have one schema appearing inside
> many different topics, and have the same id re-used, since the ids are
> unique per schema, not per subject.
>
> I think in Producer Consumer API you will have more freedom to pass a
> > schema id of ur choice and ask avro serialize/deserialize. But in connect
> > framework all these things are abstracted.
> >
>
> I disagree with this. When using the schema registry it is up to the
> serializer used to interact with it. For this part I'm specifically talking
> about the confluent kafka SerDe's. If those are being used, the behavior
> will be the same regardless of whether it is used in a generic
> KafkaProducer or in Kafka Connect. That serializer will interact with the
> schema registry (if configured to do so), and will register schemas on
> behalf of the producer. The schema registry must be in control of all
> schema IDs (see here: [2]) and it cannot be delegated to the producer.
> Otherwise it would be possible for multple producers to generate the same
> ID, and thus during deserialization the consumer wouldn't know which schema
> to deserialize with. In kafka connect, the SerDr operations are carried out
> by the specified DataConverter in the worker properties. In the quickstart
> version it defaults to using the AvroConverter, which uses the confluent
> SerDe's
>
> Its a good pointer on using NONE compatibility type so that even if schema
> > registry holds same id for a topic, each schema version under it is
> > entirely different schema. Is my understanding correct ?
> >
> > But,  when defines NONE,  the purpose of the schema registry itself
> > lost.Rght ?
> >
>
> I don't recommend using NONE. I've only ever used NONE during testing to
> allow a non-passive change to a schema to correct a previous mistake in a
> schema. This was done because deleting schemas wasn't an option (I believe
> in confluent 3.3.0 you can delete the association between a schema and a
> subject, but still cannot delete the schema itself). So, as you mention
> setting the value to NONE defeats the purpose (mostly) of the schema
> registry. If you only ever plan on dealing with the data in terms of
> generic records, NONE is fine, but you need a way of dealing with the
> multitude of types in your topic.
>
> [1]
> https://github.com/confluentinc/schema-registry/blob/
> 8eb664dbc84b1c2db3666fa0771eeb0e0909f892/avro-serializer/
> src/main/java/io/confluent/kafka/serializers/AbstractKafkaAvroSerDe.java#
> L83-L89
>
> [2]
> https://github.com/confluentinc/schema-registry/
> blob/master/avro-serializer/src/main/java/io/confluent/kafka/serializers/
> AbstractKafkaAvroSerializer.java#L74
>
> On Thu, Aug 17, 2017 at 1:10 PM, Sreejith S <sr...@gmail.com> wrote:
>
> > Hi Stephen,
> >
> > Thank you very much.
> >
> > Please give clarity on the statement.
> >
> > "each unique avro schema has a unique id associated with it. That id
> > can be used across multiple different topics. The enforcement of which
> > schemas are allowed in a particular topic comes down to the combination
> of
> > the subject (usually topic-name-key/value) and version (the version
> itself
> > starts at 1 inside the subject, and itself has an id that ties to the
> > globally unique schema id). ".
> >
> > How ?. You are always registering a schema against a topic using the
> > topicname and schema registry is assiging a unique id across the registry
> > cluster. Where is the global unique schema id here ?
> >
> > I think in Producer Consumer API you will have more freedom to pass a
> > schema id of ur choice and ask avro serialize/deserialize. But in connect
> > framework all these things are abstracted.
> >
> > Its a good pointer on using NONE compatibility type so that even if
> schema
> > registry holds same id for a topic, each schema version under it is
> > entirely different schema. Is my understanding correct ?
> >
> > But,  when defines NONE,  the purpose of the schema registry itself
> > lost.Rght ?
> >
> > Regards
> > Sreejith
> >
> > On 17-Aug-2017 11:03 pm, "Stephen Durfey" <sj...@gmail.com> wrote:
> >
> > > There is a little nuance to this topic (hehe). When it comes down to
> it,
> > > yes, each unique avro schema has a unique id associated with it. That
> id
> > > can be used across multiple different topics. The enforcement of which
> > > schemas are allowed in a particular topic comes down to the combination
> > of
> > > the subject (usually topic-name-key/value) and version (the version
> > itself
> > > starts at 1 inside the subject, and itself has an id that ties to the
> > > globally unique schema id). . So, yes, you can have multiple schemas
> > > within
> > > the same topic, and thats perfectly fine, so long as you're correctly
> > > configuring the schema registry.
> > >
> > > Whether or not a schema is allowed to be registered for a particular
> > > subject is dependent upon the type of avro compatilibty enforced. There
> > are
> > > 4 types: BACKWARD, FORWARD, FULL (combines forward and backward), and
> > NONE.
> > > The schema registry is going to evaluate the schema being published to
> > the
> > > history of schemas it knows about in the past for that subject +
> version
> > > combination. If the schema is evolved correctly according to the
> > particular
> > > type configured in the schema registry, it will be allowed.
> > >
> > > So, if you select NONE as the compatibility type the schema registry
> will
> > > allow any schema to be registered, even if they are not compatible
> > because
> > > you've informed the registry not to care. So, you should really choose
> > > amongst backward, forward, and full. I use FULL in production because
> the
> > > data being written is long lived, and will have multiple readers and
> > > writers of the data, and the data needs to be passively evolved.
> Backward
> > > and forward can be fine too, just depending upon the needs of the data
> > > being produced and consumed.
> > >
> > > On Thu, Aug 17, 2017 at 12:22 PM, Tauzell, Dave <
> > > Dave.Tauzell@surescripts.com> wrote:
> > >
> > > > Hmm, I think you are right that you cannot have multiple schemas on
> the
> > > > same topic.
> > > >
> > > > -Dave
> > > >
> > > >
> > > > -----Original Message-----
> > > > From: Sreejith S [mailto:srssreejith@gmail.com]
> > > > Sent: Thursday, August 17, 2017 11:42 AM
> > > > To: users@kafka.apache.org
> > > > Subject: RE: Different Schemas on same Kafka Topic
> > > >
> > > > Hi Dave,
> > > >
> > > > Would like to get a clarity on one thing.  If i register more than
> one
> > > > schema for a topic, i am providing topic-key, topic-value to the
> schema
> > > > registry.
> > > >
> > > > Id is created by schema registry and it will create different version
> > of
> > > > different schema. Still all schema have same id.  Am i right ?
> > > >
> > > > If so, all avro messages holds same id. Then how multiple schemas on
> > same
> > > > topic possble ?
> > > >
> > > > Please clarify
> > > >
> > > > Thanks,
> > > > Sreejith
> > > >
> > > > On 17-Aug-2017 9:49 pm, "Tauzell, Dave" <
> Dave.Tauzell@surescripts.com>
> > > > wrote:
> > > >
> > > > > > How does consumer  know A is the avro class when there could be
> > > > > > other
> > > > > classes like B,C and D denoting different schemas?.
> > > > >
> > > > > There isn't a good way.   One option is to have an avro wrapper
> that
> > > > > contains type, version and data fields.  Then you wrap everything.
> > > > >  Another option is to do what Kafka is doing and prepend some sort
> of
> > > > > fixed length value to all messages that have the schema and version
> > > > > you are using for that message.
> > > > >
> > > > > -Dave
> > > > >
> > > > > -----Original Message-----
> > > > > From: Shajahan, Nishanth [mailto:nshajaha@visa.com]
> > > > > Sent: Thursday, August 17, 2017 11:02 AM
> > > > > To: users@kafka.apache.org
> > > > > Subject: RE: Different Schemas on same Kafka Topic
> > > > >
> > > > > Thanks Dave. We may not want to start using schema registery
> > > immediately
> > > > .
> > > > > We would have java producers and consumers . I might also go with
> > > > > using byte messages  but when  consumer de serilize how can they
> map
> > > > > the byte[] to the correct Avro object  For example:
> > > > >
> > > > > KafkaConsumer<String,A> consumer = new
> > > > > KafkaConsumer<>(consumerConfig,new
> > > > > StringDeserializer(),new AvroDeserializer<>(A));
> > > > >
> > > > > How does consumer  know A is the avro class when there could be
> other
> > > > > classes like B,C and D denoting different schemas?.
> > > > >
> > > > >
> > > > > -Nishanth
> > > > >
> > > > > -----Original Message-----
> > > > > From: Tauzell, Dave [mailto:Dave.Tauzell@surescripts.com]
> > > > > Sent: Thursday, August 17, 2017 8:30 AM
> > > > > To: users@kafka.apache.org
> > > > > Subject: RE: Different Schemas on same Kafka Topic
> > > > >
> > > > > It does.  The way it works is that the Avro serializer precedes
> each
> > > > > message with a two-byte integer that references a schema id in the
> > > > > confluent schema registry.   The Avro deserializer looks at this
> > value
> > > to
> > > > > determine which schema to de-serialize with.   In order for this to
> > > work
> > > > > you need to use the java client on both ends and have the schema
> > > > > registry setup.
> > > > >
> > > > > We have some slightly different needs ( including non-java
> languages)
> > > > > so we are just using byte messages and then have our applications
> do
> > > > > the serialization and deserialization.
> > > > >
> > > > > -Dave
> > > > >
> > > > > -----Original Message-----
> > > > > From: Shajahan, Nishanth [mailto:nshajaha@visa.com]
> > > > > Sent: Wednesday, August 16, 2017 5:13 PM
> > > > > To: users@kafka.apache.org
> > > > > Subject: Different Schemas on same Kafka Topic
> > > > >
> > > > > Hello,
> > > > >
> > > > > Does kafka support writing different    avro record types(very
> > > different
> > > > > schema) to the same topic . I  guess we would have to write our own
> > > > > avro serializer and de serializer to do this ?. Is there a
> preferred
> > > > > way to do this ?.It would be great if some one can point me in the
> > > right
> > > > direction.
> > > > >
> > > > > Thanks,
> > > > > Nishanth
> > > > >
> > > > > This e-mail and any files transmitted with it are confidential, may
> > > > > contain sensitive information, and are intended solely for the use
> of
> > > > > the individual or entity to whom they are addressed. If you have
> > > > > received this e-mail in error, please notify the sender by reply
> > > > > e-mail immediately and destroy all copies of the e-mail and any
> > > > attachments.
> > > > >
> > > > >
> > > > >
> > > > This e-mail and any files transmitted with it are confidential, may
> > > > contain sensitive information, and are intended solely for the use of
> > the
> > > > individual or entity to whom they are addressed. If you have received
> > > this
> > > > e-mail in error, please notify the sender by reply e-mail immediately
> > and
> > > > destroy all copies of the e-mail and any attachments.
> > > >
> > >
> >
>

Re: Different Schemas on same Kafka Topic

Posted by Stephen Durfey <sj...@gmail.com>.
There's a lot to unpack here, so I'll do my best to answer.

How ?. You are always registering a schema against a topic using the
> topicname and schema registry is assiging a unique id across the registry
> cluster. Where is the global unique schema id here ?


When I say globally unique, I mean that the id for a particular schema
(when I say schema I'm referring to each unique version of the schema as it
evolves) is unique across all schemas that the registry knows about. The id
for a particular schema can appear in many different topics, but will
always refer to one and only one schema (so long as you dont lose your
_schemas topic, but thats a different discussion). Schemas are stored
uniquely, but can be used by many subjects (a subject being usually the
<topic-name>-key/value) [1]. So, you can have one schema appearing inside
many different topics, and have the same id re-used, since the ids are
unique per schema, not per subject.

I think in Producer Consumer API you will have more freedom to pass a
> schema id of ur choice and ask avro serialize/deserialize. But in connect
> framework all these things are abstracted.
>

I disagree with this. When using the schema registry it is up to the
serializer used to interact with it. For this part I'm specifically talking
about the confluent kafka SerDe's. If those are being used, the behavior
will be the same regardless of whether it is used in a generic
KafkaProducer or in Kafka Connect. That serializer will interact with the
schema registry (if configured to do so), and will register schemas on
behalf of the producer. The schema registry must be in control of all
schema IDs (see here: [2]) and it cannot be delegated to the producer.
Otherwise it would be possible for multple producers to generate the same
ID, and thus during deserialization the consumer wouldn't know which schema
to deserialize with. In kafka connect, the SerDr operations are carried out
by the specified DataConverter in the worker properties. In the quickstart
version it defaults to using the AvroConverter, which uses the confluent
SerDe's

Its a good pointer on using NONE compatibility type so that even if schema
> registry holds same id for a topic, each schema version under it is
> entirely different schema. Is my understanding correct ?
>
> But,  when defines NONE,  the purpose of the schema registry itself
> lost.Rght ?
>

I don't recommend using NONE. I've only ever used NONE during testing to
allow a non-passive change to a schema to correct a previous mistake in a
schema. This was done because deleting schemas wasn't an option (I believe
in confluent 3.3.0 you can delete the association between a schema and a
subject, but still cannot delete the schema itself). So, as you mention
setting the value to NONE defeats the purpose (mostly) of the schema
registry. If you only ever plan on dealing with the data in terms of
generic records, NONE is fine, but you need a way of dealing with the
multitude of types in your topic.

[1]
https://github.com/confluentinc/schema-registry/blob/8eb664dbc84b1c2db3666fa0771eeb0e0909f892/avro-serializer/src/main/java/io/confluent/kafka/serializers/AbstractKafkaAvroSerDe.java#L83-L89

[2]
https://github.com/confluentinc/schema-registry/blob/master/avro-serializer/src/main/java/io/confluent/kafka/serializers/AbstractKafkaAvroSerializer.java#L74

On Thu, Aug 17, 2017 at 1:10 PM, Sreejith S <sr...@gmail.com> wrote:

> Hi Stephen,
>
> Thank you very much.
>
> Please give clarity on the statement.
>
> "each unique avro schema has a unique id associated with it. That id
> can be used across multiple different topics. The enforcement of which
> schemas are allowed in a particular topic comes down to the combination of
> the subject (usually topic-name-key/value) and version (the version itself
> starts at 1 inside the subject, and itself has an id that ties to the
> globally unique schema id). ".
>
> How ?. You are always registering a schema against a topic using the
> topicname and schema registry is assiging a unique id across the registry
> cluster. Where is the global unique schema id here ?
>
> I think in Producer Consumer API you will have more freedom to pass a
> schema id of ur choice and ask avro serialize/deserialize. But in connect
> framework all these things are abstracted.
>
> Its a good pointer on using NONE compatibility type so that even if schema
> registry holds same id for a topic, each schema version under it is
> entirely different schema. Is my understanding correct ?
>
> But,  when defines NONE,  the purpose of the schema registry itself
> lost.Rght ?
>
> Regards
> Sreejith
>
> On 17-Aug-2017 11:03 pm, "Stephen Durfey" <sj...@gmail.com> wrote:
>
> > There is a little nuance to this topic (hehe). When it comes down to it,
> > yes, each unique avro schema has a unique id associated with it. That id
> > can be used across multiple different topics. The enforcement of which
> > schemas are allowed in a particular topic comes down to the combination
> of
> > the subject (usually topic-name-key/value) and version (the version
> itself
> > starts at 1 inside the subject, and itself has an id that ties to the
> > globally unique schema id). . So, yes, you can have multiple schemas
> > within
> > the same topic, and thats perfectly fine, so long as you're correctly
> > configuring the schema registry.
> >
> > Whether or not a schema is allowed to be registered for a particular
> > subject is dependent upon the type of avro compatilibty enforced. There
> are
> > 4 types: BACKWARD, FORWARD, FULL (combines forward and backward), and
> NONE.
> > The schema registry is going to evaluate the schema being published to
> the
> > history of schemas it knows about in the past for that subject + version
> > combination. If the schema is evolved correctly according to the
> particular
> > type configured in the schema registry, it will be allowed.
> >
> > So, if you select NONE as the compatibility type the schema registry will
> > allow any schema to be registered, even if they are not compatible
> because
> > you've informed the registry not to care. So, you should really choose
> > amongst backward, forward, and full. I use FULL in production because the
> > data being written is long lived, and will have multiple readers and
> > writers of the data, and the data needs to be passively evolved. Backward
> > and forward can be fine too, just depending upon the needs of the data
> > being produced and consumed.
> >
> > On Thu, Aug 17, 2017 at 12:22 PM, Tauzell, Dave <
> > Dave.Tauzell@surescripts.com> wrote:
> >
> > > Hmm, I think you are right that you cannot have multiple schemas on the
> > > same topic.
> > >
> > > -Dave
> > >
> > >
> > > -----Original Message-----
> > > From: Sreejith S [mailto:srssreejith@gmail.com]
> > > Sent: Thursday, August 17, 2017 11:42 AM
> > > To: users@kafka.apache.org
> > > Subject: RE: Different Schemas on same Kafka Topic
> > >
> > > Hi Dave,
> > >
> > > Would like to get a clarity on one thing.  If i register more than one
> > > schema for a topic, i am providing topic-key, topic-value to the schema
> > > registry.
> > >
> > > Id is created by schema registry and it will create different version
> of
> > > different schema. Still all schema have same id.  Am i right ?
> > >
> > > If so, all avro messages holds same id. Then how multiple schemas on
> same
> > > topic possble ?
> > >
> > > Please clarify
> > >
> > > Thanks,
> > > Sreejith
> > >
> > > On 17-Aug-2017 9:49 pm, "Tauzell, Dave" <Da...@surescripts.com>
> > > wrote:
> > >
> > > > > How does consumer  know A is the avro class when there could be
> > > > > other
> > > > classes like B,C and D denoting different schemas?.
> > > >
> > > > There isn't a good way.   One option is to have an avro wrapper that
> > > > contains type, version and data fields.  Then you wrap everything.
> > > >  Another option is to do what Kafka is doing and prepend some sort of
> > > > fixed length value to all messages that have the schema and version
> > > > you are using for that message.
> > > >
> > > > -Dave
> > > >
> > > > -----Original Message-----
> > > > From: Shajahan, Nishanth [mailto:nshajaha@visa.com]
> > > > Sent: Thursday, August 17, 2017 11:02 AM
> > > > To: users@kafka.apache.org
> > > > Subject: RE: Different Schemas on same Kafka Topic
> > > >
> > > > Thanks Dave. We may not want to start using schema registery
> > immediately
> > > .
> > > > We would have java producers and consumers . I might also go with
> > > > using byte messages  but when  consumer de serilize how can they  map
> > > > the byte[] to the correct Avro object  For example:
> > > >
> > > > KafkaConsumer<String,A> consumer = new
> > > > KafkaConsumer<>(consumerConfig,new
> > > > StringDeserializer(),new AvroDeserializer<>(A));
> > > >
> > > > How does consumer  know A is the avro class when there could be other
> > > > classes like B,C and D denoting different schemas?.
> > > >
> > > >
> > > > -Nishanth
> > > >
> > > > -----Original Message-----
> > > > From: Tauzell, Dave [mailto:Dave.Tauzell@surescripts.com]
> > > > Sent: Thursday, August 17, 2017 8:30 AM
> > > > To: users@kafka.apache.org
> > > > Subject: RE: Different Schemas on same Kafka Topic
> > > >
> > > > It does.  The way it works is that the Avro serializer precedes each
> > > > message with a two-byte integer that references a schema id in the
> > > > confluent schema registry.   The Avro deserializer looks at this
> value
> > to
> > > > determine which schema to de-serialize with.   In order for this to
> > work
> > > > you need to use the java client on both ends and have the schema
> > > > registry setup.
> > > >
> > > > We have some slightly different needs ( including non-java languages)
> > > > so we are just using byte messages and then have our applications do
> > > > the serialization and deserialization.
> > > >
> > > > -Dave
> > > >
> > > > -----Original Message-----
> > > > From: Shajahan, Nishanth [mailto:nshajaha@visa.com]
> > > > Sent: Wednesday, August 16, 2017 5:13 PM
> > > > To: users@kafka.apache.org
> > > > Subject: Different Schemas on same Kafka Topic
> > > >
> > > > Hello,
> > > >
> > > > Does kafka support writing different    avro record types(very
> > different
> > > > schema) to the same topic . I  guess we would have to write our own
> > > > avro serializer and de serializer to do this ?. Is there a preferred
> > > > way to do this ?.It would be great if some one can point me in the
> > right
> > > direction.
> > > >
> > > > Thanks,
> > > > Nishanth
> > > >
> > > > This e-mail and any files transmitted with it are confidential, may
> > > > contain sensitive information, and are intended solely for the use of
> > > > the individual or entity to whom they are addressed. If you have
> > > > received this e-mail in error, please notify the sender by reply
> > > > e-mail immediately and destroy all copies of the e-mail and any
> > > attachments.
> > > >
> > > >
> > > >
> > > This e-mail and any files transmitted with it are confidential, may
> > > contain sensitive information, and are intended solely for the use of
> the
> > > individual or entity to whom they are addressed. If you have received
> > this
> > > e-mail in error, please notify the sender by reply e-mail immediately
> and
> > > destroy all copies of the e-mail and any attachments.
> > >
> >
>

Re: Different Schemas on same Kafka Topic

Posted by Sreejith S <sr...@gmail.com>.
Hi Stephen,

Thank you very much.

Please give clarity on the statement.

"each unique avro schema has a unique id associated with it. That id
can be used across multiple different topics. The enforcement of which
schemas are allowed in a particular topic comes down to the combination of
the subject (usually topic-name-key/value) and version (the version itself
starts at 1 inside the subject, and itself has an id that ties to the
globally unique schema id). ".

How ?. You are always registering a schema against a topic using the
topicname and schema registry is assiging a unique id across the registry
cluster. Where is the global unique schema id here ?

I think in Producer Consumer API you will have more freedom to pass a
schema id of ur choice and ask avro serialize/deserialize. But in connect
framework all these things are abstracted.

Its a good pointer on using NONE compatibility type so that even if schema
registry holds same id for a topic, each schema version under it is
entirely different schema. Is my understanding correct ?

But,  when defines NONE,  the purpose of the schema registry itself
lost.Rght ?

Regards
Sreejith

On 17-Aug-2017 11:03 pm, "Stephen Durfey" <sj...@gmail.com> wrote:

> There is a little nuance to this topic (hehe). When it comes down to it,
> yes, each unique avro schema has a unique id associated with it. That id
> can be used across multiple different topics. The enforcement of which
> schemas are allowed in a particular topic comes down to the combination of
> the subject (usually topic-name-key/value) and version (the version itself
> starts at 1 inside the subject, and itself has an id that ties to the
> globally unique schema id). . So, yes, you can have multiple schemas
> within
> the same topic, and thats perfectly fine, so long as you're correctly
> configuring the schema registry.
>
> Whether or not a schema is allowed to be registered for a particular
> subject is dependent upon the type of avro compatilibty enforced. There are
> 4 types: BACKWARD, FORWARD, FULL (combines forward and backward), and NONE.
> The schema registry is going to evaluate the schema being published to the
> history of schemas it knows about in the past for that subject + version
> combination. If the schema is evolved correctly according to the particular
> type configured in the schema registry, it will be allowed.
>
> So, if you select NONE as the compatibility type the schema registry will
> allow any schema to be registered, even if they are not compatible because
> you've informed the registry not to care. So, you should really choose
> amongst backward, forward, and full. I use FULL in production because the
> data being written is long lived, and will have multiple readers and
> writers of the data, and the data needs to be passively evolved. Backward
> and forward can be fine too, just depending upon the needs of the data
> being produced and consumed.
>
> On Thu, Aug 17, 2017 at 12:22 PM, Tauzell, Dave <
> Dave.Tauzell@surescripts.com> wrote:
>
> > Hmm, I think you are right that you cannot have multiple schemas on the
> > same topic.
> >
> > -Dave
> >
> >
> > -----Original Message-----
> > From: Sreejith S [mailto:srssreejith@gmail.com]
> > Sent: Thursday, August 17, 2017 11:42 AM
> > To: users@kafka.apache.org
> > Subject: RE: Different Schemas on same Kafka Topic
> >
> > Hi Dave,
> >
> > Would like to get a clarity on one thing.  If i register more than one
> > schema for a topic, i am providing topic-key, topic-value to the schema
> > registry.
> >
> > Id is created by schema registry and it will create different version of
> > different schema. Still all schema have same id.  Am i right ?
> >
> > If so, all avro messages holds same id. Then how multiple schemas on same
> > topic possble ?
> >
> > Please clarify
> >
> > Thanks,
> > Sreejith
> >
> > On 17-Aug-2017 9:49 pm, "Tauzell, Dave" <Da...@surescripts.com>
> > wrote:
> >
> > > > How does consumer  know A is the avro class when there could be
> > > > other
> > > classes like B,C and D denoting different schemas?.
> > >
> > > There isn't a good way.   One option is to have an avro wrapper that
> > > contains type, version and data fields.  Then you wrap everything.
> > >  Another option is to do what Kafka is doing and prepend some sort of
> > > fixed length value to all messages that have the schema and version
> > > you are using for that message.
> > >
> > > -Dave
> > >
> > > -----Original Message-----
> > > From: Shajahan, Nishanth [mailto:nshajaha@visa.com]
> > > Sent: Thursday, August 17, 2017 11:02 AM
> > > To: users@kafka.apache.org
> > > Subject: RE: Different Schemas on same Kafka Topic
> > >
> > > Thanks Dave. We may not want to start using schema registery
> immediately
> > .
> > > We would have java producers and consumers . I might also go with
> > > using byte messages  but when  consumer de serilize how can they  map
> > > the byte[] to the correct Avro object  For example:
> > >
> > > KafkaConsumer<String,A> consumer = new
> > > KafkaConsumer<>(consumerConfig,new
> > > StringDeserializer(),new AvroDeserializer<>(A));
> > >
> > > How does consumer  know A is the avro class when there could be other
> > > classes like B,C and D denoting different schemas?.
> > >
> > >
> > > -Nishanth
> > >
> > > -----Original Message-----
> > > From: Tauzell, Dave [mailto:Dave.Tauzell@surescripts.com]
> > > Sent: Thursday, August 17, 2017 8:30 AM
> > > To: users@kafka.apache.org
> > > Subject: RE: Different Schemas on same Kafka Topic
> > >
> > > It does.  The way it works is that the Avro serializer precedes each
> > > message with a two-byte integer that references a schema id in the
> > > confluent schema registry.   The Avro deserializer looks at this value
> to
> > > determine which schema to de-serialize with.   In order for this to
> work
> > > you need to use the java client on both ends and have the schema
> > > registry setup.
> > >
> > > We have some slightly different needs ( including non-java languages)
> > > so we are just using byte messages and then have our applications do
> > > the serialization and deserialization.
> > >
> > > -Dave
> > >
> > > -----Original Message-----
> > > From: Shajahan, Nishanth [mailto:nshajaha@visa.com]
> > > Sent: Wednesday, August 16, 2017 5:13 PM
> > > To: users@kafka.apache.org
> > > Subject: Different Schemas on same Kafka Topic
> > >
> > > Hello,
> > >
> > > Does kafka support writing different    avro record types(very
> different
> > > schema) to the same topic . I  guess we would have to write our own
> > > avro serializer and de serializer to do this ?. Is there a preferred
> > > way to do this ?.It would be great if some one can point me in the
> right
> > direction.
> > >
> > > Thanks,
> > > Nishanth
> > >
> > > This e-mail and any files transmitted with it are confidential, may
> > > contain sensitive information, and are intended solely for the use of
> > > the individual or entity to whom they are addressed. If you have
> > > received this e-mail in error, please notify the sender by reply
> > > e-mail immediately and destroy all copies of the e-mail and any
> > attachments.
> > >
> > >
> > >
> > This e-mail and any files transmitted with it are confidential, may
> > contain sensitive information, and are intended solely for the use of the
> > individual or entity to whom they are addressed. If you have received
> this
> > e-mail in error, please notify the sender by reply e-mail immediately and
> > destroy all copies of the e-mail and any attachments.
> >
>

Re: Different Schemas on same Kafka Topic

Posted by Stephen Durfey <sj...@gmail.com>.
There is a little nuance to this topic (hehe). When it comes down to it,
yes, each unique avro schema has a unique id associated with it. That id
can be used across multiple different topics. The enforcement of which
schemas are allowed in a particular topic comes down to the combination of
the subject (usually topic-name-key/value) and version (the version itself
starts at 1 inside the subject, and itself has an id that ties to the
globally unique schema id). So, yes, you can have multiple schemas within
the same topic, and thats perfectly fine, so long as you're correctly
configuring the schema registry.

Whether or not a schema is allowed to be registered for a particular
subject is dependent upon the type of avro compatilibty enforced. There are
4 types: BACKWARD, FORWARD, FULL (combines forward and backward), and NONE.
The schema registry is going to evaluate the schema being published to the
history of schemas it knows about in the past for that subject + version
combination. If the schema is evolved correctly according to the particular
type configured in the schema registry, it will be allowed.

So, if you select NONE as the compatibility type the schema registry will
allow any schema to be registered, even if they are not compatible because
you've informed the registry not to care. So, you should really choose
amongst backward, forward, and full. I use FULL in production because the
data being written is long lived, and will have multiple readers and
writers of the data, and the data needs to be passively evolved. Backward
and forward can be fine too, just depending upon the needs of the data
being produced and consumed.

On Thu, Aug 17, 2017 at 12:22 PM, Tauzell, Dave <
Dave.Tauzell@surescripts.com> wrote:

> Hmm, I think you are right that you cannot have multiple schemas on the
> same topic.
>
> -Dave
>
>
> -----Original Message-----
> From: Sreejith S [mailto:srssreejith@gmail.com]
> Sent: Thursday, August 17, 2017 11:42 AM
> To: users@kafka.apache.org
> Subject: RE: Different Schemas on same Kafka Topic
>
> Hi Dave,
>
> Would like to get a clarity on one thing.  If i register more than one
> schema for a topic, i am providing topic-key, topic-value to the schema
> registry.
>
> Id is created by schema registry and it will create different version of
> different schema. Still all schema have same id.  Am i right ?
>
> If so, all avro messages holds same id. Then how multiple schemas on same
> topic possble ?
>
> Please clarify
>
> Thanks,
> Sreejith
>
> On 17-Aug-2017 9:49 pm, "Tauzell, Dave" <Da...@surescripts.com>
> wrote:
>
> > > How does consumer  know A is the avro class when there could be
> > > other
> > classes like B,C and D denoting different schemas?.
> >
> > There isn't a good way.   One option is to have an avro wrapper that
> > contains type, version and data fields.  Then you wrap everything.
> >  Another option is to do what Kafka is doing and prepend some sort of
> > fixed length value to all messages that have the schema and version
> > you are using for that message.
> >
> > -Dave
> >
> > -----Original Message-----
> > From: Shajahan, Nishanth [mailto:nshajaha@visa.com]
> > Sent: Thursday, August 17, 2017 11:02 AM
> > To: users@kafka.apache.org
> > Subject: RE: Different Schemas on same Kafka Topic
> >
> > Thanks Dave. We may not want to start using schema registery immediately
> .
> > We would have java producers and consumers . I might also go with
> > using byte messages  but when  consumer de serilize how can they  map
> > the byte[] to the correct Avro object  For example:
> >
> > KafkaConsumer<String,A> consumer = new
> > KafkaConsumer<>(consumerConfig,new
> > StringDeserializer(),new AvroDeserializer<>(A));
> >
> > How does consumer  know A is the avro class when there could be other
> > classes like B,C and D denoting different schemas?.
> >
> >
> > -Nishanth
> >
> > -----Original Message-----
> > From: Tauzell, Dave [mailto:Dave.Tauzell@surescripts.com]
> > Sent: Thursday, August 17, 2017 8:30 AM
> > To: users@kafka.apache.org
> > Subject: RE: Different Schemas on same Kafka Topic
> >
> > It does.  The way it works is that the Avro serializer precedes each
> > message with a two-byte integer that references a schema id in the
> > confluent schema registry.   The Avro deserializer looks at this value to
> > determine which schema to de-serialize with.   In order for this to work
> > you need to use the java client on both ends and have the schema
> > registry setup.
> >
> > We have some slightly different needs ( including non-java languages)
> > so we are just using byte messages and then have our applications do
> > the serialization and deserialization.
> >
> > -Dave
> >
> > -----Original Message-----
> > From: Shajahan, Nishanth [mailto:nshajaha@visa.com]
> > Sent: Wednesday, August 16, 2017 5:13 PM
> > To: users@kafka.apache.org
> > Subject: Different Schemas on same Kafka Topic
> >
> > Hello,
> >
> > Does kafka support writing different    avro record types(very different
> > schema) to the same topic . I  guess we would have to write our own
> > avro serializer and de serializer to do this ?. Is there a preferred
> > way to do this ?.It would be great if some one can point me in the right
> direction.
> >
> > Thanks,
> > Nishanth
> >
> > This e-mail and any files transmitted with it are confidential, may
> > contain sensitive information, and are intended solely for the use of
> > the individual or entity to whom they are addressed. If you have
> > received this e-mail in error, please notify the sender by reply
> > e-mail immediately and destroy all copies of the e-mail and any
> attachments.
> >
> >
> >
> This e-mail and any files transmitted with it are confidential, may
> contain sensitive information, and are intended solely for the use of the
> individual or entity to whom they are addressed. If you have received this
> e-mail in error, please notify the sender by reply e-mail immediately and
> destroy all copies of the e-mail and any attachments.
>

RE: Different Schemas on same Kafka Topic

Posted by "Tauzell, Dave" <Da...@surescripts.com>.
Hmm, I think you are right that you cannot have multiple schemas on the same topic.

-Dave


-----Original Message-----
From: Sreejith S [mailto:srssreejith@gmail.com]
Sent: Thursday, August 17, 2017 11:42 AM
To: users@kafka.apache.org
Subject: RE: Different Schemas on same Kafka Topic

Hi Dave,

Would like to get a clarity on one thing.  If i register more than one schema for a topic, i am providing topic-key, topic-value to the schema registry.

Id is created by schema registry and it will create different version of different schema. Still all schema have same id.  Am i right ?

If so, all avro messages holds same id. Then how multiple schemas on same topic possble ?

Please clarify

Thanks,
Sreejith

On 17-Aug-2017 9:49 pm, "Tauzell, Dave" <Da...@surescripts.com>
wrote:

> > How does consumer  know A is the avro class when there could be
> > other
> classes like B,C and D denoting different schemas?.
>
> There isn't a good way.   One option is to have an avro wrapper that
> contains type, version and data fields.  Then you wrap everything.
>  Another option is to do what Kafka is doing and prepend some sort of
> fixed length value to all messages that have the schema and version
> you are using for that message.
>
> -Dave
>
> -----Original Message-----
> From: Shajahan, Nishanth [mailto:nshajaha@visa.com]
> Sent: Thursday, August 17, 2017 11:02 AM
> To: users@kafka.apache.org
> Subject: RE: Different Schemas on same Kafka Topic
>
> Thanks Dave. We may not want to start using schema registery immediately .
> We would have java producers and consumers . I might also go with
> using byte messages  but when  consumer de serilize how can they  map
> the byte[] to the correct Avro object  For example:
>
> KafkaConsumer<String,A> consumer = new
> KafkaConsumer<>(consumerConfig,new
> StringDeserializer(),new AvroDeserializer<>(A));
>
> How does consumer  know A is the avro class when there could be other
> classes like B,C and D denoting different schemas?.
>
>
> -Nishanth
>
> -----Original Message-----
> From: Tauzell, Dave [mailto:Dave.Tauzell@surescripts.com]
> Sent: Thursday, August 17, 2017 8:30 AM
> To: users@kafka.apache.org
> Subject: RE: Different Schemas on same Kafka Topic
>
> It does.  The way it works is that the Avro serializer precedes each
> message with a two-byte integer that references a schema id in the
> confluent schema registry.   The Avro deserializer looks at this value to
> determine which schema to de-serialize with.   In order for this to work
> you need to use the java client on both ends and have the schema
> registry setup.
>
> We have some slightly different needs ( including non-java languages)
> so we are just using byte messages and then have our applications do
> the serialization and deserialization.
>
> -Dave
>
> -----Original Message-----
> From: Shajahan, Nishanth [mailto:nshajaha@visa.com]
> Sent: Wednesday, August 16, 2017 5:13 PM
> To: users@kafka.apache.org
> Subject: Different Schemas on same Kafka Topic
>
> Hello,
>
> Does kafka support writing different    avro record types(very different
> schema) to the same topic . I  guess we would have to write our own
> avro serializer and de serializer to do this ?. Is there a preferred
> way to do this ?.It would be great if some one can point me in the right direction.
>
> Thanks,
> Nishanth
>
> This e-mail and any files transmitted with it are confidential, may
> contain sensitive information, and are intended solely for the use of
> the individual or entity to whom they are addressed. If you have
> received this e-mail in error, please notify the sender by reply
> e-mail immediately and destroy all copies of the e-mail and any attachments.
>
>
>
This e-mail and any files transmitted with it are confidential, may contain sensitive information, and are intended solely for the use of the individual or entity to whom they are addressed. If you have received this e-mail in error, please notify the sender by reply e-mail immediately and destroy all copies of the e-mail and any attachments.

RE: Different Schemas on same Kafka Topic

Posted by Sreejith S <sr...@gmail.com>.
Hi Dave,

Would like to get a clarity on one thing.  If i register more than one
schema for a topic, i am providing topic-key, topic-value to the schema
registry.

Id is created by schema registry and it will create different version of
different schema. Still all schema have same id.  Am i right ?

If so, all avro messages holds same id. Then how multiple schemas on same
topic possble ?

Please clarify

Thanks,
Sreejith

On 17-Aug-2017 9:49 pm, "Tauzell, Dave" <Da...@surescripts.com>
wrote:

> > How does consumer  know A is the avro class when there could be other
> classes like B,C and D denoting different schemas?.
>
> There isn't a good way.   One option is to have an avro wrapper that
> contains type, version and data fields.  Then you wrap everything.
>  Another option is to do what Kafka is doing and prepend some sort of fixed
> length value to all messages that have the schema and version you are using
> for that message.
>
> -Dave
>
> -----Original Message-----
> From: Shajahan, Nishanth [mailto:nshajaha@visa.com]
> Sent: Thursday, August 17, 2017 11:02 AM
> To: users@kafka.apache.org
> Subject: RE: Different Schemas on same Kafka Topic
>
> Thanks Dave. We may not want to start using schema registery immediately .
> We would have java producers and consumers . I might also go with using
> byte messages  but when  consumer de serilize how can they  map the byte[]
> to the correct Avro object  For example:
>
> KafkaConsumer<String,A> consumer = new KafkaConsumer<>(consumerConfig,new
> StringDeserializer(),new AvroDeserializer<>(A));
>
> How does consumer  know A is the avro class when there could be other
> classes like B,C and D denoting different schemas?.
>
>
> -Nishanth
>
> -----Original Message-----
> From: Tauzell, Dave [mailto:Dave.Tauzell@surescripts.com]
> Sent: Thursday, August 17, 2017 8:30 AM
> To: users@kafka.apache.org
> Subject: RE: Different Schemas on same Kafka Topic
>
> It does.  The way it works is that the Avro serializer precedes each
> message with a two-byte integer that references a schema id in the
> confluent schema registry.   The Avro deserializer looks at this value to
> determine which schema to de-serialize with.   In order for this to work
> you need to use the java client on both ends and have the schema registry
> setup.
>
> We have some slightly different needs ( including non-java languages) so
> we are just using byte messages and then have our applications do the
> serialization and deserialization.
>
> -Dave
>
> -----Original Message-----
> From: Shajahan, Nishanth [mailto:nshajaha@visa.com]
> Sent: Wednesday, August 16, 2017 5:13 PM
> To: users@kafka.apache.org
> Subject: Different Schemas on same Kafka Topic
>
> Hello,
>
> Does kafka support writing different    avro record types(very different
> schema) to the same topic . I  guess we would have to write our own avro
> serializer and de serializer to do this ?. Is there a preferred way to do
> this ?.It would be great if some one can point me in the right direction.
>
> Thanks,
> Nishanth
>
> This e-mail and any files transmitted with it are confidential, may
> contain sensitive information, and are intended solely for the use of the
> individual or entity to whom they are addressed. If you have received this
> e-mail in error, please notify the sender by reply e-mail immediately and
> destroy all copies of the e-mail and any attachments.
>
>
>

RE: Different Schemas on same Kafka Topic

Posted by "Tauzell, Dave" <Da...@surescripts.com>.
> How does consumer  know A is the avro class when there could be other classes like B,C and D denoting different schemas?.

There isn't a good way.   One option is to have an avro wrapper that contains type, version and data fields.  Then you wrap everything.   Another option is to do what Kafka is doing and prepend some sort of fixed length value to all messages that have the schema and version you are using for that message.

-Dave

-----Original Message-----
From: Shajahan, Nishanth [mailto:nshajaha@visa.com] 
Sent: Thursday, August 17, 2017 11:02 AM
To: users@kafka.apache.org
Subject: RE: Different Schemas on same Kafka Topic

Thanks Dave. We may not want to start using schema registery immediately . We would have java producers and consumers . I might also go with using byte messages  but when  consumer de serilize how can they  map the byte[] to the correct Avro object  For example:

KafkaConsumer<String,A> consumer = new KafkaConsumer<>(consumerConfig,new StringDeserializer(),new AvroDeserializer<>(A));

How does consumer  know A is the avro class when there could be other classes like B,C and D denoting different schemas?.


-Nishanth

-----Original Message-----
From: Tauzell, Dave [mailto:Dave.Tauzell@surescripts.com] 
Sent: Thursday, August 17, 2017 8:30 AM
To: users@kafka.apache.org
Subject: RE: Different Schemas on same Kafka Topic

It does.  The way it works is that the Avro serializer precedes each message with a two-byte integer that references a schema id in the confluent schema registry.   The Avro deserializer looks at this value to determine which schema to de-serialize with.   In order for this to work you need to use the java client on both ends and have the schema registry setup.

We have some slightly different needs ( including non-java languages) so we are just using byte messages and then have our applications do the serialization and deserialization.

-Dave

-----Original Message-----
From: Shajahan, Nishanth [mailto:nshajaha@visa.com]
Sent: Wednesday, August 16, 2017 5:13 PM
To: users@kafka.apache.org
Subject: Different Schemas on same Kafka Topic

Hello,

Does kafka support writing different    avro record types(very different schema) to the same topic . I  guess we would have to write our own avro serializer and de serializer to do this ?. Is there a preferred way to do this ?.It would be great if some one can point me in the right direction.

Thanks,
Nishanth

This e-mail and any files transmitted with it are confidential, may contain sensitive information, and are intended solely for the use of the individual or entity to whom they are addressed. If you have received this e-mail in error, please notify the sender by reply e-mail immediately and destroy all copies of the e-mail and any attachments.



RE: Different Schemas on same Kafka Topic

Posted by "Shajahan, Nishanth" <ns...@visa.com>.
Thanks Dave. We may not want to start using schema registery immediately . We would have java producers and consumers . I might also go with using byte messages  but when  consumer de serilize how can they  map the byte[] to the correct Avro object  For example:

KafkaConsumer<String,A> consumer = new KafkaConsumer<>(consumerConfig,new StringDeserializer(),new AvroDeserializer<>(A));

How does consumer  know A is the avro class when there could be other classes like B,C and D denoting different schemas?.


-Nishanth

-----Original Message-----
From: Tauzell, Dave [mailto:Dave.Tauzell@surescripts.com] 
Sent: Thursday, August 17, 2017 8:30 AM
To: users@kafka.apache.org
Subject: RE: Different Schemas on same Kafka Topic

It does.  The way it works is that the Avro serializer precedes each message with a two-byte integer that references a schema id in the confluent schema registry.   The Avro deserializer looks at this value to determine which schema to de-serialize with.   In order for this to work you need to use the java client on both ends and have the schema registry setup.

We have some slightly different needs ( including non-java languages) so we are just using byte messages and then have our applications do the serialization and deserialization.

-Dave

-----Original Message-----
From: Shajahan, Nishanth [mailto:nshajaha@visa.com]
Sent: Wednesday, August 16, 2017 5:13 PM
To: users@kafka.apache.org
Subject: Different Schemas on same Kafka Topic

Hello,

Does kafka support writing different    avro record types(very different schema) to the same topic . I  guess we would have to write our own avro serializer and de serializer to do this ?. Is there a preferred way to do this ?.It would be great if some one can point me in the right direction.

Thanks,
Nishanth

This e-mail and any files transmitted with it are confidential, may contain sensitive information, and are intended solely for the use of the individual or entity to whom they are addressed. If you have received this e-mail in error, please notify the sender by reply e-mail immediately and destroy all copies of the e-mail and any attachments.



RE: Different Schemas on same Kafka Topic

Posted by "Tauzell, Dave" <Da...@surescripts.com>.
It does.  The way it works is that the Avro serializer precedes each message with a two-byte integer that references a schema id in the confluent schema registry.   The Avro deserializer looks at this value to determine which schema to de-serialize with.   In order for this to work you need to use the java client on both ends and have the schema registry setup.

We have some slightly different needs ( including non-java languages) so we are just using byte messages and then have our applications do the serialization and deserialization.

-Dave

-----Original Message-----
From: Shajahan, Nishanth [mailto:nshajaha@visa.com]
Sent: Wednesday, August 16, 2017 5:13 PM
To: users@kafka.apache.org
Subject: Different Schemas on same Kafka Topic

Hello,

Does kafka support writing different    avro record types(very different schema) to the same topic . I  guess we would have to write our own avro serializer and de serializer to do this ?. Is there a preferred way to do this ?.It would be great if some one can point me in the right direction.

Thanks,
Nishanth

This e-mail and any files transmitted with it are confidential, may contain sensitive information, and are intended solely for the use of the individual or entity to whom they are addressed. If you have received this e-mail in error, please notify the sender by reply e-mail immediately and destroy all copies of the e-mail and any attachments.