You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pulsar.apache.org by ro...@gmail.com, ro...@gmail.com on 2020/04/14 15:35:24 UTC

Re: [DISCUSS] PIP: Producer Send Message with Different Schema

Now that PIP-43 is released in 2.5.0, I wanted to follow up on the messages below.

What is remaining to be done in Pulsar to support having multiple different types on one topic in Pulsar? Yi indicates below that PIP-43 sets the stage for this, but that the schema compatibility implementation still would need some work.

Would this require another PIP, or just an issue to track the work?

Regards,
Raman

On 2019/09/16 01:32:39, Yi Tang <ss...@gmail.com> wrote: 
> Hi rarma,
> 
> It's a great and important feature, I think. This PIP requires the
> compatibility check from bottom registry only and doesn't touch the
> implementation detail. I think we should address this feature in the
> future, and this PIP provides the essential ability to implement it.
> 
> Thanks,
> Yi
> 
> rocketraman@gmail.com <ro...@gmail.com> 于 2019年9月15日周日 22:36写道:
> 
> > I see a mention of compatibility in the PIP but with no details.  The docs
> > about schema compatibility state this:
> >
> > > Consequently, those events need to go in the same Pulsar partition to
> > maintain order. This application can use ALWAYS_COMPATIBLE to allow
> > different kinds of events co-exist in the same topic.
> >
> > With this PIP, this limitation can be relaxed, and schema compatibility
> > should be able to be strengthened, since each type of message on a topic
> > can have its own schema, and compatibility can then be checked against only
> > other schemas for the same type. Kafka does this via the concept of
> > "subjects" in the schema registry, and subjects default to just the topic
> > name (plus a "-key" or "-value" suffix since keys and values can both have
> > their own schemas), but can also include (via an injectable strategy) the
> > message type. Compatibility is managed at the subject level.
> >
> > Is this something that should be addressed in this PIP, or in future
> > follow-on work? This is critical to supporting ordering across different
> > message types, with schema compatibility verification by Pulsar.
> >
> > Regards,
> > Raman
> >
> >
> >
> > On 2019/09/03 05:12:32, 唐谊 <ss...@gmail.com> wrote:
> > > Hi all;
> > >
> > > I am drafting a proposal to support the producer to send messages with
> > > different schema.
> > >
> > > ## Motivation
> > > For now, Pulsar producer can only produce messages of one type of schema
> > > which is determined by user when it is created, or by fecthing the latest
> > > version of schema from registry if AUTO_PRODUCE_BYTES type is specified.
> > > Schema, however, can be updated by external system after producer
> > started,
> > > which would lead to inconsistency between messsage payload and schema
> > > version metadata. Also some senarios like replicating from kafka require
> > a
> > > single producer for replicating messages of different schemas from one
> > > Kafka partition to one Pulsar partition to guarantee the order and no
> > > duplicates.
> > >
> > > Here proposing that messages can indicate the associated schema by
> > itself,
> > > for more detail,
> > >
> > https://gist.github.com/yittg/56c6dedf7509f634ec7effc4f6f3631d#file-pip-md
> > >
> > > Looking forward to any feedback.
> > >
> > > Thanks,
> > > Yi
> > >
> >
> 

Re: [DISCUSS] PIP: Producer Send Message with Different Schema

Posted by ro...@gmail.com, ro...@gmail.com.
> On Wed, Apr 15, 2020 at 4:25 AM Shivji Kumar Jha <sh...@gmail.com> wrote:
> 
> > Hi Sijie,
> >
> > I second with Raman. Apart from PIP-43 and PIP-44 which ease schema
> > management, in my opinion, we should also loosely couple the association
> > between topic and schema (or more precisely *type of data* on topic) which
> > is 1 to 1 as of now.

That's it exactly.

On 2020/04/15 22:23:03, Sijie Guo <gu...@gmail.com> wrote: 
> I see. I wasn't sure that Raman is looking for this capability based on his
> previous email.

Sorry for the confusion. You would have had to look further back in the email thread -- in a previous message I describe Kafka's decoupling of topic/schema association via the "subject" concept, and the capability this provides in terms of tracking compatibility for multiple message types on one topic. Thanks Shivji for picking this up -- I look forward to reading your PIP....

Regards,
Raman


> 
> I do agree that decoupling the relationship between topic and schema can
> drive more use cases. It is a great feature to add.
> 
> We will pick this up and come up a PIP for introducing this capability.
> 
> Thanks,
> Sijie
> 
> On Wed, Apr 15, 2020 at 4:25 AM Shivji Kumar Jha <sh...@gmail.com> wrote:
> 
> > Hi Sijie,
> >
> > I second with Raman. Apart from PIP-43 and PIP-44 which ease schema
> > management, in my opinion, we should also loosely couple the association
> > between topic and schema (or more precisely *type of data* on topic) which
> > is 1 to 1 as of now.
> >
> >    1. The schema (or schema versions of one data type) could be grouped
> >    into what Kafka calls *subject*.
> >    2. The schema compatibility should then be done among schemas in the
> >    same subject only.
> >    3. One topic can associate with multiple schema subjects and have their
> >    own evolution paths.
> >    4. Similarly, one subject can also associate to multiple topics.
> >
> > *Use case:*
> > This feature would be handy when one needs different business models in a
> > strictly ordered fashion. At the same time, these business models have
> > their own evolution paths too. As an example, an event sourcing system
> > could have events like customerCreated, customerAddressChanged,
> > customerInvoicePaid events etc required in order.
> >
> > The ideas presented above are picked from here
> > <https://martin.kleppmann.com/2018/01/18/event-types-in-kafka-topic.html>.
> >
> > Regards,
> > Shivji Kumar Jha
> > http://www.shivjijha.com/
> > +91 8884075512
> >
> >
> > On Wed, Apr 15, 2020 at 2:27 AM Sijie Guo <gu...@gmail.com> wrote:
> >
> > > Hi Raman,
> > >
> > > The schema compatibility strategies were already there prior to PIP-43.
> > >
> > > PIP-44 enhances the schema compatibility strategy support.
> > >
> > > Both of the changes are already landed in 2.5.0 release.
> > >
> > > Did you see any issues when you tryout this feature?
> > >
> > > - Sijie
> > >
> > > On Tue, Apr 14, 2020 at 8:35 AM rocketraman@gmail.com <
> > > rocketraman@gmail.com>
> > > wrote:
> > >
> > > > Now that PIP-43 is released in 2.5.0, I wanted to follow up on the
> > > > messages below.
> > > >
> > > > What is remaining to be done in Pulsar to support having multiple
> > > > different types on one topic in Pulsar? Yi indicates below that PIP-43
> > > sets
> > > > the stage for this, but that the schema compatibility implementation
> > > still
> > > > would need some work.
> > > >
> > > > Would this require another PIP, or just an issue to track the work?
> > > >
> > > > Regards,
> > > > Raman
> > > >
> > > > On 2019/09/16 01:32:39, Yi Tang <ss...@gmail.com> wrote:
> > > > > Hi rarma,
> > > > >
> > > > > It's a great and important feature, I think. This PIP requires the
> > > > > compatibility check from bottom registry only and doesn't touch the
> > > > > implementation detail. I think we should address this feature in the
> > > > > future, and this PIP provides the essential ability to implement it.
> > > > >
> > > > > Thanks,
> > > > > Yi
> > > > >
> > > > > rocketraman@gmail.com <ro...@gmail.com> 于 2019年9月15日周日
> > 22:36写道:
> > > > >
> > > > > > I see a mention of compatibility in the PIP but with no details.
> > The
> > > > docs
> > > > > > about schema compatibility state this:
> > > > > >
> > > > > > > Consequently, those events need to go in the same Pulsar
> > partition
> > > to
> > > > > > maintain order. This application can use ALWAYS_COMPATIBLE to allow
> > > > > > different kinds of events co-exist in the same topic.
> > > > > >
> > > > > > With this PIP, this limitation can be relaxed, and schema
> > > compatibility
> > > > > > should be able to be strengthened, since each type of message on a
> > > > topic
> > > > > > can have its own schema, and compatibility can then be checked
> > > against
> > > > only
> > > > > > other schemas for the same type. Kafka does this via the concept of
> > > > > > "subjects" in the schema registry, and subjects default to just the
> > > > topic
> > > > > > name (plus a "-key" or "-value" suffix since keys and values can
> > both
> > > > have
> > > > > > their own schemas), but can also include (via an injectable
> > strategy)
> > > > the
> > > > > > message type. Compatibility is managed at the subject level.
> > > > > >
> > > > > > Is this something that should be addressed in this PIP, or in
> > future
> > > > > > follow-on work? This is critical to supporting ordering across
> > > > different
> > > > > > message types, with schema compatibility verification by Pulsar.
> > > > > >
> > > > > > Regards,
> > > > > > Raman
> > > > > >
> > > > > >
> > > > > >
> > > > > > On 2019/09/03 05:12:32, 唐谊 <ss...@gmail.com> wrote:
> > > > > > > Hi all;
> > > > > > >
> > > > > > > I am drafting a proposal to support the producer to send messages
> > > > with
> > > > > > > different schema.
> > > > > > >
> > > > > > > ## Motivation
> > > > > > > For now, Pulsar producer can only produce messages of one type of
> > > > schema
> > > > > > > which is determined by user when it is created, or by fecthing
> > the
> > > > latest
> > > > > > > version of schema from registry if AUTO_PRODUCE_BYTES type is
> > > > specified.
> > > > > > > Schema, however, can be updated by external system after producer
> > > > > > started,
> > > > > > > which would lead to inconsistency between messsage payload and
> > > schema
> > > > > > > version metadata. Also some senarios like replicating from kafka
> > > > require
> > > > > > a
> > > > > > > single producer for replicating messages of different schemas
> > from
> > > > one
> > > > > > > Kafka partition to one Pulsar partition to guarantee the order
> > and
> > > no
> > > > > > > duplicates.
> > > > > > >
> > > > > > > Here proposing that messages can indicate the associated schema
> > by
> > > > > > itself,
> > > > > > > for more detail,
> > > > > > >
> > > > > >
> > > >
> > >
> > https://gist.github.com/yittg/56c6dedf7509f634ec7effc4f6f3631d#file-pip-md
> > > > > > >
> > > > > > > Looking forward to any feedback.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Yi
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> 

Re: [DISCUSS] PIP: Producer Send Message with Different Schema

Posted by Shivji Kumar Jha <sh...@gmail.com>.
Sure Sijie.

Regards,
Shivji Kumar Jha
http://www.shivjijha.com/
+91 8884075512


On Thu, Apr 16, 2020 at 2:22 PM Sijie Guo <gu...@gmail.com> wrote:

> Yeah!
>
> I don't think there is anyone picking this up yet. You are very welcome to
> contribute to this feature. Can you start putting up a PIP for it?
>
> Thanks,
> Sijie
>
> On Wed, Apr 15, 2020 at 9:35 PM Shivji Kumar Jha <sh...@gmail.com>
> wrote:
>
> > Hi Sijie,
> >
> > If no one has picked this up, I would like to volunteer for this feature.
> >
> > Regards,
> > Shivji Kumar Jha
> > http://www.shivjijha.com/
> > +91 8884075512
> >
> >
> > On Thu, Apr 16, 2020 at 3:53 AM Sijie Guo <gu...@gmail.com> wrote:
> >
> > > I see. I wasn't sure that Raman is looking for this capability based on
> > his
> > > previous email.
> > >
> > > I do agree that decoupling the relationship between topic and schema
> can
> > > drive more use cases. It is a great feature to add.
> > >
> > > We will pick this up and come up a PIP for introducing this capability.
> > >
> > > Thanks,
> > > Sijie
> > >
> > > On Wed, Apr 15, 2020 at 4:25 AM Shivji Kumar Jha <sh...@gmail.com>
> > > wrote:
> > >
> > > > Hi Sijie,
> > > >
> > > > I second with Raman. Apart from PIP-43 and PIP-44 which ease schema
> > > > management, in my opinion, we should also loosely couple the
> > association
> > > > between topic and schema (or more precisely *type of data* on topic)
> > > which
> > > > is 1 to 1 as of now.
> > > >
> > > >    1. The schema (or schema versions of one data type) could be
> grouped
> > > >    into what Kafka calls *subject*.
> > > >    2. The schema compatibility should then be done among schemas in
> the
> > > >    same subject only.
> > > >    3. One topic can associate with multiple schema subjects and have
> > > their
> > > >    own evolution paths.
> > > >    4. Similarly, one subject can also associate to multiple topics.
> > > >
> > > > *Use case:*
> > > > This feature would be handy when one needs different business models
> > in a
> > > > strictly ordered fashion. At the same time, these business models
> have
> > > > their own evolution paths too. As an example, an event sourcing
> system
> > > > could have events like customerCreated, customerAddressChanged,
> > > > customerInvoicePaid events etc required in order.
> > > >
> > > > The ideas presented above are picked from here
> > > > <
> > https://martin.kleppmann.com/2018/01/18/event-types-in-kafka-topic.html
> > > >.
> > > >
> > > > Regards,
> > > > Shivji Kumar Jha
> > > > http://www.shivjijha.com/
> > > > +91 8884075512
> > > >
> > > >
> > > > On Wed, Apr 15, 2020 at 2:27 AM Sijie Guo <gu...@gmail.com>
> wrote:
> > > >
> > > > > Hi Raman,
> > > > >
> > > > > The schema compatibility strategies were already there prior to
> > PIP-43.
> > > > >
> > > > > PIP-44 enhances the schema compatibility strategy support.
> > > > >
> > > > > Both of the changes are already landed in 2.5.0 release.
> > > > >
> > > > > Did you see any issues when you tryout this feature?
> > > > >
> > > > > - Sijie
> > > > >
> > > > > On Tue, Apr 14, 2020 at 8:35 AM rocketraman@gmail.com <
> > > > > rocketraman@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Now that PIP-43 is released in 2.5.0, I wanted to follow up on
> the
> > > > > > messages below.
> > > > > >
> > > > > > What is remaining to be done in Pulsar to support having multiple
> > > > > > different types on one topic in Pulsar? Yi indicates below that
> > > PIP-43
> > > > > sets
> > > > > > the stage for this, but that the schema compatibility
> > implementation
> > > > > still
> > > > > > would need some work.
> > > > > >
> > > > > > Would this require another PIP, or just an issue to track the
> work?
> > > > > >
> > > > > > Regards,
> > > > > > Raman
> > > > > >
> > > > > > On 2019/09/16 01:32:39, Yi Tang <ss...@gmail.com> wrote:
> > > > > > > Hi rarma,
> > > > > > >
> > > > > > > It's a great and important feature, I think. This PIP requires
> > the
> > > > > > > compatibility check from bottom registry only and doesn't touch
> > the
> > > > > > > implementation detail. I think we should address this feature
> in
> > > the
> > > > > > > future, and this PIP provides the essential ability to
> implement
> > > it.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Yi
> > > > > > >
> > > > > > > rocketraman@gmail.com <ro...@gmail.com> 于 2019年9月15日周日
> > > > 22:36写道:
> > > > > > >
> > > > > > > > I see a mention of compatibility in the PIP but with no
> > details.
> > > > The
> > > > > > docs
> > > > > > > > about schema compatibility state this:
> > > > > > > >
> > > > > > > > > Consequently, those events need to go in the same Pulsar
> > > > partition
> > > > > to
> > > > > > > > maintain order. This application can use ALWAYS_COMPATIBLE to
> > > allow
> > > > > > > > different kinds of events co-exist in the same topic.
> > > > > > > >
> > > > > > > > With this PIP, this limitation can be relaxed, and schema
> > > > > compatibility
> > > > > > > > should be able to be strengthened, since each type of message
> > on
> > > a
> > > > > > topic
> > > > > > > > can have its own schema, and compatibility can then be
> checked
> > > > > against
> > > > > > only
> > > > > > > > other schemas for the same type. Kafka does this via the
> > concept
> > > of
> > > > > > > > "subjects" in the schema registry, and subjects default to
> just
> > > the
> > > > > > topic
> > > > > > > > name (plus a "-key" or "-value" suffix since keys and values
> > can
> > > > both
> > > > > > have
> > > > > > > > their own schemas), but can also include (via an injectable
> > > > strategy)
> > > > > > the
> > > > > > > > message type. Compatibility is managed at the subject level.
> > > > > > > >
> > > > > > > > Is this something that should be addressed in this PIP, or in
> > > > future
> > > > > > > > follow-on work? This is critical to supporting ordering
> across
> > > > > > different
> > > > > > > > message types, with schema compatibility verification by
> > Pulsar.
> > > > > > > >
> > > > > > > > Regards,
> > > > > > > > Raman
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > On 2019/09/03 05:12:32, 唐谊 <ss...@gmail.com> wrote:
> > > > > > > > > Hi all;
> > > > > > > > >
> > > > > > > > > I am drafting a proposal to support the producer to send
> > > messages
> > > > > > with
> > > > > > > > > different schema.
> > > > > > > > >
> > > > > > > > > ## Motivation
> > > > > > > > > For now, Pulsar producer can only produce messages of one
> > type
> > > of
> > > > > > schema
> > > > > > > > > which is determined by user when it is created, or by
> > fecthing
> > > > the
> > > > > > latest
> > > > > > > > > version of schema from registry if AUTO_PRODUCE_BYTES type
> is
> > > > > > specified.
> > > > > > > > > Schema, however, can be updated by external system after
> > > producer
> > > > > > > > started,
> > > > > > > > > which would lead to inconsistency between messsage payload
> > and
> > > > > schema
> > > > > > > > > version metadata. Also some senarios like replicating from
> > > kafka
> > > > > > require
> > > > > > > > a
> > > > > > > > > single producer for replicating messages of different
> schemas
> > > > from
> > > > > > one
> > > > > > > > > Kafka partition to one Pulsar partition to guarantee the
> > order
> > > > and
> > > > > no
> > > > > > > > > duplicates.
> > > > > > > > >
> > > > > > > > > Here proposing that messages can indicate the associated
> > schema
> > > > by
> > > > > > > > itself,
> > > > > > > > > for more detail,
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://gist.github.com/yittg/56c6dedf7509f634ec7effc4f6f3631d#file-pip-md
> > > > > > > > >
> > > > > > > > > Looking forward to any feedback.
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > Yi
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] PIP: Producer Send Message with Different Schema

Posted by Sijie Guo <gu...@gmail.com>.
Yeah!

I don't think there is anyone picking this up yet. You are very welcome to
contribute to this feature. Can you start putting up a PIP for it?

Thanks,
Sijie

On Wed, Apr 15, 2020 at 9:35 PM Shivji Kumar Jha <sh...@gmail.com> wrote:

> Hi Sijie,
>
> If no one has picked this up, I would like to volunteer for this feature.
>
> Regards,
> Shivji Kumar Jha
> http://www.shivjijha.com/
> +91 8884075512
>
>
> On Thu, Apr 16, 2020 at 3:53 AM Sijie Guo <gu...@gmail.com> wrote:
>
> > I see. I wasn't sure that Raman is looking for this capability based on
> his
> > previous email.
> >
> > I do agree that decoupling the relationship between topic and schema can
> > drive more use cases. It is a great feature to add.
> >
> > We will pick this up and come up a PIP for introducing this capability.
> >
> > Thanks,
> > Sijie
> >
> > On Wed, Apr 15, 2020 at 4:25 AM Shivji Kumar Jha <sh...@gmail.com>
> > wrote:
> >
> > > Hi Sijie,
> > >
> > > I second with Raman. Apart from PIP-43 and PIP-44 which ease schema
> > > management, in my opinion, we should also loosely couple the
> association
> > > between topic and schema (or more precisely *type of data* on topic)
> > which
> > > is 1 to 1 as of now.
> > >
> > >    1. The schema (or schema versions of one data type) could be grouped
> > >    into what Kafka calls *subject*.
> > >    2. The schema compatibility should then be done among schemas in the
> > >    same subject only.
> > >    3. One topic can associate with multiple schema subjects and have
> > their
> > >    own evolution paths.
> > >    4. Similarly, one subject can also associate to multiple topics.
> > >
> > > *Use case:*
> > > This feature would be handy when one needs different business models
> in a
> > > strictly ordered fashion. At the same time, these business models have
> > > their own evolution paths too. As an example, an event sourcing system
> > > could have events like customerCreated, customerAddressChanged,
> > > customerInvoicePaid events etc required in order.
> > >
> > > The ideas presented above are picked from here
> > > <
> https://martin.kleppmann.com/2018/01/18/event-types-in-kafka-topic.html
> > >.
> > >
> > > Regards,
> > > Shivji Kumar Jha
> > > http://www.shivjijha.com/
> > > +91 8884075512
> > >
> > >
> > > On Wed, Apr 15, 2020 at 2:27 AM Sijie Guo <gu...@gmail.com> wrote:
> > >
> > > > Hi Raman,
> > > >
> > > > The schema compatibility strategies were already there prior to
> PIP-43.
> > > >
> > > > PIP-44 enhances the schema compatibility strategy support.
> > > >
> > > > Both of the changes are already landed in 2.5.0 release.
> > > >
> > > > Did you see any issues when you tryout this feature?
> > > >
> > > > - Sijie
> > > >
> > > > On Tue, Apr 14, 2020 at 8:35 AM rocketraman@gmail.com <
> > > > rocketraman@gmail.com>
> > > > wrote:
> > > >
> > > > > Now that PIP-43 is released in 2.5.0, I wanted to follow up on the
> > > > > messages below.
> > > > >
> > > > > What is remaining to be done in Pulsar to support having multiple
> > > > > different types on one topic in Pulsar? Yi indicates below that
> > PIP-43
> > > > sets
> > > > > the stage for this, but that the schema compatibility
> implementation
> > > > still
> > > > > would need some work.
> > > > >
> > > > > Would this require another PIP, or just an issue to track the work?
> > > > >
> > > > > Regards,
> > > > > Raman
> > > > >
> > > > > On 2019/09/16 01:32:39, Yi Tang <ss...@gmail.com> wrote:
> > > > > > Hi rarma,
> > > > > >
> > > > > > It's a great and important feature, I think. This PIP requires
> the
> > > > > > compatibility check from bottom registry only and doesn't touch
> the
> > > > > > implementation detail. I think we should address this feature in
> > the
> > > > > > future, and this PIP provides the essential ability to implement
> > it.
> > > > > >
> > > > > > Thanks,
> > > > > > Yi
> > > > > >
> > > > > > rocketraman@gmail.com <ro...@gmail.com> 于 2019年9月15日周日
> > > 22:36写道:
> > > > > >
> > > > > > > I see a mention of compatibility in the PIP but with no
> details.
> > > The
> > > > > docs
> > > > > > > about schema compatibility state this:
> > > > > > >
> > > > > > > > Consequently, those events need to go in the same Pulsar
> > > partition
> > > > to
> > > > > > > maintain order. This application can use ALWAYS_COMPATIBLE to
> > allow
> > > > > > > different kinds of events co-exist in the same topic.
> > > > > > >
> > > > > > > With this PIP, this limitation can be relaxed, and schema
> > > > compatibility
> > > > > > > should be able to be strengthened, since each type of message
> on
> > a
> > > > > topic
> > > > > > > can have its own schema, and compatibility can then be checked
> > > > against
> > > > > only
> > > > > > > other schemas for the same type. Kafka does this via the
> concept
> > of
> > > > > > > "subjects" in the schema registry, and subjects default to just
> > the
> > > > > topic
> > > > > > > name (plus a "-key" or "-value" suffix since keys and values
> can
> > > both
> > > > > have
> > > > > > > their own schemas), but can also include (via an injectable
> > > strategy)
> > > > > the
> > > > > > > message type. Compatibility is managed at the subject level.
> > > > > > >
> > > > > > > Is this something that should be addressed in this PIP, or in
> > > future
> > > > > > > follow-on work? This is critical to supporting ordering across
> > > > > different
> > > > > > > message types, with schema compatibility verification by
> Pulsar.
> > > > > > >
> > > > > > > Regards,
> > > > > > > Raman
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On 2019/09/03 05:12:32, 唐谊 <ss...@gmail.com> wrote:
> > > > > > > > Hi all;
> > > > > > > >
> > > > > > > > I am drafting a proposal to support the producer to send
> > messages
> > > > > with
> > > > > > > > different schema.
> > > > > > > >
> > > > > > > > ## Motivation
> > > > > > > > For now, Pulsar producer can only produce messages of one
> type
> > of
> > > > > schema
> > > > > > > > which is determined by user when it is created, or by
> fecthing
> > > the
> > > > > latest
> > > > > > > > version of schema from registry if AUTO_PRODUCE_BYTES type is
> > > > > specified.
> > > > > > > > Schema, however, can be updated by external system after
> > producer
> > > > > > > started,
> > > > > > > > which would lead to inconsistency between messsage payload
> and
> > > > schema
> > > > > > > > version metadata. Also some senarios like replicating from
> > kafka
> > > > > require
> > > > > > > a
> > > > > > > > single producer for replicating messages of different schemas
> > > from
> > > > > one
> > > > > > > > Kafka partition to one Pulsar partition to guarantee the
> order
> > > and
> > > > no
> > > > > > > > duplicates.
> > > > > > > >
> > > > > > > > Here proposing that messages can indicate the associated
> schema
> > > by
> > > > > > > itself,
> > > > > > > > for more detail,
> > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
> https://gist.github.com/yittg/56c6dedf7509f634ec7effc4f6f3631d#file-pip-md
> > > > > > > >
> > > > > > > > Looking forward to any feedback.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Yi
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] PIP: Producer Send Message with Different Schema

Posted by Shivji Kumar Jha <sh...@gmail.com>.
Hi Sijie,

If no one has picked this up, I would like to volunteer for this feature.

Regards,
Shivji Kumar Jha
http://www.shivjijha.com/
+91 8884075512


On Thu, Apr 16, 2020 at 3:53 AM Sijie Guo <gu...@gmail.com> wrote:

> I see. I wasn't sure that Raman is looking for this capability based on his
> previous email.
>
> I do agree that decoupling the relationship between topic and schema can
> drive more use cases. It is a great feature to add.
>
> We will pick this up and come up a PIP for introducing this capability.
>
> Thanks,
> Sijie
>
> On Wed, Apr 15, 2020 at 4:25 AM Shivji Kumar Jha <sh...@gmail.com>
> wrote:
>
> > Hi Sijie,
> >
> > I second with Raman. Apart from PIP-43 and PIP-44 which ease schema
> > management, in my opinion, we should also loosely couple the association
> > between topic and schema (or more precisely *type of data* on topic)
> which
> > is 1 to 1 as of now.
> >
> >    1. The schema (or schema versions of one data type) could be grouped
> >    into what Kafka calls *subject*.
> >    2. The schema compatibility should then be done among schemas in the
> >    same subject only.
> >    3. One topic can associate with multiple schema subjects and have
> their
> >    own evolution paths.
> >    4. Similarly, one subject can also associate to multiple topics.
> >
> > *Use case:*
> > This feature would be handy when one needs different business models in a
> > strictly ordered fashion. At the same time, these business models have
> > their own evolution paths too. As an example, an event sourcing system
> > could have events like customerCreated, customerAddressChanged,
> > customerInvoicePaid events etc required in order.
> >
> > The ideas presented above are picked from here
> > <https://martin.kleppmann.com/2018/01/18/event-types-in-kafka-topic.html
> >.
> >
> > Regards,
> > Shivji Kumar Jha
> > http://www.shivjijha.com/
> > +91 8884075512
> >
> >
> > On Wed, Apr 15, 2020 at 2:27 AM Sijie Guo <gu...@gmail.com> wrote:
> >
> > > Hi Raman,
> > >
> > > The schema compatibility strategies were already there prior to PIP-43.
> > >
> > > PIP-44 enhances the schema compatibility strategy support.
> > >
> > > Both of the changes are already landed in 2.5.0 release.
> > >
> > > Did you see any issues when you tryout this feature?
> > >
> > > - Sijie
> > >
> > > On Tue, Apr 14, 2020 at 8:35 AM rocketraman@gmail.com <
> > > rocketraman@gmail.com>
> > > wrote:
> > >
> > > > Now that PIP-43 is released in 2.5.0, I wanted to follow up on the
> > > > messages below.
> > > >
> > > > What is remaining to be done in Pulsar to support having multiple
> > > > different types on one topic in Pulsar? Yi indicates below that
> PIP-43
> > > sets
> > > > the stage for this, but that the schema compatibility implementation
> > > still
> > > > would need some work.
> > > >
> > > > Would this require another PIP, or just an issue to track the work?
> > > >
> > > > Regards,
> > > > Raman
> > > >
> > > > On 2019/09/16 01:32:39, Yi Tang <ss...@gmail.com> wrote:
> > > > > Hi rarma,
> > > > >
> > > > > It's a great and important feature, I think. This PIP requires the
> > > > > compatibility check from bottom registry only and doesn't touch the
> > > > > implementation detail. I think we should address this feature in
> the
> > > > > future, and this PIP provides the essential ability to implement
> it.
> > > > >
> > > > > Thanks,
> > > > > Yi
> > > > >
> > > > > rocketraman@gmail.com <ro...@gmail.com> 于 2019年9月15日周日
> > 22:36写道:
> > > > >
> > > > > > I see a mention of compatibility in the PIP but with no details.
> > The
> > > > docs
> > > > > > about schema compatibility state this:
> > > > > >
> > > > > > > Consequently, those events need to go in the same Pulsar
> > partition
> > > to
> > > > > > maintain order. This application can use ALWAYS_COMPATIBLE to
> allow
> > > > > > different kinds of events co-exist in the same topic.
> > > > > >
> > > > > > With this PIP, this limitation can be relaxed, and schema
> > > compatibility
> > > > > > should be able to be strengthened, since each type of message on
> a
> > > > topic
> > > > > > can have its own schema, and compatibility can then be checked
> > > against
> > > > only
> > > > > > other schemas for the same type. Kafka does this via the concept
> of
> > > > > > "subjects" in the schema registry, and subjects default to just
> the
> > > > topic
> > > > > > name (plus a "-key" or "-value" suffix since keys and values can
> > both
> > > > have
> > > > > > their own schemas), but can also include (via an injectable
> > strategy)
> > > > the
> > > > > > message type. Compatibility is managed at the subject level.
> > > > > >
> > > > > > Is this something that should be addressed in this PIP, or in
> > future
> > > > > > follow-on work? This is critical to supporting ordering across
> > > > different
> > > > > > message types, with schema compatibility verification by Pulsar.
> > > > > >
> > > > > > Regards,
> > > > > > Raman
> > > > > >
> > > > > >
> > > > > >
> > > > > > On 2019/09/03 05:12:32, 唐谊 <ss...@gmail.com> wrote:
> > > > > > > Hi all;
> > > > > > >
> > > > > > > I am drafting a proposal to support the producer to send
> messages
> > > > with
> > > > > > > different schema.
> > > > > > >
> > > > > > > ## Motivation
> > > > > > > For now, Pulsar producer can only produce messages of one type
> of
> > > > schema
> > > > > > > which is determined by user when it is created, or by fecthing
> > the
> > > > latest
> > > > > > > version of schema from registry if AUTO_PRODUCE_BYTES type is
> > > > specified.
> > > > > > > Schema, however, can be updated by external system after
> producer
> > > > > > started,
> > > > > > > which would lead to inconsistency between messsage payload and
> > > schema
> > > > > > > version metadata. Also some senarios like replicating from
> kafka
> > > > require
> > > > > > a
> > > > > > > single producer for replicating messages of different schemas
> > from
> > > > one
> > > > > > > Kafka partition to one Pulsar partition to guarantee the order
> > and
> > > no
> > > > > > > duplicates.
> > > > > > >
> > > > > > > Here proposing that messages can indicate the associated schema
> > by
> > > > > > itself,
> > > > > > > for more detail,
> > > > > > >
> > > > > >
> > > >
> > >
> >
> https://gist.github.com/yittg/56c6dedf7509f634ec7effc4f6f3631d#file-pip-md
> > > > > > >
> > > > > > > Looking forward to any feedback.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Yi
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] PIP: Producer Send Message with Different Schema

Posted by Sijie Guo <gu...@gmail.com>.
I see. I wasn't sure that Raman is looking for this capability based on his
previous email.

I do agree that decoupling the relationship between topic and schema can
drive more use cases. It is a great feature to add.

We will pick this up and come up a PIP for introducing this capability.

Thanks,
Sijie

On Wed, Apr 15, 2020 at 4:25 AM Shivji Kumar Jha <sh...@gmail.com> wrote:

> Hi Sijie,
>
> I second with Raman. Apart from PIP-43 and PIP-44 which ease schema
> management, in my opinion, we should also loosely couple the association
> between topic and schema (or more precisely *type of data* on topic) which
> is 1 to 1 as of now.
>
>    1. The schema (or schema versions of one data type) could be grouped
>    into what Kafka calls *subject*.
>    2. The schema compatibility should then be done among schemas in the
>    same subject only.
>    3. One topic can associate with multiple schema subjects and have their
>    own evolution paths.
>    4. Similarly, one subject can also associate to multiple topics.
>
> *Use case:*
> This feature would be handy when one needs different business models in a
> strictly ordered fashion. At the same time, these business models have
> their own evolution paths too. As an example, an event sourcing system
> could have events like customerCreated, customerAddressChanged,
> customerInvoicePaid events etc required in order.
>
> The ideas presented above are picked from here
> <https://martin.kleppmann.com/2018/01/18/event-types-in-kafka-topic.html>.
>
> Regards,
> Shivji Kumar Jha
> http://www.shivjijha.com/
> +91 8884075512
>
>
> On Wed, Apr 15, 2020 at 2:27 AM Sijie Guo <gu...@gmail.com> wrote:
>
> > Hi Raman,
> >
> > The schema compatibility strategies were already there prior to PIP-43.
> >
> > PIP-44 enhances the schema compatibility strategy support.
> >
> > Both of the changes are already landed in 2.5.0 release.
> >
> > Did you see any issues when you tryout this feature?
> >
> > - Sijie
> >
> > On Tue, Apr 14, 2020 at 8:35 AM rocketraman@gmail.com <
> > rocketraman@gmail.com>
> > wrote:
> >
> > > Now that PIP-43 is released in 2.5.0, I wanted to follow up on the
> > > messages below.
> > >
> > > What is remaining to be done in Pulsar to support having multiple
> > > different types on one topic in Pulsar? Yi indicates below that PIP-43
> > sets
> > > the stage for this, but that the schema compatibility implementation
> > still
> > > would need some work.
> > >
> > > Would this require another PIP, or just an issue to track the work?
> > >
> > > Regards,
> > > Raman
> > >
> > > On 2019/09/16 01:32:39, Yi Tang <ss...@gmail.com> wrote:
> > > > Hi rarma,
> > > >
> > > > It's a great and important feature, I think. This PIP requires the
> > > > compatibility check from bottom registry only and doesn't touch the
> > > > implementation detail. I think we should address this feature in the
> > > > future, and this PIP provides the essential ability to implement it.
> > > >
> > > > Thanks,
> > > > Yi
> > > >
> > > > rocketraman@gmail.com <ro...@gmail.com> 于 2019年9月15日周日
> 22:36写道:
> > > >
> > > > > I see a mention of compatibility in the PIP but with no details.
> The
> > > docs
> > > > > about schema compatibility state this:
> > > > >
> > > > > > Consequently, those events need to go in the same Pulsar
> partition
> > to
> > > > > maintain order. This application can use ALWAYS_COMPATIBLE to allow
> > > > > different kinds of events co-exist in the same topic.
> > > > >
> > > > > With this PIP, this limitation can be relaxed, and schema
> > compatibility
> > > > > should be able to be strengthened, since each type of message on a
> > > topic
> > > > > can have its own schema, and compatibility can then be checked
> > against
> > > only
> > > > > other schemas for the same type. Kafka does this via the concept of
> > > > > "subjects" in the schema registry, and subjects default to just the
> > > topic
> > > > > name (plus a "-key" or "-value" suffix since keys and values can
> both
> > > have
> > > > > their own schemas), but can also include (via an injectable
> strategy)
> > > the
> > > > > message type. Compatibility is managed at the subject level.
> > > > >
> > > > > Is this something that should be addressed in this PIP, or in
> future
> > > > > follow-on work? This is critical to supporting ordering across
> > > different
> > > > > message types, with schema compatibility verification by Pulsar.
> > > > >
> > > > > Regards,
> > > > > Raman
> > > > >
> > > > >
> > > > >
> > > > > On 2019/09/03 05:12:32, 唐谊 <ss...@gmail.com> wrote:
> > > > > > Hi all;
> > > > > >
> > > > > > I am drafting a proposal to support the producer to send messages
> > > with
> > > > > > different schema.
> > > > > >
> > > > > > ## Motivation
> > > > > > For now, Pulsar producer can only produce messages of one type of
> > > schema
> > > > > > which is determined by user when it is created, or by fecthing
> the
> > > latest
> > > > > > version of schema from registry if AUTO_PRODUCE_BYTES type is
> > > specified.
> > > > > > Schema, however, can be updated by external system after producer
> > > > > started,
> > > > > > which would lead to inconsistency between messsage payload and
> > schema
> > > > > > version metadata. Also some senarios like replicating from kafka
> > > require
> > > > > a
> > > > > > single producer for replicating messages of different schemas
> from
> > > one
> > > > > > Kafka partition to one Pulsar partition to guarantee the order
> and
> > no
> > > > > > duplicates.
> > > > > >
> > > > > > Here proposing that messages can indicate the associated schema
> by
> > > > > itself,
> > > > > > for more detail,
> > > > > >
> > > > >
> > >
> >
> https://gist.github.com/yittg/56c6dedf7509f634ec7effc4f6f3631d#file-pip-md
> > > > > >
> > > > > > Looking forward to any feedback.
> > > > > >
> > > > > > Thanks,
> > > > > > Yi
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] PIP: Producer Send Message with Different Schema

Posted by Shivji Kumar Jha <sh...@gmail.com>.
Hi Sijie,

I second with Raman. Apart from PIP-43 and PIP-44 which ease schema
management, in my opinion, we should also loosely couple the association
between topic and schema (or more precisely *type of data* on topic) which
is 1 to 1 as of now.

   1. The schema (or schema versions of one data type) could be grouped
   into what Kafka calls *subject*.
   2. The schema compatibility should then be done among schemas in the
   same subject only.
   3. One topic can associate with multiple schema subjects and have their
   own evolution paths.
   4. Similarly, one subject can also associate to multiple topics.

*Use case:*
This feature would be handy when one needs different business models in a
strictly ordered fashion. At the same time, these business models have
their own evolution paths too. As an example, an event sourcing system
could have events like customerCreated, customerAddressChanged,
customerInvoicePaid events etc required in order.

The ideas presented above are picked from here
<https://martin.kleppmann.com/2018/01/18/event-types-in-kafka-topic.html>.

Regards,
Shivji Kumar Jha
http://www.shivjijha.com/
+91 8884075512


On Wed, Apr 15, 2020 at 2:27 AM Sijie Guo <gu...@gmail.com> wrote:

> Hi Raman,
>
> The schema compatibility strategies were already there prior to PIP-43.
>
> PIP-44 enhances the schema compatibility strategy support.
>
> Both of the changes are already landed in 2.5.0 release.
>
> Did you see any issues when you tryout this feature?
>
> - Sijie
>
> On Tue, Apr 14, 2020 at 8:35 AM rocketraman@gmail.com <
> rocketraman@gmail.com>
> wrote:
>
> > Now that PIP-43 is released in 2.5.0, I wanted to follow up on the
> > messages below.
> >
> > What is remaining to be done in Pulsar to support having multiple
> > different types on one topic in Pulsar? Yi indicates below that PIP-43
> sets
> > the stage for this, but that the schema compatibility implementation
> still
> > would need some work.
> >
> > Would this require another PIP, or just an issue to track the work?
> >
> > Regards,
> > Raman
> >
> > On 2019/09/16 01:32:39, Yi Tang <ss...@gmail.com> wrote:
> > > Hi rarma,
> > >
> > > It's a great and important feature, I think. This PIP requires the
> > > compatibility check from bottom registry only and doesn't touch the
> > > implementation detail. I think we should address this feature in the
> > > future, and this PIP provides the essential ability to implement it.
> > >
> > > Thanks,
> > > Yi
> > >
> > > rocketraman@gmail.com <ro...@gmail.com> 于 2019年9月15日周日 22:36写道:
> > >
> > > > I see a mention of compatibility in the PIP but with no details.  The
> > docs
> > > > about schema compatibility state this:
> > > >
> > > > > Consequently, those events need to go in the same Pulsar partition
> to
> > > > maintain order. This application can use ALWAYS_COMPATIBLE to allow
> > > > different kinds of events co-exist in the same topic.
> > > >
> > > > With this PIP, this limitation can be relaxed, and schema
> compatibility
> > > > should be able to be strengthened, since each type of message on a
> > topic
> > > > can have its own schema, and compatibility can then be checked
> against
> > only
> > > > other schemas for the same type. Kafka does this via the concept of
> > > > "subjects" in the schema registry, and subjects default to just the
> > topic
> > > > name (plus a "-key" or "-value" suffix since keys and values can both
> > have
> > > > their own schemas), but can also include (via an injectable strategy)
> > the
> > > > message type. Compatibility is managed at the subject level.
> > > >
> > > > Is this something that should be addressed in this PIP, or in future
> > > > follow-on work? This is critical to supporting ordering across
> > different
> > > > message types, with schema compatibility verification by Pulsar.
> > > >
> > > > Regards,
> > > > Raman
> > > >
> > > >
> > > >
> > > > On 2019/09/03 05:12:32, 唐谊 <ss...@gmail.com> wrote:
> > > > > Hi all;
> > > > >
> > > > > I am drafting a proposal to support the producer to send messages
> > with
> > > > > different schema.
> > > > >
> > > > > ## Motivation
> > > > > For now, Pulsar producer can only produce messages of one type of
> > schema
> > > > > which is determined by user when it is created, or by fecthing the
> > latest
> > > > > version of schema from registry if AUTO_PRODUCE_BYTES type is
> > specified.
> > > > > Schema, however, can be updated by external system after producer
> > > > started,
> > > > > which would lead to inconsistency between messsage payload and
> schema
> > > > > version metadata. Also some senarios like replicating from kafka
> > require
> > > > a
> > > > > single producer for replicating messages of different schemas from
> > one
> > > > > Kafka partition to one Pulsar partition to guarantee the order and
> no
> > > > > duplicates.
> > > > >
> > > > > Here proposing that messages can indicate the associated schema by
> > > > itself,
> > > > > for more detail,
> > > > >
> > > >
> >
> https://gist.github.com/yittg/56c6dedf7509f634ec7effc4f6f3631d#file-pip-md
> > > > >
> > > > > Looking forward to any feedback.
> > > > >
> > > > > Thanks,
> > > > > Yi
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] PIP: Producer Send Message with Different Schema

Posted by Sijie Guo <gu...@gmail.com>.
Hi Raman,

The schema compatibility strategies were already there prior to PIP-43.

PIP-44 enhances the schema compatibility strategy support.

Both of the changes are already landed in 2.5.0 release.

Did you see any issues when you tryout this feature?

- Sijie

On Tue, Apr 14, 2020 at 8:35 AM rocketraman@gmail.com <ro...@gmail.com>
wrote:

> Now that PIP-43 is released in 2.5.0, I wanted to follow up on the
> messages below.
>
> What is remaining to be done in Pulsar to support having multiple
> different types on one topic in Pulsar? Yi indicates below that PIP-43 sets
> the stage for this, but that the schema compatibility implementation still
> would need some work.
>
> Would this require another PIP, or just an issue to track the work?
>
> Regards,
> Raman
>
> On 2019/09/16 01:32:39, Yi Tang <ss...@gmail.com> wrote:
> > Hi rarma,
> >
> > It's a great and important feature, I think. This PIP requires the
> > compatibility check from bottom registry only and doesn't touch the
> > implementation detail. I think we should address this feature in the
> > future, and this PIP provides the essential ability to implement it.
> >
> > Thanks,
> > Yi
> >
> > rocketraman@gmail.com <ro...@gmail.com> 于 2019年9月15日周日 22:36写道:
> >
> > > I see a mention of compatibility in the PIP but with no details.  The
> docs
> > > about schema compatibility state this:
> > >
> > > > Consequently, those events need to go in the same Pulsar partition to
> > > maintain order. This application can use ALWAYS_COMPATIBLE to allow
> > > different kinds of events co-exist in the same topic.
> > >
> > > With this PIP, this limitation can be relaxed, and schema compatibility
> > > should be able to be strengthened, since each type of message on a
> topic
> > > can have its own schema, and compatibility can then be checked against
> only
> > > other schemas for the same type. Kafka does this via the concept of
> > > "subjects" in the schema registry, and subjects default to just the
> topic
> > > name (plus a "-key" or "-value" suffix since keys and values can both
> have
> > > their own schemas), but can also include (via an injectable strategy)
> the
> > > message type. Compatibility is managed at the subject level.
> > >
> > > Is this something that should be addressed in this PIP, or in future
> > > follow-on work? This is critical to supporting ordering across
> different
> > > message types, with schema compatibility verification by Pulsar.
> > >
> > > Regards,
> > > Raman
> > >
> > >
> > >
> > > On 2019/09/03 05:12:32, 唐谊 <ss...@gmail.com> wrote:
> > > > Hi all;
> > > >
> > > > I am drafting a proposal to support the producer to send messages
> with
> > > > different schema.
> > > >
> > > > ## Motivation
> > > > For now, Pulsar producer can only produce messages of one type of
> schema
> > > > which is determined by user when it is created, or by fecthing the
> latest
> > > > version of schema from registry if AUTO_PRODUCE_BYTES type is
> specified.
> > > > Schema, however, can be updated by external system after producer
> > > started,
> > > > which would lead to inconsistency between messsage payload and schema
> > > > version metadata. Also some senarios like replicating from kafka
> require
> > > a
> > > > single producer for replicating messages of different schemas from
> one
> > > > Kafka partition to one Pulsar partition to guarantee the order and no
> > > > duplicates.
> > > >
> > > > Here proposing that messages can indicate the associated schema by
> > > itself,
> > > > for more detail,
> > > >
> > >
> https://gist.github.com/yittg/56c6dedf7509f634ec7effc4f6f3631d#file-pip-md
> > > >
> > > > Looking forward to any feedback.
> > > >
> > > > Thanks,
> > > > Yi
> > > >
> > >
> >
>