You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pulsar.apache.org by Asaf Mesika <as...@gmail.com> on 2023/03/01 13:17:59 UTC

Re: [DISCUSS] PIP-246: Improved PROTOBUF_NATIVE schema compatibility checks without using avro-protobuf

On Mon, Feb 27, 2023 at 3:47 PM SiNan Liu <li...@gmail.com> wrote:

> >
> > I read it and they look identical. What's the difference between them?
>
> Current avro,json, and protobuf schemas are all implemented based on AVRO.
> > What do you mean, they are all implemented based on Avro? You mean the
> > protobuf schema is converted into an Avro Schema, and then you use Avro
> > compatibility validation?
>
>
> `org.apache.pulsar.broker.service.schema.ProtobufSchemaCompatibilityCheck`
> `org.apache.pulsar.broker.service.schema.AvroSchemaCompatibilityCheck`
> `org.apache.pulsar.broker.service.schema.JsonSchemaCompatibilityCheck`
> They all extends `AvroSchemaBasedCompatibilityCheck`, the
> `checkCompatible()` is the same implementation with `AVRO`.
>

Can you please explain how a Protobuf Schema descriptor can be validated
for backward compatibility check using Avro based compatibility rules?
Doesn't it expect the schema to be Avro, but it is actually a Protobuf
descriptor?
Is there some translation happening?



>
>
> I think you should structure the validation rules differently:
>
>
> The Compatibility check strategy is described on the website
>
> https://pulsar.apache.org/docs/next/schema-understand/#schema-compatibility-check-strategy
> 1. BACKWARD(CanReadExistingStrategy): Consumers using schema V3 can process
> data written by producers using the last schema version V2. So V2 is
> "writtenSchema" and V3 is "readSchema".
> 2. FORWARD(CanBeReadByExistingStrategy): Consumers using the last schema
> version V2 can process data written by producers using a new schema V3,
> even though they may not be able to use the full capabilities of the new
> schema. So V3 is "writtenSchema" and V2 is "readSchema".
> 3. FULL(CanBeReadMutualStrategy): Schemas are both backward and forward
> compatible.
> Schema can evolve. The old version schema and the new version schema should
> be well understood.
>
>
I'm sorry - I don't understand.
I understand the different compatibility check strategies.
If you just spell them out here, then as you say, just translate the
Protobuf Descriptor into an Avro schema and run the Avro
compatibility validation, no?
I believe the answer is no, since you may want to verify different things
when it comes to Protobuf, which are different then Avro.

At the current state, I can't understand your design at all. Please help
clarify that.





>
> So each strategy should have its own section.
>
>
> The arguments of `canRead()` are writtenSchema and readSchema. As we've
> just described, we only need to change the order of arguments we pass to
> `canRead()`.
>
>
>
> Thanks,
> sinan
>
>
> Asaf Mesika <as...@gmail.com> 于2023年2月27日周一 20:49写道:
>
> > >
> > > And you can see the difference between ProtoBuf and ProtoBufNative:
> > >
> > > https://pulsar.apache.org/docs/next/schema-get-started/#protobufnative
> > >
> > > https://pulsar.apache.org/docs/next/schema-get-started/#protobuf
> > >
> >  I read it and they look identical. What's the difference between them?
> >
> > Current avro,json, and protobuf schemas are all implemented based on
> AVRO.
> >
> > What do you mean, they are all implemented based on Avro? You mean the
> > protobuf schema is converted into an Avro Schema, and then you use Avro
> > compatibility validation?
> >
> >
> > > *Here are the basic compatibility rules we've defined:*
> >
> >
> > I think you should structure the validation rules differently:
> >
> > * Backward checks
> > ** List down rules, where use newSchema (the schema used by producer or
> > consumer) and existingSchema (last schema used)
> > * Forward
> > ** List down rules, where use newSchema (the schema used by producer or
> > consumer) and existingSchema (last schema used)
> >
> > So each strategy should have its own section.
> >
> > I'm saying this since you used "writttenSchema" word but it represents
> > something completely different if it's backward or forward check.
> >
> > Once you'll have that structure like that, I personally will be able to
> > read and understand it.
> >
> >
> > The motivation and problem statement are now good - thanks for improving
> > it.
> >
> > On Mon, Feb 27, 2023 at 8:20 AM SiNan Liu <li...@gmail.com>
> wrote:
> >
> > > Hi! I updated the PIP issue again. This time I've added some background
> > and
> > > some explanations.
> > >
> > > The compatibility check rules are already written in the
> Implementation.
> > > ProtoBufNative implements the same canRead method as Apache Avro.
> > > It does this by checking whether the schema for writing and reading is
> > > compatible. I also indicate whether the writtenSchema and readSchema of
> > the
> > > Backward, Forward, and Full strategies are the old or the new version
> of
> > > the schema.
> > >
> > > Thanks,
> > > sinan
> > >
> > > Asaf Mesika <as...@gmail.com> 于2023年2月26日周日 23:24写道:
> > >
> > > > I'm sorry, but this PIP lacks a lot of background knowledge, so you
> > need
> > > to
> > > > add IMO for people to understand it. You don't need to explain the
> > entire
> > > > pulsar in this PIP, but at the very least a few paragraphs detailing
> > all
> > > > you need to know, to put you in context:
> > > >
> > > >
> > > >    - Start by saying Pulsar as a built-in schema registry inside
> Pulsar
> > > >    broker.
> > > >       - Every time the client updates the schema, it uploads it to
> the
> > > >       broker. When that happens, it has a feature which validates if
> > the
> > > > new
> > > >       schema version is compatible with the previous versions. There
> > > > are 4 types
> > > >       of compatibility: Full, ... (complete and explain each one
> > briefly)
> > > >    - Also explain Pulsar Schema registry supports various schema
> > > >    protocols:  Avro, protobuf native, ... (complete the rest), each
> > > > protocol
> > > >    has a schema which dictates how to serialize and deserialize the
> > > message
> > > >    content into typed object.
> > > >    - Explain in short what is protobuf native (compare protobuf
> > > non-native)
> > > >    - Please don't paste code instead of explaining.
> > > >       - Explain that protobuf native current validation check is only
> > > >       composed of checking the root message name is the same between
> > > > the current
> > > >       schema version and the new version.
> > > >          - Explain briefly what is a root message and its name.
> > > >       - Explain the problem (list scenarios) that we have because
> > > protobuf
> > > >       native schema only supports FULL compatibility validation.
> > > >
> > > >
> > > > Regarding high level design - as in what you plan to do.
> > > > I suggest you add "High Level Design" and in it detail how you plan
> to
> > > > validate, per protobuf version, per compatibility check (backward,
> > > forward,
> > > > full,...).
> > > > I tried reading the implementation - for me , it's all over the
> place.
> > > Can
> > > > you please list in order what I wrote above, and list the validation
> > > rules
> > > > with a good explanation why you validate it like that?
> > > >
> > > > Lastly, one you have all the validation rules clearly stated, you can
> > use
> > > > it to document it properly so users can know what validation to
> expect.
> > > >
> > > > Thanks,
> > > >
> > > > Asaf
> > > >
> > > >
> > > > On Wed, Feb 22, 2023 at 5:10 PM SiNan Liu <li...@gmail.com>
> > > wrote:
> > > >
> > > > > Sorry, my mistake. I removed the code and described the design to
> > > improve
> > > > > the PROTOBUF_NATIVE schema compatibility checks. You can have a
> look.
> > > 😊
> > > > >
> > > > > Asaf Mesika <as...@gmail.com> 于2023年2月22日周三 21:16写道:
> > > > >
> > > > > > I read it but you're almost directly diving into the code - it
> will
> > > > take
> > > > > me
> > > > > > hours just to reverse engineer your design.
> > > > > >
> > > > > > Can you please include a "High Level Design" section in which you
> > > > explain
> > > > > > how you plan to tackle any issue?
> > > > > > If I can read that section and explain to someone else how this
> > will
> > > > > work,
> > > > > > it means the section is complete.
> > > > > >
> > > > > > Let's leave the code to the PRs.
> > > > > >
> > > > > >
> > > > > > On Sun, Feb 19, 2023 at 2:59 PM SiNan Liu <
> liusinan1998@gmail.com>
> > > > > wrote:
> > > > > >
> > > > > > > Hi all,
> > > > > > >
> > > > > > > I made a PIP to discuss:
> > > > https://github.com/apache/pulsar/issues/19565
> > > > > .
> > > > > > >
> > > > > > > We can talk about the current design here. Especially for the
> > field
> > > > > type
> > > > > > > change check rules, please give your valuable advice.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Sinan
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] PIP-246: Improved PROTOBUF_NATIVE schema compatibility checks without using avro-protobuf

Posted by SiNan Liu <li...@gmail.com>.
Hello Enrico. Thanks for your suggestion, according to my understanding of
what you said "flag".
How about we add a configuration in the next release:

protoBufNativeSchemaValidatorClassName=org.apache.pulsar.broker.service.schema.validator.ProtobufNativeSchemaBreakValidatorImpl

Use the previous implementation if the configuration is empty (check only
the name of the root message). If there is a better third-party or official
solution in the future, develop a new "
ProtobufNativeSchemaBreakValidatorImpl " to give users a choice.
What do you think of this design? If there is a better third party or
official solution in the future, do you think the current pr implementation
should be retained or deleted?


Thanks,
sinan



Enrico Olivelli <eo...@gmail.com> 于 2023年3月2日周四 上午12:47写道:

> (I apologise for top posting)
>
> Would it be possible to add a flag to fallback to the previous behaviour ?
> I know that adding such flags is a burden but if the upgrade breaks
> some workflows then users won't be able to upgrade.
> We can add the flag in the next release and drop it in the next major
> release
>
> Enrico
>
> Il giorno mer 1 mar 2023 alle ore 15:33 SiNan Liu
> <li...@gmail.com> ha scritto:
> >
> > >
> > > Can you please explain how a Protobuf Schema descriptor can be
> validated
> > > for backward compatibility check using Avro based compatibility rules?
> > > Doesn't it expect the schema to be Avro, but it is actually a Protobuf
> > > descriptor?
> > > Is there some translation happening?
> >
> >
> > 1. *You can take a quick look at the previous design, the PROTOBUF uses
> > avro struct to store.*
> > https://github.com/apache/pulsar/pull/1954
> >
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/ProtobufSchema.java#L59-L61
> >
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/ProtobufSchema.java#L110-L115
> >
> > 2. *On the broker side, protobuf and avro both use `SchemaData` converted
> > to `org.apache.avro.Schema`.*
> >
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/ServerCnx.java#L1280-L1293
> >
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/schema/ProtobufSchemaCompatibilityCheck.java#L26-L31
> >
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/schema/AvroSchemaBasedCompatibilityCheck.java#L47-L70
> >
> >
> >
> > I'm sorry - I don't understand.
> > > I understand the different compatibility check strategies.
> > > If you just spell them out here, then as you say, just translate the
> > > Protobuf Descriptor into an Avro schema and run the Avro
> > > compatibility validation, no?
> > > I believe the answer is no, since you may want to verify different
> things
> > > when it comes to Protobuf, which are different then Avro.
> >
> >
> > 1.
> > *ProtobufSchema is different from ProtobufNativeSchema in that it uses
> > avro-protobuf.*
> >
> https://central.sonatype.com/artifact/org.apache.avro/avro-protobuf/1.11.1/overview
> > *ProtobufNativeSchema needs a native compatibility check, but there is no
> > official or third party implementation. So this PIP does not use
> > avro-protobuf for protobuf compatibility checking.*
> >
> > 2. *By the way, this is implemented in much the same way that Apache avro
> > does compatibility checking.*
> >
> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/SchemaValidatorBuilder.java
> > `canReadStrategy`,`canBeReadStrategy`,`mutualReadStrategy`
> >
> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/ValidateCanRead.java
> >
> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/ValidateCanBeRead.java
> >
> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/ValidateMutualRead.java
> > *In `ValidateMutualRead.java`, the arguments of `canRead()` are
> > writtenSchema and readSchema. We only need to change the order of
> arguments
> > we pass to `canRead()`.*
> > ```java
> > private void validateWithStrategy(Descriptors.Descriptor toValidate,
> > Descriptors.Descriptor fromDescriptor) throws
> ProtoBufCanReadCheckException
> > {
> > switch (strategy) {
> > case CanReadExistingStrategy -> canRead(fromDescriptor, toValidate);
> > case CanBeReadByExistingStrategy -> canRead(toValidate, fromDescriptor);
> > case CanBeReadMutualStrategy -> {
> > canRead(toValidate, fromDescriptor);
> > canRead(fromDescriptor, toValidate);
> > }
> > }
> > }
> >
> > private void canRead(Descriptors.Descriptor writtenSchema,
> > Descriptors.Descriptor readSchema) throws ProtoBufCanReadCheckException {
> >
> ProtobufNativeSchemaBreakCheckUtils.checkSchemaCompatibility(writtenSchema,
> > readSchema);
> > }
> > ```
> >
> >
> > Thanks,
> > sinan
> >
> >
> >
> > Asaf Mesika <as...@gmail.com> 于2023年3月1日周三 21:19写道:
> > >
> > > On Mon, Feb 27, 2023 at 3:47 PM SiNan Liu <li...@gmail.com>
> wrote:
> > >
> > > > >
> > > > > I read it and they look identical. What's the difference between
> them?
> > > >
> > > > Current avro,json, and protobuf schemas are all implemented based on
> > AVRO.
> > > > > What do you mean, they are all implemented based on Avro? You mean
> the
> > > > > protobuf schema is converted into an Avro Schema, and then you use
> > Avro
> > > > > compatibility validation?
> > > >
> > > >
> > > >
> >
> `org.apache.pulsar.broker.service.schema.ProtobufSchemaCompatibilityCheck`
> > > >
> `org.apache.pulsar.broker.service.schema.AvroSchemaCompatibilityCheck`
> > > >
> `org.apache.pulsar.broker.service.schema.JsonSchemaCompatibilityCheck`
> > > > They all extends `AvroSchemaBasedCompatibilityCheck`, the
> > > > `checkCompatible()` is the same implementation with `AVRO`.
> > > >
> > >
> > > Can you please explain how a Protobuf Schema descriptor can be
> validated
> > > for backward compatibility check using Avro based compatibility rules?
> > > Doesn't it expect the schema to be Avro, but it is actually a Protobuf
> > > descriptor?
> > > Is there some translation happening?
> > >
> > >
> > >
> > > >
> > > >
> > > > I think you should structure the validation rules differently:
> > > >
> > > >
> > > > The Compatibility check strategy is described on the website
> > > >
> > > >
> >
> https://pulsar.apache.org/docs/next/schema-understand/#schema-compatibility-check-strategy
> > > > 1. BACKWARD(CanReadExistingStrategy): Consumers using schema V3 can
> > process
> > > > data written by producers using the last schema version V2. So V2 is
> > > > "writtenSchema" and V3 is "readSchema".
> > > > 2. FORWARD(CanBeReadByExistingStrategy): Consumers using the last
> schema
> > > > version V2 can process data written by producers using a new schema
> V3,
> > > > even though they may not be able to use the full capabilities of the
> new
> > > > schema. So V3 is "writtenSchema" and V2 is "readSchema".
> > > > 3. FULL(CanBeReadMutualStrategy): Schemas are both backward and
> forward
> > > > compatible.
> > > > Schema can evolve. The old version schema and the new version schema
> > should
> > > > be well understood.
> > > >
> > > >
> > > I'm sorry - I don't understand.
> > > I understand the different compatibility check strategies.
> > > If you just spell them out here, then as you say, just translate the
> > > Protobuf Descriptor into an Avro schema and run the Avro
> > > compatibility validation, no?
> > > I believe the answer is no, since you may want to verify different
> things
> > > when it comes to Protobuf, which are different then Avro.
> > >
> > > At the current state, I can't understand your design at all. Please
> help
> > > clarify that.
> > >
> > >
> > >
> > >
> > >
> > > >
> > > > So each strategy should have its own section.
> > > >
> > > >
> > > > The arguments of `canRead()` are writtenSchema and readSchema. As
> we've
> > > > just described, we only need to change the order of arguments we
> pass to
> > > > `canRead()`.
> > > >
> > > >
> > > >
> > > > Thanks,
> > > > sinan
> > > >
> > > >
> > > > Asaf Mesika <as...@gmail.com> 于2023年2月27日周一 20:49写道:
> > > >
> > > > > >
> > > > > > And you can see the difference between ProtoBuf and
> ProtoBufNative:
> > > > > >
> > > > > >
> > https://pulsar.apache.org/docs/next/schema-get-started/#protobufnative
> > > > > >
> > > > > > https://pulsar.apache.org/docs/next/schema-get-started/#protobuf
> > > > > >
> > > > >  I read it and they look identical. What's the difference between
> > them?
> > > > >
> > > > > Current avro,json, and protobuf schemas are all implemented based
> on
> > > > AVRO.
> > > > >
> > > > > What do you mean, they are all implemented based on Avro? You mean
> the
> > > > > protobuf schema is converted into an Avro Schema, and then you use
> > Avro
> > > > > compatibility validation?
> > > > >
> > > > >
> > > > > > *Here are the basic compatibility rules we've defined:*
> > > > >
> > > > >
> > > > > I think you should structure the validation rules differently:
> > > > >
> > > > > * Backward checks
> > > > > ** List down rules, where use newSchema (the schema used by
> producer
> > or
> > > > > consumer) and existingSchema (last schema used)
> > > > > * Forward
> > > > > ** List down rules, where use newSchema (the schema used by
> producer
> > or
> > > > > consumer) and existingSchema (last schema used)
> > > > >
> > > > > So each strategy should have its own section.
> > > > >
> > > > > I'm saying this since you used "writttenSchema" word but it
> represents
> > > > > something completely different if it's backward or forward check.
> > > > >
> > > > > Once you'll have that structure like that, I personally will be
> able
> > to
> > > > > read and understand it.
> > > > >
> > > > >
> > > > > The motivation and problem statement are now good - thanks for
> > improving
> > > > > it.
> > > > >
> > > > > On Mon, Feb 27, 2023 at 8:20 AM SiNan Liu <li...@gmail.com>
> > > > wrote:
> > > > >
> > > > > > Hi! I updated the PIP issue again. This time I've added some
> > background
> > > > > and
> > > > > > some explanations.
> > > > > >
> > > > > > The compatibility check rules are already written in the
> > > > Implementation.
> > > > > > ProtoBufNative implements the same canRead method as Apache Avro.
> > > > > > It does this by checking whether the schema for writing and
> reading
> > is
> > > > > > compatible. I also indicate whether the writtenSchema and
> > readSchema of
> > > > > the
> > > > > > Backward, Forward, and Full strategies are the old or the new
> > version
> > > > of
> > > > > > the schema.
> > > > > >
> > > > > > Thanks,
> > > > > > sinan
> > > > > >
> > > > > > Asaf Mesika <as...@gmail.com> 于2023年2月26日周日 23:24写道:
> > > > > >
> > > > > > > I'm sorry, but this PIP lacks a lot of background knowledge, so
> > you
> > > > > need
> > > > > > to
> > > > > > > add IMO for people to understand it. You don't need to explain
> the
> > > > > entire
> > > > > > > pulsar in this PIP, but at the very least a few paragraphs
> > detailing
> > > > > all
> > > > > > > you need to know, to put you in context:
> > > > > > >
> > > > > > >
> > > > > > >    - Start by saying Pulsar as a built-in schema registry
> inside
> > > > Pulsar
> > > > > > >    broker.
> > > > > > >       - Every time the client updates the schema, it uploads
> it to
> > > > the
> > > > > > >       broker. When that happens, it has a feature which
> validates
> > if
> > > > > the
> > > > > > > new
> > > > > > >       schema version is compatible with the previous versions.
> > There
> > > > > > > are 4 types
> > > > > > >       of compatibility: Full, ... (complete and explain each
> one
> > > > > briefly)
> > > > > > >    - Also explain Pulsar Schema registry supports various
> schema
> > > > > > >    protocols:  Avro, protobuf native, ... (complete the rest),
> > each
> > > > > > > protocol
> > > > > > >    has a schema which dictates how to serialize and deserialize
> > the
> > > > > > message
> > > > > > >    content into typed object.
> > > > > > >    - Explain in short what is protobuf native (compare protobuf
> > > > > > non-native)
> > > > > > >    - Please don't paste code instead of explaining.
> > > > > > >       - Explain that protobuf native current validation check
> is
> > only
> > > > > > >       composed of checking the root message name is the same
> > between
> > > > > > > the current
> > > > > > >       schema version and the new version.
> > > > > > >          - Explain briefly what is a root message and its name.
> > > > > > >       - Explain the problem (list scenarios) that we have
> because
> > > > > > protobuf
> > > > > > >       native schema only supports FULL compatibility
> validation.
> > > > > > >
> > > > > > >
> > > > > > > Regarding high level design - as in what you plan to do.
> > > > > > > I suggest you add "High Level Design" and in it detail how you
> > plan
> > > > to
> > > > > > > validate, per protobuf version, per compatibility check
> (backward,
> > > > > > forward,
> > > > > > > full,...).
> > > > > > > I tried reading the implementation - for me , it's all over the
> > > > place.
> > > > > > Can
> > > > > > > you please list in order what I wrote above, and list the
> > validation
> > > > > > rules
> > > > > > > with a good explanation why you validate it like that?
> > > > > > >
> > > > > > > Lastly, one you have all the validation rules clearly stated,
> you
> > can
> > > > > use
> > > > > > > it to document it properly so users can know what validation to
> > > > expect.
> > > > > > >
> > > > > > > Thanks,
> > > > > > >
> > > > > > > Asaf
> > > > > > >
> > > > > > >
> > > > > > > On Wed, Feb 22, 2023 at 5:10 PM SiNan Liu <
> liusinan1998@gmail.com>
> > > > > > wrote:
> > > > > > >
> > > > > > > > Sorry, my mistake. I removed the code and described the
> design
> > to
> > > > > > improve
> > > > > > > > the PROTOBUF_NATIVE schema compatibility checks. You can
> have a
> > > > look.
> > > > > >
> > > > > > > >
> > > > > > > > Asaf Mesika <as...@gmail.com> 于2023年2月22日周三 21:16写道:
> > > > > > > >
> > > > > > > > > I read it but you're almost directly diving into the code
> - it
> > > > will
> > > > > > > take
> > > > > > > > me
> > > > > > > > > hours just to reverse engineer your design.
> > > > > > > > >
> > > > > > > > > Can you please include a "High Level Design" section in
> which
> > you
> > > > > > > explain
> > > > > > > > > how you plan to tackle any issue?
> > > > > > > > > If I can read that section and explain to someone else how
> > this
> > > > > will
> > > > > > > > work,
> > > > > > > > > it means the section is complete.
> > > > > > > > >
> > > > > > > > > Let's leave the code to the PRs.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Sun, Feb 19, 2023 at 2:59 PM SiNan Liu <
> > > > liusinan1998@gmail.com>
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi all,
> > > > > > > > > >
> > > > > > > > > > I made a PIP to discuss:
> > > > > > > https://github.com/apache/pulsar/issues/19565
> > > > > > > > .
> > > > > > > > > >
> > > > > > > > > > We can talk about the current design here. Especially for
> > the
> > > > > field
> > > > > > > > type
> > > > > > > > > > change check rules, please give your valuable advice.
> > > > > > > > > >
> > > > > > > > > > Thanks,
> > > > > > > > > > Sinan
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
>

Re: [DISCUSS] PIP-246: Improved PROTOBUF_NATIVE schema compatibility checks without using avro-protobuf

Posted by SiNan Liu <li...@gmail.com>.
Hi, Asaf.

1. I guess there's no right or wrong way to code. Your implementation also
uses concepts like canRead, writtenSchema, and readSchema, similar to mine.
You just got rid of the builder and left the validator, though there are
duplicate blocks of code, but I think I can improve on that as well.
I feel that there is no hidden logic in my design, the logic is very clear,
and the logic of the builder and validator is very clear.
But let's talk about bo's question first. If not in a
ProtobufNativeSchemaCompatibilityCheck different vadlitor extension. I can
accept both your design and mine. I don't care how I do it, I don't have to
find common behavior in builder. The builder is not going to be useful, so
we can delete it.
2. According to the question I posed above. Implement a new
ProtobufNativeAdvancedSchemaCompatibilityCheck bo think is better, Rather
than in a ProtobufNativeSchemaCompatibilityCheck extension different
validator implementation.

Sorry I can not use the computer and network in the company, I use my
mobile phone to reply to the email, the format may be a bit messy. Please
understand.

Thanks,
sinan


SiNan Liu <li...@gmail.com> 于 2023年3月8日周三 下午3:53写道:

> Hi, bo.
>
> 1. I understand what you say, to develop a new
> `ProtobufNativeAdvancedSchemaCompatibilityCheck`, rather than changing
> existing `ProtobufNativeSchemaCompatibilityCheck`. But I found a few small
> problems:
>
> (1)ProtobufNativeAdvancedSchemaCompatibilityCheck and
> ProtobufNativeSchemaCompatibilityCheck schemaType is PROTOBUF_NATIVE. It
> looks like both checkers are PROTOBUF not using AVRO-PROTOBUF's "native"
> implementation, which leads to some problems or "unreasonable" and gives me
> some extended thinking and questions.
>
> (2)In broker.conf
>
> `schemaRegistryCompatibilityCheckers`. If
> ProtobufNativeSchemaCompatibilityCheck and
> ProtobufNativeAdvancedSchemaCompatibilityCheck all set. This is going to
> overwrite each other. Because this is a map:
>
>
> https://github.com/apache/pulsar/blob/af1360fb167c1f9484fda5771df3ea9b21d1440b/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/schema/SchemaRegistryService.java#L36-L44
>
> ```java
>
> Map<SchemaType, SchemaCompatibilityCheck> checkers = new HashMap<>();
>
> for (String className : checkerClasses) {
>
> SchemaCompatibilityCheck schemaCompatibilityCheck =
> Reflections.createInstance(className,
>
> SchemaCompatibilityCheck.class,
> Thread.currentThread().getContextClassLoader());
>
> checkers.put(schemaCompatibilityCheck.getSchemaType(),
> schemaCompatibilityCheck);
>
> ```
>
> Is this a big problem or a small one? Is it possible or unnecessary? Maybe
> we can write in the documentation that protobufNative checkers can only
> choose one of the two? Why are there two Checkers for different
> implementations of the same schemaType? Why not the checker to create
> different validator, so we don not have to change
> schemaRegistryCompatibilityCheckers.
>
> (3)And after the update to ProtobufNativeAdvancedSchemaCompatibilityCheck.
> Existing topics previously only checked the name of the root message, not
> the content of protobuf.
>
> What if the user wants both Checkers?
>
> Set to ProtobufNativeAdvancedSchemaCompatibilityCheck, affect the topic of
> the existing schema?
>
> Older topics still use the old checker, and newer topics or certain older
> topics use the new advancedchecker.
>
> (4)So should we have one schemaType for a checker? protobufNativeChecker
> can have as many different implementation classes as possible. This
> classname configuration in PIP, let's see if it can be set at the topic
> level. In the current PIP design I just load this parameter into the
> checker when the broker is started and the checkers map is set up. Can I do
> this in the new normal pr if I want to support topic level? Or perfect it
> here?
>
> Add a call PROTOBUF_NATIVE_ADVANCE schemaType corresponding
> ProtobufNativeAdvancedSchemaCompatibilityCheck? (Seems to be more trouble).
>
> Sorry I can not use the computer and network in the company, I use my
> mobile phone to reply to the email, the format may be a bit messy. Please
> understand.
>
> Thanks,
>
> sinan
>
>
> 丛搏 <bo...@apache.org> 于 2023年3月7日周二 下午11:39写道:
>
>> SiNan Liu <li...@gmail.com> 于2023年3月7日周二 13:22写道:
>> >
>> > Great to see your comment, bo!
>> >
>> > 1. The first way. The protobuf website has a description of the rules,
>> but
>> > no plans to implement them.
>> > https://protobuf.dev/programming-guides/proto/#updating
>>
>> https://groups.google.com/g/protobuf
>> maybe ask here
>>
>> >
>> > 2. I think this PIP can be divided into two parts.
>> > (1) Add a flag(`ValidatorClassName`), load it into
>> > `ProtobufNativeSchemaCompatibilityCheck` when the broker starts.
>> > ValidatorClassName is empty by default, and the implementation
>> continues as
>> > before, with no change for the user.
>>
>> `ProtobufNativeSchemaCompatibilityCheck` is a plugin in `broker.conf`
>> ```
>>
>> schemaRegistryCompatibilityCheckers=org.apache.pulsar.broker.service.schema.JsonSchemaCompatibilityCheck,org.apache.pulsar.broker.service.schema.AvroSchemaCompatibilityCheck,org.apache.pulsar.broker.service.schema.ProtobufNativeSchemaCompatibilityCheck
>> ```
>> I do not recommend that we directly modify this plugin and continue to
>> add configuration items, which will cause trouble for users.
>> We have a lot of configs and it's getting very unwieldy.
>> in my opinion, we don't change
>>
>> `org.apache.pulsar.broker.service.schema.ProtobufNativeSchemaCompatibilityCheck`,
>> it is a simple implementation, it doesn't go wrong very often, most
>> users will use it. we can add another ProtobufNativeCheck named
>> `ProtobufNativeAdvancedSchemaCompatibilityCheck ` or other. in this
>> way, we don't need to add this flag. There is no need to consider
>> compatibility, it is just a plug-in and will not affect current logic.
>> If the user needs it, just change the plugin to the new implementation
>>
>> > ```java
>> >     ProtobufNativeSchemaValidator DEFAULT = (fromDescriptors,
>> toDescriptor)
>> > -> {
>> >         for (Descriptors.Descriptor fromDescriptor : fromDescriptors) {
>> >             // The default implementation only checks if the root
>> message
>> > has changed.
>> >             if
>> > (!fromDescriptor.getFullName().equals(toDescriptor.getFullName())) {
>> >                 throw new ProtoBufCanReadCheckException("Protobuf root
>> > message isn't allow change!");
>> >             }
>> >         }
>> >     };
>> > ```
>> > `ValidatorClassName` value also can be set to the current
>> implementation of
>> > PIP add
>> >
>> `org.apache.pulsar.broker.service.schema.validator.ProtobufNativeSchemaBreakValidatorImpl`.
>> >
>> > (2) Recoding the `ProtobufNativeSchemaCompatibilityCheck`. Through the
>> flag
>> > (`ValidatorClassName`) to build different
>> `ProtobufNativeSchemaValidator`.
>> > Isn't it just a plug-in? The user can develop and choose a different
>> > `ProtobufNativeSchemaValidator`. I think it didn't change the logic, it
>> > just allowed him to expand it.
>> >
>> >
>> > I think this PIP should be an enhancement and supplement to the
>> function,
>> > and there is no such thing as unnecessary and meaningless.
>> >
>> >
>> > Thanks,
>> > sinan
>> >
>> >
>> >
>> >
>> >
>> > 丛搏 <bo...@apache.org> 于2023年3月7日周二 11:53写道:
>> >
>> > > I think we have two ways to do that.
>> > >
>> > > First way: We need to advance the improvement of java in protobuf. Ask
>> > > if they have plans to improve.
>> > >
>> > > Second way: the new PROTOBUF_NATIVE `SchemaCompatibilityCheck` should
>> > > be implemented as a plugin, don't change any existing plugin logic
>> > > (it's simple and already used). I don't recommend adding flags for
>> > > rollback, it adds configuration and makes little sense.
>> > >
>> > > Thanks,
>> > > Bo
>> > >
>> > > Asaf Mesika <as...@gmail.com> 于2023年3月6日周一 23:00写道:
>> > >
>> > > >
>> > > > Can you convert the code block which is actually a quote in the
>> > > > beginning of the PIP to something which doesn't require to scroll
>> > > > horizontally so much?
>> > > > Use
>> > > >
>> > >
>> https://docs.github.com/en/get-started/writing-on-github/getting-started-with-writing-and-formatting-on-github/basic-writing-and-formatting-syntax#quoting-text
>> > > >
>> > > > Let's improve the clarity of what you wrote:
>> > > >
>> > > > "the PROTOBUF uses avro struct to store."
>> > > > -->
>> > > > When Schema type PROTOBUF is used, Pulsar Client assumes the object
>> given
>> > > > to it as message data is an auto-generated POJO containing the
>> > > annotations
>> > > > encoding the schema. The client is using a converter, which
>> converts a
>> > > > Protobuf schema descriptor into an Avro schema and sends that as the
>> > > Schema
>> > > > of the producer/consumer.
>> > > >
>> > > > "On the broker side, protobuf and avro both use SchemaData
>> converted to
>> > > > org.apache.avro.Schema."
>> > > > -->
>> > > > Since the schema is an Avro schema, the implementation of
>> compatibility
>> > > > check on the broker side is to simply re-use the compatibility
>> check of
>> > > the
>> > > > AVRO schema type.
>> > > >
>> > > > "ProtobufSchema is different from ProtobufNativeSchema in schema
>> > > > compatibility check it uses avro-protobuf.
>> > > >
>> > >
>> https://central.sonatype.com/artifact/org.apache.avro/avro-protobuf/1.11.1/overview
>> > > > But the current implementation of ProtobufNative schema
>> compatibility
>> > > > check only
>> > > > checked if the root message name is changed."
>> > > >
>> > > > -->
>> > > > PROTOBUF_NATIVE schema type is different.
>> > > > The client is actually using Protobuf Descriptor as the schema, as
>> > > opposed
>> > > > to Avro schema of PROTOBUF schema type. In the broker, the
>> > > PROTOBUF_NATIVE
>> > > > compatibility check actually hasn't implemented any rule, besides
>> one:
>> > > > checking if the root message name has changed.
>> > > >
>> > > >
>> > > >
>> > > > >    1. For now, there is no official or third-party solution for
>> > > ProtoBuf
>> > > > >    compatibility. If in the future have better solutions of a
>> third
>> > > party or
>> > > > >    the official, we develop new ProtobufNativeSchemaValidator and
>> use,
>> > > so
>> > > > >    add a flag.
>> > > > >
>> > > > > Who do you need to make that configurable? Once you found a third
>> > > party,
>> > > > just switch to it? Who knows, maybe you never will. Introduce it
>> when you
>> > > > find it, not now.
>> > > >
>> > > >
>> > > > We improve in ProtobufNativeSchemaCompatibilityCheck BACKWARD,
>> FORWARD
>> > > > > these strategies. As with the AVRO implementation, protobuf
>> > > compatibility
>> > > > > checking need implementing the canRead method. *This will check
>> that
>> > > > > the writtenschema can be read by readSchema.*
>> > > >
>> > > >
>> > > > I completely disagree.
>> > > > Avro implementation is confusing for our use case. Don't copy that.
>> > > >
>> > > > You have
>> > > >
>> > > > public void checkCompatible(SchemaData from, SchemaData to,
>> > > > SchemaCompatibilityStrategy strategy)
>> > > >         throws IncompatibleSchemaException {
>> > > >     Descriptor fromDescriptor =
>> > > > ProtobufNativeSchemaUtils.deserialize(from.getData());
>> > > >     Descriptor toDescriptor =
>> > > > ProtobufNativeSchemaUtils.deserialize(to.getData());
>> > > >     switch (strategy) {
>> > > >         case BACKWARD_TRANSITIVE:
>> > > >         case BACKWARD:
>> > > >         case FORWARD_TRANSITIVE:
>> > > >         case FORWARD:
>> > > >         case FULL_TRANSITIVE:
>> > > >         case FULL:
>> > > >             checkRootMessageChange(fromDescriptor, toDescriptor,
>> > > strategy);
>> > > >             return;
>> > > >         case ALWAYS_COMPATIBLE:
>> > > >             return;
>> > > >         default:
>> > > >             throw new IncompatibleSchemaException("Unknown
>> > > > SchemaCompatibilityStrategy.");
>> > > >     }
>> > > > }
>> > > >
>> > > > I would rename :
>> > > > from --> currentSchema
>> > > > to --> newSchema
>> > > >
>> > > > Use that switch case and have a method for each like:
>> > > > validateBackwardsCompatibility(currentSchema, newSchema)
>> > > >
>> > > > I dislike canRead and usage of writtenSchema, since you have two
>> > > completely
>> > > > different use cases: from the producing side and the consumer side.
>> > > >
>> > > > schemaValidatorBuilder
>> > > > >
>> > > > > I dislike this proposal. IMO Avro implementation is way too
>> > > complicated.
>> > > > Why not have a simple function for validation for each switch case
>> above?
>> > > > Why do we need strategy and builder, and all this complexity?
>> > > >
>> > > >
>> > > > *Here are the basic compatibility rules we've defined:*
>> > > >
>> > > >
>> > > > IMO it's impossible to read the validation rules as you described
>> them.
>> > > > I wrote how they should be structured numerous times above.
>> > > > I can't validate them.
>> > > >
>> > > >
>> > > > IMO, the current design is very hard to read.
>> > > > Please try to avoid jumping into code sections.
>> > > > Write a high level design section, in which you describe in words
>> what
>> > > you
>> > > > plan to do.
>> > > > Write the validation rules in the structure that is easy to
>> understand:
>> > > > rules per each compatibility check, and use proper words (current
>> schema,
>> > > > new schema), since new schema can be once used for read and once
>> used for
>> > > > write.
>> > > >
>> > > > In its current form it takes too much time to understand the
>> design, and
>> > > it
>> > > > shouldn't be the case.
>> > > >
>> > > > Thanks,
>> > > >
>> > > > Asaf
>> > > >
>> > > >
>> > > > >
>> > > >
>> > > >
>> > > >
>> > > > On Sun, Mar 5, 2023 at 3:58 PM SiNan Liu <li...@gmail.com>
>> wrote:
>> > > >
>> > > > > Hi! I updated the explanation of some things in the PIP issue.
>> And also
>> > > > > added a new “flag” in the conf is used as the different
>> > > > > ProtobufNativeSchemaValidator implementation, also set
>> > > > > ProtobufNativeSchemaValidator default only check whether the name
>> of
>> > > the
>> > > > > root message is the same.
>> > > > >
>> > > > >
>> > > > > Thanks,
>> > > > > sinan
>> > > > >
>> > > > >
>> > > > > Asaf Mesika <as...@gmail.com> 于2023年3月5日周日 20:21写道:
>> > > > >
>> > > > > > On Wed, Mar 1, 2023 at 4:33 PM SiNan Liu <
>> liusinan1998@gmail.com>
>> > > wrote:
>> > > > > >
>> > > > > > > >
>> > > > > > > > Can you please explain how a Protobuf Schema descriptor can
>> be
>> > > > > > validated
>> > > > > > > > for backward compatibility check using Avro based
>> compatibility
>> > > > > rules?
>> > > > > > > > Doesn't it expect the schema to be Avro, but it is actually
>> a
>> > > > > Protobuf
>> > > > > > > > descriptor?
>> > > > > > > > Is there some translation happening?
>> > > > > > >
>> > > > > > >
>> > > > > > > 1. *You can take a quick look at the previous design, the
>> PROTOBUF
>> > > uses
>> > > > > > > avro struct to store.*
>> > > > > > > https://github.com/apache/pulsar/pull/1954
>> > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > >
>> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/ProtobufSchema.java#L59-L61
>> > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > >
>> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/ProtobufSchema.java#L110-L115
>> > > > > >
>> > > > > >
>> > > > > > Ok. So to summarize your code (easier to write it than send
>> links):
>> > > > > > * Pulsar Client, when used with Protobuf Schema, actually
>> converts
>> > > the
>> > > > > > Protobuf descriptor into an Avro Schema (using code found
>> inside Avro
>> > > > > > library) and saves that Avro schema as the schema. It's not
>> saving
>> > > the
>> > > > > > protobuf descriptor at all. Very confusing I have to add - never
>> > > expected
>> > > > > > that.
>> > > > > > This explains why In the ProtobufSchemaCompatibilityCheck they
>> just
>> > > > > extend
>> > > > > > the Avro without doing any translation.
>> > > > > >
>> > > > > > Thanks for that.
>> > > > > >
>> > > > > > Now thatI finally understand this, I can say that: you *must*
>> explain
>> > > > > that
>> > > > > > in the motivation part in your PIP.
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > > > 2. *On the broker side, protobuf and avro both use
>> `SchemaData`
>> > > > > converted
>> > > > > > > to `org.apache.avro.Schema`.*
>> > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > >
>> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/ServerCnx.java#L1280-L1293
>> > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > >
>> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/schema/ProtobufSchemaCompatibilityCheck.java#L26-L31
>> > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > >
>> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/schema/AvroSchemaBasedCompatibilityCheck.java#L47-L70
>> > > > > >
>> > > > > >
>> > > > > > Actually those links don't really help.
>> > > > > > The main link that helps is:
>> > > > > >
>> > > > > >
>> > > > >
>> > >
>> https://github.com/apache/pulsar/blob/ec102fb024a6ea2b195826778300f20e330dff06/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/ProtobufSchema.java#L102-L122
>> > > > > >
>> > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > > > I'm sorry - I don't understand.
>> > > > > > > > I understand the different compatibility check strategies.
>> > > > > > > > If you just spell them out here, then as you say, just
>> translate
>> > > the
>> > > > > > > > Protobuf Descriptor into an Avro schema and run the Avro
>> > > > > > > > compatibility validation, no?
>> > > > > > > > I believe the answer is no, since you may want to verify
>> > > different
>> > > > > > things
>> > > > > > > > when it comes to Protobuf, which are different then Avro.
>> > > > > > >
>> > > > > > >
>> > > > > > > 1.
>> > > > > > > *ProtobufSchema is different from ProtobufNativeSchema in
>> that it
>> > > uses
>> > > > > > > avro-protobuf.*
>> > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > >
>> https://central.sonatype.com/artifact/org.apache.avro/avro-protobuf/1.11.1/overview
>> > > > > > > *ProtobufNativeSchema needs a native compatibility check, but
>> > > there is
>> > > > > no
>> > > > > > > official or third party implementation. So this PIP does not
>> use
>> > > > > > > avro-protobuf for protobuf compatibility checking.*
>> > > > > > >
>> > > > > > > 2. *By the way, this is implemented in much the same way that
>> > > Apache
>> > > > > avro
>> > > > > > > does compatibility checking.*
>> > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > >
>> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/SchemaValidatorBuilder.java
>> > > > > > > `canReadStrategy`,`canBeReadStrategy`,`mutualReadStrategy`
>> > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > >
>> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/ValidateCanRead.java
>> > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > >
>> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/ValidateCanBeRead.java
>> > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > >
>> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/ValidateMutualRead.java
>> > > > > > > *In `ValidateMutualRead.java`, the arguments of `canRead()`
>> are
>> > > > > > > writtenSchema and readSchema. We only need to change the
>> order of
>> > > > > > arguments
>> > > > > > > we pass to `canRead()`.*
>> > > > > > > ```java
>> > > > > > > private void validateWithStrategy(Descriptors.Descriptor
>> > > toValidate,
>> > > > > > > Descriptors.Descriptor fromDescriptor) throws
>> > > > > > ProtoBufCanReadCheckException
>> > > > > > > {
>> > > > > > > switch (strategy) {
>> > > > > > > case CanReadExistingStrategy -> canRead(fromDescriptor,
>> > > toValidate);
>> > > > > > > case CanBeReadByExistingStrategy -> canRead(toValidate,
>> > > > > fromDescriptor);
>> > > > > > > case CanBeReadMutualStrategy -> {
>> > > > > > > canRead(toValidate, fromDescriptor);
>> > > > > > > canRead(fromDescriptor, toValidate);
>> > > > > > > }
>> > > > > > > }
>> > > > > > > }
>> > > > > > >
>> > > > > > > private void canRead(Descriptors.Descriptor writtenSchema,
>> > > > > > > Descriptors.Descriptor readSchema) throws
>> > > > > ProtoBufCanReadCheckException {
>> > > > > > >
>> > > > > >
>> > > > >
>> > >
>> ProtobufNativeSchemaBreakCheckUtils.checkSchemaCompatibility(writtenSchema,
>> > > > > > > readSchema);
>> > > > > > > }
>> > > > > > > ```
>> > > > > > >
>> > > > > > >
>> > > > > > I get that you want to take inspiration from the existing Avro
>> Schema
>> > > > > > compatibility check, to do your code design.
>> > > > > > I also understand you *won't* use any existing avro code for
>> that.
>> > > > > > I also understand, you have to write the validation check on
>> your
>> > > own,
>> > > > > > since there is no 3rd party to explain that.
>> > > > > >
>> > > > > > The only thing I can't understand are the actual rules as I
>> wrote
>> > > before,
>> > > > > > since they are written confusingly.
>> > > > > > So, I repeat what I asked before:
>> > > > > >
>> > > > > > I think you should structure the validation rules differently:
>> > > > > >
>> > > > > > * Backward checks
>> > > > > > ** List down rules, where use newSchema (the schema used by
>> producer
>> > > or
>> > > > > > consumer) and existingSchema (last schema used)
>> > > > > > * Forward
>> > > > > > ** List down rules, where use newSchema (the schema used by
>> producer
>> > > or
>> > > > > > consumer) and existingSchema (last schema used)
>> > > > > >
>> > > > > > Once that's accomplished I will be able to understand the
>> different
>> > > > > > validation rules for each compatibility check.
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > > >
>> > > > > > > Thanks,
>> > > > > > > sinan
>> > > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > > > Asaf Mesika <as...@gmail.com> 于2023年3月1日周三 21:19写道:
>> > > > > > > >
>> > > > > > > > On Mon, Feb 27, 2023 at 3:47 PM SiNan Liu <
>> > > liusinan1998@gmail.com>
>> > > > > > > wrote:
>> > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > > I read it and they look identical. What's the difference
>> > > between
>> > > > > > > them?
>> > > > > > > > >
>> > > > > > > > > Current avro,json, and protobuf schemas are all
>> implemented
>> > > based
>> > > > > on
>> > > > > > > AVRO.
>> > > > > > > > > > What do you mean, they are all implemented based on
>> Avro? You
>> > > > > mean
>> > > > > > > the
>> > > > > > > > > > protobuf schema is converted into an Avro Schema, and
>> then
>> > > you
>> > > > > use
>> > > > > > > Avro
>> > > > > > > > > > compatibility validation?
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > >
>> `org.apache.pulsar.broker.service.schema.ProtobufSchemaCompatibilityCheck`
>> > > > > > > > >
>> > > > > >
>> > > `org.apache.pulsar.broker.service.schema.AvroSchemaCompatibilityCheck`
>> > > > > > > > >
>> > > > > >
>> > > `org.apache.pulsar.broker.service.schema.JsonSchemaCompatibilityCheck`
>> > > > > > > > > They all extends `AvroSchemaBasedCompatibilityCheck`, the
>> > > > > > > > > `checkCompatible()` is the same implementation with
>> `AVRO`.
>> > > > > > > > >
>> > > > > > > >
>> > > > > > > > Can you please explain how a Protobuf Schema descriptor can
>> be
>> > > > > > validated
>> > > > > > > > for backward compatibility check using Avro based
>> compatibility
>> > > > > rules?
>> > > > > > > > Doesn't it expect the schema to be Avro, but it is actually
>> a
>> > > > > Protobuf
>> > > > > > > > descriptor?
>> > > > > > > > Is there some translation happening?
>> > > > > > > >
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > > I think you should structure the validation rules
>> differently:
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > > The Compatibility check strategy is described on the
>> website
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > >
>> https://pulsar.apache.org/docs/next/schema-understand/#schema-compatibility-check-strategy
>> > > > > > > > > 1. BACKWARD(CanReadExistingStrategy): Consumers using
>> schema
>> > > V3 can
>> > > > > > > process
>> > > > > > > > > data written by producers using the last schema version
>> V2. So
>> > > V2
>> > > > > is
>> > > > > > > > > "writtenSchema" and V3 is "readSchema".
>> > > > > > > > > 2. FORWARD(CanBeReadByExistingStrategy): Consumers using
>> the
>> > > last
>> > > > > > > schema
>> > > > > > > > > version V2 can process data written by producers using a
>> new
>> > > schema
>> > > > > > V3,
>> > > > > > > > > even though they may not be able to use the full
>> capabilities
>> > > of
>> > > > > the
>> > > > > > > new
>> > > > > > > > > schema. So V3 is "writtenSchema" and V2 is "readSchema".
>> > > > > > > > > 3. FULL(CanBeReadMutualStrategy): Schemas are both
>> backward and
>> > > > > > forward
>> > > > > > > > > compatible.
>> > > > > > > > > Schema can evolve. The old version schema and the new
>> version
>> > > > > schema
>> > > > > > > should
>> > > > > > > > > be well understood.
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > I'm sorry - I don't understand.
>> > > > > > > > I understand the different compatibility check strategies.
>> > > > > > > > If you just spell them out here, then as you say, just
>> translate
>> > > the
>> > > > > > > > Protobuf Descriptor into an Avro schema and run the Avro
>> > > > > > > > compatibility validation, no?
>> > > > > > > > I believe the answer is no, since you may want to verify
>> > > different
>> > > > > > things
>> > > > > > > > when it comes to Protobuf, which are different then Avro.
>> > > > > > > >
>> > > > > > > > At the current state, I can't understand your design at all.
>> > > Please
>> > > > > > help
>> > > > > > > > clarify that.
>> > > > > > > >
>> > > > > > > >
>> > > > > > > >
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > >
>> > > > > > > > > So each strategy should have its own section.
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > > The arguments of `canRead()` are writtenSchema and
>> readSchema.
>> > > As
>> > > > > > we've
>> > > > > > > > > just described, we only need to change the order of
>> arguments
>> > > we
>> > > > > pass
>> > > > > > > to
>> > > > > > > > > `canRead()`.
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > > Thanks,
>> > > > > > > > > sinan
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > > Asaf Mesika <as...@gmail.com> 于2023年2月27日周一
>> 20:49写道:
>> > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > > And you can see the difference between ProtoBuf and
>> > > > > > ProtoBufNative:
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > >
>> > >
>> https://pulsar.apache.org/docs/next/schema-get-started/#protobufnative
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > https://pulsar.apache.org/docs/next/schema-get-started/#protobuf
>> > > > > > > > > > >
>> > > > > > > > > >  I read it and they look identical. What's the
>> difference
>> > > between
>> > > > > > > them?
>> > > > > > > > > >
>> > > > > > > > > > Current avro,json, and protobuf schemas are all
>> implemented
>> > > based
>> > > > > > on
>> > > > > > > > > AVRO.
>> > > > > > > > > >
>> > > > > > > > > > What do you mean, they are all implemented based on
>> Avro? You
>> > > > > mean
>> > > > > > > the
>> > > > > > > > > > protobuf schema is converted into an Avro Schema, and
>> then
>> > > you
>> > > > > use
>> > > > > > > Avro
>> > > > > > > > > > compatibility validation?
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > > > *Here are the basic compatibility rules we've
>> defined:*
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > > I think you should structure the validation rules
>> > > differently:
>> > > > > > > > > >
>> > > > > > > > > > * Backward checks
>> > > > > > > > > > ** List down rules, where use newSchema (the schema
>> used by
>> > > > > > producer
>> > > > > > > or
>> > > > > > > > > > consumer) and existingSchema (last schema used)
>> > > > > > > > > > * Forward
>> > > > > > > > > > ** List down rules, where use newSchema (the schema
>> used by
>> > > > > > producer
>> > > > > > > or
>> > > > > > > > > > consumer) and existingSchema (last schema used)
>> > > > > > > > > >
>> > > > > > > > > > So each strategy should have its own section.
>> > > > > > > > > >
>> > > > > > > > > > I'm saying this since you used "writttenSchema" word
>> but it
>> > > > > > > represents
>> > > > > > > > > > something completely different if it's backward or
>> forward
>> > > check.
>> > > > > > > > > >
>> > > > > > > > > > Once you'll have that structure like that, I personally
>> will
>> > > be
>> > > > > > able
>> > > > > > > to
>> > > > > > > > > > read and understand it.
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > > The motivation and problem statement are now good -
>> thanks
>> > > for
>> > > > > > > improving
>> > > > > > > > > > it.
>> > > > > > > > > >
>> > > > > > > > > > On Mon, Feb 27, 2023 at 8:20 AM SiNan Liu <
>> > > > > liusinan1998@gmail.com>
>> > > > > > > > > wrote:
>> > > > > > > > > >
>> > > > > > > > > > > Hi! I updated the PIP issue again. This time I've
>> added
>> > > some
>> > > > > > > background
>> > > > > > > > > > and
>> > > > > > > > > > > some explanations.
>> > > > > > > > > > >
>> > > > > > > > > > > The compatibility check rules are already written in
>> the
>> > > > > > > > > Implementation.
>> > > > > > > > > > > ProtoBufNative implements the same canRead method as
>> Apache
>> > > > > Avro.
>> > > > > > > > > > > It does this by checking whether the schema for
>> writing and
>> > > > > > reading
>> > > > > > > is
>> > > > > > > > > > > compatible. I also indicate whether the writtenSchema
>> and
>> > > > > > > readSchema of
>> > > > > > > > > > the
>> > > > > > > > > > > Backward, Forward, and Full strategies are the old or
>> the
>> > > new
>> > > > > > > version
>> > > > > > > > > of
>> > > > > > > > > > > the schema.
>> > > > > > > > > > >
>> > > > > > > > > > > Thanks,
>> > > > > > > > > > > sinan
>> > > > > > > > > > >
>> > > > > > > > > > > Asaf Mesika <as...@gmail.com> 于2023年2月26日周日
>> 23:24写道:
>> > > > > > > > > > >
>> > > > > > > > > > > > I'm sorry, but this PIP lacks a lot of background
>> > > knowledge,
>> > > > > so
>> > > > > > > you
>> > > > > > > > > > need
>> > > > > > > > > > > to
>> > > > > > > > > > > > add IMO for people to understand it. You don't need
>> to
>> > > > > explain
>> > > > > > > the
>> > > > > > > > > > entire
>> > > > > > > > > > > > pulsar in this PIP, but at the very least a few
>> > > paragraphs
>> > > > > > > detailing
>> > > > > > > > > > all
>> > > > > > > > > > > > you need to know, to put you in context:
>> > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > > >    - Start by saying Pulsar as a built-in schema
>> registry
>> > > > > > inside
>> > > > > > > > > Pulsar
>> > > > > > > > > > > >    broker.
>> > > > > > > > > > > >       - Every time the client updates the schema, it
>> > > uploads
>> > > > > it
>> > > > > > > to
>> > > > > > > > > the
>> > > > > > > > > > > >       broker. When that happens, it has a feature
>> which
>> > > > > > validates
>> > > > > > > if
>> > > > > > > > > > the
>> > > > > > > > > > > > new
>> > > > > > > > > > > >       schema version is compatible with the previous
>> > > > > versions.
>> > > > > > > There
>> > > > > > > > > > > > are 4 types
>> > > > > > > > > > > >       of compatibility: Full, ... (complete and
>> explain
>> > > each
>> > > > > > one
>> > > > > > > > > > briefly)
>> > > > > > > > > > > >    - Also explain Pulsar Schema registry supports
>> various
>> > > > > > schema
>> > > > > > > > > > > >    protocols:  Avro, protobuf native, ... (complete
>> the
>> > > > > rest),
>> > > > > > > each
>> > > > > > > > > > > > protocol
>> > > > > > > > > > > >    has a schema which dictates how to serialize and
>> > > > > deserialize
>> > > > > > > the
>> > > > > > > > > > > message
>> > > > > > > > > > > >    content into typed object.
>> > > > > > > > > > > >    - Explain in short what is protobuf native
>> (compare
>> > > > > protobuf
>> > > > > > > > > > > non-native)
>> > > > > > > > > > > >    - Please don't paste code instead of explaining.
>> > > > > > > > > > > >       - Explain that protobuf native current
>> validation
>> > > check
>> > > > > > is
>> > > > > > > only
>> > > > > > > > > > > >       composed of checking the root message name is
>> the
>> > > same
>> > > > > > > between
>> > > > > > > > > > > > the current
>> > > > > > > > > > > >       schema version and the new version.
>> > > > > > > > > > > >          - Explain briefly what is a root message
>> and its
>> > > > > name.
>> > > > > > > > > > > >       - Explain the problem (list scenarios) that
>> we have
>> > > > > > because
>> > > > > > > > > > > protobuf
>> > > > > > > > > > > >       native schema only supports FULL compatibility
>> > > > > > validation.
>> > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > > > Regarding high level design - as in what you plan
>> to do.
>> > > > > > > > > > > > I suggest you add "High Level Design" and in it
>> detail
>> > > how
>> > > > > you
>> > > > > > > plan
>> > > > > > > > > to
>> > > > > > > > > > > > validate, per protobuf version, per compatibility
>> check
>> > > > > > > (backward,
>> > > > > > > > > > > forward,
>> > > > > > > > > > > > full,...).
>> > > > > > > > > > > > I tried reading the implementation - for me , it's
>> all
>> > > over
>> > > > > the
>> > > > > > > > > place.
>> > > > > > > > > > > Can
>> > > > > > > > > > > > you please list in order what I wrote above, and
>> list the
>> > > > > > > validation
>> > > > > > > > > > > rules
>> > > > > > > > > > > > with a good explanation why you validate it like
>> that?
>> > > > > > > > > > > >
>> > > > > > > > > > > > Lastly, one you have all the validation rules
>> clearly
>> > > stated,
>> > > > > > you
>> > > > > > > can
>> > > > > > > > > > use
>> > > > > > > > > > > > it to document it properly so users can know what
>> > > validation
>> > > > > to
>> > > > > > > > > expect.
>> > > > > > > > > > > >
>> > > > > > > > > > > > Thanks,
>> > > > > > > > > > > >
>> > > > > > > > > > > > Asaf
>> > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > > > On Wed, Feb 22, 2023 at 5:10 PM SiNan Liu <
>> > > > > > > liusinan1998@gmail.com>
>> > > > > > > > > > > wrote:
>> > > > > > > > > > > >
>> > > > > > > > > > > > > Sorry, my mistake. I removed the code and
>> described the
>> > > > > > design
>> > > > > > > to
>> > > > > > > > > > > improve
>> > > > > > > > > > > > > the PROTOBUF_NATIVE schema compatibility checks.
>> You
>> > > can
>> > > > > > have a
>> > > > > > > > > look.
>> > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > Asaf Mesika <as...@gmail.com> 于2023年2月22日周三
>> > > 21:16写道:
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > > I read it but you're almost directly diving
>> into the
>> > > > > code -
>> > > > > > > it
>> > > > > > > > > will
>> > > > > > > > > > > > take
>> > > > > > > > > > > > > me
>> > > > > > > > > > > > > > hours just to reverse engineer your design.
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > Can you please include a "High Level Design"
>> section
>> > > in
>> > > > > > which
>> > > > > > > you
>> > > > > > > > > > > > explain
>> > > > > > > > > > > > > > how you plan to tackle any issue?
>> > > > > > > > > > > > > > If I can read that section and explain to
>> someone
>> > > else
>> > > > > how
>> > > > > > > this
>> > > > > > > > > > will
>> > > > > > > > > > > > > work,
>> > > > > > > > > > > > > > it means the section is complete.
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > Let's leave the code to the PRs.
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > On Sun, Feb 19, 2023 at 2:59 PM SiNan Liu <
>> > > > > > > > > liusinan1998@gmail.com>
>> > > > > > > > > > > > > wrote:
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > Hi all,
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > I made a PIP to discuss:
>> > > > > > > > > > > > https://github.com/apache/pulsar/issues/19565
>> > > > > > > > > > > > > .
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > We can talk about the current design here.
>> > > Especially
>> > > > > for
>> > > > > > > the
>> > > > > > > > > > field
>> > > > > > > > > > > > > type
>> > > > > > > > > > > > > > > change check rules, please give your valuable
>> > > advice.
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > Thanks,
>> > > > > > > > > > > > > > > Sinan
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > >
>>
>

Re: [DISCUSS] PIP-246: Improved PROTOBUF_NATIVE schema compatibility checks without using avro-protobuf

Posted by Asaf Mesika <as...@gmail.com>.
On Wed, Mar 15, 2023 at 5:46 PM SiNan Liu <li...@gmail.com> wrote:

> 1.
>
> > > Why? The root message name is not written over the wire to the best of
> my
> > > knowledge. I haven't found it written in the official doc.
>
>
> The name of the root message check is the rules in the previous
> `ProtobufNativeSchemaCompatibilityCheck`. Because if the root message has a
> different name, there is no need to check its contents. "Same" schema,
> their names must be the same.
>
>
I understand the root message name check is in the current implementation.
In the PIP you're describing a new implementation with different validation
rules, which the first is:

(1) If the root message names of writtenSchema and readSchema are
> different, then incompatible.
> *This rule applies to both proto2 and proto3.*
>
The message names are encoded into the binary data. Why does it matter if I
changed the message name? This is only for you as a human being reading
this.
Let me give you an example:

I write this as my protobuf in first version:

message SearchReq {  string query = 1;  int32 page_number = 2;  int32
result_per_page = 3;}



Then second version I use:

message SearchRequest {  string query = 1;  int32 page_number = 2;
int32 result_per_page = 3;}



I only change the root message name from SearchReq to SearchRequest and you
will render this not valid?
Why?
This is practically the same as renaming `string query = 1` to `string
queryString = 1`.
Names in protobuf are purely to help a human being read it, they matter not
to the encoding and decoding and therefore should not be used for schema
evolution validation rules.

Also if you search the guidelines you will not find anything about this. If
you do, please show me and I'll stand corrected and learn something new.




>
> 2.
>
> > >The writtenSchema can not change the field number of any field in
> > readSchema (the > field name is the same, but the field number is
> > different).
> > >You have to take into account field type as well when comparing.
>
>
> The first sentence on the website says that the number of fields cannot be
> changed.
>
> > Don’t change the field numbers for any existing fields.
>

This is a guideline for a human making the change, not for software
validation right?
When you write down the code for doing the comparison to know if the field
number has changed, you have to take into account both the field name AND
field type. If both are equal but the field number is different then fail
it.

I also gave you an example to show what happens if you use *only* name to
compare:


If I have in readSchema
int32 justAnID = 1
int32 customerId = 2

and in writeSchema I have
int32 justAnID = 1
string customerId = 3

This is valid.

You will fail this validation check, since customerId has different field
numbers (2 and 3), but I decided to remove field number 2 and add a new
field with the same name, different type, with field number 3. When I read,
field number 2 will get default value of 0, and ignore field number 3.
Valid.

If you compare by field type, then `int32 customerId = 2` to `int32
customerId = 3`, would fail, since that looks super strange right? Why
remove and add the same field exactly?



>
>
>
> 3.
>
> > > - The writtenSchema cannot add required fields, but optional or
> > > duplicate fields can be added (The field number must be new).
> > >
> > > That's not true.
> > You can have a required field in writeSchema, and not have that field in
> > readSchema (based on tag number).
>
>
> The required field must exist. In your case, where does readSchema go to
> read its required fields? It's not in writtenSchema.
> The second and third sentences in the official website say, add and delete
> do not operate required field!
>


Let's take an example, ok?

Say my writeSchema is:

message SearchRequest {  string query = 1;  int32 page_number = 2;
*required *int32 result_per_page = 3;}


In the producer, it will fail if I don't provide `result_per_page` right?
Since it is required.

Now say my read schema is:

message SearchRequest {  string query = 1;  int32 page_number = 2;}


I'm the consumer, using the read schema.
I don't even know if a field is required or not based on the binary data,
right?
Nothing in the encoding itself tells me this is a required field.

So, I read field 1, read field 2, and that's it.
I ignore field 3.

Valid.

If you'll read sentence 2 from the guide, you'll see they write
"This means that any messages serialized by code using your “old” message
format can be parsed by your new generated code, as they won’t be missing
any required elements."

This is for the *other side*: when you add a required to readSchema compare
to write schema. This of course is not allowed and won't work.



>
>
> 4.
>
> > (4) The writtenSchema can not change the field name of any field in
> > > readSchema (the field number is the same, but the field name is
> > > different).
> > This is incorrect.
> > Fields names are encoded into the wire. I don't see this in any best
> > practice.
>
>
> The third sentence on the website:
>
> > You may want to rename the field instead, perhaps adding the prefix
> > “OBSOLETE_”, or make the field number reserved, so that future users of
> > your .proto can’t accidentally reuse the number.
>
> If you want to rename a field, or add a new field. To delete with the new
> number!
>
>
You completely misunderstood protobuf and that specific section was quoted
out of context. Let me explain.

First, let's start with an example ok?

My write schema is

message SearchRequest {  string query = 1;  int32 page_number = 2;
int32 result_per_page = 3;}


my read schema is

message SearchRequest {  string query = 1;  int32 page_number = 2;
int32 resultPerPage = 3;}


I can read it perfectly fine.
Why?
Because the encoded data contains the field number. The type matches
perfectly so when I decode the value using int32 it will work.
The whole concept of protobuf is the field numbers! They allow you to
rename your fields as much as you want, because what's gets encoded is the
field number. You just need to make sure the same field number has the type
otherwise you'll screw up the decode of the value of the field.
I strongly suggest you read the wire protocol of protobuf to understand how
they decode and encode.

Now regarding that best practice you quoted. Here's the full quote:


   - Non-required fields can be removed, as long as the field number is not
   used again in your updated message type. You may want to rename the field
   instead, perhaps adding the prefix “OBSOLETE_”, or make the field number
   reserved <https://protobuf.dev/programming-guides/proto2/#reserved>, so
   that future users of your .proto can’t accidentally reuse the number.

First, they are talking about removal of fields. They suggest you rename
the field to OBSOLETE_myField instead of actually deleting it, to make it
easier on compatability. They don't talk about rules that forbid you to
rename a field. It's unrelated.




>
> 5.
>
> > The writtenSchema does not change the field name and number, but it does
> > change the field type.
> > > Small correction: for the same field number you are not allowed to
> change
> > types. Name is irrelevant.
>
>
> Why doesn't the name irrelevant?
> Here is the change in type, which is the rule stated in Alternatives. There
> is no check here, just a warning to the user.
> Another change is that the name of enum is changed, or the name of MESSAGE
> is changed, which is the same as the root message name check in 1, is the
> change still the same? This is not allowed to change!
>
> *Here is example:*
> readSchema(
>
> https://github.com/apache/pulsar/pull/19566/files#diff-a7006d73502e6064a80af02822f3a3072be498d8b677c4b838b0dafaea32dea4
> )
> writtenSchema(
>
> https://github.com/apache/pulsar/pull/19566/files#diff-e3e7543624edaf1e0a4fd47947a2cad6e4b816b93843f71a367042ba6c3ec53f
> )
>
>
>
I think my previous explanation in this email reply should be good enough
to explain why the name is irrelevant.
If not, please ping me on this.

Regarding your example, can you quote it? I don't understand the read and
write schema here.

Also, regarding message names and why they are irrelevant. Same thing: Read
https://protobuf.dev/programming-guides/encoding/
In there you will see the names are not encoded thus can be changed freely
and still successfully decode binary data.




> 6.
>
> > (6) The writtenSchema removes fields that do not have default values in
> > > readSchema. Then the schema is incompatible.
> > Protobuf gives you its own default if you don't supply one. This is
> > incorrect.
>
>
> (1) This rule only applies if proto2 does not set the default value. If
> proto3 does not check, the default value will always be there.
>
> (2) In PIP issue:
>
> > Proto3 canceled the required field, so there is no need to check the
> > required field. We get the syntax(proto2 or proto3) of proto in the code,
> > and skip the check of the required field if it is proto3. All other
> > checking rules also apply to proto3.
>
>
> *I made a mistake here. This default value check is not need in proto3. I
> will modify the rules later according to your suggestion.*
>
>
Let me quote from here (
https://protobuf.dev/programming-guides/proto2/#optional):

If the default value is not specified for an optional element, a
type-specific default value is used instead: for strings, the default value
is the empty string. For bytes, the default value is the empty byte string.
For bools, the default value is false. For numeric types, the default value
is zero. For enums, the default value is the first value listed in the
enum’s type definition. This means care must be taken when adding a value
to the beginning of an enum value list. See the Updating A Message Type
<https://protobuf.dev/programming-guides/proto2/#updating> section for
guidelines on how to safely change definitions.
So if my write schema is

message SearchRequest {  required string query = 1;  optional int32
page_number = 2;  optional int32 result_per_page = 3;}


and my read schema is

message SearchRequest {  optional int32 page_number = 2;  optional
int32 result_per_page = 3;}


You can see I removed a field from writeSchema that does not have a default
value.

First, I can read it without any problems: Field 2 might exists, no
problem. Field 3 might exists, no problem. I ignore field 1.

So the validation you wrote is no correct at all, without any regard to
optional value.

What if I do the other way around?

My write schema is:

message SearchRequest {  optional int32 page_number = 2;  optional
int32 result_per_page = 3;}


My read schema is:

message SearchRequest {  required string query = 1;  optional int32
page_number = 2;  optional int32 result_per_page = 3;}


Here I will fail, since it is required I will provide value for field 1 but
alas it will never be there.

Again, nothing to do with the default values. It will your rule:

The writtenSchema cannot remove required fields in the readSchema.



Now let's focus on the optional field, ok?


Write schema

message SearchRequest {  optional int32 page_number = 2;  optional
int32 result_per_page = 3;}


Read schema:

message SearchRequest {  optional string query = 1;  optional int32
page_number = 2;  optional int32 result_per_page = 3;}


When I read, I won't find field number 1 of course, so I will use the
default value dictated by the type, as I quoted above, which is for string
is the empty byte string.

So to summarize, you need to change the validation rule:

from

The writtenSchema removes fields that do not have default values in
> readSchema. Then the schema is incompatible.


to

nothing - it is ok to delete optional fields without default values from
the writeSchema.



> > I would remove the proto2/proto3 sections, since they only differ in 1
> > rule, and just mention that distinction inside that rule (less work for
> the
> > reade).
>
>
> (3) And add rules that look like they should be checked:
>
> > Rules that you don't have in the doc, but should IMO*
> > .......
>
> There can't be a field in writerSchema, that exists in readerSchema (tag
> > number based), which in writerSchema its type is scalar, but in
> readSchema
> > its type is scalar, it's repeated but with packed=true.
>
>
> But I don't think rule number three needs to be added.
>
>
>
Read this and then please explain why rule number 3 should not be added:

https://protobuf.dev/programming-guides/dos-donts/#dont-go-from-repeated-to-scalar




> Thanks,
> sinan
>
>
>
> Asaf Mesika <as...@gmail.com> 于2023年3月14日周二 22:33写道:
>
> > Hi Sinan,
> >
> > The doc looks much better!
> >
> > I have a few additional comments:
> >
> > Pasting comment from previous emails:
> >
> > Can you convert the code block which is actually a quote in the
> > beginning of the PIP to something which doesn't require to scroll
> > horizontally so much?
> > Use
> >
> >
> https://docs.github.com/en/get-started/writing-on-github/getting-started-with-writing-and-formatting-on-github/basic-writing-and-formatting-syntax#quoting-text
> >
> > *Validation Rules*
> >
> > (1) If the root message names of writtenSchema and readSchema are
> > > different, then incompatible.
> >
> > Why? The root message name is not written over the wire to the best of my
> > knowledge. I haven't found it written in the official doc.
> >
> >
> > >    - The writtenSchema cannot add required fields, but optional or
> > >    duplicate fields can be added (The field number must be new).
> > >
> > > That's not true.
> > You can have a required field in writeSchema, and not have that field in
> > readSchema (based on tag number).
> >
> > The writtenSchema can not change the field number of any field in
> > readSchema (the
> > > field name is the same, but the field number is different).
> >
> > You have to take into account field type as well when comparing.
> >
> > If I have in readSchema
> > int32 justAnID = 1
> > int32 customerId = 2
> >
> > and in writeSchema I have
> > int32 justAnID = 1
> > string customerId = 3
> >
> > This is valid.
> >
> > (4) The writtenSchema can not change the field name of any field in
> > > readSchema (the field number is the same, but the field name is
> > > different).
> >
> > This is incorrect.
> > Fields names are encoded into the wire. I don't see this in any best
> > practice.
> >
> > ) The writtenSchema does not change the field name and number, but it
> does
> > > change the field type.
> > >
> > >    - If the field type is ENUM or MESSAGE, the schema is not compatible
> > >    when the type name is changed
> > >    - If the type of the field is another type. The schemas under this
> > >    rule are not incompatible, but warn the user.(There is another way
> of
> > >    testing in PIP issue Alternatives)
> > >
> > > Small correction: for the same field number you are not allowed to
> change
> > types. Name is irrelevant.
> >
> > (6) The writtenSchema removes fields that do not have default values in
> > > readSchema. Then the schema is incompatible.
> >
> > Protobuf gives you its own default if you don't supply one. This is
> > incorrect.
> >
> >
> > *Rules that you don't have in the doc, but should IMO*
> > * There can not be a field which exists both in readSchema and
> writeSchema,
> > with same tag number, having different default values
> > * There can't be a field in writerSchema, that exists in readerSchema
> (tag
> > number based), which in writerSchema is repeated and its type is Scalar (
> > https://protobuf.dev/programming-guides/proto/#scalar) but in readSchema
> > it
> > is not repeated anymore.
> > * There can't be a field in writerSchema, that exists in readerSchema
> (tag
> > number based), which in writerSchema its type is scalar, but in
> readSchema
> > its type is scalar, it's repeated but with packed=true.
> >
> > *Rules you have , but I would phrase a bit differently*
> >
> > I would remove the proto2/proto3 sections, since they only differ in 1
> > rule, and just mention that distinction inside that rule (less work for
> the
> > reade).
> >
> > * readSchema has a field which doesn't exist in writerSchema (based on
> tag
> > number).
> >    * Proto v2:
> >       * That field must be `optional` or `repeated` (must not be
> > `required`)
> >    * Proto v3:
> >       * No problem.
> > * There can not be a field which exists both in readSchema and
> writeSchema,
> > with the same tag number, but having different types.
> >
> > *Motivation*
> >
> > Basically in the motivation section you want people to understand the
> > following:
> >
> > Pulsar has built-in support for typed messages. It allows specifying an
> > encoding scheme and its matching schema.
> > For example, it supports Avro. You specify a schema for a given topic,
> > using Avro Schema Definition (i.e. a JSON describing the schema).
> Everytime
> > you produce a message, you first declare the schema definition you wish
> to
> > use for your messages. The message data should be an avro-encoded binary
> > data (which the client in some SDKs helps encode a given
> > data-structure/object).
> > The same applies when you consume a message. You first specify the schema
> > you use to read the messages, and the client in some SDKs helps by
> decoding
> > the message binary data into an object/data-structure.
> >
> > Each time you specify a schema to be used, either by a producer or a
> > consumer, the schema is persisted in Pulsar and given an increasing
> version
> > number. If the schema was the same as the previous version, it is not
> > saved. When the message is persisted, the version number is encoded in
> the
> > message headers.
> >
> > Pulsar provides a very useful feature named Schema Evolution
> > <
> https://pulsar.apache.org/docs/2.11.x/schema-understand/#schema-evolution
> > >.
> > It allows us to check if a new schema version is compatible with previous
> > versions or versions. When you configure the schema for the topic you
> > decide the strategy to use for doing the validation check. The strategies
> > validate the following:
> >
> >    - BACKWARD strategy
> >       - A consumer with newSchema can read a message written using
> >       existingSchema
> >    - BACKWARD_TRANSITIVE strategy
> >       - A consumer with newSchema can read messages written using all
> >       existingSchema
> >    - FORWARD
> >       - A consumer with existingSchema can read messages written using
> >       newSchema
> >    - FORWARD_TRANSITIVE
> >       - A consumer defined with any of the existingSchema can read
> messages
> >       written using newSchema
> >    - FULL
> >       - A consumer defined with newSchema can read messages written using
> >       existingSchema
> >       - A consumer defined with existingSchema can read messages written
> >       using newSchema
> >    - FULL_TRANSITIVE
> >       - A consumer defined with newSchema can read messages written using
> >       any of the existingSchema
> >       - A consumer defined with any of the existingSchema can read
> messages
> >       written using newSchema
> >
> >
> > Aside from Avro, Pulsar also has two additional supported encodings:
> > PROTOBUF and PROTOBUF_NATIVE.
> >
> > PROTOBUF is a bit strange. It encodes the messages using Protobuf
> encoding,
> > but the schema that is persisted to Pulsar is *not* Protobuf Descriptor
> as
> > you would have expected. The saved schema is a translation of the
> Protobuf
> > Descriptor to an Avro Schema, so in fact an Avro schema definition is
> saved
> > as the schema.
> >
> > PROTOBUF_NATIVE was created to fix that shortcoming, by actually
> persisting
> > the Protobuf Descriptor and using Protobuf for encoding.
> > The problem is that the authors of PROTOBUF_NATIVE haven't completed it
> > fully, and the backward compatibility validation code almost does not
> > exist: It only checks if the root message name is the same between
> > versions.
> >
> > GOALS
> > The goal of this PIP is to amend PROTOBUF_NATIVE by adding a fully
> > functional validation for any of the defined Schema Compatibility
> > Strategies.
> > A secondary goal is to allow the user to choose between different
> > implementations: The new fully functional validation or the existing
> > barebones validation.
> >
> > -------- END
> >
> > I'm ok with having links in the Motivation , as *further reading*.
> > I'm against stacking up work for the reader to go read 5-6 different
> links
> > just to understand the motivation and background knowledge required to
> > understand the feature.
> >
> > I'm against putting code in the Motivation. Especially if it is supposed
> to
> > replace description in plain English making it easy to understand the
> > design.
> > Leave the code to the motivation.
> > Paste code only if you absolutely can't use plain old descriptions to
> > explain.
> >
> >
> >
> > On Sat, Mar 11, 2023 at 11:46 AM SiNan Liu <li...@gmail.com>
> wrote:
> >
> > > *I guess that's right, too! *
> > >
> > > But the name `ProtobufNativeAdvancedSchemaCompatibilityCheck` is
> better,
> > > because we don't know whether the future will have V2, V3. The official
> > > solution can be called
> `ProtobufNativeOfficialSchemaCompatibilityCheck`,
> > or
> > > is a good `ProtobufNativeXXXXXXXXSchemaCompatibilityCheck` third-party
> > > solution.
> > >
> > > I've updated my design in PIP issue.
> > > 1. A new ProtobufNativeSchemaAdvanceCompatibilityCheck, rather than a
> > > ProtobufNativeSchemaCompatibilityCheck different validator
> > implementation.
> > > 2. Remove the 'builder'
> > > 3. Clarify the relationship between newSchema, existingSchema, and
> > > writtenSchema in canRead.
> > >
> > > Help to see if the description is comprehensive and what changes and
> > > improvements need to be made.
> > >
> > > Thanks,
> > > sinan
> > >
> > >
> > >
> > > Asaf Mesika <as...@gmail.com> 于2023年3月9日周四 17:35写道:
> > >
> > > > I like Bo's suggestion - I haven't realized each schema type
> > > > compatibility check is actually a plugin.
> > > > It makes sense for any schema type checks to evolve, sometimes in a
> > > > non-backward compatible way hence having two plugins like
> > > > protobufNativeSchemaCompatabilityCheckV1 and then
> > > > protobufNativeSchemaCompatabilityCheckV2 and then
> > > > protobufNativeSchemaCompatabilityCheckV3 makes sense to me.
> > > >
> > > >
> > > >
> > > > On Thu, Mar 9, 2023 at 5:49 AM 丛搏 <bo...@apache.org> wrote:
> > > >
> > > > >  Hi siNan:
> > > > >
> > > > > From my point of view, it is just a plug-in. I don't think it is
> > > > > necessary to add configuration for the plugin.
> > > > > This is meaningless, and it will increase the difficulty of use for
> > > > users.
> > > > >
> > > > >
> > > > > SiNan Liu <li...@gmail.com> 于2023年3月8日周三 15:54写道:
> > > > > >
> > > > > > Hi, bo.
> > > > > >
> > > > > > 1. I understand what you say, to develop a new
> > > > > > `ProtobufNativeAdvancedSchemaCompatibilityCheck`, rather than
> > > changing
> > > > > > existing `ProtobufNativeSchemaCompatibilityCheck`. But I found a
> > few
> > > > > small
> > > > > > problems:
> > > > > >
> > > > > > (1)ProtobufNativeAdvancedSchemaCompatibilityCheck and
> > > > > > ProtobufNativeSchemaCompatibilityCheck schemaType is
> > PROTOBUF_NATIVE.
> > > > It
> > > > > > looks like both checkers are PROTOBUF not using AVRO-PROTOBUF's
> > > > "native"
> > > > > > implementation, which leads to some problems or "unreasonable"
> and
> > > > gives
> > > > > me
> > > > > > some extended thinking and questions.
> > > > > >
> > > > > `CompatibilityCheck ` its only a plugin.
> > > > > `ProtobufNativeSchemaCompatibilityCheck` may sooner or later leave
> > the
> > > > > stage, when `ProtobufNativeAdvancedSchemaCompatibilityCheck` is
> > > > > stable, we can make it the default Checker.
> > > > >
> > > > > It is just a plug-in, users can change it at will and ensure that
> it
> > > > > is used correctly
> > > > > > (2)In broker.conf
> > > > > >
> > > > > > `schemaRegistryCompatibilityCheckers`. If
> > > > > > ProtobufNativeSchemaCompatibilityCheck and
> > > > > > ProtobufNativeAdvancedSchemaCompatibilityCheck all set. This is
> > going
> > > > to
> > > > > > overwrite each other. Because this is a map:
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/pulsar/blob/af1360fb167c1f9484fda5771df3ea9b21d1440b/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/schema/SchemaRegistryService.java#L36-L44
> > > > > >
> > > > > > ```java
> > > > > >
> > > > > > Map<SchemaType, SchemaCompatibilityCheck> checkers = new
> > HashMap<>();
> > > > > >
> > > > > > for (String className : checkerClasses) {
> > > > > >
> > > > > > SchemaCompatibilityCheck schemaCompatibilityCheck =
> > > > > > Reflections.createInstance(className,
> > > > > >
> > > > > > SchemaCompatibilityCheck.class,
> > > > > > Thread.currentThread().getContextClassLoader());
> > > > > >
> > > > > > checkers.put(schemaCompatibilityCheck.getSchemaType(),
> > > > > > schemaCompatibilityCheck);
> > > > > >
> > > > > > ```
> > > > > >
> > > > > > Is this a big problem or a small one? Is it possible or
> > unnecessary?
> > > > > Maybe
> > > > > > we can write in the documentation that protobufNative checkers
> can
> > > only
> > > > > > choose one of the two? Why are there two Checkers for different
> > > > > > implementations of the same schemaType? Why not the checker to
> > create
> > > > > > different validator, so we don not have to change
> > > > > > schemaRegistryCompatibilityCheckers.
> > > > >
> > > > > users can only use one, not two, which will bring complexity to
> users
> > > > >
> > > > > >
> > > > > > (3)And after the update to
> > > > > ProtobufNativeAdvancedSchemaCompatibilityCheck.
> > > > > > Existing topics previously only checked the name of the root
> > message,
> > > > not
> > > > > > the content of protobuf.
> > > > > >
> > > > > > What if the user wants both Checkers?
> > > > > >
> > > > > > Set to ProtobufNativeAdvancedSchemaCompatibilityCheck, affect the
> > > topic
> > > > > of
> > > > > > the existing schema?
> > > > > >
> > > > > > Older topics still use the old checker, and newer topics or
> certain
> > > > older
> > > > > > topics use the new advancedchecker.
> > > > > >
> > > > > when `ProtobufNativeAdvancedSchemaCompatibilityCheck` stable,
> > > > > users will not choose `ProtobufNativeSchemaCompatibilityCheck`.
> > > > > because it not a complete checker.
> > > > > > (4)So should we have one schemaType for a checker?
> > > > protobufNativeChecker
> > > > > > can have as many different implementation classes as possible.
> This
> > > > > > classname configuration in PIP, let's see if it can be set at the
> > > topic
> > > > > > level. In the current PIP design I just load this parameter into
> > the
> > > > > > checker when the broker is started and the checkers map is set
> up.
> > > Can
> > > > I
> > > > > do
> > > > > > this in the new normal pr if I want to support topic level? Or
> > > perfect
> > > > it
> > > > > > here?
> > > > > >
> > > > > > Add a call PROTOBUF_NATIVE_ADVANCE schemaType corresponding
> > > > > > ProtobufNativeAdvancedSchemaCompatibilityCheck? (Seems to be more
> > > > > trouble).
> > > > > >
> > > > > > Sorry I can not use the computer and network in the company, I
> use
> > my
> > > > > > mobile phone to reply to the email, the format may be a bit
> messy.
> > > > Please
> > > > > > understand.
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > sinan
> > > > > >
> > > > > >
> > > > > > 丛搏 <bo...@apache.org> 于 2023年3月7日周二 下午11:39写道:
> > > > > >
> > > > > > > SiNan Liu <li...@gmail.com> 于2023年3月7日周二 13:22写道:
> > > > > > > >
> > > > > > > > Great to see your comment, bo!
> > > > > > > >
> > > > > > > > 1. The first way. The protobuf website has a description of
> the
> > > > > rules,
> > > > > > > but
> > > > > > > > no plans to implement them.
> > > > > > > > https://protobuf.dev/programming-guides/proto/#updating
> > > > > > >
> > > > > > > https://groups.google.com/g/protobuf
> > > > > > > maybe ask here
> > > > > > >
> > > > > > > >
> > > > > > > > 2. I think this PIP can be divided into two parts.
> > > > > > > > (1) Add a flag(`ValidatorClassName`), load it into
> > > > > > > > `ProtobufNativeSchemaCompatibilityCheck` when the broker
> > starts.
> > > > > > > > ValidatorClassName is empty by default, and the
> implementation
> > > > > continues
> > > > > > > as
> > > > > > > > before, with no change for the user.
> > > > > > >
> > > > > > > `ProtobufNativeSchemaCompatibilityCheck` is a plugin in
> > > `broker.conf`
> > > > > > > ```
> > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
> schemaRegistryCompatibilityCheckers=org.apache.pulsar.broker.service.schema.JsonSchemaCompatibilityCheck,org.apache.pulsar.broker.service.schema.AvroSchemaCompatibilityCheck,org.apache.pulsar.broker.service.schema.ProtobufNativeSchemaCompatibilityCheck
> > > > > > > ```
> > > > > > > I do not recommend that we directly modify this plugin and
> > continue
> > > > to
> > > > > > > add configuration items, which will cause trouble for users.
> > > > > > > We have a lot of configs and it's getting very unwieldy.
> > > > > > > in my opinion, we don't change
> > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
> `org.apache.pulsar.broker.service.schema.ProtobufNativeSchemaCompatibilityCheck`,
> > > > > > > it is a simple implementation, it doesn't go wrong very often,
> > most
> > > > > > > users will use it. we can add another ProtobufNativeCheck named
> > > > > > > `ProtobufNativeAdvancedSchemaCompatibilityCheck ` or other. in
> > this
> > > > > > > way, we don't need to add this flag. There is no need to
> consider
> > > > > > > compatibility, it is just a plug-in and will not affect current
> > > > logic.
> > > > > > > If the user needs it, just change the plugin to the new
> > > > implementation
> > > > > > >
> > > > > > > > ```java
> > > > > > > >     ProtobufNativeSchemaValidator DEFAULT = (fromDescriptors,
> > > > > > > toDescriptor)
> > > > > > > > -> {
> > > > > > > >         for (Descriptors.Descriptor fromDescriptor :
> > > > > fromDescriptors) {
> > > > > > > >             // The default implementation only checks if the
> > root
> > > > > message
> > > > > > > > has changed.
> > > > > > > >             if
> > > > > > > >
> > > (!fromDescriptor.getFullName().equals(toDescriptor.getFullName()))
> > > > {
> > > > > > > >                 throw new
> > ProtoBufCanReadCheckException("Protobuf
> > > > > root
> > > > > > > > message isn't allow change!");
> > > > > > > >             }
> > > > > > > >         }
> > > > > > > >     };
> > > > > > > > ```
> > > > > > > > `ValidatorClassName` value also can be set to the current
> > > > > implementation
> > > > > > > of
> > > > > > > > PIP add
> > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
> `org.apache.pulsar.broker.service.schema.validator.ProtobufNativeSchemaBreakValidatorImpl`.
> > > > > > > >
> > > > > > > > (2) Recoding the `ProtobufNativeSchemaCompatibilityCheck`.
> > > Through
> > > > > the
> > > > > > > flag
> > > > > > > > (`ValidatorClassName`) to build different
> > > > > > > `ProtobufNativeSchemaValidator`.
> > > > > > > > Isn't it just a plug-in? The user can develop and choose a
> > > > different
> > > > > > > > `ProtobufNativeSchemaValidator`. I think it didn't change the
> > > > logic,
> > > > > it
> > > > > > > > just allowed him to expand it.
> > > > > > > >
> > > > > > > >
> > > > > > > > I think this PIP should be an enhancement and supplement to
> the
> > > > > function,
> > > > > > > > and there is no such thing as unnecessary and meaningless.
> > > > > > > >
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > sinan
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > 丛搏 <bo...@apache.org> 于2023年3月7日周二 11:53写道:
> > > > > > > >
> > > > > > > > > I think we have two ways to do that.
> > > > > > > > >
> > > > > > > > > First way: We need to advance the improvement of java in
> > > > protobuf.
> > > > > Ask
> > > > > > > > > if they have plans to improve.
> > > > > > > > >
> > > > > > > > > Second way: the new PROTOBUF_NATIVE
> > `SchemaCompatibilityCheck`
> > > > > should
> > > > > > > > > be implemented as a plugin, don't change any existing
> plugin
> > > > logic
> > > > > > > > > (it's simple and already used). I don't recommend adding
> > flags
> > > > for
> > > > > > > > > rollback, it adds configuration and makes little sense.
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > Bo
> > > > > > > > >
> > > > > > > > > Asaf Mesika <as...@gmail.com> 于2023年3月6日周一 23:00写道:
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Can you convert the code block which is actually a quote
> in
> > > the
> > > > > > > > > > beginning of the PIP to something which doesn't require
> to
> > > > scroll
> > > > > > > > > > horizontally so much?
> > > > > > > > > > Use
> > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
> https://docs.github.com/en/get-started/writing-on-github/getting-started-with-writing-and-formatting-on-github/basic-writing-and-formatting-syntax#quoting-text
> > > > > > > > > >
> > > > > > > > > > Let's improve the clarity of what you wrote:
> > > > > > > > > >
> > > > > > > > > > "the PROTOBUF uses avro struct to store."
> > > > > > > > > > -->
> > > > > > > > > > When Schema type PROTOBUF is used, Pulsar Client assumes
> > the
> > > > > object
> > > > > > > given
> > > > > > > > > > to it as message data is an auto-generated POJO
> containing
> > > the
> > > > > > > > > annotations
> > > > > > > > > > encoding the schema. The client is using a converter,
> which
> > > > > converts
> > > > > > > a
> > > > > > > > > > Protobuf schema descriptor into an Avro schema and sends
> > that
> > > > as
> > > > > the
> > > > > > > > > Schema
> > > > > > > > > > of the producer/consumer.
> > > > > > > > > >
> > > > > > > > > > "On the broker side, protobuf and avro both use
> SchemaData
> > > > > converted
> > > > > > > to
> > > > > > > > > > org.apache.avro.Schema."
> > > > > > > > > > -->
> > > > > > > > > > Since the schema is an Avro schema, the implementation of
> > > > > > > compatibility
> > > > > > > > > > check on the broker side is to simply re-use the
> > > compatibility
> > > > > check
> > > > > > > of
> > > > > > > > > the
> > > > > > > > > > AVRO schema type.
> > > > > > > > > >
> > > > > > > > > > "ProtobufSchema is different from ProtobufNativeSchema in
> > > > schema
> > > > > > > > > > compatibility check it uses avro-protobuf.
> > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
> https://central.sonatype.com/artifact/org.apache.avro/avro-protobuf/1.11.1/overview
> > > > > > > > > > But the current implementation of ProtobufNative schema
> > > > > compatibility
> > > > > > > > > > check only
> > > > > > > > > > checked if the root message name is changed."
> > > > > > > > > >
> > > > > > > > > > -->
> > > > > > > > > > PROTOBUF_NATIVE schema type is different.
> > > > > > > > > > The client is actually using Protobuf Descriptor as the
> > > schema,
> > > > > as
> > > > > > > > > opposed
> > > > > > > > > > to Avro schema of PROTOBUF schema type. In the broker,
> the
> > > > > > > > > PROTOBUF_NATIVE
> > > > > > > > > > compatibility check actually hasn't implemented any rule,
> > > > besides
> > > > > > > one:
> > > > > > > > > > checking if the root message name has changed.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >    1. For now, there is no official or third-party
> > solution
> > > > for
> > > > > > > > > ProtoBuf
> > > > > > > > > > >    compatibility. If in the future have better
> solutions
> > > of a
> > > > > third
> > > > > > > > > party or
> > > > > > > > > > >    the official, we develop new
> > > ProtobufNativeSchemaValidator
> > > > > and
> > > > > > > use,
> > > > > > > > > so
> > > > > > > > > > >    add a flag.
> > > > > > > > > > >
> > > > > > > > > > > Who do you need to make that configurable? Once you
> > found a
> > > > > third
> > > > > > > > > party,
> > > > > > > > > > just switch to it? Who knows, maybe you never will.
> > Introduce
> > > > it
> > > > > > > when you
> > > > > > > > > > find it, not now.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > We improve in ProtobufNativeSchemaCompatibilityCheck
> > > BACKWARD,
> > > > > > > FORWARD
> > > > > > > > > > > these strategies. As with the AVRO implementation,
> > protobuf
> > > > > > > > > compatibility
> > > > > > > > > > > checking need implementing the canRead method. *This
> will
> > > > check
> > > > > > > that
> > > > > > > > > > > the writtenschema can be read by readSchema.*
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > I completely disagree.
> > > > > > > > > > Avro implementation is confusing for our use case. Don't
> > copy
> > > > > that.
> > > > > > > > > >
> > > > > > > > > > You have
> > > > > > > > > >
> > > > > > > > > > public void checkCompatible(SchemaData from, SchemaData
> to,
> > > > > > > > > > SchemaCompatibilityStrategy strategy)
> > > > > > > > > >         throws IncompatibleSchemaException {
> > > > > > > > > >     Descriptor fromDescriptor =
> > > > > > > > > > ProtobufNativeSchemaUtils.deserialize(from.getData());
> > > > > > > > > >     Descriptor toDescriptor =
> > > > > > > > > > ProtobufNativeSchemaUtils.deserialize(to.getData());
> > > > > > > > > >     switch (strategy) {
> > > > > > > > > >         case BACKWARD_TRANSITIVE:
> > > > > > > > > >         case BACKWARD:
> > > > > > > > > >         case FORWARD_TRANSITIVE:
> > > > > > > > > >         case FORWARD:
> > > > > > > > > >         case FULL_TRANSITIVE:
> > > > > > > > > >         case FULL:
> > > > > > > > > >             checkRootMessageChange(fromDescriptor,
> > > > toDescriptor,
> > > > > > > > > strategy);
> > > > > > > > > >             return;
> > > > > > > > > >         case ALWAYS_COMPATIBLE:
> > > > > > > > > >             return;
> > > > > > > > > >         default:
> > > > > > > > > >             throw new
> IncompatibleSchemaException("Unknown
> > > > > > > > > > SchemaCompatibilityStrategy.");
> > > > > > > > > >     }
> > > > > > > > > > }
> > > > > > > > > >
> > > > > > > > > > I would rename :
> > > > > > > > > > from --> currentSchema
> > > > > > > > > > to --> newSchema
> > > > > > > > > >
> > > > > > > > > > Use that switch case and have a method for each like:
> > > > > > > > > > validateBackwardsCompatibility(currentSchema, newSchema)
> > > > > > > > > >
> > > > > > > > > > I dislike canRead and usage of writtenSchema, since you
> > have
> > > > two
> > > > > > > > > completely
> > > > > > > > > > different use cases: from the producing side and the
> > consumer
> > > > > side.
> > > > > > > > > >
> > > > > > > > > > schemaValidatorBuilder
> > > > > > > > > > >
> > > > > > > > > > > I dislike this proposal. IMO Avro implementation is way
> > too
> > > > > > > > > complicated.
> > > > > > > > > > Why not have a simple function for validation for each
> > switch
> > > > > case
> > > > > > > above?
> > > > > > > > > > Why do we need strategy and builder, and all this
> > complexity?
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > *Here are the basic compatibility rules we've defined:*
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > IMO it's impossible to read the validation rules as you
> > > > described
> > > > > > > them.
> > > > > > > > > > I wrote how they should be structured numerous times
> above.
> > > > > > > > > > I can't validate them.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > IMO, the current design is very hard to read.
> > > > > > > > > > Please try to avoid jumping into code sections.
> > > > > > > > > > Write a high level design section, in which you describe
> in
> > > > words
> > > > > > > what
> > > > > > > > > you
> > > > > > > > > > plan to do.
> > > > > > > > > > Write the validation rules in the structure that is easy
> to
> > > > > > > understand:
> > > > > > > > > > rules per each compatibility check, and use proper words
> > > > (current
> > > > > > > schema,
> > > > > > > > > > new schema), since new schema can be once used for read
> and
> > > > once
> > > > > > > used for
> > > > > > > > > > write.
> > > > > > > > > >
> > > > > > > > > > In its current form it takes too much time to understand
> > the
> > > > > design,
> > > > > > > and
> > > > > > > > > it
> > > > > > > > > > shouldn't be the case.
> > > > > > > > > >
> > > > > > > > > > Thanks,
> > > > > > > > > >
> > > > > > > > > > Asaf
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Sun, Mar 5, 2023 at 3:58 PM SiNan Liu <
> > > > liusinan1998@gmail.com
> > > > > >
> > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Hi! I updated the explanation of some things in the PIP
> > > > issue.
> > > > > And
> > > > > > > also
> > > > > > > > > > > added a new “flag” in the conf is used as the different
> > > > > > > > > > > ProtobufNativeSchemaValidator implementation, also set
> > > > > > > > > > > ProtobufNativeSchemaValidator default only check
> whether
> > > the
> > > > > name
> > > > > > > of
> > > > > > > > > the
> > > > > > > > > > > root message is the same.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Thanks,
> > > > > > > > > > > sinan
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Asaf Mesika <as...@gmail.com> 于2023年3月5日周日
> > 20:21写道:
> > > > > > > > > > >
> > > > > > > > > > > > On Wed, Mar 1, 2023 at 4:33 PM SiNan Liu <
> > > > > liusinan1998@gmail.com
> > > > > > > >
> > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Can you please explain how a Protobuf Schema
> > > descriptor
> > > > > can
> > > > > > > be
> > > > > > > > > > > > validated
> > > > > > > > > > > > > > for backward compatibility check using Avro based
> > > > > > > compatibility
> > > > > > > > > > > rules?
> > > > > > > > > > > > > > Doesn't it expect the schema to be Avro, but it
> is
> > > > > actually a
> > > > > > > > > > > Protobuf
> > > > > > > > > > > > > > descriptor?
> > > > > > > > > > > > > > Is there some translation happening?
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > 1. *You can take a quick look at the previous
> design,
> > > the
> > > > > > > PROTOBUF
> > > > > > > > > uses
> > > > > > > > > > > > > avro struct to store.*
> > > > > > > > > > > > > https://github.com/apache/pulsar/pull/1954
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/ProtobufSchema.java#L59-L61
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/ProtobufSchema.java#L110-L115
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Ok. So to summarize your code (easier to write it
> than
> > > send
> > > > > > > links):
> > > > > > > > > > > > * Pulsar Client, when used with Protobuf Schema,
> > actually
> > > > > > > converts
> > > > > > > > > the
> > > > > > > > > > > > Protobuf descriptor into an Avro Schema (using code
> > found
> > > > > inside
> > > > > > > Avro
> > > > > > > > > > > > library) and saves that Avro schema as the schema.
> It's
> > > not
> > > > > > > saving
> > > > > > > > > the
> > > > > > > > > > > > protobuf descriptor at all. Very confusing I have to
> > add
> > > -
> > > > > never
> > > > > > > > > expected
> > > > > > > > > > > > that.
> > > > > > > > > > > > This explains why In the
> > ProtobufSchemaCompatibilityCheck
> > > > > they
> > > > > > > just
> > > > > > > > > > > extend
> > > > > > > > > > > > the Avro without doing any translation.
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks for that.
> > > > > > > > > > > >
> > > > > > > > > > > > Now thatI finally understand this, I can say that:
> you
> > > > *must*
> > > > > > > explain
> > > > > > > > > > > that
> > > > > > > > > > > > in the motivation part in your PIP.
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > 2. *On the broker side, protobuf and avro both use
> > > > > `SchemaData`
> > > > > > > > > > > converted
> > > > > > > > > > > > > to `org.apache.avro.Schema`.*
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/ServerCnx.java#L1280-L1293
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/schema/ProtobufSchemaCompatibilityCheck.java#L26-L31
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/schema/AvroSchemaBasedCompatibilityCheck.java#L47-L70
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Actually those links don't really help.
> > > > > > > > > > > > The main link that helps is:
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/pulsar/blob/ec102fb024a6ea2b195826778300f20e330dff06/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/ProtobufSchema.java#L102-L122
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > I'm sorry - I don't understand.
> > > > > > > > > > > > > > I understand the different compatibility check
> > > > > strategies.
> > > > > > > > > > > > > > If you just spell them out here, then as you say,
> > > just
> > > > > > > translate
> > > > > > > > > the
> > > > > > > > > > > > > > Protobuf Descriptor into an Avro schema and run
> the
> > > > Avro
> > > > > > > > > > > > > > compatibility validation, no?
> > > > > > > > > > > > > > I believe the answer is no, since you may want to
> > > > verify
> > > > > > > > > different
> > > > > > > > > > > > things
> > > > > > > > > > > > > > when it comes to Protobuf, which are different
> then
> > > > Avro.
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > 1.
> > > > > > > > > > > > > *ProtobufSchema is different from
> > ProtobufNativeSchema
> > > in
> > > > > that
> > > > > > > it
> > > > > > > > > uses
> > > > > > > > > > > > > avro-protobuf.*
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
> https://central.sonatype.com/artifact/org.apache.avro/avro-protobuf/1.11.1/overview
> > > > > > > > > > > > > *ProtobufNativeSchema needs a native compatibility
> > > check,
> > > > > but
> > > > > > > > > there is
> > > > > > > > > > > no
> > > > > > > > > > > > > official or third party implementation. So this PIP
> > > does
> > > > > not
> > > > > > > use
> > > > > > > > > > > > > avro-protobuf for protobuf compatibility checking.*
> > > > > > > > > > > > >
> > > > > > > > > > > > > 2. *By the way, this is implemented in much the
> same
> > > way
> > > > > that
> > > > > > > > > Apache
> > > > > > > > > > > avro
> > > > > > > > > > > > > does compatibility checking.*
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/SchemaValidatorBuilder.java
> > > > > > > > > > > > >
> > > > `canReadStrategy`,`canBeReadStrategy`,`mutualReadStrategy`
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/ValidateCanRead.java
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/ValidateCanBeRead.java
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/ValidateMutualRead.java
> > > > > > > > > > > > > *In `ValidateMutualRead.java`, the arguments of
> > > > > `canRead()` are
> > > > > > > > > > > > > writtenSchema and readSchema. We only need to
> change
> > > the
> > > > > order
> > > > > > > of
> > > > > > > > > > > > arguments
> > > > > > > > > > > > > we pass to `canRead()`.*
> > > > > > > > > > > > > ```java
> > > > > > > > > > > > > private void
> > > validateWithStrategy(Descriptors.Descriptor
> > > > > > > > > toValidate,
> > > > > > > > > > > > > Descriptors.Descriptor fromDescriptor) throws
> > > > > > > > > > > > ProtoBufCanReadCheckException
> > > > > > > > > > > > > {
> > > > > > > > > > > > > switch (strategy) {
> > > > > > > > > > > > > case CanReadExistingStrategy ->
> > canRead(fromDescriptor,
> > > > > > > > > toValidate);
> > > > > > > > > > > > > case CanBeReadByExistingStrategy ->
> > canRead(toValidate,
> > > > > > > > > > > fromDescriptor);
> > > > > > > > > > > > > case CanBeReadMutualStrategy -> {
> > > > > > > > > > > > > canRead(toValidate, fromDescriptor);
> > > > > > > > > > > > > canRead(fromDescriptor, toValidate);
> > > > > > > > > > > > > }
> > > > > > > > > > > > > }
> > > > > > > > > > > > > }
> > > > > > > > > > > > >
> > > > > > > > > > > > > private void canRead(Descriptors.Descriptor
> > > > writtenSchema,
> > > > > > > > > > > > > Descriptors.Descriptor readSchema) throws
> > > > > > > > > > > ProtoBufCanReadCheckException {
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
> ProtobufNativeSchemaBreakCheckUtils.checkSchemaCompatibility(writtenSchema,
> > > > > > > > > > > > > readSchema);
> > > > > > > > > > > > > }
> > > > > > > > > > > > > ```
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > I get that you want to take inspiration from the
> > existing
> > > > > Avro
> > > > > > > Schema
> > > > > > > > > > > > compatibility check, to do your code design.
> > > > > > > > > > > > I also understand you *won't* use any existing avro
> > code
> > > > for
> > > > > > > that.
> > > > > > > > > > > > I also understand, you have to write the validation
> > check
> > > > on
> > > > > your
> > > > > > > > > own,
> > > > > > > > > > > > since there is no 3rd party to explain that.
> > > > > > > > > > > >
> > > > > > > > > > > > The only thing I can't understand are the actual
> rules
> > > as I
> > > > > wrote
> > > > > > > > > before,
> > > > > > > > > > > > since they are written confusingly.
> > > > > > > > > > > > So, I repeat what I asked before:
> > > > > > > > > > > >
> > > > > > > > > > > > I think you should structure the validation rules
> > > > > differently:
> > > > > > > > > > > >
> > > > > > > > > > > > * Backward checks
> > > > > > > > > > > > ** List down rules, where use newSchema (the schema
> > used
> > > by
> > > > > > > producer
> > > > > > > > > or
> > > > > > > > > > > > consumer) and existingSchema (last schema used)
> > > > > > > > > > > > * Forward
> > > > > > > > > > > > ** List down rules, where use newSchema (the schema
> > used
> > > by
> > > > > > > producer
> > > > > > > > > or
> > > > > > > > > > > > consumer) and existingSchema (last schema used)
> > > > > > > > > > > >
> > > > > > > > > > > > Once that's accomplished I will be able to understand
> > the
> > > > > > > different
> > > > > > > > > > > > validation rules for each compatibility check.
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > sinan
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > Asaf Mesika <as...@gmail.com> 于2023年3月1日周三
> > > > 21:19写道:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Mon, Feb 27, 2023 at 3:47 PM SiNan Liu <
> > > > > > > > > liusinan1998@gmail.com>
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > I read it and they look identical. What's the
> > > > > difference
> > > > > > > > > between
> > > > > > > > > > > > > them?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Current avro,json, and protobuf schemas are all
> > > > > implemented
> > > > > > > > > based
> > > > > > > > > > > on
> > > > > > > > > > > > > AVRO.
> > > > > > > > > > > > > > > > What do you mean, they are all implemented
> > based
> > > on
> > > > > > > Avro? You
> > > > > > > > > > > mean
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > protobuf schema is converted into an Avro
> > Schema,
> > > > and
> > > > > > > then
> > > > > > > > > you
> > > > > > > > > > > use
> > > > > > > > > > > > > Avro
> > > > > > > > > > > > > > > > compatibility validation?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
> `org.apache.pulsar.broker.service.schema.ProtobufSchemaCompatibilityCheck`
> > > > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > >
> > > > >
> > `org.apache.pulsar.broker.service.schema.AvroSchemaCompatibilityCheck`
> > > > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > >
> > > > >
> > `org.apache.pulsar.broker.service.schema.JsonSchemaCompatibilityCheck`
> > > > > > > > > > > > > > > They all extends
> > > `AvroSchemaBasedCompatibilityCheck`,
> > > > > the
> > > > > > > > > > > > > > > `checkCompatible()` is the same implementation
> > with
> > > > > `AVRO`.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Can you please explain how a Protobuf Schema
> > > descriptor
> > > > > can
> > > > > > > be
> > > > > > > > > > > > validated
> > > > > > > > > > > > > > for backward compatibility check using Avro based
> > > > > > > compatibility
> > > > > > > > > > > rules?
> > > > > > > > > > > > > > Doesn't it expect the schema to be Avro, but it
> is
> > > > > actually a
> > > > > > > > > > > Protobuf
> > > > > > > > > > > > > > descriptor?
> > > > > > > > > > > > > > Is there some translation happening?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I think you should structure the validation
> rules
> > > > > > > differently:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > The Compatibility check strategy is described
> on
> > > the
> > > > > > > website
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
> https://pulsar.apache.org/docs/next/schema-understand/#schema-compatibility-check-strategy
> > > > > > > > > > > > > > > 1. BACKWARD(CanReadExistingStrategy): Consumers
> > > using
> > > > > > > schema
> > > > > > > > > V3 can
> > > > > > > > > > > > > process
> > > > > > > > > > > > > > > data written by producers using the last schema
> > > > version
> > > > > > > V2. So
> > > > > > > > > V2
> > > > > > > > > > > is
> > > > > > > > > > > > > > > "writtenSchema" and V3 is "readSchema".
> > > > > > > > > > > > > > > 2. FORWARD(CanBeReadByExistingStrategy):
> > Consumers
> > > > > using
> > > > > > > the
> > > > > > > > > last
> > > > > > > > > > > > > schema
> > > > > > > > > > > > > > > version V2 can process data written by
> producers
> > > > using
> > > > > a
> > > > > > > new
> > > > > > > > > schema
> > > > > > > > > > > > V3,
> > > > > > > > > > > > > > > even though they may not be able to use the
> full
> > > > > > > capabilities
> > > > > > > > > of
> > > > > > > > > > > the
> > > > > > > > > > > > > new
> > > > > > > > > > > > > > > schema. So V3 is "writtenSchema" and V2 is
> > > > > "readSchema".
> > > > > > > > > > > > > > > 3. FULL(CanBeReadMutualStrategy): Schemas are
> > both
> > > > > > > backward and
> > > > > > > > > > > > forward
> > > > > > > > > > > > > > > compatible.
> > > > > > > > > > > > > > > Schema can evolve. The old version schema and
> the
> > > new
> > > > > > > version
> > > > > > > > > > > schema
> > > > > > > > > > > > > should
> > > > > > > > > > > > > > > be well understood.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > I'm sorry - I don't understand.
> > > > > > > > > > > > > > I understand the different compatibility check
> > > > > strategies.
> > > > > > > > > > > > > > If you just spell them out here, then as you say,
> > > just
> > > > > > > translate
> > > > > > > > > the
> > > > > > > > > > > > > > Protobuf Descriptor into an Avro schema and run
> the
> > > > Avro
> > > > > > > > > > > > > > compatibility validation, no?
> > > > > > > > > > > > > > I believe the answer is no, since you may want to
> > > > verify
> > > > > > > > > different
> > > > > > > > > > > > things
> > > > > > > > > > > > > > when it comes to Protobuf, which are different
> then
> > > > Avro.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > At the current state, I can't understand your
> > design
> > > at
> > > > > all.
> > > > > > > > > Please
> > > > > > > > > > > > help
> > > > > > > > > > > > > > clarify that.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > So each strategy should have its own section.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > The arguments of `canRead()` are writtenSchema
> > and
> > > > > > > readSchema.
> > > > > > > > > As
> > > > > > > > > > > > we've
> > > > > > > > > > > > > > > just described, we only need to change the
> order
> > of
> > > > > > > arguments
> > > > > > > > > we
> > > > > > > > > > > pass
> > > > > > > > > > > > > to
> > > > > > > > > > > > > > > `canRead()`.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > sinan
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Asaf Mesika <as...@gmail.com>
> > 于2023年2月27日周一
> > > > > 20:49写道:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > And you can see the difference between
> > ProtoBuf
> > > > and
> > > > > > > > > > > > ProtoBufNative:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > >
> > > > >
> > https://pulsar.apache.org/docs/next/schema-get-started/#protobufnative
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > https://pulsar.apache.org/docs/next/schema-get-started/#protobuf
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >  I read it and they look identical. What's
> the
> > > > > difference
> > > > > > > > > between
> > > > > > > > > > > > > them?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Current avro,json, and protobuf schemas are
> all
> > > > > > > implemented
> > > > > > > > > based
> > > > > > > > > > > > on
> > > > > > > > > > > > > > > AVRO.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > What do you mean, they are all implemented
> > based
> > > on
> > > > > > > Avro? You
> > > > > > > > > > > mean
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > protobuf schema is converted into an Avro
> > Schema,
> > > > and
> > > > > > > then
> > > > > > > > > you
> > > > > > > > > > > use
> > > > > > > > > > > > > Avro
> > > > > > > > > > > > > > > > compatibility validation?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > *Here are the basic compatibility rules
> we've
> > > > > defined:*
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > I think you should structure the validation
> > rules
> > > > > > > > > differently:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > * Backward checks
> > > > > > > > > > > > > > > > ** List down rules, where use newSchema (the
> > > schema
> > > > > used
> > > > > > > by
> > > > > > > > > > > > producer
> > > > > > > > > > > > > or
> > > > > > > > > > > > > > > > consumer) and existingSchema (last schema
> used)
> > > > > > > > > > > > > > > > * Forward
> > > > > > > > > > > > > > > > ** List down rules, where use newSchema (the
> > > schema
> > > > > used
> > > > > > > by
> > > > > > > > > > > > producer
> > > > > > > > > > > > > or
> > > > > > > > > > > > > > > > consumer) and existingSchema (last schema
> used)
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > So each strategy should have its own section.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > I'm saying this since you used
> "writttenSchema"
> > > > word
> > > > > but
> > > > > > > it
> > > > > > > > > > > > > represents
> > > > > > > > > > > > > > > > something completely different if it's
> backward
> > > or
> > > > > > > forward
> > > > > > > > > check.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Once you'll have that structure like that, I
> > > > > personally
> > > > > > > will
> > > > > > > > > be
> > > > > > > > > > > > able
> > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > read and understand it.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > The motivation and problem statement are now
> > > good -
> > > > > > > thanks
> > > > > > > > > for
> > > > > > > > > > > > > improving
> > > > > > > > > > > > > > > > it.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On Mon, Feb 27, 2023 at 8:20 AM SiNan Liu <
> > > > > > > > > > > liusinan1998@gmail.com>
> > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Hi! I updated the PIP issue again. This
> time
> > > I've
> > > > > added
> > > > > > > > > some
> > > > > > > > > > > > > background
> > > > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > some explanations.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > The compatibility check rules are already
> > > written
> > > > > in
> > > > > > > the
> > > > > > > > > > > > > > > Implementation.
> > > > > > > > > > > > > > > > > ProtoBufNative implements the same canRead
> > > method
> > > > > as
> > > > > > > Apache
> > > > > > > > > > > Avro.
> > > > > > > > > > > > > > > > > It does this by checking whether the schema
> > for
> > > > > > > writing and
> > > > > > > > > > > > reading
> > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > compatible. I also indicate whether the
> > > > > writtenSchema
> > > > > > > and
> > > > > > > > > > > > > readSchema of
> > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > Backward, Forward, and Full strategies are
> > the
> > > > old
> > > > > or
> > > > > > > the
> > > > > > > > > new
> > > > > > > > > > > > > version
> > > > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > > the schema.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > > > sinan
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Asaf Mesika <as...@gmail.com>
> > > > 于2023年2月26日周日
> > > > > > > 23:24写道:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > I'm sorry, but this PIP lacks a lot of
> > > > background
> > > > > > > > > knowledge,
> > > > > > > > > > > so
> > > > > > > > > > > > > you
> > > > > > > > > > > > > > > > need
> > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > add IMO for people to understand it. You
> > > don't
> > > > > need
> > > > > > > to
> > > > > > > > > > > explain
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > entire
> > > > > > > > > > > > > > > > > > pulsar in this PIP, but at the very
> least a
> > > few
> > > > > > > > > paragraphs
> > > > > > > > > > > > > detailing
> > > > > > > > > > > > > > > > all
> > > > > > > > > > > > > > > > > > you need to know, to put you in context:
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >    - Start by saying Pulsar as a built-in
> > > > schema
> > > > > > > registry
> > > > > > > > > > > > inside
> > > > > > > > > > > > > > > Pulsar
> > > > > > > > > > > > > > > > > >    broker.
> > > > > > > > > > > > > > > > > >       - Every time the client updates the
> > > > > schema, it
> > > > > > > > > uploads
> > > > > > > > > > > it
> > > > > > > > > > > > > to
> > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > >       broker. When that happens, it has a
> > > > feature
> > > > > > > which
> > > > > > > > > > > > validates
> > > > > > > > > > > > > if
> > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > new
> > > > > > > > > > > > > > > > > >       schema version is compatible with
> the
> > > > > previous
> > > > > > > > > > > versions.
> > > > > > > > > > > > > There
> > > > > > > > > > > > > > > > > > are 4 types
> > > > > > > > > > > > > > > > > >       of compatibility: Full, ...
> (complete
> > > and
> > > > > > > explain
> > > > > > > > > each
> > > > > > > > > > > > one
> > > > > > > > > > > > > > > > briefly)
> > > > > > > > > > > > > > > > > >    - Also explain Pulsar Schema registry
> > > > supports
> > > > > > > various
> > > > > > > > > > > > schema
> > > > > > > > > > > > > > > > > >    protocols:  Avro, protobuf native, ...
> > > > > (complete
> > > > > > > the
> > > > > > > > > > > rest),
> > > > > > > > > > > > > each
> > > > > > > > > > > > > > > > > > protocol
> > > > > > > > > > > > > > > > > >    has a schema which dictates how to
> > > serialize
> > > > > and
> > > > > > > > > > > deserialize
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > message
> > > > > > > > > > > > > > > > > >    content into typed object.
> > > > > > > > > > > > > > > > > >    - Explain in short what is protobuf
> > native
> > > > > > > (compare
> > > > > > > > > > > protobuf
> > > > > > > > > > > > > > > > > non-native)
> > > > > > > > > > > > > > > > > >    - Please don't paste code instead of
> > > > > explaining.
> > > > > > > > > > > > > > > > > >       - Explain that protobuf native
> > current
> > > > > > > validation
> > > > > > > > > check
> > > > > > > > > > > > is
> > > > > > > > > > > > > only
> > > > > > > > > > > > > > > > > >       composed of checking the root
> message
> > > > name
> > > > > is
> > > > > > > the
> > > > > > > > > same
> > > > > > > > > > > > > between
> > > > > > > > > > > > > > > > > > the current
> > > > > > > > > > > > > > > > > >       schema version and the new version.
> > > > > > > > > > > > > > > > > >          - Explain briefly what is a root
> > > > message
> > > > > > > and its
> > > > > > > > > > > name.
> > > > > > > > > > > > > > > > > >       - Explain the problem (list
> > scenarios)
> > > > > that we
> > > > > > > have
> > > > > > > > > > > > because
> > > > > > > > > > > > > > > > > protobuf
> > > > > > > > > > > > > > > > > >       native schema only supports FULL
> > > > > compatibility
> > > > > > > > > > > > validation.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Regarding high level design - as in what
> > you
> > > > > plan to
> > > > > > > do.
> > > > > > > > > > > > > > > > > > I suggest you add "High Level Design" and
> > in
> > > it
> > > > > > > detail
> > > > > > > > > how
> > > > > > > > > > > you
> > > > > > > > > > > > > plan
> > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > validate, per protobuf version, per
> > > > compatibility
> > > > > > > check
> > > > > > > > > > > > > (backward,
> > > > > > > > > > > > > > > > > forward,
> > > > > > > > > > > > > > > > > > full,...).
> > > > > > > > > > > > > > > > > > I tried reading the implementation - for
> > me ,
> > > > > it's
> > > > > > > all
> > > > > > > > > over
> > > > > > > > > > > the
> > > > > > > > > > > > > > > place.
> > > > > > > > > > > > > > > > > Can
> > > > > > > > > > > > > > > > > > you please list in order what I wrote
> > above,
> > > > and
> > > > > > > list the
> > > > > > > > > > > > > validation
> > > > > > > > > > > > > > > > > rules
> > > > > > > > > > > > > > > > > > with a good explanation why you validate
> it
> > > > like
> > > > > > > that?
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Lastly, one you have all the validation
> > rules
> > > > > clearly
> > > > > > > > > stated,
> > > > > > > > > > > > you
> > > > > > > > > > > > > can
> > > > > > > > > > > > > > > > use
> > > > > > > > > > > > > > > > > > it to document it properly so users can
> > know
> > > > what
> > > > > > > > > validation
> > > > > > > > > > > to
> > > > > > > > > > > > > > > expect.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Asaf
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > On Wed, Feb 22, 2023 at 5:10 PM SiNan
> Liu <
> > > > > > > > > > > > > liusinan1998@gmail.com>
> > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Sorry, my mistake. I removed the code
> and
> > > > > > > described the
> > > > > > > > > > > > design
> > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > improve
> > > > > > > > > > > > > > > > > > > the PROTOBUF_NATIVE schema
> compatibility
> > > > > checks.
> > > > > > > You
> > > > > > > > > can
> > > > > > > > > > > > have a
> > > > > > > > > > > > > > > look.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Asaf Mesika <as...@gmail.com>
> > > > > 于2023年2月22日周三
> > > > > > > > > 21:16写道:
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > I read it but you're almost directly
> > > diving
> > > > > into
> > > > > > > the
> > > > > > > > > > > code -
> > > > > > > > > > > > > it
> > > > > > > > > > > > > > > will
> > > > > > > > > > > > > > > > > > take
> > > > > > > > > > > > > > > > > > > me
> > > > > > > > > > > > > > > > > > > > hours just to reverse engineer your
> > > design.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Can you please include a "High Level
> > > > Design"
> > > > > > > section
> > > > > > > > > in
> > > > > > > > > > > > which
> > > > > > > > > > > > > you
> > > > > > > > > > > > > > > > > > explain
> > > > > > > > > > > > > > > > > > > > how you plan to tackle any issue?
> > > > > > > > > > > > > > > > > > > > If I can read that section and
> explain
> > to
> > > > > someone
> > > > > > > > > else
> > > > > > > > > > > how
> > > > > > > > > > > > > this
> > > > > > > > > > > > > > > > will
> > > > > > > > > > > > > > > > > > > work,
> > > > > > > > > > > > > > > > > > > > it means the section is complete.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Let's leave the code to the PRs.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > On Sun, Feb 19, 2023 at 2:59 PM SiNan
> > > Liu <
> > > > > > > > > > > > > > > liusinan1998@gmail.com>
> > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Hi all,
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > I made a PIP to discuss:
> > > > > > > > > > > > > > > > > >
> > > https://github.com/apache/pulsar/issues/19565
> > > > > > > > > > > > > > > > > > > .
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > We can talk about the current
> design
> > > > here.
> > > > > > > > > Especially
> > > > > > > > > > > for
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > field
> > > > > > > > > > > > > > > > > > > type
> > > > > > > > > > > > > > > > > > > > > change check rules, please give
> your
> > > > > valuable
> > > > > > > > > advice.
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > > > > > > > Sinan
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] PIP-246: Improved PROTOBUF_NATIVE schema compatibility checks without using avro-protobuf

Posted by SiNan Liu <li...@gmail.com>.
Hi, all.

Are there any other additions and modifications to this PIP?

Thanks,
sinan


SiNan Liu <li...@gmail.com> 于2023年3月26日周日 14:25写道:

> *Hi, Asaf! This week I in the research about `SchemaRegistryServiceImpl`
> optimization, sorry to reply later.*
>
> 1.
>
>> Well, this is exactly the argument we had in this thread. This is why you
>> make 2 implementation classes ( I called them v1, v2 if you remember). The
>> user can choose between them. This is exactly while I liked the _v1 _v2
>> naming, so they will know they are not backward compatible.
>> Also, you're talking about improving right? Checking the message name is
>> outright wrong, so you're actually fixing a big mistake someone made back
>> then.
>
>
> This is just a very simple check the root message name of the two proto,
> does it really matter?
> This PIP is a more stringent check, so the previous implementation rules
> for checking the root message are also needed.
> This is a very small problem, and I don't think it's necessary to go into
> such a discussion.
>
>
>
> 2.
>
>> Your link to Stackoverflow will fail *my* validation: Since `int foo = 1`
>> --> `int foo = 2;` : same name, same type, different field number -->
>> fail
>> validation.
>> So it doesn't prove anything, or help this argument.
>
>
> If I have in readSchema
>> int32 justAnID = 1
>> int32 customerId = 2
>> and in writeSchema I have
>> int32 justAnID = 1
>> string customerId = 3
>> This is valid.
>> You will fail this validation check, since customerId has different field
>> numbers (2 and 3), but I decided to remove field number 2 and add a new
>> field with the same name, different type, with field number 3. When I
>> read,
>> field number 2 will get default value of 0, and ignore field number 3.
>> Valid.
>> If you compare by field type, then `int32 customerId = 2` to `int32
>> customerId = 3`, would fail, since that looks super strange right? Why
>> remove and add the same field exactly?
>
>
> If there exists a field named customerId with number 2 in readSchema and
> there also exists a field named customerId with number 3 in writeSchema,
> then the two schemas are incompatible.
> Because when reading data encoded with writeSchema, the reader will try to
> get the customerId with number 2, but it's actually number 3, which will
> cause the read to fail.
>
> > > (4) The writtenSchema can not change the field name of any field in
>> > > > readSchema (the field number is the same, but the field name is
>> > > > different).
>> > > This is incorrect.
>> > > Fields names are encoded into the wire. I don't see this in any best
>> > > practice.
>
>
> If the writer and reader use the same field number, Protobuf will
> correctly serialize and deserialize the data, even if the field names are
> different, because serialization and deserialization use the field number
> to determine the field.
> *So you're right. We can rename it.*
>
> *Name renaming of the same number is compatible. *
> *However, the numbering of the fields is different, which will result in a
> mismatch between the field number when the message is written and the field
> number when the message is read, *
> *resulting in the data not being parsed correctly when the message is
> read.*
>
>
> 3.
>
>> > > > - The writtenSchema cannot add required fields, but optional or
>> > > > duplicate fields can be added (The field number must be new).
>> > > >
>> > > > That's not true.
>> > > You can have a required field in writeSchema, and not have that
>> field in
>> > > readSchema (based on tag number).
>> >
>> >
>> > The required field must exist. In your case, where does readSchema go
>> to
>> > read its required fields? It's not in writtenSchema.
>> > The second and third sentences in the official website say, add and
>> delete
>> > do not operate required field!
>
>
> Just like on the official website about why not add the required field.
> You can check it out.
>
> https://protobuf.dev/programming-guides/dos-donts/#dont-add-a-required-field
>
> Required field to be so harmful that it's being removed.
>
>
> 4.
>
>> > > So if my write schema is
>> > > message SearchRequest { required string query = 1; optional int32
>> > > page_number = 2; optional int32 result_per_page = 3;}
>> > > and my read schema is
>> > > message SearchRequest { optional int32 page_number = 2; optional
>> > > int32 result_per_page = 3;}
>> > > You can see I removed a field from writeSchema that does not have a
>> > default
>> > > value.
>> > > First, I can read it without any problems: Field 2 might exists, no
>> > > problem. Field 3 might exists, no problem. I ignore field 1.
>> > > So the validation you wrote is no correct at all, without any regard
>> to
>> > > optional value.
>> >
>> >
>> > *It looks like you've added a required field here, which is not
>> allowed (in
>> > 3 i am said that).*
>> > *It also uses the number 1 of the deleted field.*
>> >
>> > I removed the required field "required string query = 1" from
>> writeSchema
>> compared with readSchema.
>> Of course you can say I added "required string query = 1" to writeSchema
>> compared with readSchema.
>> The main question: can I use readSchema to read message written by
>> writeSchema?
>> Yes I can.
>> I wrote why above.
>> I didn't use the number of 1 of deleted field. The read schema is using
>> number 2 and 3.
>
>
> This is a problem with 3 and does not allow the required field to be added.
>
>
> 5.
>
>> > What if I do the other way around?
>> > > My write schema is:
>> > > message SearchRequest { optional int32 page_number = 2; optional
>> > > int32 result_per_page = 3;}
>> > > My read schema is:
>> > > message SearchRequest { required string query = 1; optional int32
>> > > page_number = 2; optional int32 result_per_page = 3;}
>> > > Here I will fail, since it is required I will provide value for
>> field 1
>> > but
>> > > alas it will never be there.
>> >
>> >
>> > Yes, the required field has been removed, which is incompatible.
>> > The check for changes to the required field starts at the beginning, at
>> > which point an incompatible exception has been thrown and the following
>> > compatibility check is not performed.
>> >
>> You can't say "The required field " - it's plain wrong.
>> Look at your rule:
>> The writtenSchema cannot remove required fields in the readSchema.
>> You can say: the writeSchema has removed a required field compared with
>> readSchema.
>> This rule will fail the validation.
>> *not* the default rule we're discussing.
>
>
> Here we are talking about changing the default value. But in this case the
> writtenSchema removes the required field so it is not compatible. In
> implementation, the rules check are in order. Changes to the required field
> are checked at the beginning. So it is not necessary to discuss changes to
> the required field until the default value is changed.
>
>
> 6.
>
>> > *It looks like sixth item in PIP should be removed.*
>> > *And Rule 7 in PIP should be removed:*
>> >
>> > Delete rule 6.
>> I would keep rule 7. See:
>>
>> https://protobuf.dev/programming-guides/dos-donts/#dont-change-the-default-value-of-a-field
>> It sounds like a good best practice. We can omit it if we don't want to be
>> overly strict.
>
>
> I think we should really keep rule 7(In PIP issue, it is `(5)` now).
> Follow the instructions on the website.
> ```
> (7) There can not be a field which exists both in `readSchema` and
> `writtenSchema`, with same field number, having different default values.
> **This rule applies to proto2. But in proto3, the default value cannot be
> set by yourself! If you use proto3, it will not check for this rule**
> ```
> By the way, changes to the required field are not allowed in `3` and
> should be followed as well.
>
>
> 7.
>
>> > > Changing a default value is generally OK, as long as you remember
>> that
>> > > default values are never sent over the wire.
>> > > Thus, if a program receives a message in which a particular field
>> isn’t
>> > > set, the program will see the default value as it was defined in that
>> > > program’s version of the protocol.
>> > > It will NOT see the default value that was defined in the sender’s
>> code.
>> >
>> > This is in the official document 11:
>> > https://protobuf.dev/programming-guides/proto2/#updating
>> >
>> >
>> >
>> So what do you wish to state with this quote?
>
>
> Oh, rule 7 has been retained. Let's ignore this.
>
>
>
> *I also took into account that proto2 was a really problematic version.
> And the pulsar website seems to say that only proto3 is used.*
> https://pulsar.apache.org/docs/next/schema-understand/#struct-schema
> *```*
> *ProtobufNativeSchema is based on protobuf native descriptor.*
> *This allows Pulsar to:*
> *- use native protobuf-v3 to serialize or deserialize data.*
> *- use AutoConsume to deserialize data.*
> *```*
> *In my opinion, there are some differences between proto2 and proto3 check
> rules, and this PIP also supports proto2. So for this PIP, we may need to
> change the description of "ProtoBufNative" on the official website as well.*
>
>
>
> Thanks,
> sinan
>
>
>
>
>
> Asaf Mesika <as...@gmail.com> 于2023年3月20日周一 17:46写道:
>
>> On Sun, Mar 19, 2023 at 4:47 PM SiNan Liu <li...@gmail.com> wrote:
>>
>> > 1.
>> >
>> > > message SearchReq { string query = 1; int32 page_number = 2; int32
>> > > result_per_page = 3;}
>> > > Then second version I use:
>> > > message SearchRequest { string query = 1; int32 page_number = 2;
>> > > int32 result_per_page = 3;}
>> >
>> >
>> > The rule in PIP improve the previous implementation, so the previous
>> > implementation needs to be added.
>> > If the user switches from the previous implementation to the current
>> > implementation of PIP, and the current implementation doesn't check for
>> > root message name changes, isn't that changing the old behavior? This
>> PIP
>> > is to make compatibility checking more stringent.
>> >
>> >
>> Well, this is exactly the argument we had in this thread. This is why you
>> make 2 implementation classes ( I called them v1, v2 if you remember). The
>> user can choose between them. This is exactly while I liked the _v1 _v2
>> naming, so they will know they are not backward compatible.
>>
>> Also, you're talking about improving right? Checking the message name is
>> outright wrong, so you're actually fixing a big mistake someone made back
>> then.
>>
>>
>> >
>> > 2.
>> >
>> > > This is a guideline for a human making the change, not for software
>> > > validation right?
>> > > When you write down the code for doing the comparison to know if the
>> > field
>> > > number has changed, you have to take into account both the field name
>> AND
>> > > field type. If both are equal but the field number is different then
>> fail
>> > > it.
>> >
>> >
>> > I use the field name here to match, the name is the same, but the
>> number is
>> > not the same will be incompatible.
>> >
>> >
>> https://github.com/apache/pulsar/pull/19566/files#diff-14b840259375cc8fcae564586fc1c2a188cb71822ab5995ed6807f7dab30c210R129-R132
>> > PIP description here is the rule, but how to implement it is a matter in
>> > PR. I have considered the example you gave here, which is also
>> incompatible
>> > in this case.
>> >
>> > Your changes are also described here:
>> >
>> >
>> https://stackoverflow.com/questions/65230623/safeness-of-changing-proto-field-number
>> >
>> >
>> This sentence "the PIP description here is the rule, but how to
>> implemented
>> it is a matter in PR" is a completely invalid argument.
>> For compatibility rules you specify the exact validation.
>> We argue here about the rule.
>> You can say in the PIP you will match by name and then in PR match by name
>> and type. You have to be exact in the PIP.
>>
>> Regarding "I have considered the example you gave here, which is also
>> incompatible" - please explain? I find it compatible - meaning you can
>> read
>> using the readSchema message written using writeSchema.
>>
>> Your link to Stackoverflow will fail *my* validation: Since `int foo = 1`
>> --> `int foo = 2;` : same name, same type, different field number --> fail
>> validation.
>> So it doesn't prove anything, or help this argument.
>>
>>
>>
>>
>>
>> >
>> > 3.
>> >
>> > > If you'll read sentence 2 from the guide, you'll see they write
>> > > "This means that any messages serialized by code using your “old”
>> message
>> > > format can be parsed by your new generated code, as they won’t be
>> missing
>> > > any required elements."
>> > > This is for the *other side*: when you add a required to readSchema
>> > > compare
>> > > to write schema. This of course is not allowed and won't work.
>> >
>> >
>> > Note that in proto2: **Any new fields that you add should be optional or
>> > repeated.**
>> > https://protobuf.dev/programming-guides/proto2/#updating
>> >
>> > I also stated in PIP that proto3 will not check this rule for required
>> > field changes.
>> > This rule applies to proto2. However, proto3 removes required. If you
>> use
>> > proto3, it will not check for changes to the required field
>> >
>> > I'm talking about proto2.
>>
>> Listen.
>>
>> This is what you wrote in the PIP, ok?
>>
>>
>> >    - The writtenSchema cannot add required fields, but optional or
>> >    duplicate fields can be added (The field number must be new).
>> >
>> > I answered:
>>
>> That's not true.
>> > You can have a required field in writeSchema, and not have that field in
>> > readSchema (based on tag number).
>>
>> I also gave you an example showing what you wrote is not true (see
>> previous
>> mails), in which I add a required field to writeSchema compared with
>> readSchema and I can still read it with readSchema.
>>
>> You mention a link to a guideline saying you can only add optional or
>> repeated without reading the rest of the paragraph and without
>> understanding the protocol buffer encoding and decoding.
>>
>> I'm not sure how to move forward here.
>> This validation rule is invalid.
>>
>>
>>
>>
>> >
>> > 4.
>> >
>> > > My write schema is
>> > > message SearchRequest { string query = 1; int32 page_number = 2;
>> > > int32 result_per_page = 3;}
>> > > my read schema is
>> > > message SearchRequest { string query = 1; int32 page_number = 2;
>> > > int32 resultPerPage = 3;}
>> > > - Non-required fields can be removed, as long as the field number is
>> not
>> > > used again in your updated message type. You may want to rename the
>> field
>> > > instead, perhaps adding the prefix “OBSOLETE_”, or make the field
>> number
>> > > reserved <https://protobuf.dev/programming-guides/proto2/#reserved>,
>> so
>> > > that future users of your .proto can’t accidentally reuse the number.
>> > > First, they are talking about removal of fields. They suggest you
>> rename
>> > > the field to OBSOLETE_myField instead of actually deleting it, to
>> make it
>> > > easier on compatability. They don't talk about rules that forbid you
>> to
>> > > rename a field. It's unrelated.
>> >
>> >
>> > Maybe you are right that changing the field name is compatible.
>> >
>> >
>> https://stackoverflow.com/questions/45431685/protocol-buffer-does-changing-field-name-break-the-message/45431953#comment84548234_45431953
>> >
>> > But it is interesting I tested the use `avro-proto` implementation
>> > `ProtobufSchemaCompatibilityCheck` compatibility check.(
>> >
>> `org.apache.pulsar.broker.service.schema.BaseAvroSchemaCompatibilityTest`)
>> >
>> > private static final String schemaJson1 =
>> >
>> >
>> "{\"type\":\"record\",\"name\":\"DefaultTest\",\"namespace\":\"org.apache.pulsar.broker.service.schema"
>> > +
>> > ".AvroSchemaCompatibilityCheckTest\",\"fields\":[
>> > {\"name\":\"field1\",\"type\":\"string\"}]}";
>> >
>> > private static final String schemaJson4 =
>> >
>> >
>> "{\"type\":\"record\",\"name\":\"DefaultTest\",\"namespace\":\"org.apache.pulsar.broker.service.schema"
>> > +
>> >
>> >
>> ".AvroSchemaCompatibilityCheckTest\",\"fields\":[{\"name\":\"field1_v2\",\"type\":\"string\","
>> > +
>> > "\"aliases\":[\"field1\"]}]}";
>> >
>> > schemaCompatibilityCheck.isCompatible(schemaData1, schemaData4,
>> > SchemaCompatibilityStrategy.BACKWARD);
>> > This is compatible.
>> >
>> > But that's not compatible
>> > schemaCompatibilityCheck.isCompatible(schemaData1, schemaData4,
>> > SchemaCompatibilityStrategy.FORWARD);
>> >
>> > So, what do you think of AVRO implementation
>> > `ProtobufSchemaCompatibilityCheck`, rules about renaming are wrong is
>> it?
>> > It looks something like this.
>> >
>> >
>> You're asking why Avro doesn't allow when read schema is schema 4, and
>> write schema is 1 and the difference between them is a single field
>> rename?
>> I don't know - need to debug to understand.
>> I do know that Avro encoding is completely different. They don't have a
>> field identifier like protobuf does. They completely rely on the schema of
>> the writer and the reader.
>> But it is besides the point, since they are supposed to allow field rename
>> from my knowledge.
>> You can debug and find out.
>>
>>
>>
>> >
>> > 5.
>> >
>> > > I think my previous explanation in this email reply should be good
>> enough
>> > > to explain why the name is irrelevant.
>> > > If not, please ping me on this.
>> > > Regarding your example, can you quote it? I don't understand the read
>> and
>> > > write schema here.
>> > > Also, regarding message names and why they are irrelevant. Same thing:
>> > Read
>> > > https://protobuf.dev/programming-guides/encoding/
>> > > In there you will see the names are not encoded thus can be changed
>> > freely
>> > > and still successfully decode binary data.
>> >
>> >
>> > Reader.proto
>> > ```protobuf
>> > syntax = "proto2";
>> > package proto.reader;
>> > import "ExternalReader.proto";
>> > option java_package = "org.apache.pulsar.client.schema.proto.reader";
>> > option java_outer_classname = "Reader";
>> >
>> > enum WeekEnum {
>> > ......
>> > }
>> >
>> > message ProtobufMessage {
>> > required string protobufFoo = 1;
>> > required double protobufBar = 2;
>> > }
>> >
>> > message ProtobufSchema {
>> > ......
>> > optional ProtobufMessage messageField = 12;
>> > ......
>> > optional WeekEnum enumField = 15;
>> > }
>> > ```
>> >
>> > WriterWithTypeNameChange.proto:
>> > ```protobuf
>> > syntax = "proto2";
>> > package proto.writerWithTypeNameChange;
>> > import "ExternalReader.proto";
>> > option java_package =
>> > "org.apache.pulsar.client.schema.proto.writerWithTypeNameChange";
>> > option java_outer_classname = "WriterWithTypeNameChange";
>> >
>> > enum WeekEnum {
>> > ......
>> > }
>> >
>> > message ProtobufMessage_V2 {
>> > required string protobufFoo = 1;
>> > required double protobufBar = 2;
>> > }
>> >
>> > message ProtobufSchema {
>> > ......
>> > optional ProtobufMessage_V2 messageField = 12;
>> > ......
>> > optional WeekEnum enumField = 15;
>> > ......
>> > }
>> > ```
>> >
>> > *Doesn't that change the field type? Should that be compatible?*
>> >
>> >
>> It doesn't change the field type since you haven't changed the field types
>> of the fields inside ProtobufMessage - they are still field 1 and field 2
>> double.
>> It is compatible since the message type name is not encoded in the binary.
>>
>>
>> >
>> > 6.
>> >
>> > > So if my write schema is
>> > > message SearchRequest { required string query = 1; optional int32
>> > > page_number = 2; optional int32 result_per_page = 3;}
>> > > and my read schema is
>> > > message SearchRequest { optional int32 page_number = 2; optional
>> > > int32 result_per_page = 3;}
>> > > You can see I removed a field from writeSchema that does not have a
>> > default
>> > > value.
>> > > First, I can read it without any problems: Field 2 might exists, no
>> > > problem. Field 3 might exists, no problem. I ignore field 1.
>> > > So the validation you wrote is no correct at all, without any regard
>> to
>> > > optional value.
>> >
>> >
>> > *It looks like you've added a required field here, which is not allowed
>> (in
>> > 3 i am said that).*
>> > *It also uses the number 1 of the deleted field.*
>> >
>> > I removed the required field "required string query = 1" from
>> writeSchema
>> compared with readSchema.
>> Of course you can say I added "required string query = 1" to writeSchema
>> compared with readSchema.
>> The main question: can I use readSchema to read message written by
>> writeSchema?
>> Yes I can.
>> I wrote why above.
>> I didn't use the number of 1 of deleted field. The read schema is using
>> number 2 and 3.
>>
>>
>>
>>
>>
>> > What if I do the other way around?
>> > > My write schema is:
>> > > message SearchRequest { optional int32 page_number = 2; optional
>> > > int32 result_per_page = 3;}
>> > > My read schema is:
>> > > message SearchRequest { required string query = 1; optional int32
>> > > page_number = 2; optional int32 result_per_page = 3;}
>> > > Here I will fail, since it is required I will provide value for field
>> 1
>> > but
>> > > alas it will never be there.
>> >
>> >
>> > Yes, the required field has been removed, which is incompatible.
>> > The check for changes to the required field starts at the beginning, at
>> > which point an incompatible exception has been thrown and the following
>> > compatibility check is not performed.
>> >
>>
>> You can't say "The required field " - it's plain wrong.
>>
>> Look at your rule:
>>
>> The writtenSchema cannot remove required fields in the readSchema.
>>
>>
>> You can say: the writeSchema has removed a required field compared with
>> readSchema.
>> This rule will fail the validation.
>> *not* the default rule we're discussing.
>>
>>
>>
>>
>> >
>> > Now let's focus on the optional field, ok?
>> > > Write schema
>> > > message SearchRequest { optional int32 page_number = 2; optional
>> > > int32 result_per_page = 3;}
>> > > Read schema:
>> > > message SearchRequest { optional string query = 1; optional int32
>> > > page_number = 2; optional int32 result_per_page = 3;}
>> > > When I read, I won't find field number 1 of course, so I will use the
>> > > default value dictated by the type, as I quoted above, which is for
>> > string
>> > > is the empty byte string.
>> >
>> >
>> > *It looks like sixth item in PIP should be removed.*
>> > *And Rule 7 in PIP should be removed:*
>> >
>> > Delete rule 6.
>> I would keep rule 7. See:
>>
>> https://protobuf.dev/programming-guides/dos-donts/#dont-change-the-default-value-of-a-field
>> It sounds like a good best practice. We can omit it if we don't want to be
>> overly strict.
>>
>>
>>
>> > > Changing a default value is generally OK, as long as you remember that
>> > > default values are never sent over the wire.
>> > > Thus, if a program receives a message in which a particular field
>> isn’t
>> > > set, the program will see the default value as it was defined in that
>> > > program’s version of the protocol.
>> > > It will NOT see the default value that was defined in the sender’s
>> code.
>> >
>> > This is in the official document 11:
>> > https://protobuf.dev/programming-guides/proto2/#updating
>> >
>> >
>> >
>> So what do you wish to state with this quote?
>>
>>
>> > 7.
>> >
>> > > Read this and then please explain why rule number 3 should not be
>> added:
>> > >
>> > >
>> >
>> https://protobuf.dev/programming-guides/dos-donts/#dont-go-from-repeated-to-scalar
>> >
>> >
>> > It looks like this rule needs to be added.
>> >
>> > Thanks
>>
>>
>> >
>> >
>> > Thanks,
>> > sinan
>> >
>> >
>> >
>> > Asaf Mesika <as...@gmail.com> 于2023年3月19日周日 19:11写道:
>> >
>> > > Also:
>> > >
>> > > (2) PROTOBUF_NATIVE was designed so that not use avro-protobuf for
>> > protobuf
>> > > > schema compatibility checking.
>> > >
>> > > The root message name is the class name we pass in when we create the
>> > > > producer or consumer. ProtoBuf has many nested messages or
>> > dependencies.
>> > > > The current implementation only checks if the passed class name is
>> the
>> > > > same. It does not check if the fields in the file change in a way
>> that
>> > is
>> > > > compatible with older versions of the schema.
>> > >
>> > >
>> > > You missed the most important thing there: PROTOBUF_NATIVE uses
>> Protobuf
>> > > Descriptor when persisting the schema. It's not using Avro Schema
>> > > definition. As I wrote you in previous email:
>> > >
>> > > PROTOBUF_NATIVE was created to fix that shortcoming, by actually
>> > persisting
>> > > > the Protobuf Descriptor and using Protobuf for encoding.
>> > > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > > On Wed, Mar 15, 2023 at 5:46 PM SiNan Liu <li...@gmail.com>
>> > wrote:
>> > >
>> > > > 1.
>> > > >
>> > > > > > Why? The root message name is not written over the wire to the
>> best
>> > > of
>> > > > my
>> > > > > > knowledge. I haven't found it written in the official doc.
>> > > >
>> > > >
>> > > > The name of the root message check is the rules in the previous
>> > > > `ProtobufNativeSchemaCompatibilityCheck`. Because if the root
>> message
>> > > has a
>> > > > different name, there is no need to check its contents. "Same"
>> schema,
>> > > > their names must be the same.
>> > > >
>> > > >
>> > > > 2.
>> > > >
>> > > > > >The writtenSchema can not change the field number of any field in
>> > > > > readSchema (the > field name is the same, but the field number is
>> > > > > different).
>> > > > > >You have to take into account field type as well when comparing.
>> > > >
>> > > >
>> > > > The first sentence on the website says that the number of fields
>> cannot
>> > > be
>> > > > changed.
>> > > >
>> > > > > Don’t change the field numbers for any existing fields.
>> > > >
>> > > >
>> > > >
>> > > > 3.
>> > > >
>> > > > > > - The writtenSchema cannot add required fields, but optional or
>> > > > > > duplicate fields can be added (The field number must be new).
>> > > > > >
>> > > > > > That's not true.
>> > > > > You can have a required field in writeSchema, and not have that
>> field
>> > > in
>> > > > > readSchema (based on tag number).
>> > > >
>> > > >
>> > > > The required field must exist. In your case, where does readSchema
>> go
>> > to
>> > > > read its required fields? It's not in writtenSchema.
>> > > > The second and third sentences in the official website say, add and
>> > > delete
>> > > > do not operate required field!
>> > > >
>> > > >
>> > > > 4.
>> > > >
>> > > > > (4) The writtenSchema can not change the field name of any field
>> in
>> > > > > > readSchema (the field number is the same, but the field name is
>> > > > > > different).
>> > > > > This is incorrect.
>> > > > > Fields names are encoded into the wire. I don't see this in any
>> best
>> > > > > practice.
>> > > >
>> > > >
>> > > > The third sentence on the website:
>> > > >
>> > > > > You may want to rename the field instead, perhaps adding the
>> prefix
>> > > > > “OBSOLETE_”, or make the field number reserved, so that future
>> users
>> > of
>> > > > > your .proto can’t accidentally reuse the number.
>> > > >
>> > > > If you want to rename a field, or add a new field. To delete with
>> the
>> > new
>> > > > number!
>> > > >
>> > > >
>> > > > 5.
>> > > >
>> > > > > The writtenSchema does not change the field name and number, but
>> it
>> > > does
>> > > > > change the field type.
>> > > > > > Small correction: for the same field number you are not allowed
>> to
>> > > > change
>> > > > > types. Name is irrelevant.
>> > > >
>> > > >
>> > > > Why doesn't the name irrelevant?
>> > > > Here is the change in type, which is the rule stated in
>> Alternatives.
>> > > There
>> > > > is no check here, just a warning to the user.
>> > > > Another change is that the name of enum is changed, or the name of
>> > > MESSAGE
>> > > > is changed, which is the same as the root message name check in 1,
>> is
>> > the
>> > > > change still the same? This is not allowed to change!
>> > > >
>> > > > *Here is example:*
>> > > > readSchema(
>> > > >
>> > > >
>> > >
>> >
>> https://github.com/apache/pulsar/pull/19566/files#diff-a7006d73502e6064a80af02822f3a3072be498d8b677c4b838b0dafaea32dea4
>> > > > )
>> > > > writtenSchema(
>> > > >
>> > > >
>> > >
>> >
>> https://github.com/apache/pulsar/pull/19566/files#diff-e3e7543624edaf1e0a4fd47947a2cad6e4b816b93843f71a367042ba6c3ec53f
>> > > > )
>> > > >
>> > > >
>> > > > 6.
>> > > >
>> > > > > (6) The writtenSchema removes fields that do not have default
>> values
>> > in
>> > > > > > readSchema. Then the schema is incompatible.
>> > > > > Protobuf gives you its own default if you don't supply one. This
>> is
>> > > > > incorrect.
>> > > >
>> > > >
>> > > > (1) This rule only applies if proto2 does not set the default
>> value. If
>> > > > proto3 does not check, the default value will always be there.
>> > > >
>> > > > (2) In PIP issue:
>> > > >
>> > > > > Proto3 canceled the required field, so there is no need to check
>> the
>> > > > > required field. We get the syntax(proto2 or proto3) of proto in
>> the
>> > > code,
>> > > > > and skip the check of the required field if it is proto3. All
>> other
>> > > > > checking rules also apply to proto3.
>> > > >
>> > > >
>> > > > *I made a mistake here. This default value check is not need in
>> > proto3. I
>> > > > will modify the rules later according to your suggestion.*
>> > > >
>> > > > > I would remove the proto2/proto3 sections, since they only differ
>> in
>> > 1
>> > > > > rule, and just mention that distinction inside that rule (less
>> work
>> > for
>> > > > the
>> > > > > reade).
>> > > >
>> > > >
>> > > > (3) And add rules that look like they should be checked:
>> > > >
>> > > > > Rules that you don't have in the doc, but should IMO*
>> > > > > .......
>> > > >
>> > > > There can't be a field in writerSchema, that exists in readerSchema
>> > (tag
>> > > > > number based), which in writerSchema its type is scalar, but in
>> > > > readSchema
>> > > > > its type is scalar, it's repeated but with packed=true.
>> > > >
>> > > >
>> > > > But I don't think rule number three needs to be added.
>> > > >
>> > > >
>> > > > Thanks,
>> > > > sinan
>> > > >
>> > > >
>> > > >
>> > > > Asaf Mesika <as...@gmail.com> 于2023年3月14日周二 22:33写道:
>> > > >
>> > > > > Hi Sinan,
>> > > > >
>> > > > > The doc looks much better!
>> > > > >
>> > > > > I have a few additional comments:
>> > > > >
>> > > > > Pasting comment from previous emails:
>> > > > >
>> > > > > Can you convert the code block which is actually a quote in the
>> > > > > beginning of the PIP to something which doesn't require to scroll
>> > > > > horizontally so much?
>> > > > > Use
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://docs.github.com/en/get-started/writing-on-github/getting-started-with-writing-and-formatting-on-github/basic-writing-and-formatting-syntax#quoting-text
>> > > > >
>> > > > > *Validation Rules*
>> > > > >
>> > > > > (1) If the root message names of writtenSchema and readSchema are
>> > > > > > different, then incompatible.
>> > > > >
>> > > > > Why? The root message name is not written over the wire to the
>> best
>> > of
>> > > my
>> > > > > knowledge. I haven't found it written in the official doc.
>> > > > >
>> > > > >
>> > > > > >    - The writtenSchema cannot add required fields, but optional
>> or
>> > > > > >    duplicate fields can be added (The field number must be new).
>> > > > > >
>> > > > > > That's not true.
>> > > > > You can have a required field in writeSchema, and not have that
>> field
>> > > in
>> > > > > readSchema (based on tag number).
>> > > > >
>> > > > > The writtenSchema can not change the field number of any field in
>> > > > > readSchema (the
>> > > > > > field name is the same, but the field number is different).
>> > > > >
>> > > > > You have to take into account field type as well when comparing.
>> > > > >
>> > > > > If I have in readSchema
>> > > > > int32 justAnID = 1
>> > > > > int32 customerId = 2
>> > > > >
>> > > > > and in writeSchema I have
>> > > > > int32 justAnID = 1
>> > > > > string customerId = 3
>> > > > >
>> > > > > This is valid.
>> > > > >
>> > > > > (4) The writtenSchema can not change the field name of any field
>> in
>> > > > > > readSchema (the field number is the same, but the field name is
>> > > > > > different).
>> > > > >
>> > > > > This is incorrect.
>> > > > > Fields names are encoded into the wire. I don't see this in any
>> best
>> > > > > practice.
>> > > > >
>> > > > > ) The writtenSchema does not change the field name and number,
>> but it
>> > > > does
>> > > > > > change the field type.
>> > > > > >
>> > > > > >    - If the field type is ENUM or MESSAGE, the schema is not
>> > > compatible
>> > > > > >    when the type name is changed
>> > > > > >    - If the type of the field is another type. The schemas under
>> > this
>> > > > > >    rule are not incompatible, but warn the user.(There is
>> another
>> > way
>> > > > of
>> > > > > >    testing in PIP issue Alternatives)
>> > > > > >
>> > > > > > Small correction: for the same field number you are not allowed
>> to
>> > > > change
>> > > > > types. Name is irrelevant.
>> > > > >
>> > > > > (6) The writtenSchema removes fields that do not have default
>> values
>> > in
>> > > > > > readSchema. Then the schema is incompatible.
>> > > > >
>> > > > > Protobuf gives you its own default if you don't supply one. This
>> is
>> > > > > incorrect.
>> > > > >
>> > > > >
>> > > > > *Rules that you don't have in the doc, but should IMO*
>> > > > > * There can not be a field which exists both in readSchema and
>> > > > writeSchema,
>> > > > > with same tag number, having different default values
>> > > > > * There can't be a field in writerSchema, that exists in
>> readerSchema
>> > > > (tag
>> > > > > number based), which in writerSchema is repeated and its type is
>> > > Scalar (
>> > > > > https://protobuf.dev/programming-guides/proto/#scalar) but in
>> > > readSchema
>> > > > > it
>> > > > > is not repeated anymore.
>> > > > > * There can't be a field in writerSchema, that exists in
>> readerSchema
>> > > > (tag
>> > > > > number based), which in writerSchema its type is scalar, but in
>> > > > readSchema
>> > > > > its type is scalar, it's repeated but with packed=true.
>> > > > >
>> > > > > *Rules you have , but I would phrase a bit differently*
>> > > > >
>> > > > > I would remove the proto2/proto3 sections, since they only differ
>> in
>> > 1
>> > > > > rule, and just mention that distinction inside that rule (less
>> work
>> > for
>> > > > the
>> > > > > reade).
>> > > > >
>> > > > > * readSchema has a field which doesn't exist in writerSchema
>> (based
>> > on
>> > > > tag
>> > > > > number).
>> > > > >    * Proto v2:
>> > > > >       * That field must be `optional` or `repeated` (must not be
>> > > > > `required`)
>> > > > >    * Proto v3:
>> > > > >       * No problem.
>> > > > > * There can not be a field which exists both in readSchema and
>> > > > writeSchema,
>> > > > > with the same tag number, but having different types.
>> > > > >
>> > > > > *Motivation*
>> > > > >
>> > > > > Basically in the motivation section you want people to understand
>> the
>> > > > > following:
>> > > > >
>> > > > > Pulsar has built-in support for typed messages. It allows
>> specifying
>> > an
>> > > > > encoding scheme and its matching schema.
>> > > > > For example, it supports Avro. You specify a schema for a given
>> > topic,
>> > > > > using Avro Schema Definition (i.e. a JSON describing the schema).
>> > > > Everytime
>> > > > > you produce a message, you first declare the schema definition you
>> > wish
>> > > > to
>> > > > > use for your messages. The message data should be an avro-encoded
>> > > binary
>> > > > > data (which the client in some SDKs helps encode a given
>> > > > > data-structure/object).
>> > > > > The same applies when you consume a message. You first specify the
>> > > schema
>> > > > > you use to read the messages, and the client in some SDKs helps by
>> > > > decoding
>> > > > > the message binary data into an object/data-structure.
>> > > > >
>> > > > > Each time you specify a schema to be used, either by a producer
>> or a
>> > > > > consumer, the schema is persisted in Pulsar and given an
>> increasing
>> > > > version
>> > > > > number. If the schema was the same as the previous version, it is
>> not
>> > > > > saved. When the message is persisted, the version number is
>> encoded
>> > in
>> > > > the
>> > > > > message headers.
>> > > > >
>> > > > > Pulsar provides a very useful feature named Schema Evolution
>> > > > > <
>> > > >
>> > >
>> >
>> https://pulsar.apache.org/docs/2.11.x/schema-understand/#schema-evolution
>> > > > > >.
>> > > > > It allows us to check if a new schema version is compatible with
>> > > previous
>> > > > > versions or versions. When you configure the schema for the topic
>> you
>> > > > > decide the strategy to use for doing the validation check. The
>> > > strategies
>> > > > > validate the following:
>> > > > >
>> > > > >    - BACKWARD strategy
>> > > > >       - A consumer with newSchema can read a message written using
>> > > > >       existingSchema
>> > > > >    - BACKWARD_TRANSITIVE strategy
>> > > > >       - A consumer with newSchema can read messages written using
>> all
>> > > > >       existingSchema
>> > > > >    - FORWARD
>> > > > >       - A consumer with existingSchema can read messages written
>> > using
>> > > > >       newSchema
>> > > > >    - FORWARD_TRANSITIVE
>> > > > >       - A consumer defined with any of the existingSchema can read
>> > > > messages
>> > > > >       written using newSchema
>> > > > >    - FULL
>> > > > >       - A consumer defined with newSchema can read messages
>> written
>> > > using
>> > > > >       existingSchema
>> > > > >       - A consumer defined with existingSchema can read messages
>> > > written
>> > > > >       using newSchema
>> > > > >    - FULL_TRANSITIVE
>> > > > >       - A consumer defined with newSchema can read messages
>> written
>> > > using
>> > > > >       any of the existingSchema
>> > > > >       - A consumer defined with any of the existingSchema can read
>> > > > messages
>> > > > >       written using newSchema
>> > > > >
>> > > > >
>> > > > > Aside from Avro, Pulsar also has two additional supported
>> encodings:
>> > > > > PROTOBUF and PROTOBUF_NATIVE.
>> > > > >
>> > > > > PROTOBUF is a bit strange. It encodes the messages using Protobuf
>> > > > encoding,
>> > > > > but the schema that is persisted to Pulsar is *not* Protobuf
>> > Descriptor
>> > > > as
>> > > > > you would have expected. The saved schema is a translation of the
>> > > > Protobuf
>> > > > > Descriptor to an Avro Schema, so in fact an Avro schema
>> definition is
>> > > > saved
>> > > > > as the schema.
>> > > > >
>> > > > > PROTOBUF_NATIVE was created to fix that shortcoming, by actually
>> > > > persisting
>> > > > > the Protobuf Descriptor and using Protobuf for encoding.
>> > > > > The problem is that the authors of PROTOBUF_NATIVE haven't
>> completed
>> > it
>> > > > > fully, and the backward compatibility validation code almost does
>> not
>> > > > > exist: It only checks if the root message name is the same between
>> > > > > versions.
>> > > > >
>> > > > > GOALS
>> > > > > The goal of this PIP is to amend PROTOBUF_NATIVE by adding a fully
>> > > > > functional validation for any of the defined Schema Compatibility
>> > > > > Strategies.
>> > > > > A secondary goal is to allow the user to choose between different
>> > > > > implementations: The new fully functional validation or the
>> existing
>> > > > > barebones validation.
>> > > > >
>> > > > > -------- END
>> > > > >
>> > > > > I'm ok with having links in the Motivation , as *further reading*.
>> > > > > I'm against stacking up work for the reader to go read 5-6
>> different
>> > > > links
>> > > > > just to understand the motivation and background knowledge
>> required
>> > to
>> > > > > understand the feature.
>> > > > >
>> > > > > I'm against putting code in the Motivation. Especially if it is
>> > > supposed
>> > > > to
>> > > > > replace description in plain English making it easy to understand
>> the
>> > > > > design.
>> > > > > Leave the code to the motivation.
>> > > > > Paste code only if you absolutely can't use plain old
>> descriptions to
>> > > > > explain.
>> > > > >
>> > > > >
>> > > > >
>> > > > > On Sat, Mar 11, 2023 at 11:46 AM SiNan Liu <
>> liusinan1998@gmail.com>
>> > > > wrote:
>> > > > >
>> > > > > > *I guess that's right, too! *
>> > > > > >
>> > > > > > But the name `ProtobufNativeAdvancedSchemaCompatibilityCheck` is
>> > > > better,
>> > > > > > because we don't know whether the future will have V2, V3. The
>> > > official
>> > > > > > solution can be called
>> > > > `ProtobufNativeOfficialSchemaCompatibilityCheck`,
>> > > > > or
>> > > > > > is a good `ProtobufNativeXXXXXXXXSchemaCompatibilityCheck`
>> > > third-party
>> > > > > > solution.
>> > > > > >
>> > > > > > I've updated my design in PIP issue.
>> > > > > > 1. A new ProtobufNativeSchemaAdvanceCompatibilityCheck, rather
>> > than a
>> > > > > > ProtobufNativeSchemaCompatibilityCheck different validator
>> > > > > implementation.
>> > > > > > 2. Remove the 'builder'
>> > > > > > 3. Clarify the relationship between newSchema, existingSchema,
>> and
>> > > > > > writtenSchema in canRead.
>> > > > > >
>> > > > > > Help to see if the description is comprehensive and what changes
>> > and
>> > > > > > improvements need to be made.
>> > > > > >
>> > > > > > Thanks,
>> > > > > > sinan
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > > Asaf Mesika <as...@gmail.com> 于2023年3月9日周四 17:35写道:
>> > > > > >
>> > > > > > > I like Bo's suggestion - I haven't realized each schema type
>> > > > > > > compatibility check is actually a plugin.
>> > > > > > > It makes sense for any schema type checks to evolve, sometimes
>> > in a
>> > > > > > > non-backward compatible way hence having two plugins like
>> > > > > > > protobufNativeSchemaCompatabilityCheckV1 and then
>> > > > > > > protobufNativeSchemaCompatabilityCheckV2 and then
>> > > > > > > protobufNativeSchemaCompatabilityCheckV3 makes sense to me.
>> > > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > > > On Thu, Mar 9, 2023 at 5:49 AM 丛搏 <bo...@apache.org> wrote:
>> > > > > > >
>> > > > > > > >  Hi siNan:
>> > > > > > > >
>> > > > > > > > From my point of view, it is just a plug-in. I don't think
>> it
>> > is
>> > > > > > > > necessary to add configuration for the plugin.
>> > > > > > > > This is meaningless, and it will increase the difficulty of
>> use
>> > > for
>> > > > > > > users.
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > SiNan Liu <li...@gmail.com> 于2023年3月8日周三 15:54写道:
>> > > > > > > > >
>> > > > > > > > > Hi, bo.
>> > > > > > > > >
>> > > > > > > > > 1. I understand what you say, to develop a new
>> > > > > > > > > `ProtobufNativeAdvancedSchemaCompatibilityCheck`, rather
>> than
>> > > > > > changing
>> > > > > > > > > existing `ProtobufNativeSchemaCompatibilityCheck`. But I
>> > found
>> > > a
>> > > > > few
>> > > > > > > > small
>> > > > > > > > > problems:
>> > > > > > > > >
>> > > > > > > > > (1)ProtobufNativeAdvancedSchemaCompatibilityCheck and
>> > > > > > > > > ProtobufNativeSchemaCompatibilityCheck schemaType is
>> > > > > PROTOBUF_NATIVE.
>> > > > > > > It
>> > > > > > > > > looks like both checkers are PROTOBUF not using
>> > AVRO-PROTOBUF's
>> > > > > > > "native"
>> > > > > > > > > implementation, which leads to some problems or
>> > "unreasonable"
>> > > > and
>> > > > > > > gives
>> > > > > > > > me
>> > > > > > > > > some extended thinking and questions.
>> > > > > > > > >
>> > > > > > > > `CompatibilityCheck ` its only a plugin.
>> > > > > > > > `ProtobufNativeSchemaCompatibilityCheck` may sooner or later
>> > > leave
>> > > > > the
>> > > > > > > > stage, when
>> `ProtobufNativeAdvancedSchemaCompatibilityCheck` is
>> > > > > > > > stable, we can make it the default Checker.
>> > > > > > > >
>> > > > > > > > It is just a plug-in, users can change it at will and ensure
>> > that
>> > > > it
>> > > > > > > > is used correctly
>> > > > > > > > > (2)In broker.conf
>> > > > > > > > >
>> > > > > > > > > `schemaRegistryCompatibilityCheckers`. If
>> > > > > > > > > ProtobufNativeSchemaCompatibilityCheck and
>> > > > > > > > > ProtobufNativeAdvancedSchemaCompatibilityCheck all set.
>> This
>> > is
>> > > > > going
>> > > > > > > to
>> > > > > > > > > overwrite each other. Because this is a map:
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://github.com/apache/pulsar/blob/af1360fb167c1f9484fda5771df3ea9b21d1440b/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/schema/SchemaRegistryService.java#L36-L44
>> > > > > > > > >
>> > > > > > > > > ```java
>> > > > > > > > >
>> > > > > > > > > Map<SchemaType, SchemaCompatibilityCheck> checkers = new
>> > > > > HashMap<>();
>> > > > > > > > >
>> > > > > > > > > for (String className : checkerClasses) {
>> > > > > > > > >
>> > > > > > > > > SchemaCompatibilityCheck schemaCompatibilityCheck =
>> > > > > > > > > Reflections.createInstance(className,
>> > > > > > > > >
>> > > > > > > > > SchemaCompatibilityCheck.class,
>> > > > > > > > > Thread.currentThread().getContextClassLoader());
>> > > > > > > > >
>> > > > > > > > > checkers.put(schemaCompatibilityCheck.getSchemaType(),
>> > > > > > > > > schemaCompatibilityCheck);
>> > > > > > > > >
>> > > > > > > > > ```
>> > > > > > > > >
>> > > > > > > > > Is this a big problem or a small one? Is it possible or
>> > > > > unnecessary?
>> > > > > > > > Maybe
>> > > > > > > > > we can write in the documentation that protobufNative
>> > checkers
>> > > > can
>> > > > > > only
>> > > > > > > > > choose one of the two? Why are there two Checkers for
>> > different
>> > > > > > > > > implementations of the same schemaType? Why not the
>> checker
>> > to
>> > > > > create
>> > > > > > > > > different validator, so we don not have to change
>> > > > > > > > > schemaRegistryCompatibilityCheckers.
>> > > > > > > >
>> > > > > > > > users can only use one, not two, which will bring
>> complexity to
>> > > > users
>> > > > > > > >
>> > > > > > > > >
>> > > > > > > > > (3)And after the update to
>> > > > > > > > ProtobufNativeAdvancedSchemaCompatibilityCheck.
>> > > > > > > > > Existing topics previously only checked the name of the
>> root
>> > > > > message,
>> > > > > > > not
>> > > > > > > > > the content of protobuf.
>> > > > > > > > >
>> > > > > > > > > What if the user wants both Checkers?
>> > > > > > > > >
>> > > > > > > > > Set to ProtobufNativeAdvancedSchemaCompatibilityCheck,
>> affect
>> > > the
>> > > > > > topic
>> > > > > > > > of
>> > > > > > > > > the existing schema?
>> > > > > > > > >
>> > > > > > > > > Older topics still use the old checker, and newer topics
>> or
>> > > > certain
>> > > > > > > older
>> > > > > > > > > topics use the new advancedchecker.
>> > > > > > > > >
>> > > > > > > > when `ProtobufNativeAdvancedSchemaCompatibilityCheck`
>> stable,
>> > > > > > > > users will not choose
>> `ProtobufNativeSchemaCompatibilityCheck`.
>> > > > > > > > because it not a complete checker.
>> > > > > > > > > (4)So should we have one schemaType for a checker?
>> > > > > > > protobufNativeChecker
>> > > > > > > > > can have as many different implementation classes as
>> > possible.
>> > > > This
>> > > > > > > > > classname configuration in PIP, let's see if it can be
>> set at
>> > > the
>> > > > > > topic
>> > > > > > > > > level. In the current PIP design I just load this
>> parameter
>> > > into
>> > > > > the
>> > > > > > > > > checker when the broker is started and the checkers map is
>> > set
>> > > > up.
>> > > > > > Can
>> > > > > > > I
>> > > > > > > > do
>> > > > > > > > > this in the new normal pr if I want to support topic
>> level?
>> > Or
>> > > > > > perfect
>> > > > > > > it
>> > > > > > > > > here?
>> > > > > > > > >
>> > > > > > > > > Add a call PROTOBUF_NATIVE_ADVANCE schemaType
>> corresponding
>> > > > > > > > > ProtobufNativeAdvancedSchemaCompatibilityCheck? (Seems to
>> be
>> > > more
>> > > > > > > > trouble).
>> > > > > > > > >
>> > > > > > > > > Sorry I can not use the computer and network in the
>> company,
>> > I
>> > > > use
>> > > > > my
>> > > > > > > > > mobile phone to reply to the email, the format may be a
>> bit
>> > > > messy.
>> > > > > > > Please
>> > > > > > > > > understand.
>> > > > > > > > >
>> > > > > > > > > Thanks,
>> > > > > > > > >
>> > > > > > > > > sinan
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > > 丛搏 <bo...@apache.org> 于 2023年3月7日周二 下午11:39写道:
>> > > > > > > > >
>> > > > > > > > > > SiNan Liu <li...@gmail.com> 于2023年3月7日周二
>> 13:22写道:
>> > > > > > > > > > >
>> > > > > > > > > > > Great to see your comment, bo!
>> > > > > > > > > > >
>> > > > > > > > > > > 1. The first way. The protobuf website has a
>> description
>> > of
>> > > > the
>> > > > > > > > rules,
>> > > > > > > > > > but
>> > > > > > > > > > > no plans to implement them.
>> > > > > > > > > > >
>> https://protobuf.dev/programming-guides/proto/#updating
>> > > > > > > > > >
>> > > > > > > > > > https://groups.google.com/g/protobuf
>> > > > > > > > > > maybe ask here
>> > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > > 2. I think this PIP can be divided into two parts.
>> > > > > > > > > > > (1) Add a flag(`ValidatorClassName`), load it into
>> > > > > > > > > > > `ProtobufNativeSchemaCompatibilityCheck` when the
>> broker
>> > > > > starts.
>> > > > > > > > > > > ValidatorClassName is empty by default, and the
>> > > > implementation
>> > > > > > > > continues
>> > > > > > > > > > as
>> > > > > > > > > > > before, with no change for the user.
>> > > > > > > > > >
>> > > > > > > > > > `ProtobufNativeSchemaCompatibilityCheck` is a plugin in
>> > > > > > `broker.conf`
>> > > > > > > > > > ```
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> schemaRegistryCompatibilityCheckers=org.apache.pulsar.broker.service.schema.JsonSchemaCompatibilityCheck,org.apache.pulsar.broker.service.schema.AvroSchemaCompatibilityCheck,org.apache.pulsar.broker.service.schema.ProtobufNativeSchemaCompatibilityCheck
>> > > > > > > > > > ```
>> > > > > > > > > > I do not recommend that we directly modify this plugin
>> and
>> > > > > continue
>> > > > > > > to
>> > > > > > > > > > add configuration items, which will cause trouble for
>> > users.
>> > > > > > > > > > We have a lot of configs and it's getting very unwieldy.
>> > > > > > > > > > in my opinion, we don't change
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> `org.apache.pulsar.broker.service.schema.ProtobufNativeSchemaCompatibilityCheck`,
>> > > > > > > > > > it is a simple implementation, it doesn't go wrong very
>> > > often,
>> > > > > most
>> > > > > > > > > > users will use it. we can add another
>> ProtobufNativeCheck
>> > > named
>> > > > > > > > > > `ProtobufNativeAdvancedSchemaCompatibilityCheck ` or
>> other.
>> > > in
>> > > > > this
>> > > > > > > > > > way, we don't need to add this flag. There is no need to
>> > > > consider
>> > > > > > > > > > compatibility, it is just a plug-in and will not affect
>> > > current
>> > > > > > > logic.
>> > > > > > > > > > If the user needs it, just change the plugin to the new
>> > > > > > > implementation
>> > > > > > > > > >
>> > > > > > > > > > > ```java
>> > > > > > > > > > >     ProtobufNativeSchemaValidator DEFAULT =
>> > > (fromDescriptors,
>> > > > > > > > > > toDescriptor)
>> > > > > > > > > > > -> {
>> > > > > > > > > > >         for (Descriptors.Descriptor fromDescriptor :
>> > > > > > > > fromDescriptors) {
>> > > > > > > > > > >             // The default implementation only checks
>> if
>> > > the
>> > > > > root
>> > > > > > > > message
>> > > > > > > > > > > has changed.
>> > > > > > > > > > >             if
>> > > > > > > > > > >
>> > > > > >
>> (!fromDescriptor.getFullName().equals(toDescriptor.getFullName()))
>> > > > > > > {
>> > > > > > > > > > >                 throw new
>> > > > > ProtoBufCanReadCheckException("Protobuf
>> > > > > > > > root
>> > > > > > > > > > > message isn't allow change!");
>> > > > > > > > > > >             }
>> > > > > > > > > > >         }
>> > > > > > > > > > >     };
>> > > > > > > > > > > ```
>> > > > > > > > > > > `ValidatorClassName` value also can be set to the
>> current
>> > > > > > > > implementation
>> > > > > > > > > > of
>> > > > > > > > > > > PIP add
>> > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> `org.apache.pulsar.broker.service.schema.validator.ProtobufNativeSchemaBreakValidatorImpl`.
>> > > > > > > > > > >
>> > > > > > > > > > > (2) Recoding the
>> > `ProtobufNativeSchemaCompatibilityCheck`.
>> > > > > > Through
>> > > > > > > > the
>> > > > > > > > > > flag
>> > > > > > > > > > > (`ValidatorClassName`) to build different
>> > > > > > > > > > `ProtobufNativeSchemaValidator`.
>> > > > > > > > > > > Isn't it just a plug-in? The user can develop and
>> choose
>> > a
>> > > > > > > different
>> > > > > > > > > > > `ProtobufNativeSchemaValidator`. I think it didn't
>> change
>> > > the
>> > > > > > > logic,
>> > > > > > > > it
>> > > > > > > > > > > just allowed him to expand it.
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > > I think this PIP should be an enhancement and
>> supplement
>> > to
>> > > > the
>> > > > > > > > function,
>> > > > > > > > > > > and there is no such thing as unnecessary and
>> > meaningless.
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > > Thanks,
>> > > > > > > > > > > sinan
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > > 丛搏 <bo...@apache.org> 于2023年3月7日周二 11:53写道:
>> > > > > > > > > > >
>> > > > > > > > > > > > I think we have two ways to do that.
>> > > > > > > > > > > >
>> > > > > > > > > > > > First way: We need to advance the improvement of
>> java
>> > in
>> > > > > > > protobuf.
>> > > > > > > > Ask
>> > > > > > > > > > > > if they have plans to improve.
>> > > > > > > > > > > >
>> > > > > > > > > > > > Second way: the new PROTOBUF_NATIVE
>> > > > > `SchemaCompatibilityCheck`
>> > > > > > > > should
>> > > > > > > > > > > > be implemented as a plugin, don't change any
>> existing
>> > > > plugin
>> > > > > > > logic
>> > > > > > > > > > > > (it's simple and already used). I don't recommend
>> > adding
>> > > > > flags
>> > > > > > > for
>> > > > > > > > > > > > rollback, it adds configuration and makes little
>> sense.
>> > > > > > > > > > > >
>> > > > > > > > > > > > Thanks,
>> > > > > > > > > > > > Bo
>> > > > > > > > > > > >
>> > > > > > > > > > > > Asaf Mesika <as...@gmail.com> 于2023年3月6日周一
>> > > 23:00写道:
>> > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > Can you convert the code block which is actually a
>> > > quote
>> > > > in
>> > > > > > the
>> > > > > > > > > > > > > beginning of the PIP to something which doesn't
>> > require
>> > > > to
>> > > > > > > scroll
>> > > > > > > > > > > > > horizontally so much?
>> > > > > > > > > > > > > Use
>> > > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://docs.github.com/en/get-started/writing-on-github/getting-started-with-writing-and-formatting-on-github/basic-writing-and-formatting-syntax#quoting-text
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > Let's improve the clarity of what you wrote:
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > "the PROTOBUF uses avro struct to store."
>> > > > > > > > > > > > > -->
>> > > > > > > > > > > > > When Schema type PROTOBUF is used, Pulsar Client
>> > > assumes
>> > > > > the
>> > > > > > > > object
>> > > > > > > > > > given
>> > > > > > > > > > > > > to it as message data is an auto-generated POJO
>> > > > containing
>> > > > > > the
>> > > > > > > > > > > > annotations
>> > > > > > > > > > > > > encoding the schema. The client is using a
>> converter,
>> > > > which
>> > > > > > > > converts
>> > > > > > > > > > a
>> > > > > > > > > > > > > Protobuf schema descriptor into an Avro schema and
>> > > sends
>> > > > > that
>> > > > > > > as
>> > > > > > > > the
>> > > > > > > > > > > > Schema
>> > > > > > > > > > > > > of the producer/consumer.
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > "On the broker side, protobuf and avro both use
>> > > > SchemaData
>> > > > > > > > converted
>> > > > > > > > > > to
>> > > > > > > > > > > > > org.apache.avro.Schema."
>> > > > > > > > > > > > > -->
>> > > > > > > > > > > > > Since the schema is an Avro schema, the
>> > implementation
>> > > of
>> > > > > > > > > > compatibility
>> > > > > > > > > > > > > check on the broker side is to simply re-use the
>> > > > > > compatibility
>> > > > > > > > check
>> > > > > > > > > > of
>> > > > > > > > > > > > the
>> > > > > > > > > > > > > AVRO schema type.
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > "ProtobufSchema is different from
>> > ProtobufNativeSchema
>> > > in
>> > > > > > > schema
>> > > > > > > > > > > > > compatibility check it uses avro-protobuf.
>> > > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://central.sonatype.com/artifact/org.apache.avro/avro-protobuf/1.11.1/overview
>> > > > > > > > > > > > > But the current implementation of ProtobufNative
>> > schema
>> > > > > > > > compatibility
>> > > > > > > > > > > > > check only
>> > > > > > > > > > > > > checked if the root message name is changed."
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > -->
>> > > > > > > > > > > > > PROTOBUF_NATIVE schema type is different.
>> > > > > > > > > > > > > The client is actually using Protobuf Descriptor
>> as
>> > the
>> > > > > > schema,
>> > > > > > > > as
>> > > > > > > > > > > > opposed
>> > > > > > > > > > > > > to Avro schema of PROTOBUF schema type. In the
>> > broker,
>> > > > the
>> > > > > > > > > > > > PROTOBUF_NATIVE
>> > > > > > > > > > > > > compatibility check actually hasn't implemented
>> any
>> > > rule,
>> > > > > > > besides
>> > > > > > > > > > one:
>> > > > > > > > > > > > > checking if the root message name has changed.
>> > > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > >    1. For now, there is no official or
>> third-party
>> > > > > solution
>> > > > > > > for
>> > > > > > > > > > > > ProtoBuf
>> > > > > > > > > > > > > >    compatibility. If in the future have better
>> > > > solutions
>> > > > > > of a
>> > > > > > > > third
>> > > > > > > > > > > > party or
>> > > > > > > > > > > > > >    the official, we develop new
>> > > > > > ProtobufNativeSchemaValidator
>> > > > > > > > and
>> > > > > > > > > > use,
>> > > > > > > > > > > > so
>> > > > > > > > > > > > > >    add a flag.
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > Who do you need to make that configurable? Once
>> you
>> > > > > found a
>> > > > > > > > third
>> > > > > > > > > > > > party,
>> > > > > > > > > > > > > just switch to it? Who knows, maybe you never
>> will.
>> > > > > Introduce
>> > > > > > > it
>> > > > > > > > > > when you
>> > > > > > > > > > > > > find it, not now.
>> > > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > We improve in
>> ProtobufNativeSchemaCompatibilityCheck
>> > > > > > BACKWARD,
>> > > > > > > > > > FORWARD
>> > > > > > > > > > > > > > these strategies. As with the AVRO
>> implementation,
>> > > > > protobuf
>> > > > > > > > > > > > compatibility
>> > > > > > > > > > > > > > checking need implementing the canRead method.
>> > *This
>> > > > will
>> > > > > > > check
>> > > > > > > > > > that
>> > > > > > > > > > > > > > the writtenschema can be read by readSchema.*
>> > > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > I completely disagree.
>> > > > > > > > > > > > > Avro implementation is confusing for our use case.
>> > > Don't
>> > > > > copy
>> > > > > > > > that.
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > You have
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > public void checkCompatible(SchemaData from,
>> > SchemaData
>> > > > to,
>> > > > > > > > > > > > > SchemaCompatibilityStrategy strategy)
>> > > > > > > > > > > > >         throws IncompatibleSchemaException {
>> > > > > > > > > > > > >     Descriptor fromDescriptor =
>> > > > > > > > > > > > >
>> > ProtobufNativeSchemaUtils.deserialize(from.getData());
>> > > > > > > > > > > > >     Descriptor toDescriptor =
>> > > > > > > > > > > > >
>> ProtobufNativeSchemaUtils.deserialize(to.getData());
>> > > > > > > > > > > > >     switch (strategy) {
>> > > > > > > > > > > > >         case BACKWARD_TRANSITIVE:
>> > > > > > > > > > > > >         case BACKWARD:
>> > > > > > > > > > > > >         case FORWARD_TRANSITIVE:
>> > > > > > > > > > > > >         case FORWARD:
>> > > > > > > > > > > > >         case FULL_TRANSITIVE:
>> > > > > > > > > > > > >         case FULL:
>> > > > > > > > > > > > >             checkRootMessageChange(fromDescriptor,
>> > > > > > > toDescriptor,
>> > > > > > > > > > > > strategy);
>> > > > > > > > > > > > >             return;
>> > > > > > > > > > > > >         case ALWAYS_COMPATIBLE:
>> > > > > > > > > > > > >             return;
>> > > > > > > > > > > > >         default:
>> > > > > > > > > > > > >             throw new
>> > > > IncompatibleSchemaException("Unknown
>> > > > > > > > > > > > > SchemaCompatibilityStrategy.");
>> > > > > > > > > > > > >     }
>> > > > > > > > > > > > > }
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > I would rename :
>> > > > > > > > > > > > > from --> currentSchema
>> > > > > > > > > > > > > to --> newSchema
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > Use that switch case and have a method for each
>> like:
>> > > > > > > > > > > > > validateBackwardsCompatibility(currentSchema,
>> > > newSchema)
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > I dislike canRead and usage of writtenSchema,
>> since
>> > you
>> > > > > have
>> > > > > > > two
>> > > > > > > > > > > > completely
>> > > > > > > > > > > > > different use cases: from the producing side and
>> the
>> > > > > consumer
>> > > > > > > > side.
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > schemaValidatorBuilder
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > I dislike this proposal. IMO Avro
>> implementation is
>> > > way
>> > > > > too
>> > > > > > > > > > > > complicated.
>> > > > > > > > > > > > > Why not have a simple function for validation for
>> > each
>> > > > > switch
>> > > > > > > > case
>> > > > > > > > > > above?
>> > > > > > > > > > > > > Why do we need strategy and builder, and all this
>> > > > > complexity?
>> > > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > *Here are the basic compatibility rules we've
>> > defined:*
>> > > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > IMO it's impossible to read the validation rules
>> as
>> > you
>> > > > > > > described
>> > > > > > > > > > them.
>> > > > > > > > > > > > > I wrote how they should be structured numerous
>> times
>> > > > above.
>> > > > > > > > > > > > > I can't validate them.
>> > > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > IMO, the current design is very hard to read.
>> > > > > > > > > > > > > Please try to avoid jumping into code sections.
>> > > > > > > > > > > > > Write a high level design section, in which you
>> > > describe
>> > > > in
>> > > > > > > words
>> > > > > > > > > > what
>> > > > > > > > > > > > you
>> > > > > > > > > > > > > plan to do.
>> > > > > > > > > > > > > Write the validation rules in the structure that
>> is
>> > > easy
>> > > > to
>> > > > > > > > > > understand:
>> > > > > > > > > > > > > rules per each compatibility check, and use proper
>> > > words
>> > > > > > > (current
>> > > > > > > > > > schema,
>> > > > > > > > > > > > > new schema), since new schema can be once used for
>> > read
>> > > > and
>> > > > > > > once
>> > > > > > > > > > used for
>> > > > > > > > > > > > > write.
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > In its current form it takes too much time to
>> > > understand
>> > > > > the
>> > > > > > > > design,
>> > > > > > > > > > and
>> > > > > > > > > > > > it
>> > > > > > > > > > > > > shouldn't be the case.
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > Thanks,
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > Asaf
>> > > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > On Sun, Mar 5, 2023 at 3:58 PM SiNan Liu <
>> > > > > > > liusinan1998@gmail.com
>> > > > > > > > >
>> > > > > > > > > > wrote:
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > > Hi! I updated the explanation of some things in
>> the
>> > > PIP
>> > > > > > > issue.
>> > > > > > > > And
>> > > > > > > > > > also
>> > > > > > > > > > > > > > added a new “flag” in the conf is used as the
>> > > different
>> > > > > > > > > > > > > > ProtobufNativeSchemaValidator implementation,
>> also
>> > > set
>> > > > > > > > > > > > > > ProtobufNativeSchemaValidator default only check
>> > > > whether
>> > > > > > the
>> > > > > > > > name
>> > > > > > > > > > of
>> > > > > > > > > > > > the
>> > > > > > > > > > > > > > root message is the same.
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > Thanks,
>> > > > > > > > > > > > > > sinan
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > Asaf Mesika <as...@gmail.com>
>> 于2023年3月5日周日
>> > > > > 20:21写道:
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > On Wed, Mar 1, 2023 at 4:33 PM SiNan Liu <
>> > > > > > > > liusinan1998@gmail.com
>> > > > > > > > > > >
>> > > > > > > > > > > > wrote:
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > Can you please explain how a Protobuf
>> Schema
>> > > > > > descriptor
>> > > > > > > > can
>> > > > > > > > > > be
>> > > > > > > > > > > > > > > validated
>> > > > > > > > > > > > > > > > > for backward compatibility check using
>> Avro
>> > > based
>> > > > > > > > > > compatibility
>> > > > > > > > > > > > > > rules?
>> > > > > > > > > > > > > > > > > Doesn't it expect the schema to be Avro,
>> but
>> > it
>> > > > is
>> > > > > > > > actually a
>> > > > > > > > > > > > > > Protobuf
>> > > > > > > > > > > > > > > > > descriptor?
>> > > > > > > > > > > > > > > > > Is there some translation happening?
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > 1. *You can take a quick look at the
>> previous
>> > > > design,
>> > > > > > the
>> > > > > > > > > > PROTOBUF
>> > > > > > > > > > > > uses
>> > > > > > > > > > > > > > > > avro struct to store.*
>> > > > > > > > > > > > > > > > https://github.com/apache/pulsar/pull/1954
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/ProtobufSchema.java#L59-L61
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/ProtobufSchema.java#L110-L115
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > Ok. So to summarize your code (easier to
>> write it
>> > > > than
>> > > > > > send
>> > > > > > > > > > links):
>> > > > > > > > > > > > > > > * Pulsar Client, when used with Protobuf
>> Schema,
>> > > > > actually
>> > > > > > > > > > converts
>> > > > > > > > > > > > the
>> > > > > > > > > > > > > > > Protobuf descriptor into an Avro Schema (using
>> > code
>> > > > > found
>> > > > > > > > inside
>> > > > > > > > > > Avro
>> > > > > > > > > > > > > > > library) and saves that Avro schema as the
>> > schema.
>> > > > It's
>> > > > > > not
>> > > > > > > > > > saving
>> > > > > > > > > > > > the
>> > > > > > > > > > > > > > > protobuf descriptor at all. Very confusing I
>> have
>> > > to
>> > > > > add
>> > > > > > -
>> > > > > > > > never
>> > > > > > > > > > > > expected
>> > > > > > > > > > > > > > > that.
>> > > > > > > > > > > > > > > This explains why In the
>> > > > > ProtobufSchemaCompatibilityCheck
>> > > > > > > > they
>> > > > > > > > > > just
>> > > > > > > > > > > > > > extend
>> > > > > > > > > > > > > > > the Avro without doing any translation.
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > Thanks for that.
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > Now thatI finally understand this, I can say
>> > that:
>> > > > you
>> > > > > > > *must*
>> > > > > > > > > > explain
>> > > > > > > > > > > > > > that
>> > > > > > > > > > > > > > > in the motivation part in your PIP.
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > 2. *On the broker side, protobuf and avro
>> both
>> > > use
>> > > > > > > > `SchemaData`
>> > > > > > > > > > > > > > converted
>> > > > > > > > > > > > > > > > to `org.apache.avro.Schema`.*
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/ServerCnx.java#L1280-L1293
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/schema/ProtobufSchemaCompatibilityCheck.java#L26-L31
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/schema/AvroSchemaBasedCompatibilityCheck.java#L47-L70
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > Actually those links don't really help.
>> > > > > > > > > > > > > > > The main link that helps is:
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://github.com/apache/pulsar/blob/ec102fb024a6ea2b195826778300f20e330dff06/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/ProtobufSchema.java#L102-L122
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > I'm sorry - I don't understand.
>> > > > > > > > > > > > > > > > > I understand the different compatibility
>> > check
>> > > > > > > > strategies.
>> > > > > > > > > > > > > > > > > If you just spell them out here, then as
>> you
>> > > say,
>> > > > > > just
>> > > > > > > > > > translate
>> > > > > > > > > > > > the
>> > > > > > > > > > > > > > > > > Protobuf Descriptor into an Avro schema
>> and
>> > run
>> > > > the
>> > > > > > > Avro
>> > > > > > > > > > > > > > > > > compatibility validation, no?
>> > > > > > > > > > > > > > > > > I believe the answer is no, since you may
>> > want
>> > > to
>> > > > > > > verify
>> > > > > > > > > > > > different
>> > > > > > > > > > > > > > > things
>> > > > > > > > > > > > > > > > > when it comes to Protobuf, which are
>> > different
>> > > > then
>> > > > > > > Avro.
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > 1.
>> > > > > > > > > > > > > > > > *ProtobufSchema is different from
>> > > > > ProtobufNativeSchema
>> > > > > > in
>> > > > > > > > that
>> > > > > > > > > > it
>> > > > > > > > > > > > uses
>> > > > > > > > > > > > > > > > avro-protobuf.*
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://central.sonatype.com/artifact/org.apache.avro/avro-protobuf/1.11.1/overview
>> > > > > > > > > > > > > > > > *ProtobufNativeSchema needs a native
>> > > compatibility
>> > > > > > check,
>> > > > > > > > but
>> > > > > > > > > > > > there is
>> > > > > > > > > > > > > > no
>> > > > > > > > > > > > > > > > official or third party implementation. So
>> this
>> > > PIP
>> > > > > > does
>> > > > > > > > not
>> > > > > > > > > > use
>> > > > > > > > > > > > > > > > avro-protobuf for protobuf compatibility
>> > > checking.*
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > 2. *By the way, this is implemented in much
>> the
>> > > > same
>> > > > > > way
>> > > > > > > > that
>> > > > > > > > > > > > Apache
>> > > > > > > > > > > > > > avro
>> > > > > > > > > > > > > > > > does compatibility checking.*
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/SchemaValidatorBuilder.java
>> > > > > > > > > > > > > > > >
>> > > > > > > `canReadStrategy`,`canBeReadStrategy`,`mutualReadStrategy`
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/ValidateCanRead.java
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/ValidateCanBeRead.java
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/ValidateMutualRead.java
>> > > > > > > > > > > > > > > > *In `ValidateMutualRead.java`, the
>> arguments of
>> > > > > > > > `canRead()` are
>> > > > > > > > > > > > > > > > writtenSchema and readSchema. We only need
>> to
>> > > > change
>> > > > > > the
>> > > > > > > > order
>> > > > > > > > > > of
>> > > > > > > > > > > > > > > arguments
>> > > > > > > > > > > > > > > > we pass to `canRead()`.*
>> > > > > > > > > > > > > > > > ```java
>> > > > > > > > > > > > > > > > private void
>> > > > > > validateWithStrategy(Descriptors.Descriptor
>> > > > > > > > > > > > toValidate,
>> > > > > > > > > > > > > > > > Descriptors.Descriptor fromDescriptor)
>> throws
>> > > > > > > > > > > > > > > ProtoBufCanReadCheckException
>> > > > > > > > > > > > > > > > {
>> > > > > > > > > > > > > > > > switch (strategy) {
>> > > > > > > > > > > > > > > > case CanReadExistingStrategy ->
>> > > > > canRead(fromDescriptor,
>> > > > > > > > > > > > toValidate);
>> > > > > > > > > > > > > > > > case CanBeReadByExistingStrategy ->
>> > > > > canRead(toValidate,
>> > > > > > > > > > > > > > fromDescriptor);
>> > > > > > > > > > > > > > > > case CanBeReadMutualStrategy -> {
>> > > > > > > > > > > > > > > > canRead(toValidate, fromDescriptor);
>> > > > > > > > > > > > > > > > canRead(fromDescriptor, toValidate);
>> > > > > > > > > > > > > > > > }
>> > > > > > > > > > > > > > > > }
>> > > > > > > > > > > > > > > > }
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > private void canRead(Descriptors.Descriptor
>> > > > > > > writtenSchema,
>> > > > > > > > > > > > > > > > Descriptors.Descriptor readSchema) throws
>> > > > > > > > > > > > > > ProtoBufCanReadCheckException {
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> ProtobufNativeSchemaBreakCheckUtils.checkSchemaCompatibility(writtenSchema,
>> > > > > > > > > > > > > > > > readSchema);
>> > > > > > > > > > > > > > > > }
>> > > > > > > > > > > > > > > > ```
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > I get that you want to take inspiration from
>> the
>> > > > > existing
>> > > > > > > > Avro
>> > > > > > > > > > Schema
>> > > > > > > > > > > > > > > compatibility check, to do your code design.
>> > > > > > > > > > > > > > > I also understand you *won't* use any existing
>> > avro
>> > > > > code
>> > > > > > > for
>> > > > > > > > > > that.
>> > > > > > > > > > > > > > > I also understand, you have to write the
>> > validation
>> > > > > check
>> > > > > > > on
>> > > > > > > > your
>> > > > > > > > > > > > own,
>> > > > > > > > > > > > > > > since there is no 3rd party to explain that.
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > The only thing I can't understand are the
>> actual
>> > > > rules
>> > > > > > as I
>> > > > > > > > wrote
>> > > > > > > > > > > > before,
>> > > > > > > > > > > > > > > since they are written confusingly.
>> > > > > > > > > > > > > > > So, I repeat what I asked before:
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > I think you should structure the validation
>> rules
>> > > > > > > > differently:
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > * Backward checks
>> > > > > > > > > > > > > > > ** List down rules, where use newSchema (the
>> > schema
>> > > > > used
>> > > > > > by
>> > > > > > > > > > producer
>> > > > > > > > > > > > or
>> > > > > > > > > > > > > > > consumer) and existingSchema (last schema
>> used)
>> > > > > > > > > > > > > > > * Forward
>> > > > > > > > > > > > > > > ** List down rules, where use newSchema (the
>> > schema
>> > > > > used
>> > > > > > by
>> > > > > > > > > > producer
>> > > > > > > > > > > > or
>> > > > > > > > > > > > > > > consumer) and existingSchema (last schema
>> used)
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > Once that's accomplished I will be able to
>> > > understand
>> > > > > the
>> > > > > > > > > > different
>> > > > > > > > > > > > > > > validation rules for each compatibility check.
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > Thanks,
>> > > > > > > > > > > > > > > > sinan
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > Asaf Mesika <as...@gmail.com>
>> > 于2023年3月1日周三
>> > > > > > > 21:19写道:
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > On Mon, Feb 27, 2023 at 3:47 PM SiNan Liu
>> <
>> > > > > > > > > > > > liusinan1998@gmail.com>
>> > > > > > > > > > > > > > > > wrote:
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > I read it and they look identical.
>> What's
>> > > the
>> > > > > > > > difference
>> > > > > > > > > > > > between
>> > > > > > > > > > > > > > > > them?
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > Current avro,json, and protobuf schemas
>> are
>> > > all
>> > > > > > > > implemented
>> > > > > > > > > > > > based
>> > > > > > > > > > > > > > on
>> > > > > > > > > > > > > > > > AVRO.
>> > > > > > > > > > > > > > > > > > > What do you mean, they are all
>> > implemented
>> > > > > based
>> > > > > > on
>> > > > > > > > > > Avro? You
>> > > > > > > > > > > > > > mean
>> > > > > > > > > > > > > > > > the
>> > > > > > > > > > > > > > > > > > > protobuf schema is converted into an
>> Avro
>> > > > > Schema,
>> > > > > > > and
>> > > > > > > > > > then
>> > > > > > > > > > > > you
>> > > > > > > > > > > > > > use
>> > > > > > > > > > > > > > > > Avro
>> > > > > > > > > > > > > > > > > > > compatibility validation?
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> `org.apache.pulsar.broker.service.schema.ProtobufSchemaCompatibilityCheck`
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > >
>> > > > >
>> > `org.apache.pulsar.broker.service.schema.AvroSchemaCompatibilityCheck`
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > >
>> > > > >
>> > `org.apache.pulsar.broker.service.schema.JsonSchemaCompatibilityCheck`
>> > > > > > > > > > > > > > > > > > They all extends
>> > > > > > `AvroSchemaBasedCompatibilityCheck`,
>> > > > > > > > the
>> > > > > > > > > > > > > > > > > > `checkCompatible()` is the same
>> > > implementation
>> > > > > with
>> > > > > > > > `AVRO`.
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > Can you please explain how a Protobuf
>> Schema
>> > > > > > descriptor
>> > > > > > > > can
>> > > > > > > > > > be
>> > > > > > > > > > > > > > > validated
>> > > > > > > > > > > > > > > > > for backward compatibility check using
>> Avro
>> > > based
>> > > > > > > > > > compatibility
>> > > > > > > > > > > > > > rules?
>> > > > > > > > > > > > > > > > > Doesn't it expect the schema to be Avro,
>> but
>> > it
>> > > > is
>> > > > > > > > actually a
>> > > > > > > > > > > > > > Protobuf
>> > > > > > > > > > > > > > > > > descriptor?
>> > > > > > > > > > > > > > > > > Is there some translation happening?
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > I think you should structure the
>> validation
>> > > > rules
>> > > > > > > > > > differently:
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > The Compatibility check strategy is
>> > described
>> > > > on
>> > > > > > the
>> > > > > > > > > > website
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://pulsar.apache.org/docs/next/schema-understand/#schema-compatibility-check-strategy
>> > > > > > > > > > > > > > > > > > 1. BACKWARD(CanReadExistingStrategy):
>> > > Consumers
>> > > > > > using
>> > > > > > > > > > schema
>> > > > > > > > > > > > V3 can
>> > > > > > > > > > > > > > > > process
>> > > > > > > > > > > > > > > > > > data written by producers using the last
>> > > schema
>> > > > > > > version
>> > > > > > > > > > V2. So
>> > > > > > > > > > > > V2
>> > > > > > > > > > > > > > is
>> > > > > > > > > > > > > > > > > > "writtenSchema" and V3 is "readSchema".
>> > > > > > > > > > > > > > > > > > 2. FORWARD(CanBeReadByExistingStrategy):
>> > > > > Consumers
>> > > > > > > > using
>> > > > > > > > > > the
>> > > > > > > > > > > > last
>> > > > > > > > > > > > > > > > schema
>> > > > > > > > > > > > > > > > > > version V2 can process data written by
>> > > > producers
>> > > > > > > using
>> > > > > > > > a
>> > > > > > > > > > new
>> > > > > > > > > > > > schema
>> > > > > > > > > > > > > > > V3,
>> > > > > > > > > > > > > > > > > > even though they may not be able to use
>> the
>> > > > full
>> > > > > > > > > > capabilities
>> > > > > > > > > > > > of
>> > > > > > > > > > > > > > the
>> > > > > > > > > > > > > > > > new
>> > > > > > > > > > > > > > > > > > schema. So V3 is "writtenSchema" and V2
>> is
>> > > > > > > > "readSchema".
>> > > > > > > > > > > > > > > > > > 3. FULL(CanBeReadMutualStrategy):
>> Schemas
>> > are
>> > > > > both
>> > > > > > > > > > backward and
>> > > > > > > > > > > > > > > forward
>> > > > > > > > > > > > > > > > > > compatible.
>> > > > > > > > > > > > > > > > > > Schema can evolve. The old version
>> schema
>> > and
>> > > > the
>> > > > > > new
>> > > > > > > > > > version
>> > > > > > > > > > > > > > schema
>> > > > > > > > > > > > > > > > should
>> > > > > > > > > > > > > > > > > > be well understood.
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > I'm sorry - I don't understand.
>> > > > > > > > > > > > > > > > > I understand the different compatibility
>> > check
>> > > > > > > > strategies.
>> > > > > > > > > > > > > > > > > If you just spell them out here, then as
>> you
>> > > say,
>> > > > > > just
>> > > > > > > > > > translate
>> > > > > > > > > > > > the
>> > > > > > > > > > > > > > > > > Protobuf Descriptor into an Avro schema
>> and
>> > run
>> > > > the
>> > > > > > > Avro
>> > > > > > > > > > > > > > > > > compatibility validation, no?
>> > > > > > > > > > > > > > > > > I believe the answer is no, since you may
>> > want
>> > > to
>> > > > > > > verify
>> > > > > > > > > > > > different
>> > > > > > > > > > > > > > > things
>> > > > > > > > > > > > > > > > > when it comes to Protobuf, which are
>> > different
>> > > > then
>> > > > > > > Avro.
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > At the current state, I can't understand
>> your
>> > > > > design
>> > > > > > at
>> > > > > > > > all.
>> > > > > > > > > > > > Please
>> > > > > > > > > > > > > > > help
>> > > > > > > > > > > > > > > > > clarify that.
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > So each strategy should have its own
>> > section.
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > The arguments of `canRead()` are
>> > > writtenSchema
>> > > > > and
>> > > > > > > > > > readSchema.
>> > > > > > > > > > > > As
>> > > > > > > > > > > > > > > we've
>> > > > > > > > > > > > > > > > > > just described, we only need to change
>> the
>> > > > order
>> > > > > of
>> > > > > > > > > > arguments
>> > > > > > > > > > > > we
>> > > > > > > > > > > > > > pass
>> > > > > > > > > > > > > > > > to
>> > > > > > > > > > > > > > > > > > `canRead()`.
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > Thanks,
>> > > > > > > > > > > > > > > > > > sinan
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > Asaf Mesika <as...@gmail.com>
>> > > > > 于2023年2月27日周一
>> > > > > > > > 20:49写道:
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > > And you can see the difference
>> between
>> > > > > ProtoBuf
>> > > > > > > and
>> > > > > > > > > > > > > > > ProtoBufNative:
>> > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > >
>> > > > >
>> > https://pulsar.apache.org/docs/next/schema-get-started/#protobufnative
>> > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > >
>> > https://pulsar.apache.org/docs/next/schema-get-started/#protobuf
>> > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > >  I read it and they look identical.
>> > What's
>> > > > the
>> > > > > > > > difference
>> > > > > > > > > > > > between
>> > > > > > > > > > > > > > > > them?
>> > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > Current avro,json, and protobuf
>> schemas
>> > are
>> > > > all
>> > > > > > > > > > implemented
>> > > > > > > > > > > > based
>> > > > > > > > > > > > > > > on
>> > > > > > > > > > > > > > > > > > AVRO.
>> > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > What do you mean, they are all
>> > implemented
>> > > > > based
>> > > > > > on
>> > > > > > > > > > Avro? You
>> > > > > > > > > > > > > > mean
>> > > > > > > > > > > > > > > > the
>> > > > > > > > > > > > > > > > > > > protobuf schema is converted into an
>> Avro
>> > > > > Schema,
>> > > > > > > and
>> > > > > > > > > > then
>> > > > > > > > > > > > you
>> > > > > > > > > > > > > > use
>> > > > > > > > > > > > > > > > Avro
>> > > > > > > > > > > > > > > > > > > compatibility validation?
>> > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > > *Here are the basic compatibility
>> rules
>> > > > we've
>> > > > > > > > defined:*
>> > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > I think you should structure the
>> > validation
>> > > > > rules
>> > > > > > > > > > > > differently:
>> > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > * Backward checks
>> > > > > > > > > > > > > > > > > > > ** List down rules, where use
>> newSchema
>> > > (the
>> > > > > > schema
>> > > > > > > > used
>> > > > > > > > > > by
>> > > > > > > > > > > > > > > producer
>> > > > > > > > > > > > > > > > or
>> > > > > > > > > > > > > > > > > > > consumer) and existingSchema (last
>> schema
>> > > > used)
>> > > > > > > > > > > > > > > > > > > * Forward
>> > > > > > > > > > > > > > > > > > > ** List down rules, where use
>> newSchema
>> > > (the
>> > > > > > schema
>> > > > > > > > used
>> > > > > > > > > > by
>> > > > > > > > > > > > > > > producer
>> > > > > > > > > > > > > > > > or
>> > > > > > > > > > > > > > > > > > > consumer) and existingSchema (last
>> schema
>> > > > used)
>> > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > So each strategy should have its own
>> > > section.
>> > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > I'm saying this since you used
>> > > > "writttenSchema"
>> > > > > > > word
>> > > > > > > > but
>> > > > > > > > > > it
>> > > > > > > > > > > > > > > > represents
>> > > > > > > > > > > > > > > > > > > something completely different if it's
>> > > > backward
>> > > > > > or
>> > > > > > > > > > forward
>> > > > > > > > > > > > check.
>> > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > Once you'll have that structure like
>> > that,
>> > > I
>> > > > > > > > personally
>> > > > > > > > > > will
>> > > > > > > > > > > > be
>> > > > > > > > > > > > > > > able
>> > > > > > > > > > > > > > > > to
>> > > > > > > > > > > > > > > > > > > read and understand it.
>> > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > The motivation and problem statement
>> are
>> > > now
>> > > > > > good -
>> > > > > > > > > > thanks
>> > > > > > > > > > > > for
>> > > > > > > > > > > > > > > > improving
>> > > > > > > > > > > > > > > > > > > it.
>> > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > On Mon, Feb 27, 2023 at 8:20 AM SiNan
>> > Liu <
>> > > > > > > > > > > > > > liusinan1998@gmail.com>
>> > > > > > > > > > > > > > > > > > wrote:
>> > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > > Hi! I updated the PIP issue again.
>> This
>> > > > time
>> > > > > > I've
>> > > > > > > > added
>> > > > > > > > > > > > some
>> > > > > > > > > > > > > > > > background
>> > > > > > > > > > > > > > > > > > > and
>> > > > > > > > > > > > > > > > > > > > some explanations.
>> > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > > The compatibility check rules are
>> > already
>> > > > > > written
>> > > > > > > > in
>> > > > > > > > > > the
>> > > > > > > > > > > > > > > > > > Implementation.
>> > > > > > > > > > > > > > > > > > > > ProtoBufNative implements the same
>> > > canRead
>> > > > > > method
>> > > > > > > > as
>> > > > > > > > > > Apache
>> > > > > > > > > > > > > > Avro.
>> > > > > > > > > > > > > > > > > > > > It does this by checking whether the
>> > > schema
>> > > > > for
>> > > > > > > > > > writing and
>> > > > > > > > > > > > > > > reading
>> > > > > > > > > > > > > > > > is
>> > > > > > > > > > > > > > > > > > > > compatible. I also indicate whether
>> the
>> > > > > > > > writtenSchema
>> > > > > > > > > > and
>> > > > > > > > > > > > > > > > readSchema of
>> > > > > > > > > > > > > > > > > > > the
>> > > > > > > > > > > > > > > > > > > > Backward, Forward, and Full
>> strategies
>> > > are
>> > > > > the
>> > > > > > > old
>> > > > > > > > or
>> > > > > > > > > > the
>> > > > > > > > > > > > new
>> > > > > > > > > > > > > > > > version
>> > > > > > > > > > > > > > > > > > of
>> > > > > > > > > > > > > > > > > > > > the schema.
>> > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > > Thanks,
>> > > > > > > > > > > > > > > > > > > > sinan
>> > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > > Asaf Mesika <as...@gmail.com>
>> > > > > > > 于2023年2月26日周日
>> > > > > > > > > > 23:24写道:
>> > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > > > I'm sorry, but this PIP lacks a
>> lot
>> > of
>> > > > > > > background
>> > > > > > > > > > > > knowledge,
>> > > > > > > > > > > > > > so
>> > > > > > > > > > > > > > > > you
>> > > > > > > > > > > > > > > > > > > need
>> > > > > > > > > > > > > > > > > > > > to
>> > > > > > > > > > > > > > > > > > > > > add IMO for people to understand
>> it.
>> > > You
>> > > > > > don't
>> > > > > > > > need
>> > > > > > > > > > to
>> > > > > > > > > > > > > > explain
>> > > > > > > > > > > > > > > > the
>> > > > > > > > > > > > > > > > > > > entire
>> > > > > > > > > > > > > > > > > > > > > pulsar in this PIP, but at the
>> very
>> > > > least a
>> > > > > > few
>> > > > > > > > > > > > paragraphs
>> > > > > > > > > > > > > > > > detailing
>> > > > > > > > > > > > > > > > > > > all
>> > > > > > > > > > > > > > > > > > > > > you need to know, to put you in
>> > > context:
>> > > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > > >    - Start by saying Pulsar as a
>> > > built-in
>> > > > > > > schema
>> > > > > > > > > > registry
>> > > > > > > > > > > > > > > inside
>> > > > > > > > > > > > > > > > > > Pulsar
>> > > > > > > > > > > > > > > > > > > > >    broker.
>> > > > > > > > > > > > > > > > > > > > >       - Every time the client
>> updates
>> > > the
>> > > > > > > > schema, it
>> > > > > > > > > > > > uploads
>> > > > > > > > > > > > > > it
>> > > > > > > > > > > > > > > > to
>> > > > > > > > > > > > > > > > > > the
>> > > > > > > > > > > > > > > > > > > > >       broker. When that happens,
>> it
>> > > has a
>> > > > > > > feature
>> > > > > > > > > > which
>> > > > > > > > > > > > > > > validates
>> > > > > > > > > > > > > > > > if
>> > > > > > > > > > > > > > > > > > > the
>> > > > > > > > > > > > > > > > > > > > > new
>> > > > > > > > > > > > > > > > > > > > >       schema version is compatible
>> > with
>> > > > the
>> > > > > > > > previous
>> > > > > > > > > > > > > > versions.
>> > > > > > > > > > > > > > > > There
>> > > > > > > > > > > > > > > > > > > > > are 4 types
>> > > > > > > > > > > > > > > > > > > > >       of compatibility: Full, ...
>> > > > (complete
>> > > > > > and
>> > > > > > > > > > explain
>> > > > > > > > > > > > each
>> > > > > > > > > > > > > > > one
>> > > > > > > > > > > > > > > > > > > briefly)
>> > > > > > > > > > > > > > > > > > > > >    - Also explain Pulsar Schema
>> > > registry
>> > > > > > > supports
>> > > > > > > > > > various
>> > > > > > > > > > > > > > > schema
>> > > > > > > > > > > > > > > > > > > > >    protocols:  Avro, protobuf
>> native,
>> > > ...
>> > > > > > > > (complete
>> > > > > > > > > > the
>> > > > > > > > > > > > > > rest),
>> > > > > > > > > > > > > > > > each
>> > > > > > > > > > > > > > > > > > > > > protocol
>> > > > > > > > > > > > > > > > > > > > >    has a schema which dictates
>> how to
>> > > > > > serialize
>> > > > > > > > and
>> > > > > > > > > > > > > > deserialize
>> > > > > > > > > > > > > > > > the
>> > > > > > > > > > > > > > > > > > > > message
>> > > > > > > > > > > > > > > > > > > > >    content into typed object.
>> > > > > > > > > > > > > > > > > > > > >    - Explain in short what is
>> > protobuf
>> > > > > native
>> > > > > > > > > > (compare
>> > > > > > > > > > > > > > protobuf
>> > > > > > > > > > > > > > > > > > > > non-native)
>> > > > > > > > > > > > > > > > > > > > >    - Please don't paste code
>> instead
>> > of
>> > > > > > > > explaining.
>> > > > > > > > > > > > > > > > > > > > >       - Explain that protobuf
>> native
>> > > > > current
>> > > > > > > > > > validation
>> > > > > > > > > > > > check
>> > > > > > > > > > > > > > > is
>> > > > > > > > > > > > > > > > only
>> > > > > > > > > > > > > > > > > > > > >       composed of checking the
>> root
>> > > > message
>> > > > > > > name
>> > > > > > > > is
>> > > > > > > > > > the
>> > > > > > > > > > > > same
>> > > > > > > > > > > > > > > > between
>> > > > > > > > > > > > > > > > > > > > > the current
>> > > > > > > > > > > > > > > > > > > > >       schema version and the new
>> > > version.
>> > > > > > > > > > > > > > > > > > > > >          - Explain briefly what
>> is a
>> > > root
>> > > > > > > message
>> > > > > > > > > > and its
>> > > > > > > > > > > > > > name.
>> > > > > > > > > > > > > > > > > > > > >       - Explain the problem (list
>> > > > > scenarios)
>> > > > > > > > that we
>> > > > > > > > > > have
>> > > > > > > > > > > > > > > because
>> > > > > > > > > > > > > > > > > > > > protobuf
>> > > > > > > > > > > > > > > > > > > > >       native schema only supports
>> > FULL
>> > > > > > > > compatibility
>> > > > > > > > > > > > > > > validation.
>> > > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > > > Regarding high level design - as
>> in
>> > > what
>> > > > > you
>> > > > > > > > plan to
>> > > > > > > > > > do.
>> > > > > > > > > > > > > > > > > > > > > I suggest you add "High Level
>> Design"
>> > > and
>> > > > > in
>> > > > > > it
>> > > > > > > > > > detail
>> > > > > > > > > > > > how
>> > > > > > > > > > > > > > you
>> > > > > > > > > > > > > > > > plan
>> > > > > > > > > > > > > > > > > > to
>> > > > > > > > > > > > > > > > > > > > > validate, per protobuf version,
>> per
>> > > > > > > compatibility
>> > > > > > > > > > check
>> > > > > > > > > > > > > > > > (backward,
>> > > > > > > > > > > > > > > > > > > > forward,
>> > > > > > > > > > > > > > > > > > > > > full,...).
>> > > > > > > > > > > > > > > > > > > > > I tried reading the
>> implementation -
>> > > for
>> > > > > me ,
>> > > > > > > > it's
>> > > > > > > > > > all
>> > > > > > > > > > > > over
>> > > > > > > > > > > > > > the
>> > > > > > > > > > > > > > > > > > place.
>> > > > > > > > > > > > > > > > > > > > Can
>> > > > > > > > > > > > > > > > > > > > > you please list in order what I
>> wrote
>> > > > > above,
>> > > > > > > and
>> > > > > > > > > > list the
>> > > > > > > > > > > > > > > > validation
>> > > > > > > > > > > > > > > > > > > > rules
>> > > > > > > > > > > > > > > > > > > > > with a good explanation why you
>> > > validate
>> > > > it
>> > > > > > > like
>> > > > > > > > > > that?
>> > > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > > > Lastly, one you have all the
>> > validation
>> > > > > rules
>> > > > > > > > clearly
>> > > > > > > > > > > > stated,
>> > > > > > > > > > > > > > > you
>> > > > > > > > > > > > > > > > can
>> > > > > > > > > > > > > > > > > > > use
>> > > > > > > > > > > > > > > > > > > > > it to document it properly so
>> users
>> > can
>> > > > > know
>> > > > > > > what
>> > > > > > > > > > > > validation
>> > > > > > > > > > > > > > to
>> > > > > > > > > > > > > > > > > > expect.
>> > > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > > > Thanks,
>> > > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > > > Asaf
>> > > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > > > On Wed, Feb 22, 2023 at 5:10 PM
>> SiNan
>> > > > Liu <
>> > > > > > > > > > > > > > > > liusinan1998@gmail.com>
>> > > > > > > > > > > > > > > > > > > > wrote:
>> > > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > > > > Sorry, my mistake. I removed the
>> > code
>> > > > and
>> > > > > > > > > > described the
>> > > > > > > > > > > > > > > design
>> > > > > > > > > > > > > > > > to
>> > > > > > > > > > > > > > > > > > > > improve
>> > > > > > > > > > > > > > > > > > > > > > the PROTOBUF_NATIVE schema
>> > > > compatibility
>> > > > > > > > checks.
>> > > > > > > > > > You
>> > > > > > > > > > > > can
>> > > > > > > > > > > > > > > have a
>> > > > > > > > > > > > > > > > > > look.
>> > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > > > > Asaf Mesika <
>> asaf.mesika@gmail.com
>> > >
>> > > > > > > > 于2023年2月22日周三
>> > > > > > > > > > > > 21:16写道:
>> > > > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > > > > > I read it but you're almost
>> > > directly
>> > > > > > diving
>> > > > > > > > into
>> > > > > > > > > > the
>> > > > > > > > > > > > > > code -
>> > > > > > > > > > > > > > > > it
>> > > > > > > > > > > > > > > > > > will
>> > > > > > > > > > > > > > > > > > > > > take
>> > > > > > > > > > > > > > > > > > > > > > me
>> > > > > > > > > > > > > > > > > > > > > > > hours just to reverse engineer
>> > your
>> > > > > > design.
>> > > > > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > > > > > Can you please include a "High
>> > > Level
>> > > > > > > Design"
>> > > > > > > > > > section
>> > > > > > > > > > > > in
>> > > > > > > > > > > > > > > which
>> > > > > > > > > > > > > > > > you
>> > > > > > > > > > > > > > > > > > > > > explain
>> > > > > > > > > > > > > > > > > > > > > > > how you plan to tackle any
>> issue?
>> > > > > > > > > > > > > > > > > > > > > > > If I can read that section and
>> > > > explain
>> > > > > to
>> > > > > > > > someone
>> > > > > > > > > > > > else
>> > > > > > > > > > > > > > how
>> > > > > > > > > > > > > > > > this
>> > > > > > > > > > > > > > > > > > > will
>> > > > > > > > > > > > > > > > > > > > > > work,
>> > > > > > > > > > > > > > > > > > > > > > > it means the section is
>> complete.
>> > > > > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > > > > > Let's leave the code to the
>> PRs.
>> > > > > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > > > > > On Sun, Feb 19, 2023 at 2:59
>> PM
>> > > SiNan
>> > > > > > Liu <
>> > > > > > > > > > > > > > > > > > liusinan1998@gmail.com>
>> > > > > > > > > > > > > > > > > > > > > > wrote:
>> > > > > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > > > > > > Hi all,
>> > > > > > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > > > > > > I made a PIP to discuss:
>> > > > > > > > > > > > > > > > > > > > >
>> > > > > > https://github.com/apache/pulsar/issues/19565
>> > > > > > > > > > > > > > > > > > > > > > .
>> > > > > > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > > > > > > We can talk about the
>> current
>> > > > design
>> > > > > > > here.
>> > > > > > > > > > > > Especially
>> > > > > > > > > > > > > > for
>> > > > > > > > > > > > > > > > the
>> > > > > > > > > > > > > > > > > > > field
>> > > > > > > > > > > > > > > > > > > > > > type
>> > > > > > > > > > > > > > > > > > > > > > > > change check rules, please
>> give
>> > > > your
>> > > > > > > > valuable
>> > > > > > > > > > > > advice.
>> > > > > > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > > > > > > Thanks,
>> > > > > > > > > > > > > > > > > > > > > > > > Sinan
>> > > > > > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>

Re: [DISCUSS] PIP-246: Improved PROTOBUF_NATIVE schema compatibility checks without using avro-protobuf

Posted by SiNan Liu <li...@gmail.com>.
*Hi, Asaf! This week I in the research about `SchemaRegistryServiceImpl`
optimization, sorry to reply later.*

1.

> Well, this is exactly the argument we had in this thread. This is why you
> make 2 implementation classes ( I called them v1, v2 if you remember). The
> user can choose between them. This is exactly while I liked the _v1 _v2
> naming, so they will know they are not backward compatible.
> Also, you're talking about improving right? Checking the message name is
> outright wrong, so you're actually fixing a big mistake someone made back
> then.


This is just a very simple check the root message name of the two proto,
does it really matter?
This PIP is a more stringent check, so the previous implementation rules
for checking the root message are also needed.
This is a very small problem, and I don't think it's necessary to go into
such a discussion.



2.

> Your link to Stackoverflow will fail *my* validation: Since `int foo = 1`
> --> `int foo = 2;` : same name, same type, different field number --> fail
> validation.
> So it doesn't prove anything, or help this argument.


If I have in readSchema
> int32 justAnID = 1
> int32 customerId = 2
> and in writeSchema I have
> int32 justAnID = 1
> string customerId = 3
> This is valid.
> You will fail this validation check, since customerId has different field
> numbers (2 and 3), but I decided to remove field number 2 and add a new
> field with the same name, different type, with field number 3. When I read,
> field number 2 will get default value of 0, and ignore field number 3.
> Valid.
> If you compare by field type, then `int32 customerId = 2` to `int32
> customerId = 3`, would fail, since that looks super strange right? Why
> remove and add the same field exactly?


If there exists a field named customerId with number 2 in readSchema and
there also exists a field named customerId with number 3 in writeSchema,
then the two schemas are incompatible.
Because when reading data encoded with writeSchema, the reader will try to
get the customerId with number 2, but it's actually number 3, which will
cause the read to fail.

> > (4) The writtenSchema can not change the field name of any field in
> > > > readSchema (the field number is the same, but the field name is
> > > > different).
> > > This is incorrect.
> > > Fields names are encoded into the wire. I don't see this in any best
> > > practice.


If the writer and reader use the same field number, Protobuf will correctly
serialize and deserialize the data, even if the field names are different,
because serialization and deserialization use the field number to determine
the field.
*So you're right. We can rename it.*

*Name renaming of the same number is compatible. *
*However, the numbering of the fields is different, which will result in a
mismatch between the field number when the message is written and the field
number when the message is read, *
*resulting in the data not being parsed correctly when the message is read.*


3.

> > > > - The writtenSchema cannot add required fields, but optional or
> > > > duplicate fields can be added (The field number must be new).
> > > >
> > > > That's not true.
> > > You can have a required field in writeSchema, and not have that field
> in
> > > readSchema (based on tag number).
> >
> >
> > The required field must exist. In your case, where does readSchema go to
> > read its required fields? It's not in writtenSchema.
> > The second and third sentences in the official website say, add and
> delete
> > do not operate required field!


Just like on the official website about why not add the required field. You
can check it out.
https://protobuf.dev/programming-guides/dos-donts/#dont-add-a-required-field

Required field to be so harmful that it's being removed.


4.

> > > So if my write schema is
> > > message SearchRequest { required string query = 1; optional int32
> > > page_number = 2; optional int32 result_per_page = 3;}
> > > and my read schema is
> > > message SearchRequest { optional int32 page_number = 2; optional
> > > int32 result_per_page = 3;}
> > > You can see I removed a field from writeSchema that does not have a
> > default
> > > value.
> > > First, I can read it without any problems: Field 2 might exists, no
> > > problem. Field 3 might exists, no problem. I ignore field 1.
> > > So the validation you wrote is no correct at all, without any regard
> to
> > > optional value.
> >
> >
> > *It looks like you've added a required field here, which is not allowed
> (in
> > 3 i am said that).*
> > *It also uses the number 1 of the deleted field.*
> >
> > I removed the required field "required string query = 1" from
> writeSchema
> compared with readSchema.
> Of course you can say I added "required string query = 1" to writeSchema
> compared with readSchema.
> The main question: can I use readSchema to read message written by
> writeSchema?
> Yes I can.
> I wrote why above.
> I didn't use the number of 1 of deleted field. The read schema is using
> number 2 and 3.


This is a problem with 3 and does not allow the required field to be added.


5.

> > What if I do the other way around?
> > > My write schema is:
> > > message SearchRequest { optional int32 page_number = 2; optional
> > > int32 result_per_page = 3;}
> > > My read schema is:
> > > message SearchRequest { required string query = 1; optional int32
> > > page_number = 2; optional int32 result_per_page = 3;}
> > > Here I will fail, since it is required I will provide value for field
> 1
> > but
> > > alas it will never be there.
> >
> >
> > Yes, the required field has been removed, which is incompatible.
> > The check for changes to the required field starts at the beginning, at
> > which point an incompatible exception has been thrown and the following
> > compatibility check is not performed.
> >
> You can't say "The required field " - it's plain wrong.
> Look at your rule:
> The writtenSchema cannot remove required fields in the readSchema.
> You can say: the writeSchema has removed a required field compared with
> readSchema.
> This rule will fail the validation.
> *not* the default rule we're discussing.


Here we are talking about changing the default value. But in this case the
writtenSchema removes the required field so it is not compatible. In
implementation, the rules check are in order. Changes to the required field
are checked at the beginning. So it is not necessary to discuss changes to
the required field until the default value is changed.


6.

> > *It looks like sixth item in PIP should be removed.*
> > *And Rule 7 in PIP should be removed:*
> >
> > Delete rule 6.
> I would keep rule 7. See:
>
> https://protobuf.dev/programming-guides/dos-donts/#dont-change-the-default-value-of-a-field
> It sounds like a good best practice. We can omit it if we don't want to be
> overly strict.


I think we should really keep rule 7(In PIP issue, it is `(5)` now). Follow
the instructions on the website.
```
(7) There can not be a field which exists both in `readSchema` and
`writtenSchema`, with same field number, having different default values.
**This rule applies to proto2. But in proto3, the default value cannot be
set by yourself! If you use proto3, it will not check for this rule**
```
By the way, changes to the required field are not allowed in `3` and should
be followed as well.


7.

> > > Changing a default value is generally OK, as long as you remember that
> > > default values are never sent over the wire.
> > > Thus, if a program receives a message in which a particular field
> isn’t
> > > set, the program will see the default value as it was defined in that
> > > program’s version of the protocol.
> > > It will NOT see the default value that was defined in the sender’s
> code.
> >
> > This is in the official document 11:
> > https://protobuf.dev/programming-guides/proto2/#updating
> >
> >
> >
> So what do you wish to state with this quote?


Oh, rule 7 has been retained. Let's ignore this.



*I also took into account that proto2 was a really problematic version. And
the pulsar website seems to say that only proto3 is used.*
https://pulsar.apache.org/docs/next/schema-understand/#struct-schema
*```*
*ProtobufNativeSchema is based on protobuf native descriptor.*
*This allows Pulsar to:*
*- use native protobuf-v3 to serialize or deserialize data.*
*- use AutoConsume to deserialize data.*
*```*
*In my opinion, there are some differences between proto2 and proto3 check
rules, and this PIP also supports proto2. So for this PIP, we may need to
change the description of "ProtoBufNative" on the official website as well.*



Thanks,
sinan





Asaf Mesika <as...@gmail.com> 于2023年3月20日周一 17:46写道:

> On Sun, Mar 19, 2023 at 4:47 PM SiNan Liu <li...@gmail.com> wrote:
>
> > 1.
> >
> > > message SearchReq { string query = 1; int32 page_number = 2; int32
> > > result_per_page = 3;}
> > > Then second version I use:
> > > message SearchRequest { string query = 1; int32 page_number = 2;
> > > int32 result_per_page = 3;}
> >
> >
> > The rule in PIP improve the previous implementation, so the previous
> > implementation needs to be added.
> > If the user switches from the previous implementation to the current
> > implementation of PIP, and the current implementation doesn't check for
> > root message name changes, isn't that changing the old behavior? This PIP
> > is to make compatibility checking more stringent.
> >
> >
> Well, this is exactly the argument we had in this thread. This is why you
> make 2 implementation classes ( I called them v1, v2 if you remember). The
> user can choose between them. This is exactly while I liked the _v1 _v2
> naming, so they will know they are not backward compatible.
>
> Also, you're talking about improving right? Checking the message name is
> outright wrong, so you're actually fixing a big mistake someone made back
> then.
>
>
> >
> > 2.
> >
> > > This is a guideline for a human making the change, not for software
> > > validation right?
> > > When you write down the code for doing the comparison to know if the
> > field
> > > number has changed, you have to take into account both the field name
> AND
> > > field type. If both are equal but the field number is different then
> fail
> > > it.
> >
> >
> > I use the field name here to match, the name is the same, but the number
> is
> > not the same will be incompatible.
> >
> >
> https://github.com/apache/pulsar/pull/19566/files#diff-14b840259375cc8fcae564586fc1c2a188cb71822ab5995ed6807f7dab30c210R129-R132
> > PIP description here is the rule, but how to implement it is a matter in
> > PR. I have considered the example you gave here, which is also
> incompatible
> > in this case.
> >
> > Your changes are also described here:
> >
> >
> https://stackoverflow.com/questions/65230623/safeness-of-changing-proto-field-number
> >
> >
> This sentence "the PIP description here is the rule, but how to implemented
> it is a matter in PR" is a completely invalid argument.
> For compatibility rules you specify the exact validation.
> We argue here about the rule.
> You can say in the PIP you will match by name and then in PR match by name
> and type. You have to be exact in the PIP.
>
> Regarding "I have considered the example you gave here, which is also
> incompatible" - please explain? I find it compatible - meaning you can read
> using the readSchema message written using writeSchema.
>
> Your link to Stackoverflow will fail *my* validation: Since `int foo = 1`
> --> `int foo = 2;` : same name, same type, different field number --> fail
> validation.
> So it doesn't prove anything, or help this argument.
>
>
>
>
>
> >
> > 3.
> >
> > > If you'll read sentence 2 from the guide, you'll see they write
> > > "This means that any messages serialized by code using your “old”
> message
> > > format can be parsed by your new generated code, as they won’t be
> missing
> > > any required elements."
> > > This is for the *other side*: when you add a required to readSchema
> > > compare
> > > to write schema. This of course is not allowed and won't work.
> >
> >
> > Note that in proto2: **Any new fields that you add should be optional or
> > repeated.**
> > https://protobuf.dev/programming-guides/proto2/#updating
> >
> > I also stated in PIP that proto3 will not check this rule for required
> > field changes.
> > This rule applies to proto2. However, proto3 removes required. If you use
> > proto3, it will not check for changes to the required field
> >
> > I'm talking about proto2.
>
> Listen.
>
> This is what you wrote in the PIP, ok?
>
>
> >    - The writtenSchema cannot add required fields, but optional or
> >    duplicate fields can be added (The field number must be new).
> >
> > I answered:
>
> That's not true.
> > You can have a required field in writeSchema, and not have that field in
> > readSchema (based on tag number).
>
> I also gave you an example showing what you wrote is not true (see previous
> mails), in which I add a required field to writeSchema compared with
> readSchema and I can still read it with readSchema.
>
> You mention a link to a guideline saying you can only add optional or
> repeated without reading the rest of the paragraph and without
> understanding the protocol buffer encoding and decoding.
>
> I'm not sure how to move forward here.
> This validation rule is invalid.
>
>
>
>
> >
> > 4.
> >
> > > My write schema is
> > > message SearchRequest { string query = 1; int32 page_number = 2;
> > > int32 result_per_page = 3;}
> > > my read schema is
> > > message SearchRequest { string query = 1; int32 page_number = 2;
> > > int32 resultPerPage = 3;}
> > > - Non-required fields can be removed, as long as the field number is
> not
> > > used again in your updated message type. You may want to rename the
> field
> > > instead, perhaps adding the prefix “OBSOLETE_”, or make the field
> number
> > > reserved <https://protobuf.dev/programming-guides/proto2/#reserved>,
> so
> > > that future users of your .proto can’t accidentally reuse the number.
> > > First, they are talking about removal of fields. They suggest you
> rename
> > > the field to OBSOLETE_myField instead of actually deleting it, to make
> it
> > > easier on compatability. They don't talk about rules that forbid you to
> > > rename a field. It's unrelated.
> >
> >
> > Maybe you are right that changing the field name is compatible.
> >
> >
> https://stackoverflow.com/questions/45431685/protocol-buffer-does-changing-field-name-break-the-message/45431953#comment84548234_45431953
> >
> > But it is interesting I tested the use `avro-proto` implementation
> > `ProtobufSchemaCompatibilityCheck` compatibility check.(
> >
> `org.apache.pulsar.broker.service.schema.BaseAvroSchemaCompatibilityTest`)
> >
> > private static final String schemaJson1 =
> >
> >
> "{\"type\":\"record\",\"name\":\"DefaultTest\",\"namespace\":\"org.apache.pulsar.broker.service.schema"
> > +
> > ".AvroSchemaCompatibilityCheckTest\",\"fields\":[
> > {\"name\":\"field1\",\"type\":\"string\"}]}";
> >
> > private static final String schemaJson4 =
> >
> >
> "{\"type\":\"record\",\"name\":\"DefaultTest\",\"namespace\":\"org.apache.pulsar.broker.service.schema"
> > +
> >
> >
> ".AvroSchemaCompatibilityCheckTest\",\"fields\":[{\"name\":\"field1_v2\",\"type\":\"string\","
> > +
> > "\"aliases\":[\"field1\"]}]}";
> >
> > schemaCompatibilityCheck.isCompatible(schemaData1, schemaData4,
> > SchemaCompatibilityStrategy.BACKWARD);
> > This is compatible.
> >
> > But that's not compatible
> > schemaCompatibilityCheck.isCompatible(schemaData1, schemaData4,
> > SchemaCompatibilityStrategy.FORWARD);
> >
> > So, what do you think of AVRO implementation
> > `ProtobufSchemaCompatibilityCheck`, rules about renaming are wrong is it?
> > It looks something like this.
> >
> >
> You're asking why Avro doesn't allow when read schema is schema 4, and
> write schema is 1 and the difference between them is a single field rename?
> I don't know - need to debug to understand.
> I do know that Avro encoding is completely different. They don't have a
> field identifier like protobuf does. They completely rely on the schema of
> the writer and the reader.
> But it is besides the point, since they are supposed to allow field rename
> from my knowledge.
> You can debug and find out.
>
>
>
> >
> > 5.
> >
> > > I think my previous explanation in this email reply should be good
> enough
> > > to explain why the name is irrelevant.
> > > If not, please ping me on this.
> > > Regarding your example, can you quote it? I don't understand the read
> and
> > > write schema here.
> > > Also, regarding message names and why they are irrelevant. Same thing:
> > Read
> > > https://protobuf.dev/programming-guides/encoding/
> > > In there you will see the names are not encoded thus can be changed
> > freely
> > > and still successfully decode binary data.
> >
> >
> > Reader.proto
> > ```protobuf
> > syntax = "proto2";
> > package proto.reader;
> > import "ExternalReader.proto";
> > option java_package = "org.apache.pulsar.client.schema.proto.reader";
> > option java_outer_classname = "Reader";
> >
> > enum WeekEnum {
> > ......
> > }
> >
> > message ProtobufMessage {
> > required string protobufFoo = 1;
> > required double protobufBar = 2;
> > }
> >
> > message ProtobufSchema {
> > ......
> > optional ProtobufMessage messageField = 12;
> > ......
> > optional WeekEnum enumField = 15;
> > }
> > ```
> >
> > WriterWithTypeNameChange.proto:
> > ```protobuf
> > syntax = "proto2";
> > package proto.writerWithTypeNameChange;
> > import "ExternalReader.proto";
> > option java_package =
> > "org.apache.pulsar.client.schema.proto.writerWithTypeNameChange";
> > option java_outer_classname = "WriterWithTypeNameChange";
> >
> > enum WeekEnum {
> > ......
> > }
> >
> > message ProtobufMessage_V2 {
> > required string protobufFoo = 1;
> > required double protobufBar = 2;
> > }
> >
> > message ProtobufSchema {
> > ......
> > optional ProtobufMessage_V2 messageField = 12;
> > ......
> > optional WeekEnum enumField = 15;
> > ......
> > }
> > ```
> >
> > *Doesn't that change the field type? Should that be compatible?*
> >
> >
> It doesn't change the field type since you haven't changed the field types
> of the fields inside ProtobufMessage - they are still field 1 and field 2
> double.
> It is compatible since the message type name is not encoded in the binary.
>
>
> >
> > 6.
> >
> > > So if my write schema is
> > > message SearchRequest { required string query = 1; optional int32
> > > page_number = 2; optional int32 result_per_page = 3;}
> > > and my read schema is
> > > message SearchRequest { optional int32 page_number = 2; optional
> > > int32 result_per_page = 3;}
> > > You can see I removed a field from writeSchema that does not have a
> > default
> > > value.
> > > First, I can read it without any problems: Field 2 might exists, no
> > > problem. Field 3 might exists, no problem. I ignore field 1.
> > > So the validation you wrote is no correct at all, without any regard to
> > > optional value.
> >
> >
> > *It looks like you've added a required field here, which is not allowed
> (in
> > 3 i am said that).*
> > *It also uses the number 1 of the deleted field.*
> >
> > I removed the required field "required string query = 1" from writeSchema
> compared with readSchema.
> Of course you can say I added "required string query = 1" to writeSchema
> compared with readSchema.
> The main question: can I use readSchema to read message written by
> writeSchema?
> Yes I can.
> I wrote why above.
> I didn't use the number of 1 of deleted field. The read schema is using
> number 2 and 3.
>
>
>
>
>
> > What if I do the other way around?
> > > My write schema is:
> > > message SearchRequest { optional int32 page_number = 2; optional
> > > int32 result_per_page = 3;}
> > > My read schema is:
> > > message SearchRequest { required string query = 1; optional int32
> > > page_number = 2; optional int32 result_per_page = 3;}
> > > Here I will fail, since it is required I will provide value for field 1
> > but
> > > alas it will never be there.
> >
> >
> > Yes, the required field has been removed, which is incompatible.
> > The check for changes to the required field starts at the beginning, at
> > which point an incompatible exception has been thrown and the following
> > compatibility check is not performed.
> >
>
> You can't say "The required field " - it's plain wrong.
>
> Look at your rule:
>
> The writtenSchema cannot remove required fields in the readSchema.
>
>
> You can say: the writeSchema has removed a required field compared with
> readSchema.
> This rule will fail the validation.
> *not* the default rule we're discussing.
>
>
>
>
> >
> > Now let's focus on the optional field, ok?
> > > Write schema
> > > message SearchRequest { optional int32 page_number = 2; optional
> > > int32 result_per_page = 3;}
> > > Read schema:
> > > message SearchRequest { optional string query = 1; optional int32
> > > page_number = 2; optional int32 result_per_page = 3;}
> > > When I read, I won't find field number 1 of course, so I will use the
> > > default value dictated by the type, as I quoted above, which is for
> > string
> > > is the empty byte string.
> >
> >
> > *It looks like sixth item in PIP should be removed.*
> > *And Rule 7 in PIP should be removed:*
> >
> > Delete rule 6.
> I would keep rule 7. See:
>
> https://protobuf.dev/programming-guides/dos-donts/#dont-change-the-default-value-of-a-field
> It sounds like a good best practice. We can omit it if we don't want to be
> overly strict.
>
>
>
> > > Changing a default value is generally OK, as long as you remember that
> > > default values are never sent over the wire.
> > > Thus, if a program receives a message in which a particular field isn’t
> > > set, the program will see the default value as it was defined in that
> > > program’s version of the protocol.
> > > It will NOT see the default value that was defined in the sender’s
> code.
> >
> > This is in the official document 11:
> > https://protobuf.dev/programming-guides/proto2/#updating
> >
> >
> >
> So what do you wish to state with this quote?
>
>
> > 7.
> >
> > > Read this and then please explain why rule number 3 should not be
> added:
> > >
> > >
> >
> https://protobuf.dev/programming-guides/dos-donts/#dont-go-from-repeated-to-scalar
> >
> >
> > It looks like this rule needs to be added.
> >
> > Thanks
>
>
> >
> >
> > Thanks,
> > sinan
> >
> >
> >
> > Asaf Mesika <as...@gmail.com> 于2023年3月19日周日 19:11写道:
> >
> > > Also:
> > >
> > > (2) PROTOBUF_NATIVE was designed so that not use avro-protobuf for
> > protobuf
> > > > schema compatibility checking.
> > >
> > > The root message name is the class name we pass in when we create the
> > > > producer or consumer. ProtoBuf has many nested messages or
> > dependencies.
> > > > The current implementation only checks if the passed class name is
> the
> > > > same. It does not check if the fields in the file change in a way
> that
> > is
> > > > compatible with older versions of the schema.
> > >
> > >
> > > You missed the most important thing there: PROTOBUF_NATIVE uses
> Protobuf
> > > Descriptor when persisting the schema. It's not using Avro Schema
> > > definition. As I wrote you in previous email:
> > >
> > > PROTOBUF_NATIVE was created to fix that shortcoming, by actually
> > persisting
> > > > the Protobuf Descriptor and using Protobuf for encoding.
> > > >
> > >
> > >
> > >
> > >
> > >
> > > On Wed, Mar 15, 2023 at 5:46 PM SiNan Liu <li...@gmail.com>
> > wrote:
> > >
> > > > 1.
> > > >
> > > > > > Why? The root message name is not written over the wire to the
> best
> > > of
> > > > my
> > > > > > knowledge. I haven't found it written in the official doc.
> > > >
> > > >
> > > > The name of the root message check is the rules in the previous
> > > > `ProtobufNativeSchemaCompatibilityCheck`. Because if the root message
> > > has a
> > > > different name, there is no need to check its contents. "Same"
> schema,
> > > > their names must be the same.
> > > >
> > > >
> > > > 2.
> > > >
> > > > > >The writtenSchema can not change the field number of any field in
> > > > > readSchema (the > field name is the same, but the field number is
> > > > > different).
> > > > > >You have to take into account field type as well when comparing.
> > > >
> > > >
> > > > The first sentence on the website says that the number of fields
> cannot
> > > be
> > > > changed.
> > > >
> > > > > Don’t change the field numbers for any existing fields.
> > > >
> > > >
> > > >
> > > > 3.
> > > >
> > > > > > - The writtenSchema cannot add required fields, but optional or
> > > > > > duplicate fields can be added (The field number must be new).
> > > > > >
> > > > > > That's not true.
> > > > > You can have a required field in writeSchema, and not have that
> field
> > > in
> > > > > readSchema (based on tag number).
> > > >
> > > >
> > > > The required field must exist. In your case, where does readSchema go
> > to
> > > > read its required fields? It's not in writtenSchema.
> > > > The second and third sentences in the official website say, add and
> > > delete
> > > > do not operate required field!
> > > >
> > > >
> > > > 4.
> > > >
> > > > > (4) The writtenSchema can not change the field name of any field in
> > > > > > readSchema (the field number is the same, but the field name is
> > > > > > different).
> > > > > This is incorrect.
> > > > > Fields names are encoded into the wire. I don't see this in any
> best
> > > > > practice.
> > > >
> > > >
> > > > The third sentence on the website:
> > > >
> > > > > You may want to rename the field instead, perhaps adding the prefix
> > > > > “OBSOLETE_”, or make the field number reserved, so that future
> users
> > of
> > > > > your .proto can’t accidentally reuse the number.
> > > >
> > > > If you want to rename a field, or add a new field. To delete with the
> > new
> > > > number!
> > > >
> > > >
> > > > 5.
> > > >
> > > > > The writtenSchema does not change the field name and number, but it
> > > does
> > > > > change the field type.
> > > > > > Small correction: for the same field number you are not allowed
> to
> > > > change
> > > > > types. Name is irrelevant.
> > > >
> > > >
> > > > Why doesn't the name irrelevant?
> > > > Here is the change in type, which is the rule stated in Alternatives.
> > > There
> > > > is no check here, just a warning to the user.
> > > > Another change is that the name of enum is changed, or the name of
> > > MESSAGE
> > > > is changed, which is the same as the root message name check in 1, is
> > the
> > > > change still the same? This is not allowed to change!
> > > >
> > > > *Here is example:*
> > > > readSchema(
> > > >
> > > >
> > >
> >
> https://github.com/apache/pulsar/pull/19566/files#diff-a7006d73502e6064a80af02822f3a3072be498d8b677c4b838b0dafaea32dea4
> > > > )
> > > > writtenSchema(
> > > >
> > > >
> > >
> >
> https://github.com/apache/pulsar/pull/19566/files#diff-e3e7543624edaf1e0a4fd47947a2cad6e4b816b93843f71a367042ba6c3ec53f
> > > > )
> > > >
> > > >
> > > > 6.
> > > >
> > > > > (6) The writtenSchema removes fields that do not have default
> values
> > in
> > > > > > readSchema. Then the schema is incompatible.
> > > > > Protobuf gives you its own default if you don't supply one. This is
> > > > > incorrect.
> > > >
> > > >
> > > > (1) This rule only applies if proto2 does not set the default value.
> If
> > > > proto3 does not check, the default value will always be there.
> > > >
> > > > (2) In PIP issue:
> > > >
> > > > > Proto3 canceled the required field, so there is no need to check
> the
> > > > > required field. We get the syntax(proto2 or proto3) of proto in the
> > > code,
> > > > > and skip the check of the required field if it is proto3. All other
> > > > > checking rules also apply to proto3.
> > > >
> > > >
> > > > *I made a mistake here. This default value check is not need in
> > proto3. I
> > > > will modify the rules later according to your suggestion.*
> > > >
> > > > > I would remove the proto2/proto3 sections, since they only differ
> in
> > 1
> > > > > rule, and just mention that distinction inside that rule (less work
> > for
> > > > the
> > > > > reade).
> > > >
> > > >
> > > > (3) And add rules that look like they should be checked:
> > > >
> > > > > Rules that you don't have in the doc, but should IMO*
> > > > > .......
> > > >
> > > > There can't be a field in writerSchema, that exists in readerSchema
> > (tag
> > > > > number based), which in writerSchema its type is scalar, but in
> > > > readSchema
> > > > > its type is scalar, it's repeated but with packed=true.
> > > >
> > > >
> > > > But I don't think rule number three needs to be added.
> > > >
> > > >
> > > > Thanks,
> > > > sinan
> > > >
> > > >
> > > >
> > > > Asaf Mesika <as...@gmail.com> 于2023年3月14日周二 22:33写道:
> > > >
> > > > > Hi Sinan,
> > > > >
> > > > > The doc looks much better!
> > > > >
> > > > > I have a few additional comments:
> > > > >
> > > > > Pasting comment from previous emails:
> > > > >
> > > > > Can you convert the code block which is actually a quote in the
> > > > > beginning of the PIP to something which doesn't require to scroll
> > > > > horizontally so much?
> > > > > Use
> > > > >
> > > > >
> > > >
> > >
> >
> https://docs.github.com/en/get-started/writing-on-github/getting-started-with-writing-and-formatting-on-github/basic-writing-and-formatting-syntax#quoting-text
> > > > >
> > > > > *Validation Rules*
> > > > >
> > > > > (1) If the root message names of writtenSchema and readSchema are
> > > > > > different, then incompatible.
> > > > >
> > > > > Why? The root message name is not written over the wire to the best
> > of
> > > my
> > > > > knowledge. I haven't found it written in the official doc.
> > > > >
> > > > >
> > > > > >    - The writtenSchema cannot add required fields, but optional
> or
> > > > > >    duplicate fields can be added (The field number must be new).
> > > > > >
> > > > > > That's not true.
> > > > > You can have a required field in writeSchema, and not have that
> field
> > > in
> > > > > readSchema (based on tag number).
> > > > >
> > > > > The writtenSchema can not change the field number of any field in
> > > > > readSchema (the
> > > > > > field name is the same, but the field number is different).
> > > > >
> > > > > You have to take into account field type as well when comparing.
> > > > >
> > > > > If I have in readSchema
> > > > > int32 justAnID = 1
> > > > > int32 customerId = 2
> > > > >
> > > > > and in writeSchema I have
> > > > > int32 justAnID = 1
> > > > > string customerId = 3
> > > > >
> > > > > This is valid.
> > > > >
> > > > > (4) The writtenSchema can not change the field name of any field in
> > > > > > readSchema (the field number is the same, but the field name is
> > > > > > different).
> > > > >
> > > > > This is incorrect.
> > > > > Fields names are encoded into the wire. I don't see this in any
> best
> > > > > practice.
> > > > >
> > > > > ) The writtenSchema does not change the field name and number, but
> it
> > > > does
> > > > > > change the field type.
> > > > > >
> > > > > >    - If the field type is ENUM or MESSAGE, the schema is not
> > > compatible
> > > > > >    when the type name is changed
> > > > > >    - If the type of the field is another type. The schemas under
> > this
> > > > > >    rule are not incompatible, but warn the user.(There is another
> > way
> > > > of
> > > > > >    testing in PIP issue Alternatives)
> > > > > >
> > > > > > Small correction: for the same field number you are not allowed
> to
> > > > change
> > > > > types. Name is irrelevant.
> > > > >
> > > > > (6) The writtenSchema removes fields that do not have default
> values
> > in
> > > > > > readSchema. Then the schema is incompatible.
> > > > >
> > > > > Protobuf gives you its own default if you don't supply one. This is
> > > > > incorrect.
> > > > >
> > > > >
> > > > > *Rules that you don't have in the doc, but should IMO*
> > > > > * There can not be a field which exists both in readSchema and
> > > > writeSchema,
> > > > > with same tag number, having different default values
> > > > > * There can't be a field in writerSchema, that exists in
> readerSchema
> > > > (tag
> > > > > number based), which in writerSchema is repeated and its type is
> > > Scalar (
> > > > > https://protobuf.dev/programming-guides/proto/#scalar) but in
> > > readSchema
> > > > > it
> > > > > is not repeated anymore.
> > > > > * There can't be a field in writerSchema, that exists in
> readerSchema
> > > > (tag
> > > > > number based), which in writerSchema its type is scalar, but in
> > > > readSchema
> > > > > its type is scalar, it's repeated but with packed=true.
> > > > >
> > > > > *Rules you have , but I would phrase a bit differently*
> > > > >
> > > > > I would remove the proto2/proto3 sections, since they only differ
> in
> > 1
> > > > > rule, and just mention that distinction inside that rule (less work
> > for
> > > > the
> > > > > reade).
> > > > >
> > > > > * readSchema has a field which doesn't exist in writerSchema (based
> > on
> > > > tag
> > > > > number).
> > > > >    * Proto v2:
> > > > >       * That field must be `optional` or `repeated` (must not be
> > > > > `required`)
> > > > >    * Proto v3:
> > > > >       * No problem.
> > > > > * There can not be a field which exists both in readSchema and
> > > > writeSchema,
> > > > > with the same tag number, but having different types.
> > > > >
> > > > > *Motivation*
> > > > >
> > > > > Basically in the motivation section you want people to understand
> the
> > > > > following:
> > > > >
> > > > > Pulsar has built-in support for typed messages. It allows
> specifying
> > an
> > > > > encoding scheme and its matching schema.
> > > > > For example, it supports Avro. You specify a schema for a given
> > topic,
> > > > > using Avro Schema Definition (i.e. a JSON describing the schema).
> > > > Everytime
> > > > > you produce a message, you first declare the schema definition you
> > wish
> > > > to
> > > > > use for your messages. The message data should be an avro-encoded
> > > binary
> > > > > data (which the client in some SDKs helps encode a given
> > > > > data-structure/object).
> > > > > The same applies when you consume a message. You first specify the
> > > schema
> > > > > you use to read the messages, and the client in some SDKs helps by
> > > > decoding
> > > > > the message binary data into an object/data-structure.
> > > > >
> > > > > Each time you specify a schema to be used, either by a producer or
> a
> > > > > consumer, the schema is persisted in Pulsar and given an increasing
> > > > version
> > > > > number. If the schema was the same as the previous version, it is
> not
> > > > > saved. When the message is persisted, the version number is encoded
> > in
> > > > the
> > > > > message headers.
> > > > >
> > > > > Pulsar provides a very useful feature named Schema Evolution
> > > > > <
> > > >
> > >
> >
> https://pulsar.apache.org/docs/2.11.x/schema-understand/#schema-evolution
> > > > > >.
> > > > > It allows us to check if a new schema version is compatible with
> > > previous
> > > > > versions or versions. When you configure the schema for the topic
> you
> > > > > decide the strategy to use for doing the validation check. The
> > > strategies
> > > > > validate the following:
> > > > >
> > > > >    - BACKWARD strategy
> > > > >       - A consumer with newSchema can read a message written using
> > > > >       existingSchema
> > > > >    - BACKWARD_TRANSITIVE strategy
> > > > >       - A consumer with newSchema can read messages written using
> all
> > > > >       existingSchema
> > > > >    - FORWARD
> > > > >       - A consumer with existingSchema can read messages written
> > using
> > > > >       newSchema
> > > > >    - FORWARD_TRANSITIVE
> > > > >       - A consumer defined with any of the existingSchema can read
> > > > messages
> > > > >       written using newSchema
> > > > >    - FULL
> > > > >       - A consumer defined with newSchema can read messages written
> > > using
> > > > >       existingSchema
> > > > >       - A consumer defined with existingSchema can read messages
> > > written
> > > > >       using newSchema
> > > > >    - FULL_TRANSITIVE
> > > > >       - A consumer defined with newSchema can read messages written
> > > using
> > > > >       any of the existingSchema
> > > > >       - A consumer defined with any of the existingSchema can read
> > > > messages
> > > > >       written using newSchema
> > > > >
> > > > >
> > > > > Aside from Avro, Pulsar also has two additional supported
> encodings:
> > > > > PROTOBUF and PROTOBUF_NATIVE.
> > > > >
> > > > > PROTOBUF is a bit strange. It encodes the messages using Protobuf
> > > > encoding,
> > > > > but the schema that is persisted to Pulsar is *not* Protobuf
> > Descriptor
> > > > as
> > > > > you would have expected. The saved schema is a translation of the
> > > > Protobuf
> > > > > Descriptor to an Avro Schema, so in fact an Avro schema definition
> is
> > > > saved
> > > > > as the schema.
> > > > >
> > > > > PROTOBUF_NATIVE was created to fix that shortcoming, by actually
> > > > persisting
> > > > > the Protobuf Descriptor and using Protobuf for encoding.
> > > > > The problem is that the authors of PROTOBUF_NATIVE haven't
> completed
> > it
> > > > > fully, and the backward compatibility validation code almost does
> not
> > > > > exist: It only checks if the root message name is the same between
> > > > > versions.
> > > > >
> > > > > GOALS
> > > > > The goal of this PIP is to amend PROTOBUF_NATIVE by adding a fully
> > > > > functional validation for any of the defined Schema Compatibility
> > > > > Strategies.
> > > > > A secondary goal is to allow the user to choose between different
> > > > > implementations: The new fully functional validation or the
> existing
> > > > > barebones validation.
> > > > >
> > > > > -------- END
> > > > >
> > > > > I'm ok with having links in the Motivation , as *further reading*.
> > > > > I'm against stacking up work for the reader to go read 5-6
> different
> > > > links
> > > > > just to understand the motivation and background knowledge required
> > to
> > > > > understand the feature.
> > > > >
> > > > > I'm against putting code in the Motivation. Especially if it is
> > > supposed
> > > > to
> > > > > replace description in plain English making it easy to understand
> the
> > > > > design.
> > > > > Leave the code to the motivation.
> > > > > Paste code only if you absolutely can't use plain old descriptions
> to
> > > > > explain.
> > > > >
> > > > >
> > > > >
> > > > > On Sat, Mar 11, 2023 at 11:46 AM SiNan Liu <liusinan1998@gmail.com
> >
> > > > wrote:
> > > > >
> > > > > > *I guess that's right, too! *
> > > > > >
> > > > > > But the name `ProtobufNativeAdvancedSchemaCompatibilityCheck` is
> > > > better,
> > > > > > because we don't know whether the future will have V2, V3. The
> > > official
> > > > > > solution can be called
> > > > `ProtobufNativeOfficialSchemaCompatibilityCheck`,
> > > > > or
> > > > > > is a good `ProtobufNativeXXXXXXXXSchemaCompatibilityCheck`
> > > third-party
> > > > > > solution.
> > > > > >
> > > > > > I've updated my design in PIP issue.
> > > > > > 1. A new ProtobufNativeSchemaAdvanceCompatibilityCheck, rather
> > than a
> > > > > > ProtobufNativeSchemaCompatibilityCheck different validator
> > > > > implementation.
> > > > > > 2. Remove the 'builder'
> > > > > > 3. Clarify the relationship between newSchema, existingSchema,
> and
> > > > > > writtenSchema in canRead.
> > > > > >
> > > > > > Help to see if the description is comprehensive and what changes
> > and
> > > > > > improvements need to be made.
> > > > > >
> > > > > > Thanks,
> > > > > > sinan
> > > > > >
> > > > > >
> > > > > >
> > > > > > Asaf Mesika <as...@gmail.com> 于2023年3月9日周四 17:35写道:
> > > > > >
> > > > > > > I like Bo's suggestion - I haven't realized each schema type
> > > > > > > compatibility check is actually a plugin.
> > > > > > > It makes sense for any schema type checks to evolve, sometimes
> > in a
> > > > > > > non-backward compatible way hence having two plugins like
> > > > > > > protobufNativeSchemaCompatabilityCheckV1 and then
> > > > > > > protobufNativeSchemaCompatabilityCheckV2 and then
> > > > > > > protobufNativeSchemaCompatabilityCheckV3 makes sense to me.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Thu, Mar 9, 2023 at 5:49 AM 丛搏 <bo...@apache.org> wrote:
> > > > > > >
> > > > > > > >  Hi siNan:
> > > > > > > >
> > > > > > > > From my point of view, it is just a plug-in. I don't think it
> > is
> > > > > > > > necessary to add configuration for the plugin.
> > > > > > > > This is meaningless, and it will increase the difficulty of
> use
> > > for
> > > > > > > users.
> > > > > > > >
> > > > > > > >
> > > > > > > > SiNan Liu <li...@gmail.com> 于2023年3月8日周三 15:54写道:
> > > > > > > > >
> > > > > > > > > Hi, bo.
> > > > > > > > >
> > > > > > > > > 1. I understand what you say, to develop a new
> > > > > > > > > `ProtobufNativeAdvancedSchemaCompatibilityCheck`, rather
> than
> > > > > > changing
> > > > > > > > > existing `ProtobufNativeSchemaCompatibilityCheck`. But I
> > found
> > > a
> > > > > few
> > > > > > > > small
> > > > > > > > > problems:
> > > > > > > > >
> > > > > > > > > (1)ProtobufNativeAdvancedSchemaCompatibilityCheck and
> > > > > > > > > ProtobufNativeSchemaCompatibilityCheck schemaType is
> > > > > PROTOBUF_NATIVE.
> > > > > > > It
> > > > > > > > > looks like both checkers are PROTOBUF not using
> > AVRO-PROTOBUF's
> > > > > > > "native"
> > > > > > > > > implementation, which leads to some problems or
> > "unreasonable"
> > > > and
> > > > > > > gives
> > > > > > > > me
> > > > > > > > > some extended thinking and questions.
> > > > > > > > >
> > > > > > > > `CompatibilityCheck ` its only a plugin.
> > > > > > > > `ProtobufNativeSchemaCompatibilityCheck` may sooner or later
> > > leave
> > > > > the
> > > > > > > > stage, when `ProtobufNativeAdvancedSchemaCompatibilityCheck`
> is
> > > > > > > > stable, we can make it the default Checker.
> > > > > > > >
> > > > > > > > It is just a plug-in, users can change it at will and ensure
> > that
> > > > it
> > > > > > > > is used correctly
> > > > > > > > > (2)In broker.conf
> > > > > > > > >
> > > > > > > > > `schemaRegistryCompatibilityCheckers`. If
> > > > > > > > > ProtobufNativeSchemaCompatibilityCheck and
> > > > > > > > > ProtobufNativeAdvancedSchemaCompatibilityCheck all set.
> This
> > is
> > > > > going
> > > > > > > to
> > > > > > > > > overwrite each other. Because this is a map:
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/pulsar/blob/af1360fb167c1f9484fda5771df3ea9b21d1440b/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/schema/SchemaRegistryService.java#L36-L44
> > > > > > > > >
> > > > > > > > > ```java
> > > > > > > > >
> > > > > > > > > Map<SchemaType, SchemaCompatibilityCheck> checkers = new
> > > > > HashMap<>();
> > > > > > > > >
> > > > > > > > > for (String className : checkerClasses) {
> > > > > > > > >
> > > > > > > > > SchemaCompatibilityCheck schemaCompatibilityCheck =
> > > > > > > > > Reflections.createInstance(className,
> > > > > > > > >
> > > > > > > > > SchemaCompatibilityCheck.class,
> > > > > > > > > Thread.currentThread().getContextClassLoader());
> > > > > > > > >
> > > > > > > > > checkers.put(schemaCompatibilityCheck.getSchemaType(),
> > > > > > > > > schemaCompatibilityCheck);
> > > > > > > > >
> > > > > > > > > ```
> > > > > > > > >
> > > > > > > > > Is this a big problem or a small one? Is it possible or
> > > > > unnecessary?
> > > > > > > > Maybe
> > > > > > > > > we can write in the documentation that protobufNative
> > checkers
> > > > can
> > > > > > only
> > > > > > > > > choose one of the two? Why are there two Checkers for
> > different
> > > > > > > > > implementations of the same schemaType? Why not the checker
> > to
> > > > > create
> > > > > > > > > different validator, so we don not have to change
> > > > > > > > > schemaRegistryCompatibilityCheckers.
> > > > > > > >
> > > > > > > > users can only use one, not two, which will bring complexity
> to
> > > > users
> > > > > > > >
> > > > > > > > >
> > > > > > > > > (3)And after the update to
> > > > > > > > ProtobufNativeAdvancedSchemaCompatibilityCheck.
> > > > > > > > > Existing topics previously only checked the name of the
> root
> > > > > message,
> > > > > > > not
> > > > > > > > > the content of protobuf.
> > > > > > > > >
> > > > > > > > > What if the user wants both Checkers?
> > > > > > > > >
> > > > > > > > > Set to ProtobufNativeAdvancedSchemaCompatibilityCheck,
> affect
> > > the
> > > > > > topic
> > > > > > > > of
> > > > > > > > > the existing schema?
> > > > > > > > >
> > > > > > > > > Older topics still use the old checker, and newer topics or
> > > > certain
> > > > > > > older
> > > > > > > > > topics use the new advancedchecker.
> > > > > > > > >
> > > > > > > > when `ProtobufNativeAdvancedSchemaCompatibilityCheck` stable,
> > > > > > > > users will not choose
> `ProtobufNativeSchemaCompatibilityCheck`.
> > > > > > > > because it not a complete checker.
> > > > > > > > > (4)So should we have one schemaType for a checker?
> > > > > > > protobufNativeChecker
> > > > > > > > > can have as many different implementation classes as
> > possible.
> > > > This
> > > > > > > > > classname configuration in PIP, let's see if it can be set
> at
> > > the
> > > > > > topic
> > > > > > > > > level. In the current PIP design I just load this parameter
> > > into
> > > > > the
> > > > > > > > > checker when the broker is started and the checkers map is
> > set
> > > > up.
> > > > > > Can
> > > > > > > I
> > > > > > > > do
> > > > > > > > > this in the new normal pr if I want to support topic level?
> > Or
> > > > > > perfect
> > > > > > > it
> > > > > > > > > here?
> > > > > > > > >
> > > > > > > > > Add a call PROTOBUF_NATIVE_ADVANCE schemaType corresponding
> > > > > > > > > ProtobufNativeAdvancedSchemaCompatibilityCheck? (Seems to
> be
> > > more
> > > > > > > > trouble).
> > > > > > > > >
> > > > > > > > > Sorry I can not use the computer and network in the
> company,
> > I
> > > > use
> > > > > my
> > > > > > > > > mobile phone to reply to the email, the format may be a bit
> > > > messy.
> > > > > > > Please
> > > > > > > > > understand.
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > >
> > > > > > > > > sinan
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > 丛搏 <bo...@apache.org> 于 2023年3月7日周二 下午11:39写道:
> > > > > > > > >
> > > > > > > > > > SiNan Liu <li...@gmail.com> 于2023年3月7日周二 13:22写道:
> > > > > > > > > > >
> > > > > > > > > > > Great to see your comment, bo!
> > > > > > > > > > >
> > > > > > > > > > > 1. The first way. The protobuf website has a
> description
> > of
> > > > the
> > > > > > > > rules,
> > > > > > > > > > but
> > > > > > > > > > > no plans to implement them.
> > > > > > > > > > >
> https://protobuf.dev/programming-guides/proto/#updating
> > > > > > > > > >
> > > > > > > > > > https://groups.google.com/g/protobuf
> > > > > > > > > > maybe ask here
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > 2. I think this PIP can be divided into two parts.
> > > > > > > > > > > (1) Add a flag(`ValidatorClassName`), load it into
> > > > > > > > > > > `ProtobufNativeSchemaCompatibilityCheck` when the
> broker
> > > > > starts.
> > > > > > > > > > > ValidatorClassName is empty by default, and the
> > > > implementation
> > > > > > > > continues
> > > > > > > > > > as
> > > > > > > > > > > before, with no change for the user.
> > > > > > > > > >
> > > > > > > > > > `ProtobufNativeSchemaCompatibilityCheck` is a plugin in
> > > > > > `broker.conf`
> > > > > > > > > > ```
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> schemaRegistryCompatibilityCheckers=org.apache.pulsar.broker.service.schema.JsonSchemaCompatibilityCheck,org.apache.pulsar.broker.service.schema.AvroSchemaCompatibilityCheck,org.apache.pulsar.broker.service.schema.ProtobufNativeSchemaCompatibilityCheck
> > > > > > > > > > ```
> > > > > > > > > > I do not recommend that we directly modify this plugin
> and
> > > > > continue
> > > > > > > to
> > > > > > > > > > add configuration items, which will cause trouble for
> > users.
> > > > > > > > > > We have a lot of configs and it's getting very unwieldy.
> > > > > > > > > > in my opinion, we don't change
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> `org.apache.pulsar.broker.service.schema.ProtobufNativeSchemaCompatibilityCheck`,
> > > > > > > > > > it is a simple implementation, it doesn't go wrong very
> > > often,
> > > > > most
> > > > > > > > > > users will use it. we can add another ProtobufNativeCheck
> > > named
> > > > > > > > > > `ProtobufNativeAdvancedSchemaCompatibilityCheck ` or
> other.
> > > in
> > > > > this
> > > > > > > > > > way, we don't need to add this flag. There is no need to
> > > > consider
> > > > > > > > > > compatibility, it is just a plug-in and will not affect
> > > current
> > > > > > > logic.
> > > > > > > > > > If the user needs it, just change the plugin to the new
> > > > > > > implementation
> > > > > > > > > >
> > > > > > > > > > > ```java
> > > > > > > > > > >     ProtobufNativeSchemaValidator DEFAULT =
> > > (fromDescriptors,
> > > > > > > > > > toDescriptor)
> > > > > > > > > > > -> {
> > > > > > > > > > >         for (Descriptors.Descriptor fromDescriptor :
> > > > > > > > fromDescriptors) {
> > > > > > > > > > >             // The default implementation only checks
> if
> > > the
> > > > > root
> > > > > > > > message
> > > > > > > > > > > has changed.
> > > > > > > > > > >             if
> > > > > > > > > > >
> > > > > >
> (!fromDescriptor.getFullName().equals(toDescriptor.getFullName()))
> > > > > > > {
> > > > > > > > > > >                 throw new
> > > > > ProtoBufCanReadCheckException("Protobuf
> > > > > > > > root
> > > > > > > > > > > message isn't allow change!");
> > > > > > > > > > >             }
> > > > > > > > > > >         }
> > > > > > > > > > >     };
> > > > > > > > > > > ```
> > > > > > > > > > > `ValidatorClassName` value also can be set to the
> current
> > > > > > > > implementation
> > > > > > > > > > of
> > > > > > > > > > > PIP add
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> `org.apache.pulsar.broker.service.schema.validator.ProtobufNativeSchemaBreakValidatorImpl`.
> > > > > > > > > > >
> > > > > > > > > > > (2) Recoding the
> > `ProtobufNativeSchemaCompatibilityCheck`.
> > > > > > Through
> > > > > > > > the
> > > > > > > > > > flag
> > > > > > > > > > > (`ValidatorClassName`) to build different
> > > > > > > > > > `ProtobufNativeSchemaValidator`.
> > > > > > > > > > > Isn't it just a plug-in? The user can develop and
> choose
> > a
> > > > > > > different
> > > > > > > > > > > `ProtobufNativeSchemaValidator`. I think it didn't
> change
> > > the
> > > > > > > logic,
> > > > > > > > it
> > > > > > > > > > > just allowed him to expand it.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > I think this PIP should be an enhancement and
> supplement
> > to
> > > > the
> > > > > > > > function,
> > > > > > > > > > > and there is no such thing as unnecessary and
> > meaningless.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Thanks,
> > > > > > > > > > > sinan
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > 丛搏 <bo...@apache.org> 于2023年3月7日周二 11:53写道:
> > > > > > > > > > >
> > > > > > > > > > > > I think we have two ways to do that.
> > > > > > > > > > > >
> > > > > > > > > > > > First way: We need to advance the improvement of java
> > in
> > > > > > > protobuf.
> > > > > > > > Ask
> > > > > > > > > > > > if they have plans to improve.
> > > > > > > > > > > >
> > > > > > > > > > > > Second way: the new PROTOBUF_NATIVE
> > > > > `SchemaCompatibilityCheck`
> > > > > > > > should
> > > > > > > > > > > > be implemented as a plugin, don't change any existing
> > > > plugin
> > > > > > > logic
> > > > > > > > > > > > (it's simple and already used). I don't recommend
> > adding
> > > > > flags
> > > > > > > for
> > > > > > > > > > > > rollback, it adds configuration and makes little
> sense.
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks,
> > > > > > > > > > > > Bo
> > > > > > > > > > > >
> > > > > > > > > > > > Asaf Mesika <as...@gmail.com> 于2023年3月6日周一
> > > 23:00写道:
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > Can you convert the code block which is actually a
> > > quote
> > > > in
> > > > > > the
> > > > > > > > > > > > > beginning of the PIP to something which doesn't
> > require
> > > > to
> > > > > > > scroll
> > > > > > > > > > > > > horizontally so much?
> > > > > > > > > > > > > Use
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://docs.github.com/en/get-started/writing-on-github/getting-started-with-writing-and-formatting-on-github/basic-writing-and-formatting-syntax#quoting-text
> > > > > > > > > > > > >
> > > > > > > > > > > > > Let's improve the clarity of what you wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > "the PROTOBUF uses avro struct to store."
> > > > > > > > > > > > > -->
> > > > > > > > > > > > > When Schema type PROTOBUF is used, Pulsar Client
> > > assumes
> > > > > the
> > > > > > > > object
> > > > > > > > > > given
> > > > > > > > > > > > > to it as message data is an auto-generated POJO
> > > > containing
> > > > > > the
> > > > > > > > > > > > annotations
> > > > > > > > > > > > > encoding the schema. The client is using a
> converter,
> > > > which
> > > > > > > > converts
> > > > > > > > > > a
> > > > > > > > > > > > > Protobuf schema descriptor into an Avro schema and
> > > sends
> > > > > that
> > > > > > > as
> > > > > > > > the
> > > > > > > > > > > > Schema
> > > > > > > > > > > > > of the producer/consumer.
> > > > > > > > > > > > >
> > > > > > > > > > > > > "On the broker side, protobuf and avro both use
> > > > SchemaData
> > > > > > > > converted
> > > > > > > > > > to
> > > > > > > > > > > > > org.apache.avro.Schema."
> > > > > > > > > > > > > -->
> > > > > > > > > > > > > Since the schema is an Avro schema, the
> > implementation
> > > of
> > > > > > > > > > compatibility
> > > > > > > > > > > > > check on the broker side is to simply re-use the
> > > > > > compatibility
> > > > > > > > check
> > > > > > > > > > of
> > > > > > > > > > > > the
> > > > > > > > > > > > > AVRO schema type.
> > > > > > > > > > > > >
> > > > > > > > > > > > > "ProtobufSchema is different from
> > ProtobufNativeSchema
> > > in
> > > > > > > schema
> > > > > > > > > > > > > compatibility check it uses avro-protobuf.
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://central.sonatype.com/artifact/org.apache.avro/avro-protobuf/1.11.1/overview
> > > > > > > > > > > > > But the current implementation of ProtobufNative
> > schema
> > > > > > > > compatibility
> > > > > > > > > > > > > check only
> > > > > > > > > > > > > checked if the root message name is changed."
> > > > > > > > > > > > >
> > > > > > > > > > > > > -->
> > > > > > > > > > > > > PROTOBUF_NATIVE schema type is different.
> > > > > > > > > > > > > The client is actually using Protobuf Descriptor as
> > the
> > > > > > schema,
> > > > > > > > as
> > > > > > > > > > > > opposed
> > > > > > > > > > > > > to Avro schema of PROTOBUF schema type. In the
> > broker,
> > > > the
> > > > > > > > > > > > PROTOBUF_NATIVE
> > > > > > > > > > > > > compatibility check actually hasn't implemented any
> > > rule,
> > > > > > > besides
> > > > > > > > > > one:
> > > > > > > > > > > > > checking if the root message name has changed.
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >    1. For now, there is no official or
> third-party
> > > > > solution
> > > > > > > for
> > > > > > > > > > > > ProtoBuf
> > > > > > > > > > > > > >    compatibility. If in the future have better
> > > > solutions
> > > > > > of a
> > > > > > > > third
> > > > > > > > > > > > party or
> > > > > > > > > > > > > >    the official, we develop new
> > > > > > ProtobufNativeSchemaValidator
> > > > > > > > and
> > > > > > > > > > use,
> > > > > > > > > > > > so
> > > > > > > > > > > > > >    add a flag.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Who do you need to make that configurable? Once
> you
> > > > > found a
> > > > > > > > third
> > > > > > > > > > > > party,
> > > > > > > > > > > > > just switch to it? Who knows, maybe you never will.
> > > > > Introduce
> > > > > > > it
> > > > > > > > > > when you
> > > > > > > > > > > > > find it, not now.
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > We improve in
> ProtobufNativeSchemaCompatibilityCheck
> > > > > > BACKWARD,
> > > > > > > > > > FORWARD
> > > > > > > > > > > > > > these strategies. As with the AVRO
> implementation,
> > > > > protobuf
> > > > > > > > > > > > compatibility
> > > > > > > > > > > > > > checking need implementing the canRead method.
> > *This
> > > > will
> > > > > > > check
> > > > > > > > > > that
> > > > > > > > > > > > > > the writtenschema can be read by readSchema.*
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > I completely disagree.
> > > > > > > > > > > > > Avro implementation is confusing for our use case.
> > > Don't
> > > > > copy
> > > > > > > > that.
> > > > > > > > > > > > >
> > > > > > > > > > > > > You have
> > > > > > > > > > > > >
> > > > > > > > > > > > > public void checkCompatible(SchemaData from,
> > SchemaData
> > > > to,
> > > > > > > > > > > > > SchemaCompatibilityStrategy strategy)
> > > > > > > > > > > > >         throws IncompatibleSchemaException {
> > > > > > > > > > > > >     Descriptor fromDescriptor =
> > > > > > > > > > > > >
> > ProtobufNativeSchemaUtils.deserialize(from.getData());
> > > > > > > > > > > > >     Descriptor toDescriptor =
> > > > > > > > > > > > >
> ProtobufNativeSchemaUtils.deserialize(to.getData());
> > > > > > > > > > > > >     switch (strategy) {
> > > > > > > > > > > > >         case BACKWARD_TRANSITIVE:
> > > > > > > > > > > > >         case BACKWARD:
> > > > > > > > > > > > >         case FORWARD_TRANSITIVE:
> > > > > > > > > > > > >         case FORWARD:
> > > > > > > > > > > > >         case FULL_TRANSITIVE:
> > > > > > > > > > > > >         case FULL:
> > > > > > > > > > > > >             checkRootMessageChange(fromDescriptor,
> > > > > > > toDescriptor,
> > > > > > > > > > > > strategy);
> > > > > > > > > > > > >             return;
> > > > > > > > > > > > >         case ALWAYS_COMPATIBLE:
> > > > > > > > > > > > >             return;
> > > > > > > > > > > > >         default:
> > > > > > > > > > > > >             throw new
> > > > IncompatibleSchemaException("Unknown
> > > > > > > > > > > > > SchemaCompatibilityStrategy.");
> > > > > > > > > > > > >     }
> > > > > > > > > > > > > }
> > > > > > > > > > > > >
> > > > > > > > > > > > > I would rename :
> > > > > > > > > > > > > from --> currentSchema
> > > > > > > > > > > > > to --> newSchema
> > > > > > > > > > > > >
> > > > > > > > > > > > > Use that switch case and have a method for each
> like:
> > > > > > > > > > > > > validateBackwardsCompatibility(currentSchema,
> > > newSchema)
> > > > > > > > > > > > >
> > > > > > > > > > > > > I dislike canRead and usage of writtenSchema, since
> > you
> > > > > have
> > > > > > > two
> > > > > > > > > > > > completely
> > > > > > > > > > > > > different use cases: from the producing side and
> the
> > > > > consumer
> > > > > > > > side.
> > > > > > > > > > > > >
> > > > > > > > > > > > > schemaValidatorBuilder
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I dislike this proposal. IMO Avro implementation
> is
> > > way
> > > > > too
> > > > > > > > > > > > complicated.
> > > > > > > > > > > > > Why not have a simple function for validation for
> > each
> > > > > switch
> > > > > > > > case
> > > > > > > > > > above?
> > > > > > > > > > > > > Why do we need strategy and builder, and all this
> > > > > complexity?
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > *Here are the basic compatibility rules we've
> > defined:*
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > IMO it's impossible to read the validation rules as
> > you
> > > > > > > described
> > > > > > > > > > them.
> > > > > > > > > > > > > I wrote how they should be structured numerous
> times
> > > > above.
> > > > > > > > > > > > > I can't validate them.
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > IMO, the current design is very hard to read.
> > > > > > > > > > > > > Please try to avoid jumping into code sections.
> > > > > > > > > > > > > Write a high level design section, in which you
> > > describe
> > > > in
> > > > > > > words
> > > > > > > > > > what
> > > > > > > > > > > > you
> > > > > > > > > > > > > plan to do.
> > > > > > > > > > > > > Write the validation rules in the structure that is
> > > easy
> > > > to
> > > > > > > > > > understand:
> > > > > > > > > > > > > rules per each compatibility check, and use proper
> > > words
> > > > > > > (current
> > > > > > > > > > schema,
> > > > > > > > > > > > > new schema), since new schema can be once used for
> > read
> > > > and
> > > > > > > once
> > > > > > > > > > used for
> > > > > > > > > > > > > write.
> > > > > > > > > > > > >
> > > > > > > > > > > > > In its current form it takes too much time to
> > > understand
> > > > > the
> > > > > > > > design,
> > > > > > > > > > and
> > > > > > > > > > > > it
> > > > > > > > > > > > > shouldn't be the case.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > >
> > > > > > > > > > > > > Asaf
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Sun, Mar 5, 2023 at 3:58 PM SiNan Liu <
> > > > > > > liusinan1998@gmail.com
> > > > > > > > >
> > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Hi! I updated the explanation of some things in
> the
> > > PIP
> > > > > > > issue.
> > > > > > > > And
> > > > > > > > > > also
> > > > > > > > > > > > > > added a new “flag” in the conf is used as the
> > > different
> > > > > > > > > > > > > > ProtobufNativeSchemaValidator implementation,
> also
> > > set
> > > > > > > > > > > > > > ProtobufNativeSchemaValidator default only check
> > > > whether
> > > > > > the
> > > > > > > > name
> > > > > > > > > > of
> > > > > > > > > > > > the
> > > > > > > > > > > > > > root message is the same.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > sinan
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Asaf Mesika <as...@gmail.com> 于2023年3月5日周日
> > > > > 20:21写道:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Wed, Mar 1, 2023 at 4:33 PM SiNan Liu <
> > > > > > > > liusinan1998@gmail.com
> > > > > > > > > > >
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Can you please explain how a Protobuf
> Schema
> > > > > > descriptor
> > > > > > > > can
> > > > > > > > > > be
> > > > > > > > > > > > > > > validated
> > > > > > > > > > > > > > > > > for backward compatibility check using Avro
> > > based
> > > > > > > > > > compatibility
> > > > > > > > > > > > > > rules?
> > > > > > > > > > > > > > > > > Doesn't it expect the schema to be Avro,
> but
> > it
> > > > is
> > > > > > > > actually a
> > > > > > > > > > > > > > Protobuf
> > > > > > > > > > > > > > > > > descriptor?
> > > > > > > > > > > > > > > > > Is there some translation happening?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > 1. *You can take a quick look at the previous
> > > > design,
> > > > > > the
> > > > > > > > > > PROTOBUF
> > > > > > > > > > > > uses
> > > > > > > > > > > > > > > > avro struct to store.*
> > > > > > > > > > > > > > > > https://github.com/apache/pulsar/pull/1954
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/ProtobufSchema.java#L59-L61
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/ProtobufSchema.java#L110-L115
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Ok. So to summarize your code (easier to write
> it
> > > > than
> > > > > > send
> > > > > > > > > > links):
> > > > > > > > > > > > > > > * Pulsar Client, when used with Protobuf
> Schema,
> > > > > actually
> > > > > > > > > > converts
> > > > > > > > > > > > the
> > > > > > > > > > > > > > > Protobuf descriptor into an Avro Schema (using
> > code
> > > > > found
> > > > > > > > inside
> > > > > > > > > > Avro
> > > > > > > > > > > > > > > library) and saves that Avro schema as the
> > schema.
> > > > It's
> > > > > > not
> > > > > > > > > > saving
> > > > > > > > > > > > the
> > > > > > > > > > > > > > > protobuf descriptor at all. Very confusing I
> have
> > > to
> > > > > add
> > > > > > -
> > > > > > > > never
> > > > > > > > > > > > expected
> > > > > > > > > > > > > > > that.
> > > > > > > > > > > > > > > This explains why In the
> > > > > ProtobufSchemaCompatibilityCheck
> > > > > > > > they
> > > > > > > > > > just
> > > > > > > > > > > > > > extend
> > > > > > > > > > > > > > > the Avro without doing any translation.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Thanks for that.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Now thatI finally understand this, I can say
> > that:
> > > > you
> > > > > > > *must*
> > > > > > > > > > explain
> > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > in the motivation part in your PIP.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > 2. *On the broker side, protobuf and avro
> both
> > > use
> > > > > > > > `SchemaData`
> > > > > > > > > > > > > > converted
> > > > > > > > > > > > > > > > to `org.apache.avro.Schema`.*
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/ServerCnx.java#L1280-L1293
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/schema/ProtobufSchemaCompatibilityCheck.java#L26-L31
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/schema/AvroSchemaBasedCompatibilityCheck.java#L47-L70
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Actually those links don't really help.
> > > > > > > > > > > > > > > The main link that helps is:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/pulsar/blob/ec102fb024a6ea2b195826778300f20e330dff06/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/ProtobufSchema.java#L102-L122
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > I'm sorry - I don't understand.
> > > > > > > > > > > > > > > > > I understand the different compatibility
> > check
> > > > > > > > strategies.
> > > > > > > > > > > > > > > > > If you just spell them out here, then as
> you
> > > say,
> > > > > > just
> > > > > > > > > > translate
> > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > Protobuf Descriptor into an Avro schema and
> > run
> > > > the
> > > > > > > Avro
> > > > > > > > > > > > > > > > > compatibility validation, no?
> > > > > > > > > > > > > > > > > I believe the answer is no, since you may
> > want
> > > to
> > > > > > > verify
> > > > > > > > > > > > different
> > > > > > > > > > > > > > > things
> > > > > > > > > > > > > > > > > when it comes to Protobuf, which are
> > different
> > > > then
> > > > > > > Avro.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > 1.
> > > > > > > > > > > > > > > > *ProtobufSchema is different from
> > > > > ProtobufNativeSchema
> > > > > > in
> > > > > > > > that
> > > > > > > > > > it
> > > > > > > > > > > > uses
> > > > > > > > > > > > > > > > avro-protobuf.*
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://central.sonatype.com/artifact/org.apache.avro/avro-protobuf/1.11.1/overview
> > > > > > > > > > > > > > > > *ProtobufNativeSchema needs a native
> > > compatibility
> > > > > > check,
> > > > > > > > but
> > > > > > > > > > > > there is
> > > > > > > > > > > > > > no
> > > > > > > > > > > > > > > > official or third party implementation. So
> this
> > > PIP
> > > > > > does
> > > > > > > > not
> > > > > > > > > > use
> > > > > > > > > > > > > > > > avro-protobuf for protobuf compatibility
> > > checking.*
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > 2. *By the way, this is implemented in much
> the
> > > > same
> > > > > > way
> > > > > > > > that
> > > > > > > > > > > > Apache
> > > > > > > > > > > > > > avro
> > > > > > > > > > > > > > > > does compatibility checking.*
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/SchemaValidatorBuilder.java
> > > > > > > > > > > > > > > >
> > > > > > > `canReadStrategy`,`canBeReadStrategy`,`mutualReadStrategy`
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/ValidateCanRead.java
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/ValidateCanBeRead.java
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/ValidateMutualRead.java
> > > > > > > > > > > > > > > > *In `ValidateMutualRead.java`, the arguments
> of
> > > > > > > > `canRead()` are
> > > > > > > > > > > > > > > > writtenSchema and readSchema. We only need to
> > > > change
> > > > > > the
> > > > > > > > order
> > > > > > > > > > of
> > > > > > > > > > > > > > > arguments
> > > > > > > > > > > > > > > > we pass to `canRead()`.*
> > > > > > > > > > > > > > > > ```java
> > > > > > > > > > > > > > > > private void
> > > > > > validateWithStrategy(Descriptors.Descriptor
> > > > > > > > > > > > toValidate,
> > > > > > > > > > > > > > > > Descriptors.Descriptor fromDescriptor) throws
> > > > > > > > > > > > > > > ProtoBufCanReadCheckException
> > > > > > > > > > > > > > > > {
> > > > > > > > > > > > > > > > switch (strategy) {
> > > > > > > > > > > > > > > > case CanReadExistingStrategy ->
> > > > > canRead(fromDescriptor,
> > > > > > > > > > > > toValidate);
> > > > > > > > > > > > > > > > case CanBeReadByExistingStrategy ->
> > > > > canRead(toValidate,
> > > > > > > > > > > > > > fromDescriptor);
> > > > > > > > > > > > > > > > case CanBeReadMutualStrategy -> {
> > > > > > > > > > > > > > > > canRead(toValidate, fromDescriptor);
> > > > > > > > > > > > > > > > canRead(fromDescriptor, toValidate);
> > > > > > > > > > > > > > > > }
> > > > > > > > > > > > > > > > }
> > > > > > > > > > > > > > > > }
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > private void canRead(Descriptors.Descriptor
> > > > > > > writtenSchema,
> > > > > > > > > > > > > > > > Descriptors.Descriptor readSchema) throws
> > > > > > > > > > > > > > ProtoBufCanReadCheckException {
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> ProtobufNativeSchemaBreakCheckUtils.checkSchemaCompatibility(writtenSchema,
> > > > > > > > > > > > > > > > readSchema);
> > > > > > > > > > > > > > > > }
> > > > > > > > > > > > > > > > ```
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I get that you want to take inspiration from
> the
> > > > > existing
> > > > > > > > Avro
> > > > > > > > > > Schema
> > > > > > > > > > > > > > > compatibility check, to do your code design.
> > > > > > > > > > > > > > > I also understand you *won't* use any existing
> > avro
> > > > > code
> > > > > > > for
> > > > > > > > > > that.
> > > > > > > > > > > > > > > I also understand, you have to write the
> > validation
> > > > > check
> > > > > > > on
> > > > > > > > your
> > > > > > > > > > > > own,
> > > > > > > > > > > > > > > since there is no 3rd party to explain that.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > The only thing I can't understand are the
> actual
> > > > rules
> > > > > > as I
> > > > > > > > wrote
> > > > > > > > > > > > before,
> > > > > > > > > > > > > > > since they are written confusingly.
> > > > > > > > > > > > > > > So, I repeat what I asked before:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I think you should structure the validation
> rules
> > > > > > > > differently:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > * Backward checks
> > > > > > > > > > > > > > > ** List down rules, where use newSchema (the
> > schema
> > > > > used
> > > > > > by
> > > > > > > > > > producer
> > > > > > > > > > > > or
> > > > > > > > > > > > > > > consumer) and existingSchema (last schema used)
> > > > > > > > > > > > > > > * Forward
> > > > > > > > > > > > > > > ** List down rules, where use newSchema (the
> > schema
> > > > > used
> > > > > > by
> > > > > > > > > > producer
> > > > > > > > > > > > or
> > > > > > > > > > > > > > > consumer) and existingSchema (last schema used)
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Once that's accomplished I will be able to
> > > understand
> > > > > the
> > > > > > > > > > different
> > > > > > > > > > > > > > > validation rules for each compatibility check.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > > sinan
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Asaf Mesika <as...@gmail.com>
> > 于2023年3月1日周三
> > > > > > > 21:19写道:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > On Mon, Feb 27, 2023 at 3:47 PM SiNan Liu <
> > > > > > > > > > > > liusinan1998@gmail.com>
> > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > I read it and they look identical.
> What's
> > > the
> > > > > > > > difference
> > > > > > > > > > > > between
> > > > > > > > > > > > > > > > them?
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Current avro,json, and protobuf schemas
> are
> > > all
> > > > > > > > implemented
> > > > > > > > > > > > based
> > > > > > > > > > > > > > on
> > > > > > > > > > > > > > > > AVRO.
> > > > > > > > > > > > > > > > > > > What do you mean, they are all
> > implemented
> > > > > based
> > > > > > on
> > > > > > > > > > Avro? You
> > > > > > > > > > > > > > mean
> > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > protobuf schema is converted into an
> Avro
> > > > > Schema,
> > > > > > > and
> > > > > > > > > > then
> > > > > > > > > > > > you
> > > > > > > > > > > > > > use
> > > > > > > > > > > > > > > > Avro
> > > > > > > > > > > > > > > > > > > compatibility validation?
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> `org.apache.pulsar.broker.service.schema.ProtobufSchemaCompatibilityCheck`
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > >
> > > > >
> > `org.apache.pulsar.broker.service.schema.AvroSchemaCompatibilityCheck`
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > >
> > > > >
> > `org.apache.pulsar.broker.service.schema.JsonSchemaCompatibilityCheck`
> > > > > > > > > > > > > > > > > > They all extends
> > > > > > `AvroSchemaBasedCompatibilityCheck`,
> > > > > > > > the
> > > > > > > > > > > > > > > > > > `checkCompatible()` is the same
> > > implementation
> > > > > with
> > > > > > > > `AVRO`.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Can you please explain how a Protobuf
> Schema
> > > > > > descriptor
> > > > > > > > can
> > > > > > > > > > be
> > > > > > > > > > > > > > > validated
> > > > > > > > > > > > > > > > > for backward compatibility check using Avro
> > > based
> > > > > > > > > > compatibility
> > > > > > > > > > > > > > rules?
> > > > > > > > > > > > > > > > > Doesn't it expect the schema to be Avro,
> but
> > it
> > > > is
> > > > > > > > actually a
> > > > > > > > > > > > > > Protobuf
> > > > > > > > > > > > > > > > > descriptor?
> > > > > > > > > > > > > > > > > Is there some translation happening?
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > I think you should structure the
> validation
> > > > rules
> > > > > > > > > > differently:
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > The Compatibility check strategy is
> > described
> > > > on
> > > > > > the
> > > > > > > > > > website
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://pulsar.apache.org/docs/next/schema-understand/#schema-compatibility-check-strategy
> > > > > > > > > > > > > > > > > > 1. BACKWARD(CanReadExistingStrategy):
> > > Consumers
> > > > > > using
> > > > > > > > > > schema
> > > > > > > > > > > > V3 can
> > > > > > > > > > > > > > > > process
> > > > > > > > > > > > > > > > > > data written by producers using the last
> > > schema
> > > > > > > version
> > > > > > > > > > V2. So
> > > > > > > > > > > > V2
> > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > > "writtenSchema" and V3 is "readSchema".
> > > > > > > > > > > > > > > > > > 2. FORWARD(CanBeReadByExistingStrategy):
> > > > > Consumers
> > > > > > > > using
> > > > > > > > > > the
> > > > > > > > > > > > last
> > > > > > > > > > > > > > > > schema
> > > > > > > > > > > > > > > > > > version V2 can process data written by
> > > > producers
> > > > > > > using
> > > > > > > > a
> > > > > > > > > > new
> > > > > > > > > > > > schema
> > > > > > > > > > > > > > > V3,
> > > > > > > > > > > > > > > > > > even though they may not be able to use
> the
> > > > full
> > > > > > > > > > capabilities
> > > > > > > > > > > > of
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > new
> > > > > > > > > > > > > > > > > > schema. So V3 is "writtenSchema" and V2
> is
> > > > > > > > "readSchema".
> > > > > > > > > > > > > > > > > > 3. FULL(CanBeReadMutualStrategy): Schemas
> > are
> > > > > both
> > > > > > > > > > backward and
> > > > > > > > > > > > > > > forward
> > > > > > > > > > > > > > > > > > compatible.
> > > > > > > > > > > > > > > > > > Schema can evolve. The old version schema
> > and
> > > > the
> > > > > > new
> > > > > > > > > > version
> > > > > > > > > > > > > > schema
> > > > > > > > > > > > > > > > should
> > > > > > > > > > > > > > > > > > be well understood.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > I'm sorry - I don't understand.
> > > > > > > > > > > > > > > > > I understand the different compatibility
> > check
> > > > > > > > strategies.
> > > > > > > > > > > > > > > > > If you just spell them out here, then as
> you
> > > say,
> > > > > > just
> > > > > > > > > > translate
> > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > Protobuf Descriptor into an Avro schema and
> > run
> > > > the
> > > > > > > Avro
> > > > > > > > > > > > > > > > > compatibility validation, no?
> > > > > > > > > > > > > > > > > I believe the answer is no, since you may
> > want
> > > to
> > > > > > > verify
> > > > > > > > > > > > different
> > > > > > > > > > > > > > > things
> > > > > > > > > > > > > > > > > when it comes to Protobuf, which are
> > different
> > > > then
> > > > > > > Avro.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > At the current state, I can't understand
> your
> > > > > design
> > > > > > at
> > > > > > > > all.
> > > > > > > > > > > > Please
> > > > > > > > > > > > > > > help
> > > > > > > > > > > > > > > > > clarify that.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > So each strategy should have its own
> > section.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > The arguments of `canRead()` are
> > > writtenSchema
> > > > > and
> > > > > > > > > > readSchema.
> > > > > > > > > > > > As
> > > > > > > > > > > > > > > we've
> > > > > > > > > > > > > > > > > > just described, we only need to change
> the
> > > > order
> > > > > of
> > > > > > > > > > arguments
> > > > > > > > > > > > we
> > > > > > > > > > > > > > pass
> > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > `canRead()`.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > > > > sinan
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Asaf Mesika <as...@gmail.com>
> > > > > 于2023年2月27日周一
> > > > > > > > 20:49写道:
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > And you can see the difference
> between
> > > > > ProtoBuf
> > > > > > > and
> > > > > > > > > > > > > > > ProtoBufNative:
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > >
> > > > >
> > https://pulsar.apache.org/docs/next/schema-get-started/#protobufnative
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > >
> > https://pulsar.apache.org/docs/next/schema-get-started/#protobuf
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >  I read it and they look identical.
> > What's
> > > > the
> > > > > > > > difference
> > > > > > > > > > > > between
> > > > > > > > > > > > > > > > them?
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Current avro,json, and protobuf schemas
> > are
> > > > all
> > > > > > > > > > implemented
> > > > > > > > > > > > based
> > > > > > > > > > > > > > > on
> > > > > > > > > > > > > > > > > > AVRO.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > What do you mean, they are all
> > implemented
> > > > > based
> > > > > > on
> > > > > > > > > > Avro? You
> > > > > > > > > > > > > > mean
> > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > protobuf schema is converted into an
> Avro
> > > > > Schema,
> > > > > > > and
> > > > > > > > > > then
> > > > > > > > > > > > you
> > > > > > > > > > > > > > use
> > > > > > > > > > > > > > > > Avro
> > > > > > > > > > > > > > > > > > > compatibility validation?
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > *Here are the basic compatibility
> rules
> > > > we've
> > > > > > > > defined:*
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > I think you should structure the
> > validation
> > > > > rules
> > > > > > > > > > > > differently:
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > * Backward checks
> > > > > > > > > > > > > > > > > > > ** List down rules, where use newSchema
> > > (the
> > > > > > schema
> > > > > > > > used
> > > > > > > > > > by
> > > > > > > > > > > > > > > producer
> > > > > > > > > > > > > > > > or
> > > > > > > > > > > > > > > > > > > consumer) and existingSchema (last
> schema
> > > > used)
> > > > > > > > > > > > > > > > > > > * Forward
> > > > > > > > > > > > > > > > > > > ** List down rules, where use newSchema
> > > (the
> > > > > > schema
> > > > > > > > used
> > > > > > > > > > by
> > > > > > > > > > > > > > > producer
> > > > > > > > > > > > > > > > or
> > > > > > > > > > > > > > > > > > > consumer) and existingSchema (last
> schema
> > > > used)
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > So each strategy should have its own
> > > section.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > I'm saying this since you used
> > > > "writttenSchema"
> > > > > > > word
> > > > > > > > but
> > > > > > > > > > it
> > > > > > > > > > > > > > > > represents
> > > > > > > > > > > > > > > > > > > something completely different if it's
> > > > backward
> > > > > > or
> > > > > > > > > > forward
> > > > > > > > > > > > check.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Once you'll have that structure like
> > that,
> > > I
> > > > > > > > personally
> > > > > > > > > > will
> > > > > > > > > > > > be
> > > > > > > > > > > > > > > able
> > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > read and understand it.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > The motivation and problem statement
> are
> > > now
> > > > > > good -
> > > > > > > > > > thanks
> > > > > > > > > > > > for
> > > > > > > > > > > > > > > > improving
> > > > > > > > > > > > > > > > > > > it.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > On Mon, Feb 27, 2023 at 8:20 AM SiNan
> > Liu <
> > > > > > > > > > > > > > liusinan1998@gmail.com>
> > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Hi! I updated the PIP issue again.
> This
> > > > time
> > > > > > I've
> > > > > > > > added
> > > > > > > > > > > > some
> > > > > > > > > > > > > > > > background
> > > > > > > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > > > > some explanations.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > The compatibility check rules are
> > already
> > > > > > written
> > > > > > > > in
> > > > > > > > > > the
> > > > > > > > > > > > > > > > > > Implementation.
> > > > > > > > > > > > > > > > > > > > ProtoBufNative implements the same
> > > canRead
> > > > > > method
> > > > > > > > as
> > > > > > > > > > Apache
> > > > > > > > > > > > > > Avro.
> > > > > > > > > > > > > > > > > > > > It does this by checking whether the
> > > schema
> > > > > for
> > > > > > > > > > writing and
> > > > > > > > > > > > > > > reading
> > > > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > > > > compatible. I also indicate whether
> the
> > > > > > > > writtenSchema
> > > > > > > > > > and
> > > > > > > > > > > > > > > > readSchema of
> > > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > Backward, Forward, and Full
> strategies
> > > are
> > > > > the
> > > > > > > old
> > > > > > > > or
> > > > > > > > > > the
> > > > > > > > > > > > new
> > > > > > > > > > > > > > > > version
> > > > > > > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > > > > > the schema.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > > > > > > sinan
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Asaf Mesika <as...@gmail.com>
> > > > > > > 于2023年2月26日周日
> > > > > > > > > > 23:24写道:
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > I'm sorry, but this PIP lacks a lot
> > of
> > > > > > > background
> > > > > > > > > > > > knowledge,
> > > > > > > > > > > > > > so
> > > > > > > > > > > > > > > > you
> > > > > > > > > > > > > > > > > > > need
> > > > > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > > > add IMO for people to understand
> it.
> > > You
> > > > > > don't
> > > > > > > > need
> > > > > > > > > > to
> > > > > > > > > > > > > > explain
> > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > entire
> > > > > > > > > > > > > > > > > > > > > pulsar in this PIP, but at the very
> > > > least a
> > > > > > few
> > > > > > > > > > > > paragraphs
> > > > > > > > > > > > > > > > detailing
> > > > > > > > > > > > > > > > > > > all
> > > > > > > > > > > > > > > > > > > > > you need to know, to put you in
> > > context:
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >    - Start by saying Pulsar as a
> > > built-in
> > > > > > > schema
> > > > > > > > > > registry
> > > > > > > > > > > > > > > inside
> > > > > > > > > > > > > > > > > > Pulsar
> > > > > > > > > > > > > > > > > > > > >    broker.
> > > > > > > > > > > > > > > > > > > > >       - Every time the client
> updates
> > > the
> > > > > > > > schema, it
> > > > > > > > > > > > uploads
> > > > > > > > > > > > > > it
> > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > >       broker. When that happens, it
> > > has a
> > > > > > > feature
> > > > > > > > > > which
> > > > > > > > > > > > > > > validates
> > > > > > > > > > > > > > > > if
> > > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > new
> > > > > > > > > > > > > > > > > > > > >       schema version is compatible
> > with
> > > > the
> > > > > > > > previous
> > > > > > > > > > > > > > versions.
> > > > > > > > > > > > > > > > There
> > > > > > > > > > > > > > > > > > > > > are 4 types
> > > > > > > > > > > > > > > > > > > > >       of compatibility: Full, ...
> > > > (complete
> > > > > > and
> > > > > > > > > > explain
> > > > > > > > > > > > each
> > > > > > > > > > > > > > > one
> > > > > > > > > > > > > > > > > > > briefly)
> > > > > > > > > > > > > > > > > > > > >    - Also explain Pulsar Schema
> > > registry
> > > > > > > supports
> > > > > > > > > > various
> > > > > > > > > > > > > > > schema
> > > > > > > > > > > > > > > > > > > > >    protocols:  Avro, protobuf
> native,
> > > ...
> > > > > > > > (complete
> > > > > > > > > > the
> > > > > > > > > > > > > > rest),
> > > > > > > > > > > > > > > > each
> > > > > > > > > > > > > > > > > > > > > protocol
> > > > > > > > > > > > > > > > > > > > >    has a schema which dictates how
> to
> > > > > > serialize
> > > > > > > > and
> > > > > > > > > > > > > > deserialize
> > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > message
> > > > > > > > > > > > > > > > > > > > >    content into typed object.
> > > > > > > > > > > > > > > > > > > > >    - Explain in short what is
> > protobuf
> > > > > native
> > > > > > > > > > (compare
> > > > > > > > > > > > > > protobuf
> > > > > > > > > > > > > > > > > > > > non-native)
> > > > > > > > > > > > > > > > > > > > >    - Please don't paste code
> instead
> > of
> > > > > > > > explaining.
> > > > > > > > > > > > > > > > > > > > >       - Explain that protobuf
> native
> > > > > current
> > > > > > > > > > validation
> > > > > > > > > > > > check
> > > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > only
> > > > > > > > > > > > > > > > > > > > >       composed of checking the root
> > > > message
> > > > > > > name
> > > > > > > > is
> > > > > > > > > > the
> > > > > > > > > > > > same
> > > > > > > > > > > > > > > > between
> > > > > > > > > > > > > > > > > > > > > the current
> > > > > > > > > > > > > > > > > > > > >       schema version and the new
> > > version.
> > > > > > > > > > > > > > > > > > > > >          - Explain briefly what is
> a
> > > root
> > > > > > > message
> > > > > > > > > > and its
> > > > > > > > > > > > > > name.
> > > > > > > > > > > > > > > > > > > > >       - Explain the problem (list
> > > > > scenarios)
> > > > > > > > that we
> > > > > > > > > > have
> > > > > > > > > > > > > > > because
> > > > > > > > > > > > > > > > > > > > protobuf
> > > > > > > > > > > > > > > > > > > > >       native schema only supports
> > FULL
> > > > > > > > compatibility
> > > > > > > > > > > > > > > validation.
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Regarding high level design - as in
> > > what
> > > > > you
> > > > > > > > plan to
> > > > > > > > > > do.
> > > > > > > > > > > > > > > > > > > > > I suggest you add "High Level
> Design"
> > > and
> > > > > in
> > > > > > it
> > > > > > > > > > detail
> > > > > > > > > > > > how
> > > > > > > > > > > > > > you
> > > > > > > > > > > > > > > > plan
> > > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > > > validate, per protobuf version, per
> > > > > > > compatibility
> > > > > > > > > > check
> > > > > > > > > > > > > > > > (backward,
> > > > > > > > > > > > > > > > > > > > forward,
> > > > > > > > > > > > > > > > > > > > > full,...).
> > > > > > > > > > > > > > > > > > > > > I tried reading the implementation
> -
> > > for
> > > > > me ,
> > > > > > > > it's
> > > > > > > > > > all
> > > > > > > > > > > > over
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > place.
> > > > > > > > > > > > > > > > > > > > Can
> > > > > > > > > > > > > > > > > > > > > you please list in order what I
> wrote
> > > > > above,
> > > > > > > and
> > > > > > > > > > list the
> > > > > > > > > > > > > > > > validation
> > > > > > > > > > > > > > > > > > > > rules
> > > > > > > > > > > > > > > > > > > > > with a good explanation why you
> > > validate
> > > > it
> > > > > > > like
> > > > > > > > > > that?
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Lastly, one you have all the
> > validation
> > > > > rules
> > > > > > > > clearly
> > > > > > > > > > > > stated,
> > > > > > > > > > > > > > > you
> > > > > > > > > > > > > > > > can
> > > > > > > > > > > > > > > > > > > use
> > > > > > > > > > > > > > > > > > > > > it to document it properly so users
> > can
> > > > > know
> > > > > > > what
> > > > > > > > > > > > validation
> > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > expect.
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Asaf
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > On Wed, Feb 22, 2023 at 5:10 PM
> SiNan
> > > > Liu <
> > > > > > > > > > > > > > > > liusinan1998@gmail.com>
> > > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > Sorry, my mistake. I removed the
> > code
> > > > and
> > > > > > > > > > described the
> > > > > > > > > > > > > > > design
> > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > > improve
> > > > > > > > > > > > > > > > > > > > > > the PROTOBUF_NATIVE schema
> > > > compatibility
> > > > > > > > checks.
> > > > > > > > > > You
> > > > > > > > > > > > can
> > > > > > > > > > > > > > > have a
> > > > > > > > > > > > > > > > > > look.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > Asaf Mesika <
> asaf.mesika@gmail.com
> > >
> > > > > > > > 于2023年2月22日周三
> > > > > > > > > > > > 21:16写道:
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > I read it but you're almost
> > > directly
> > > > > > diving
> > > > > > > > into
> > > > > > > > > > the
> > > > > > > > > > > > > > code -
> > > > > > > > > > > > > > > > it
> > > > > > > > > > > > > > > > > > will
> > > > > > > > > > > > > > > > > > > > > take
> > > > > > > > > > > > > > > > > > > > > > me
> > > > > > > > > > > > > > > > > > > > > > > hours just to reverse engineer
> > your
> > > > > > design.
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > Can you please include a "High
> > > Level
> > > > > > > Design"
> > > > > > > > > > section
> > > > > > > > > > > > in
> > > > > > > > > > > > > > > which
> > > > > > > > > > > > > > > > you
> > > > > > > > > > > > > > > > > > > > > explain
> > > > > > > > > > > > > > > > > > > > > > > how you plan to tackle any
> issue?
> > > > > > > > > > > > > > > > > > > > > > > If I can read that section and
> > > > explain
> > > > > to
> > > > > > > > someone
> > > > > > > > > > > > else
> > > > > > > > > > > > > > how
> > > > > > > > > > > > > > > > this
> > > > > > > > > > > > > > > > > > > will
> > > > > > > > > > > > > > > > > > > > > > work,
> > > > > > > > > > > > > > > > > > > > > > > it means the section is
> complete.
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > Let's leave the code to the
> PRs.
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > On Sun, Feb 19, 2023 at 2:59 PM
> > > SiNan
> > > > > > Liu <
> > > > > > > > > > > > > > > > > > liusinan1998@gmail.com>
> > > > > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > Hi all,
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > I made a PIP to discuss:
> > > > > > > > > > > > > > > > > > > > >
> > > > > > https://github.com/apache/pulsar/issues/19565
> > > > > > > > > > > > > > > > > > > > > > .
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > We can talk about the current
> > > > design
> > > > > > > here.
> > > > > > > > > > > > Especially
> > > > > > > > > > > > > > for
> > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > field
> > > > > > > > > > > > > > > > > > > > > > type
> > > > > > > > > > > > > > > > > > > > > > > > change check rules, please
> give
> > > > your
> > > > > > > > valuable
> > > > > > > > > > > > advice.
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > > > > > > > > > > Sinan
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] PIP-246: Improved PROTOBUF_NATIVE schema compatibility checks without using avro-protobuf

Posted by Asaf Mesika <as...@gmail.com>.
On Sun, Mar 19, 2023 at 4:47 PM SiNan Liu <li...@gmail.com> wrote:

> 1.
>
> > message SearchReq { string query = 1; int32 page_number = 2; int32
> > result_per_page = 3;}
> > Then second version I use:
> > message SearchRequest { string query = 1; int32 page_number = 2;
> > int32 result_per_page = 3;}
>
>
> The rule in PIP improve the previous implementation, so the previous
> implementation needs to be added.
> If the user switches from the previous implementation to the current
> implementation of PIP, and the current implementation doesn't check for
> root message name changes, isn't that changing the old behavior? This PIP
> is to make compatibility checking more stringent.
>
>
Well, this is exactly the argument we had in this thread. This is why you
make 2 implementation classes ( I called them v1, v2 if you remember). The
user can choose between them. This is exactly while I liked the _v1 _v2
naming, so they will know they are not backward compatible.

Also, you're talking about improving right? Checking the message name is
outright wrong, so you're actually fixing a big mistake someone made back
then.


>
> 2.
>
> > This is a guideline for a human making the change, not for software
> > validation right?
> > When you write down the code for doing the comparison to know if the
> field
> > number has changed, you have to take into account both the field name AND
> > field type. If both are equal but the field number is different then fail
> > it.
>
>
> I use the field name here to match, the name is the same, but the number is
> not the same will be incompatible.
>
> https://github.com/apache/pulsar/pull/19566/files#diff-14b840259375cc8fcae564586fc1c2a188cb71822ab5995ed6807f7dab30c210R129-R132
> PIP description here is the rule, but how to implement it is a matter in
> PR. I have considered the example you gave here, which is also incompatible
> in this case.
>
> Your changes are also described here:
>
> https://stackoverflow.com/questions/65230623/safeness-of-changing-proto-field-number
>
>
This sentence "the PIP description here is the rule, but how to implemented
it is a matter in PR" is a completely invalid argument.
For compatibility rules you specify the exact validation.
We argue here about the rule.
You can say in the PIP you will match by name and then in PR match by name
and type. You have to be exact in the PIP.

Regarding "I have considered the example you gave here, which is also
incompatible" - please explain? I find it compatible - meaning you can read
using the readSchema message written using writeSchema.

Your link to Stackoverflow will fail *my* validation: Since `int foo = 1`
--> `int foo = 2;` : same name, same type, different field number --> fail
validation.
So it doesn't prove anything, or help this argument.





>
> 3.
>
> > If you'll read sentence 2 from the guide, you'll see they write
> > "This means that any messages serialized by code using your “old” message
> > format can be parsed by your new generated code, as they won’t be missing
> > any required elements."
> > This is for the *other side*: when you add a required to readSchema
> > compare
> > to write schema. This of course is not allowed and won't work.
>
>
> Note that in proto2: **Any new fields that you add should be optional or
> repeated.**
> https://protobuf.dev/programming-guides/proto2/#updating
>
> I also stated in PIP that proto3 will not check this rule for required
> field changes.
> This rule applies to proto2. However, proto3 removes required. If you use
> proto3, it will not check for changes to the required field
>
> I'm talking about proto2.

Listen.

This is what you wrote in the PIP, ok?


>    - The writtenSchema cannot add required fields, but optional or
>    duplicate fields can be added (The field number must be new).
>
> I answered:

That's not true.
> You can have a required field in writeSchema, and not have that field in
> readSchema (based on tag number).

I also gave you an example showing what you wrote is not true (see previous
mails), in which I add a required field to writeSchema compared with
readSchema and I can still read it with readSchema.

You mention a link to a guideline saying you can only add optional or
repeated without reading the rest of the paragraph and without
understanding the protocol buffer encoding and decoding.

I'm not sure how to move forward here.
This validation rule is invalid.




>
> 4.
>
> > My write schema is
> > message SearchRequest { string query = 1; int32 page_number = 2;
> > int32 result_per_page = 3;}
> > my read schema is
> > message SearchRequest { string query = 1; int32 page_number = 2;
> > int32 resultPerPage = 3;}
> > - Non-required fields can be removed, as long as the field number is not
> > used again in your updated message type. You may want to rename the field
> > instead, perhaps adding the prefix “OBSOLETE_”, or make the field number
> > reserved <https://protobuf.dev/programming-guides/proto2/#reserved>, so
> > that future users of your .proto can’t accidentally reuse the number.
> > First, they are talking about removal of fields. They suggest you rename
> > the field to OBSOLETE_myField instead of actually deleting it, to make it
> > easier on compatability. They don't talk about rules that forbid you to
> > rename a field. It's unrelated.
>
>
> Maybe you are right that changing the field name is compatible.
>
> https://stackoverflow.com/questions/45431685/protocol-buffer-does-changing-field-name-break-the-message/45431953#comment84548234_45431953
>
> But it is interesting I tested the use `avro-proto` implementation
> `ProtobufSchemaCompatibilityCheck` compatibility check.(
> `org.apache.pulsar.broker.service.schema.BaseAvroSchemaCompatibilityTest`)
>
> private static final String schemaJson1 =
>
> "{\"type\":\"record\",\"name\":\"DefaultTest\",\"namespace\":\"org.apache.pulsar.broker.service.schema"
> +
> ".AvroSchemaCompatibilityCheckTest\",\"fields\":[
> {\"name\":\"field1\",\"type\":\"string\"}]}";
>
> private static final String schemaJson4 =
>
> "{\"type\":\"record\",\"name\":\"DefaultTest\",\"namespace\":\"org.apache.pulsar.broker.service.schema"
> +
>
> ".AvroSchemaCompatibilityCheckTest\",\"fields\":[{\"name\":\"field1_v2\",\"type\":\"string\","
> +
> "\"aliases\":[\"field1\"]}]}";
>
> schemaCompatibilityCheck.isCompatible(schemaData1, schemaData4,
> SchemaCompatibilityStrategy.BACKWARD);
> This is compatible.
>
> But that's not compatible
> schemaCompatibilityCheck.isCompatible(schemaData1, schemaData4,
> SchemaCompatibilityStrategy.FORWARD);
>
> So, what do you think of AVRO implementation
> `ProtobufSchemaCompatibilityCheck`, rules about renaming are wrong is it?
> It looks something like this.
>
>
You're asking why Avro doesn't allow when read schema is schema 4, and
write schema is 1 and the difference between them is a single field rename?
I don't know - need to debug to understand.
I do know that Avro encoding is completely different. They don't have a
field identifier like protobuf does. They completely rely on the schema of
the writer and the reader.
But it is besides the point, since they are supposed to allow field rename
from my knowledge.
You can debug and find out.



>
> 5.
>
> > I think my previous explanation in this email reply should be good enough
> > to explain why the name is irrelevant.
> > If not, please ping me on this.
> > Regarding your example, can you quote it? I don't understand the read and
> > write schema here.
> > Also, regarding message names and why they are irrelevant. Same thing:
> Read
> > https://protobuf.dev/programming-guides/encoding/
> > In there you will see the names are not encoded thus can be changed
> freely
> > and still successfully decode binary data.
>
>
> Reader.proto
> ```protobuf
> syntax = "proto2";
> package proto.reader;
> import "ExternalReader.proto";
> option java_package = "org.apache.pulsar.client.schema.proto.reader";
> option java_outer_classname = "Reader";
>
> enum WeekEnum {
> ......
> }
>
> message ProtobufMessage {
> required string protobufFoo = 1;
> required double protobufBar = 2;
> }
>
> message ProtobufSchema {
> ......
> optional ProtobufMessage messageField = 12;
> ......
> optional WeekEnum enumField = 15;
> }
> ```
>
> WriterWithTypeNameChange.proto:
> ```protobuf
> syntax = "proto2";
> package proto.writerWithTypeNameChange;
> import "ExternalReader.proto";
> option java_package =
> "org.apache.pulsar.client.schema.proto.writerWithTypeNameChange";
> option java_outer_classname = "WriterWithTypeNameChange";
>
> enum WeekEnum {
> ......
> }
>
> message ProtobufMessage_V2 {
> required string protobufFoo = 1;
> required double protobufBar = 2;
> }
>
> message ProtobufSchema {
> ......
> optional ProtobufMessage_V2 messageField = 12;
> ......
> optional WeekEnum enumField = 15;
> ......
> }
> ```
>
> *Doesn't that change the field type? Should that be compatible?*
>
>
It doesn't change the field type since you haven't changed the field types
of the fields inside ProtobufMessage - they are still field 1 and field 2
double.
It is compatible since the message type name is not encoded in the binary.


>
> 6.
>
> > So if my write schema is
> > message SearchRequest { required string query = 1; optional int32
> > page_number = 2; optional int32 result_per_page = 3;}
> > and my read schema is
> > message SearchRequest { optional int32 page_number = 2; optional
> > int32 result_per_page = 3;}
> > You can see I removed a field from writeSchema that does not have a
> default
> > value.
> > First, I can read it without any problems: Field 2 might exists, no
> > problem. Field 3 might exists, no problem. I ignore field 1.
> > So the validation you wrote is no correct at all, without any regard to
> > optional value.
>
>
> *It looks like you've added a required field here, which is not allowed (in
> 3 i am said that).*
> *It also uses the number 1 of the deleted field.*
>
> I removed the required field "required string query = 1" from writeSchema
compared with readSchema.
Of course you can say I added "required string query = 1" to writeSchema
compared with readSchema.
The main question: can I use readSchema to read message written by
writeSchema?
Yes I can.
I wrote why above.
I didn't use the number of 1 of deleted field. The read schema is using
number 2 and 3.





> What if I do the other way around?
> > My write schema is:
> > message SearchRequest { optional int32 page_number = 2; optional
> > int32 result_per_page = 3;}
> > My read schema is:
> > message SearchRequest { required string query = 1; optional int32
> > page_number = 2; optional int32 result_per_page = 3;}
> > Here I will fail, since it is required I will provide value for field 1
> but
> > alas it will never be there.
>
>
> Yes, the required field has been removed, which is incompatible.
> The check for changes to the required field starts at the beginning, at
> which point an incompatible exception has been thrown and the following
> compatibility check is not performed.
>

You can't say "The required field " - it's plain wrong.

Look at your rule:

The writtenSchema cannot remove required fields in the readSchema.


You can say: the writeSchema has removed a required field compared with
readSchema.
This rule will fail the validation.
*not* the default rule we're discussing.




>
> Now let's focus on the optional field, ok?
> > Write schema
> > message SearchRequest { optional int32 page_number = 2; optional
> > int32 result_per_page = 3;}
> > Read schema:
> > message SearchRequest { optional string query = 1; optional int32
> > page_number = 2; optional int32 result_per_page = 3;}
> > When I read, I won't find field number 1 of course, so I will use the
> > default value dictated by the type, as I quoted above, which is for
> string
> > is the empty byte string.
>
>
> *It looks like sixth item in PIP should be removed.*
> *And Rule 7 in PIP should be removed:*
>
> Delete rule 6.
I would keep rule 7. See:
https://protobuf.dev/programming-guides/dos-donts/#dont-change-the-default-value-of-a-field
It sounds like a good best practice. We can omit it if we don't want to be
overly strict.



> > Changing a default value is generally OK, as long as you remember that
> > default values are never sent over the wire.
> > Thus, if a program receives a message in which a particular field isn’t
> > set, the program will see the default value as it was defined in that
> > program’s version of the protocol.
> > It will NOT see the default value that was defined in the sender’s code.
>
> This is in the official document 11:
> https://protobuf.dev/programming-guides/proto2/#updating
>
>
>
So what do you wish to state with this quote?


> 7.
>
> > Read this and then please explain why rule number 3 should not be added:
> >
> >
> https://protobuf.dev/programming-guides/dos-donts/#dont-go-from-repeated-to-scalar
>
>
> It looks like this rule needs to be added.
>
> Thanks


>
>
> Thanks,
> sinan
>
>
>
> Asaf Mesika <as...@gmail.com> 于2023年3月19日周日 19:11写道:
>
> > Also:
> >
> > (2) PROTOBUF_NATIVE was designed so that not use avro-protobuf for
> protobuf
> > > schema compatibility checking.
> >
> > The root message name is the class name we pass in when we create the
> > > producer or consumer. ProtoBuf has many nested messages or
> dependencies.
> > > The current implementation only checks if the passed class name is the
> > > same. It does not check if the fields in the file change in a way that
> is
> > > compatible with older versions of the schema.
> >
> >
> > You missed the most important thing there: PROTOBUF_NATIVE uses Protobuf
> > Descriptor when persisting the schema. It's not using Avro Schema
> > definition. As I wrote you in previous email:
> >
> > PROTOBUF_NATIVE was created to fix that shortcoming, by actually
> persisting
> > > the Protobuf Descriptor and using Protobuf for encoding.
> > >
> >
> >
> >
> >
> >
> > On Wed, Mar 15, 2023 at 5:46 PM SiNan Liu <li...@gmail.com>
> wrote:
> >
> > > 1.
> > >
> > > > > Why? The root message name is not written over the wire to the best
> > of
> > > my
> > > > > knowledge. I haven't found it written in the official doc.
> > >
> > >
> > > The name of the root message check is the rules in the previous
> > > `ProtobufNativeSchemaCompatibilityCheck`. Because if the root message
> > has a
> > > different name, there is no need to check its contents. "Same" schema,
> > > their names must be the same.
> > >
> > >
> > > 2.
> > >
> > > > >The writtenSchema can not change the field number of any field in
> > > > readSchema (the > field name is the same, but the field number is
> > > > different).
> > > > >You have to take into account field type as well when comparing.
> > >
> > >
> > > The first sentence on the website says that the number of fields cannot
> > be
> > > changed.
> > >
> > > > Don’t change the field numbers for any existing fields.
> > >
> > >
> > >
> > > 3.
> > >
> > > > > - The writtenSchema cannot add required fields, but optional or
> > > > > duplicate fields can be added (The field number must be new).
> > > > >
> > > > > That's not true.
> > > > You can have a required field in writeSchema, and not have that field
> > in
> > > > readSchema (based on tag number).
> > >
> > >
> > > The required field must exist. In your case, where does readSchema go
> to
> > > read its required fields? It's not in writtenSchema.
> > > The second and third sentences in the official website say, add and
> > delete
> > > do not operate required field!
> > >
> > >
> > > 4.
> > >
> > > > (4) The writtenSchema can not change the field name of any field in
> > > > > readSchema (the field number is the same, but the field name is
> > > > > different).
> > > > This is incorrect.
> > > > Fields names are encoded into the wire. I don't see this in any best
> > > > practice.
> > >
> > >
> > > The third sentence on the website:
> > >
> > > > You may want to rename the field instead, perhaps adding the prefix
> > > > “OBSOLETE_”, or make the field number reserved, so that future users
> of
> > > > your .proto can’t accidentally reuse the number.
> > >
> > > If you want to rename a field, or add a new field. To delete with the
> new
> > > number!
> > >
> > >
> > > 5.
> > >
> > > > The writtenSchema does not change the field name and number, but it
> > does
> > > > change the field type.
> > > > > Small correction: for the same field number you are not allowed to
> > > change
> > > > types. Name is irrelevant.
> > >
> > >
> > > Why doesn't the name irrelevant?
> > > Here is the change in type, which is the rule stated in Alternatives.
> > There
> > > is no check here, just a warning to the user.
> > > Another change is that the name of enum is changed, or the name of
> > MESSAGE
> > > is changed, which is the same as the root message name check in 1, is
> the
> > > change still the same? This is not allowed to change!
> > >
> > > *Here is example:*
> > > readSchema(
> > >
> > >
> >
> https://github.com/apache/pulsar/pull/19566/files#diff-a7006d73502e6064a80af02822f3a3072be498d8b677c4b838b0dafaea32dea4
> > > )
> > > writtenSchema(
> > >
> > >
> >
> https://github.com/apache/pulsar/pull/19566/files#diff-e3e7543624edaf1e0a4fd47947a2cad6e4b816b93843f71a367042ba6c3ec53f
> > > )
> > >
> > >
> > > 6.
> > >
> > > > (6) The writtenSchema removes fields that do not have default values
> in
> > > > > readSchema. Then the schema is incompatible.
> > > > Protobuf gives you its own default if you don't supply one. This is
> > > > incorrect.
> > >
> > >
> > > (1) This rule only applies if proto2 does not set the default value. If
> > > proto3 does not check, the default value will always be there.
> > >
> > > (2) In PIP issue:
> > >
> > > > Proto3 canceled the required field, so there is no need to check the
> > > > required field. We get the syntax(proto2 or proto3) of proto in the
> > code,
> > > > and skip the check of the required field if it is proto3. All other
> > > > checking rules also apply to proto3.
> > >
> > >
> > > *I made a mistake here. This default value check is not need in
> proto3. I
> > > will modify the rules later according to your suggestion.*
> > >
> > > > I would remove the proto2/proto3 sections, since they only differ in
> 1
> > > > rule, and just mention that distinction inside that rule (less work
> for
> > > the
> > > > reade).
> > >
> > >
> > > (3) And add rules that look like they should be checked:
> > >
> > > > Rules that you don't have in the doc, but should IMO*
> > > > .......
> > >
> > > There can't be a field in writerSchema, that exists in readerSchema
> (tag
> > > > number based), which in writerSchema its type is scalar, but in
> > > readSchema
> > > > its type is scalar, it's repeated but with packed=true.
> > >
> > >
> > > But I don't think rule number three needs to be added.
> > >
> > >
> > > Thanks,
> > > sinan
> > >
> > >
> > >
> > > Asaf Mesika <as...@gmail.com> 于2023年3月14日周二 22:33写道:
> > >
> > > > Hi Sinan,
> > > >
> > > > The doc looks much better!
> > > >
> > > > I have a few additional comments:
> > > >
> > > > Pasting comment from previous emails:
> > > >
> > > > Can you convert the code block which is actually a quote in the
> > > > beginning of the PIP to something which doesn't require to scroll
> > > > horizontally so much?
> > > > Use
> > > >
> > > >
> > >
> >
> https://docs.github.com/en/get-started/writing-on-github/getting-started-with-writing-and-formatting-on-github/basic-writing-and-formatting-syntax#quoting-text
> > > >
> > > > *Validation Rules*
> > > >
> > > > (1) If the root message names of writtenSchema and readSchema are
> > > > > different, then incompatible.
> > > >
> > > > Why? The root message name is not written over the wire to the best
> of
> > my
> > > > knowledge. I haven't found it written in the official doc.
> > > >
> > > >
> > > > >    - The writtenSchema cannot add required fields, but optional or
> > > > >    duplicate fields can be added (The field number must be new).
> > > > >
> > > > > That's not true.
> > > > You can have a required field in writeSchema, and not have that field
> > in
> > > > readSchema (based on tag number).
> > > >
> > > > The writtenSchema can not change the field number of any field in
> > > > readSchema (the
> > > > > field name is the same, but the field number is different).
> > > >
> > > > You have to take into account field type as well when comparing.
> > > >
> > > > If I have in readSchema
> > > > int32 justAnID = 1
> > > > int32 customerId = 2
> > > >
> > > > and in writeSchema I have
> > > > int32 justAnID = 1
> > > > string customerId = 3
> > > >
> > > > This is valid.
> > > >
> > > > (4) The writtenSchema can not change the field name of any field in
> > > > > readSchema (the field number is the same, but the field name is
> > > > > different).
> > > >
> > > > This is incorrect.
> > > > Fields names are encoded into the wire. I don't see this in any best
> > > > practice.
> > > >
> > > > ) The writtenSchema does not change the field name and number, but it
> > > does
> > > > > change the field type.
> > > > >
> > > > >    - If the field type is ENUM or MESSAGE, the schema is not
> > compatible
> > > > >    when the type name is changed
> > > > >    - If the type of the field is another type. The schemas under
> this
> > > > >    rule are not incompatible, but warn the user.(There is another
> way
> > > of
> > > > >    testing in PIP issue Alternatives)
> > > > >
> > > > > Small correction: for the same field number you are not allowed to
> > > change
> > > > types. Name is irrelevant.
> > > >
> > > > (6) The writtenSchema removes fields that do not have default values
> in
> > > > > readSchema. Then the schema is incompatible.
> > > >
> > > > Protobuf gives you its own default if you don't supply one. This is
> > > > incorrect.
> > > >
> > > >
> > > > *Rules that you don't have in the doc, but should IMO*
> > > > * There can not be a field which exists both in readSchema and
> > > writeSchema,
> > > > with same tag number, having different default values
> > > > * There can't be a field in writerSchema, that exists in readerSchema
> > > (tag
> > > > number based), which in writerSchema is repeated and its type is
> > Scalar (
> > > > https://protobuf.dev/programming-guides/proto/#scalar) but in
> > readSchema
> > > > it
> > > > is not repeated anymore.
> > > > * There can't be a field in writerSchema, that exists in readerSchema
> > > (tag
> > > > number based), which in writerSchema its type is scalar, but in
> > > readSchema
> > > > its type is scalar, it's repeated but with packed=true.
> > > >
> > > > *Rules you have , but I would phrase a bit differently*
> > > >
> > > > I would remove the proto2/proto3 sections, since they only differ in
> 1
> > > > rule, and just mention that distinction inside that rule (less work
> for
> > > the
> > > > reade).
> > > >
> > > > * readSchema has a field which doesn't exist in writerSchema (based
> on
> > > tag
> > > > number).
> > > >    * Proto v2:
> > > >       * That field must be `optional` or `repeated` (must not be
> > > > `required`)
> > > >    * Proto v3:
> > > >       * No problem.
> > > > * There can not be a field which exists both in readSchema and
> > > writeSchema,
> > > > with the same tag number, but having different types.
> > > >
> > > > *Motivation*
> > > >
> > > > Basically in the motivation section you want people to understand the
> > > > following:
> > > >
> > > > Pulsar has built-in support for typed messages. It allows specifying
> an
> > > > encoding scheme and its matching schema.
> > > > For example, it supports Avro. You specify a schema for a given
> topic,
> > > > using Avro Schema Definition (i.e. a JSON describing the schema).
> > > Everytime
> > > > you produce a message, you first declare the schema definition you
> wish
> > > to
> > > > use for your messages. The message data should be an avro-encoded
> > binary
> > > > data (which the client in some SDKs helps encode a given
> > > > data-structure/object).
> > > > The same applies when you consume a message. You first specify the
> > schema
> > > > you use to read the messages, and the client in some SDKs helps by
> > > decoding
> > > > the message binary data into an object/data-structure.
> > > >
> > > > Each time you specify a schema to be used, either by a producer or a
> > > > consumer, the schema is persisted in Pulsar and given an increasing
> > > version
> > > > number. If the schema was the same as the previous version, it is not
> > > > saved. When the message is persisted, the version number is encoded
> in
> > > the
> > > > message headers.
> > > >
> > > > Pulsar provides a very useful feature named Schema Evolution
> > > > <
> > >
> >
> https://pulsar.apache.org/docs/2.11.x/schema-understand/#schema-evolution
> > > > >.
> > > > It allows us to check if a new schema version is compatible with
> > previous
> > > > versions or versions. When you configure the schema for the topic you
> > > > decide the strategy to use for doing the validation check. The
> > strategies
> > > > validate the following:
> > > >
> > > >    - BACKWARD strategy
> > > >       - A consumer with newSchema can read a message written using
> > > >       existingSchema
> > > >    - BACKWARD_TRANSITIVE strategy
> > > >       - A consumer with newSchema can read messages written using all
> > > >       existingSchema
> > > >    - FORWARD
> > > >       - A consumer with existingSchema can read messages written
> using
> > > >       newSchema
> > > >    - FORWARD_TRANSITIVE
> > > >       - A consumer defined with any of the existingSchema can read
> > > messages
> > > >       written using newSchema
> > > >    - FULL
> > > >       - A consumer defined with newSchema can read messages written
> > using
> > > >       existingSchema
> > > >       - A consumer defined with existingSchema can read messages
> > written
> > > >       using newSchema
> > > >    - FULL_TRANSITIVE
> > > >       - A consumer defined with newSchema can read messages written
> > using
> > > >       any of the existingSchema
> > > >       - A consumer defined with any of the existingSchema can read
> > > messages
> > > >       written using newSchema
> > > >
> > > >
> > > > Aside from Avro, Pulsar also has two additional supported encodings:
> > > > PROTOBUF and PROTOBUF_NATIVE.
> > > >
> > > > PROTOBUF is a bit strange. It encodes the messages using Protobuf
> > > encoding,
> > > > but the schema that is persisted to Pulsar is *not* Protobuf
> Descriptor
> > > as
> > > > you would have expected. The saved schema is a translation of the
> > > Protobuf
> > > > Descriptor to an Avro Schema, so in fact an Avro schema definition is
> > > saved
> > > > as the schema.
> > > >
> > > > PROTOBUF_NATIVE was created to fix that shortcoming, by actually
> > > persisting
> > > > the Protobuf Descriptor and using Protobuf for encoding.
> > > > The problem is that the authors of PROTOBUF_NATIVE haven't completed
> it
> > > > fully, and the backward compatibility validation code almost does not
> > > > exist: It only checks if the root message name is the same between
> > > > versions.
> > > >
> > > > GOALS
> > > > The goal of this PIP is to amend PROTOBUF_NATIVE by adding a fully
> > > > functional validation for any of the defined Schema Compatibility
> > > > Strategies.
> > > > A secondary goal is to allow the user to choose between different
> > > > implementations: The new fully functional validation or the existing
> > > > barebones validation.
> > > >
> > > > -------- END
> > > >
> > > > I'm ok with having links in the Motivation , as *further reading*.
> > > > I'm against stacking up work for the reader to go read 5-6 different
> > > links
> > > > just to understand the motivation and background knowledge required
> to
> > > > understand the feature.
> > > >
> > > > I'm against putting code in the Motivation. Especially if it is
> > supposed
> > > to
> > > > replace description in plain English making it easy to understand the
> > > > design.
> > > > Leave the code to the motivation.
> > > > Paste code only if you absolutely can't use plain old descriptions to
> > > > explain.
> > > >
> > > >
> > > >
> > > > On Sat, Mar 11, 2023 at 11:46 AM SiNan Liu <li...@gmail.com>
> > > wrote:
> > > >
> > > > > *I guess that's right, too! *
> > > > >
> > > > > But the name `ProtobufNativeAdvancedSchemaCompatibilityCheck` is
> > > better,
> > > > > because we don't know whether the future will have V2, V3. The
> > official
> > > > > solution can be called
> > > `ProtobufNativeOfficialSchemaCompatibilityCheck`,
> > > > or
> > > > > is a good `ProtobufNativeXXXXXXXXSchemaCompatibilityCheck`
> > third-party
> > > > > solution.
> > > > >
> > > > > I've updated my design in PIP issue.
> > > > > 1. A new ProtobufNativeSchemaAdvanceCompatibilityCheck, rather
> than a
> > > > > ProtobufNativeSchemaCompatibilityCheck different validator
> > > > implementation.
> > > > > 2. Remove the 'builder'
> > > > > 3. Clarify the relationship between newSchema, existingSchema, and
> > > > > writtenSchema in canRead.
> > > > >
> > > > > Help to see if the description is comprehensive and what changes
> and
> > > > > improvements need to be made.
> > > > >
> > > > > Thanks,
> > > > > sinan
> > > > >
> > > > >
> > > > >
> > > > > Asaf Mesika <as...@gmail.com> 于2023年3月9日周四 17:35写道:
> > > > >
> > > > > > I like Bo's suggestion - I haven't realized each schema type
> > > > > > compatibility check is actually a plugin.
> > > > > > It makes sense for any schema type checks to evolve, sometimes
> in a
> > > > > > non-backward compatible way hence having two plugins like
> > > > > > protobufNativeSchemaCompatabilityCheckV1 and then
> > > > > > protobufNativeSchemaCompatabilityCheckV2 and then
> > > > > > protobufNativeSchemaCompatabilityCheckV3 makes sense to me.
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Thu, Mar 9, 2023 at 5:49 AM 丛搏 <bo...@apache.org> wrote:
> > > > > >
> > > > > > >  Hi siNan:
> > > > > > >
> > > > > > > From my point of view, it is just a plug-in. I don't think it
> is
> > > > > > > necessary to add configuration for the plugin.
> > > > > > > This is meaningless, and it will increase the difficulty of use
> > for
> > > > > > users.
> > > > > > >
> > > > > > >
> > > > > > > SiNan Liu <li...@gmail.com> 于2023年3月8日周三 15:54写道:
> > > > > > > >
> > > > > > > > Hi, bo.
> > > > > > > >
> > > > > > > > 1. I understand what you say, to develop a new
> > > > > > > > `ProtobufNativeAdvancedSchemaCompatibilityCheck`, rather than
> > > > > changing
> > > > > > > > existing `ProtobufNativeSchemaCompatibilityCheck`. But I
> found
> > a
> > > > few
> > > > > > > small
> > > > > > > > problems:
> > > > > > > >
> > > > > > > > (1)ProtobufNativeAdvancedSchemaCompatibilityCheck and
> > > > > > > > ProtobufNativeSchemaCompatibilityCheck schemaType is
> > > > PROTOBUF_NATIVE.
> > > > > > It
> > > > > > > > looks like both checkers are PROTOBUF not using
> AVRO-PROTOBUF's
> > > > > > "native"
> > > > > > > > implementation, which leads to some problems or
> "unreasonable"
> > > and
> > > > > > gives
> > > > > > > me
> > > > > > > > some extended thinking and questions.
> > > > > > > >
> > > > > > > `CompatibilityCheck ` its only a plugin.
> > > > > > > `ProtobufNativeSchemaCompatibilityCheck` may sooner or later
> > leave
> > > > the
> > > > > > > stage, when `ProtobufNativeAdvancedSchemaCompatibilityCheck` is
> > > > > > > stable, we can make it the default Checker.
> > > > > > >
> > > > > > > It is just a plug-in, users can change it at will and ensure
> that
> > > it
> > > > > > > is used correctly
> > > > > > > > (2)In broker.conf
> > > > > > > >
> > > > > > > > `schemaRegistryCompatibilityCheckers`. If
> > > > > > > > ProtobufNativeSchemaCompatibilityCheck and
> > > > > > > > ProtobufNativeAdvancedSchemaCompatibilityCheck all set. This
> is
> > > > going
> > > > > > to
> > > > > > > > overwrite each other. Because this is a map:
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/pulsar/blob/af1360fb167c1f9484fda5771df3ea9b21d1440b/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/schema/SchemaRegistryService.java#L36-L44
> > > > > > > >
> > > > > > > > ```java
> > > > > > > >
> > > > > > > > Map<SchemaType, SchemaCompatibilityCheck> checkers = new
> > > > HashMap<>();
> > > > > > > >
> > > > > > > > for (String className : checkerClasses) {
> > > > > > > >
> > > > > > > > SchemaCompatibilityCheck schemaCompatibilityCheck =
> > > > > > > > Reflections.createInstance(className,
> > > > > > > >
> > > > > > > > SchemaCompatibilityCheck.class,
> > > > > > > > Thread.currentThread().getContextClassLoader());
> > > > > > > >
> > > > > > > > checkers.put(schemaCompatibilityCheck.getSchemaType(),
> > > > > > > > schemaCompatibilityCheck);
> > > > > > > >
> > > > > > > > ```
> > > > > > > >
> > > > > > > > Is this a big problem or a small one? Is it possible or
> > > > unnecessary?
> > > > > > > Maybe
> > > > > > > > we can write in the documentation that protobufNative
> checkers
> > > can
> > > > > only
> > > > > > > > choose one of the two? Why are there two Checkers for
> different
> > > > > > > > implementations of the same schemaType? Why not the checker
> to
> > > > create
> > > > > > > > different validator, so we don not have to change
> > > > > > > > schemaRegistryCompatibilityCheckers.
> > > > > > >
> > > > > > > users can only use one, not two, which will bring complexity to
> > > users
> > > > > > >
> > > > > > > >
> > > > > > > > (3)And after the update to
> > > > > > > ProtobufNativeAdvancedSchemaCompatibilityCheck.
> > > > > > > > Existing topics previously only checked the name of the root
> > > > message,
> > > > > > not
> > > > > > > > the content of protobuf.
> > > > > > > >
> > > > > > > > What if the user wants both Checkers?
> > > > > > > >
> > > > > > > > Set to ProtobufNativeAdvancedSchemaCompatibilityCheck, affect
> > the
> > > > > topic
> > > > > > > of
> > > > > > > > the existing schema?
> > > > > > > >
> > > > > > > > Older topics still use the old checker, and newer topics or
> > > certain
> > > > > > older
> > > > > > > > topics use the new advancedchecker.
> > > > > > > >
> > > > > > > when `ProtobufNativeAdvancedSchemaCompatibilityCheck` stable,
> > > > > > > users will not choose `ProtobufNativeSchemaCompatibilityCheck`.
> > > > > > > because it not a complete checker.
> > > > > > > > (4)So should we have one schemaType for a checker?
> > > > > > protobufNativeChecker
> > > > > > > > can have as many different implementation classes as
> possible.
> > > This
> > > > > > > > classname configuration in PIP, let's see if it can be set at
> > the
> > > > > topic
> > > > > > > > level. In the current PIP design I just load this parameter
> > into
> > > > the
> > > > > > > > checker when the broker is started and the checkers map is
> set
> > > up.
> > > > > Can
> > > > > > I
> > > > > > > do
> > > > > > > > this in the new normal pr if I want to support topic level?
> Or
> > > > > perfect
> > > > > > it
> > > > > > > > here?
> > > > > > > >
> > > > > > > > Add a call PROTOBUF_NATIVE_ADVANCE schemaType corresponding
> > > > > > > > ProtobufNativeAdvancedSchemaCompatibilityCheck? (Seems to be
> > more
> > > > > > > trouble).
> > > > > > > >
> > > > > > > > Sorry I can not use the computer and network in the company,
> I
> > > use
> > > > my
> > > > > > > > mobile phone to reply to the email, the format may be a bit
> > > messy.
> > > > > > Please
> > > > > > > > understand.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > >
> > > > > > > > sinan
> > > > > > > >
> > > > > > > >
> > > > > > > > 丛搏 <bo...@apache.org> 于 2023年3月7日周二 下午11:39写道:
> > > > > > > >
> > > > > > > > > SiNan Liu <li...@gmail.com> 于2023年3月7日周二 13:22写道:
> > > > > > > > > >
> > > > > > > > > > Great to see your comment, bo!
> > > > > > > > > >
> > > > > > > > > > 1. The first way. The protobuf website has a description
> of
> > > the
> > > > > > > rules,
> > > > > > > > > but
> > > > > > > > > > no plans to implement them.
> > > > > > > > > > https://protobuf.dev/programming-guides/proto/#updating
> > > > > > > > >
> > > > > > > > > https://groups.google.com/g/protobuf
> > > > > > > > > maybe ask here
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > 2. I think this PIP can be divided into two parts.
> > > > > > > > > > (1) Add a flag(`ValidatorClassName`), load it into
> > > > > > > > > > `ProtobufNativeSchemaCompatibilityCheck` when the broker
> > > > starts.
> > > > > > > > > > ValidatorClassName is empty by default, and the
> > > implementation
> > > > > > > continues
> > > > > > > > > as
> > > > > > > > > > before, with no change for the user.
> > > > > > > > >
> > > > > > > > > `ProtobufNativeSchemaCompatibilityCheck` is a plugin in
> > > > > `broker.conf`
> > > > > > > > > ```
> > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> schemaRegistryCompatibilityCheckers=org.apache.pulsar.broker.service.schema.JsonSchemaCompatibilityCheck,org.apache.pulsar.broker.service.schema.AvroSchemaCompatibilityCheck,org.apache.pulsar.broker.service.schema.ProtobufNativeSchemaCompatibilityCheck
> > > > > > > > > ```
> > > > > > > > > I do not recommend that we directly modify this plugin and
> > > > continue
> > > > > > to
> > > > > > > > > add configuration items, which will cause trouble for
> users.
> > > > > > > > > We have a lot of configs and it's getting very unwieldy.
> > > > > > > > > in my opinion, we don't change
> > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> `org.apache.pulsar.broker.service.schema.ProtobufNativeSchemaCompatibilityCheck`,
> > > > > > > > > it is a simple implementation, it doesn't go wrong very
> > often,
> > > > most
> > > > > > > > > users will use it. we can add another ProtobufNativeCheck
> > named
> > > > > > > > > `ProtobufNativeAdvancedSchemaCompatibilityCheck ` or other.
> > in
> > > > this
> > > > > > > > > way, we don't need to add this flag. There is no need to
> > > consider
> > > > > > > > > compatibility, it is just a plug-in and will not affect
> > current
> > > > > > logic.
> > > > > > > > > If the user needs it, just change the plugin to the new
> > > > > > implementation
> > > > > > > > >
> > > > > > > > > > ```java
> > > > > > > > > >     ProtobufNativeSchemaValidator DEFAULT =
> > (fromDescriptors,
> > > > > > > > > toDescriptor)
> > > > > > > > > > -> {
> > > > > > > > > >         for (Descriptors.Descriptor fromDescriptor :
> > > > > > > fromDescriptors) {
> > > > > > > > > >             // The default implementation only checks if
> > the
> > > > root
> > > > > > > message
> > > > > > > > > > has changed.
> > > > > > > > > >             if
> > > > > > > > > >
> > > > > (!fromDescriptor.getFullName().equals(toDescriptor.getFullName()))
> > > > > > {
> > > > > > > > > >                 throw new
> > > > ProtoBufCanReadCheckException("Protobuf
> > > > > > > root
> > > > > > > > > > message isn't allow change!");
> > > > > > > > > >             }
> > > > > > > > > >         }
> > > > > > > > > >     };
> > > > > > > > > > ```
> > > > > > > > > > `ValidatorClassName` value also can be set to the current
> > > > > > > implementation
> > > > > > > > > of
> > > > > > > > > > PIP add
> > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> `org.apache.pulsar.broker.service.schema.validator.ProtobufNativeSchemaBreakValidatorImpl`.
> > > > > > > > > >
> > > > > > > > > > (2) Recoding the
> `ProtobufNativeSchemaCompatibilityCheck`.
> > > > > Through
> > > > > > > the
> > > > > > > > > flag
> > > > > > > > > > (`ValidatorClassName`) to build different
> > > > > > > > > `ProtobufNativeSchemaValidator`.
> > > > > > > > > > Isn't it just a plug-in? The user can develop and choose
> a
> > > > > > different
> > > > > > > > > > `ProtobufNativeSchemaValidator`. I think it didn't change
> > the
> > > > > > logic,
> > > > > > > it
> > > > > > > > > > just allowed him to expand it.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > I think this PIP should be an enhancement and supplement
> to
> > > the
> > > > > > > function,
> > > > > > > > > > and there is no such thing as unnecessary and
> meaningless.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Thanks,
> > > > > > > > > > sinan
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > 丛搏 <bo...@apache.org> 于2023年3月7日周二 11:53写道:
> > > > > > > > > >
> > > > > > > > > > > I think we have two ways to do that.
> > > > > > > > > > >
> > > > > > > > > > > First way: We need to advance the improvement of java
> in
> > > > > > protobuf.
> > > > > > > Ask
> > > > > > > > > > > if they have plans to improve.
> > > > > > > > > > >
> > > > > > > > > > > Second way: the new PROTOBUF_NATIVE
> > > > `SchemaCompatibilityCheck`
> > > > > > > should
> > > > > > > > > > > be implemented as a plugin, don't change any existing
> > > plugin
> > > > > > logic
> > > > > > > > > > > (it's simple and already used). I don't recommend
> adding
> > > > flags
> > > > > > for
> > > > > > > > > > > rollback, it adds configuration and makes little sense.
> > > > > > > > > > >
> > > > > > > > > > > Thanks,
> > > > > > > > > > > Bo
> > > > > > > > > > >
> > > > > > > > > > > Asaf Mesika <as...@gmail.com> 于2023年3月6日周一
> > 23:00写道:
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Can you convert the code block which is actually a
> > quote
> > > in
> > > > > the
> > > > > > > > > > > > beginning of the PIP to something which doesn't
> require
> > > to
> > > > > > scroll
> > > > > > > > > > > > horizontally so much?
> > > > > > > > > > > > Use
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://docs.github.com/en/get-started/writing-on-github/getting-started-with-writing-and-formatting-on-github/basic-writing-and-formatting-syntax#quoting-text
> > > > > > > > > > > >
> > > > > > > > > > > > Let's improve the clarity of what you wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > "the PROTOBUF uses avro struct to store."
> > > > > > > > > > > > -->
> > > > > > > > > > > > When Schema type PROTOBUF is used, Pulsar Client
> > assumes
> > > > the
> > > > > > > object
> > > > > > > > > given
> > > > > > > > > > > > to it as message data is an auto-generated POJO
> > > containing
> > > > > the
> > > > > > > > > > > annotations
> > > > > > > > > > > > encoding the schema. The client is using a converter,
> > > which
> > > > > > > converts
> > > > > > > > > a
> > > > > > > > > > > > Protobuf schema descriptor into an Avro schema and
> > sends
> > > > that
> > > > > > as
> > > > > > > the
> > > > > > > > > > > Schema
> > > > > > > > > > > > of the producer/consumer.
> > > > > > > > > > > >
> > > > > > > > > > > > "On the broker side, protobuf and avro both use
> > > SchemaData
> > > > > > > converted
> > > > > > > > > to
> > > > > > > > > > > > org.apache.avro.Schema."
> > > > > > > > > > > > -->
> > > > > > > > > > > > Since the schema is an Avro schema, the
> implementation
> > of
> > > > > > > > > compatibility
> > > > > > > > > > > > check on the broker side is to simply re-use the
> > > > > compatibility
> > > > > > > check
> > > > > > > > > of
> > > > > > > > > > > the
> > > > > > > > > > > > AVRO schema type.
> > > > > > > > > > > >
> > > > > > > > > > > > "ProtobufSchema is different from
> ProtobufNativeSchema
> > in
> > > > > > schema
> > > > > > > > > > > > compatibility check it uses avro-protobuf.
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://central.sonatype.com/artifact/org.apache.avro/avro-protobuf/1.11.1/overview
> > > > > > > > > > > > But the current implementation of ProtobufNative
> schema
> > > > > > > compatibility
> > > > > > > > > > > > check only
> > > > > > > > > > > > checked if the root message name is changed."
> > > > > > > > > > > >
> > > > > > > > > > > > -->
> > > > > > > > > > > > PROTOBUF_NATIVE schema type is different.
> > > > > > > > > > > > The client is actually using Protobuf Descriptor as
> the
> > > > > schema,
> > > > > > > as
> > > > > > > > > > > opposed
> > > > > > > > > > > > to Avro schema of PROTOBUF schema type. In the
> broker,
> > > the
> > > > > > > > > > > PROTOBUF_NATIVE
> > > > > > > > > > > > compatibility check actually hasn't implemented any
> > rule,
> > > > > > besides
> > > > > > > > > one:
> > > > > > > > > > > > checking if the root message name has changed.
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >    1. For now, there is no official or third-party
> > > > solution
> > > > > > for
> > > > > > > > > > > ProtoBuf
> > > > > > > > > > > > >    compatibility. If in the future have better
> > > solutions
> > > > > of a
> > > > > > > third
> > > > > > > > > > > party or
> > > > > > > > > > > > >    the official, we develop new
> > > > > ProtobufNativeSchemaValidator
> > > > > > > and
> > > > > > > > > use,
> > > > > > > > > > > so
> > > > > > > > > > > > >    add a flag.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Who do you need to make that configurable? Once you
> > > > found a
> > > > > > > third
> > > > > > > > > > > party,
> > > > > > > > > > > > just switch to it? Who knows, maybe you never will.
> > > > Introduce
> > > > > > it
> > > > > > > > > when you
> > > > > > > > > > > > find it, not now.
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > We improve in ProtobufNativeSchemaCompatibilityCheck
> > > > > BACKWARD,
> > > > > > > > > FORWARD
> > > > > > > > > > > > > these strategies. As with the AVRO implementation,
> > > > protobuf
> > > > > > > > > > > compatibility
> > > > > > > > > > > > > checking need implementing the canRead method.
> *This
> > > will
> > > > > > check
> > > > > > > > > that
> > > > > > > > > > > > > the writtenschema can be read by readSchema.*
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > I completely disagree.
> > > > > > > > > > > > Avro implementation is confusing for our use case.
> > Don't
> > > > copy
> > > > > > > that.
> > > > > > > > > > > >
> > > > > > > > > > > > You have
> > > > > > > > > > > >
> > > > > > > > > > > > public void checkCompatible(SchemaData from,
> SchemaData
> > > to,
> > > > > > > > > > > > SchemaCompatibilityStrategy strategy)
> > > > > > > > > > > >         throws IncompatibleSchemaException {
> > > > > > > > > > > >     Descriptor fromDescriptor =
> > > > > > > > > > > >
> ProtobufNativeSchemaUtils.deserialize(from.getData());
> > > > > > > > > > > >     Descriptor toDescriptor =
> > > > > > > > > > > > ProtobufNativeSchemaUtils.deserialize(to.getData());
> > > > > > > > > > > >     switch (strategy) {
> > > > > > > > > > > >         case BACKWARD_TRANSITIVE:
> > > > > > > > > > > >         case BACKWARD:
> > > > > > > > > > > >         case FORWARD_TRANSITIVE:
> > > > > > > > > > > >         case FORWARD:
> > > > > > > > > > > >         case FULL_TRANSITIVE:
> > > > > > > > > > > >         case FULL:
> > > > > > > > > > > >             checkRootMessageChange(fromDescriptor,
> > > > > > toDescriptor,
> > > > > > > > > > > strategy);
> > > > > > > > > > > >             return;
> > > > > > > > > > > >         case ALWAYS_COMPATIBLE:
> > > > > > > > > > > >             return;
> > > > > > > > > > > >         default:
> > > > > > > > > > > >             throw new
> > > IncompatibleSchemaException("Unknown
> > > > > > > > > > > > SchemaCompatibilityStrategy.");
> > > > > > > > > > > >     }
> > > > > > > > > > > > }
> > > > > > > > > > > >
> > > > > > > > > > > > I would rename :
> > > > > > > > > > > > from --> currentSchema
> > > > > > > > > > > > to --> newSchema
> > > > > > > > > > > >
> > > > > > > > > > > > Use that switch case and have a method for each like:
> > > > > > > > > > > > validateBackwardsCompatibility(currentSchema,
> > newSchema)
> > > > > > > > > > > >
> > > > > > > > > > > > I dislike canRead and usage of writtenSchema, since
> you
> > > > have
> > > > > > two
> > > > > > > > > > > completely
> > > > > > > > > > > > different use cases: from the producing side and the
> > > > consumer
> > > > > > > side.
> > > > > > > > > > > >
> > > > > > > > > > > > schemaValidatorBuilder
> > > > > > > > > > > > >
> > > > > > > > > > > > > I dislike this proposal. IMO Avro implementation is
> > way
> > > > too
> > > > > > > > > > > complicated.
> > > > > > > > > > > > Why not have a simple function for validation for
> each
> > > > switch
> > > > > > > case
> > > > > > > > > above?
> > > > > > > > > > > > Why do we need strategy and builder, and all this
> > > > complexity?
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > *Here are the basic compatibility rules we've
> defined:*
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > IMO it's impossible to read the validation rules as
> you
> > > > > > described
> > > > > > > > > them.
> > > > > > > > > > > > I wrote how they should be structured numerous times
> > > above.
> > > > > > > > > > > > I can't validate them.
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > IMO, the current design is very hard to read.
> > > > > > > > > > > > Please try to avoid jumping into code sections.
> > > > > > > > > > > > Write a high level design section, in which you
> > describe
> > > in
> > > > > > words
> > > > > > > > > what
> > > > > > > > > > > you
> > > > > > > > > > > > plan to do.
> > > > > > > > > > > > Write the validation rules in the structure that is
> > easy
> > > to
> > > > > > > > > understand:
> > > > > > > > > > > > rules per each compatibility check, and use proper
> > words
> > > > > > (current
> > > > > > > > > schema,
> > > > > > > > > > > > new schema), since new schema can be once used for
> read
> > > and
> > > > > > once
> > > > > > > > > used for
> > > > > > > > > > > > write.
> > > > > > > > > > > >
> > > > > > > > > > > > In its current form it takes too much time to
> > understand
> > > > the
> > > > > > > design,
> > > > > > > > > and
> > > > > > > > > > > it
> > > > > > > > > > > > shouldn't be the case.
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks,
> > > > > > > > > > > >
> > > > > > > > > > > > Asaf
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > On Sun, Mar 5, 2023 at 3:58 PM SiNan Liu <
> > > > > > liusinan1998@gmail.com
> > > > > > > >
> > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > Hi! I updated the explanation of some things in the
> > PIP
> > > > > > issue.
> > > > > > > And
> > > > > > > > > also
> > > > > > > > > > > > > added a new “flag” in the conf is used as the
> > different
> > > > > > > > > > > > > ProtobufNativeSchemaValidator implementation, also
> > set
> > > > > > > > > > > > > ProtobufNativeSchemaValidator default only check
> > > whether
> > > > > the
> > > > > > > name
> > > > > > > > > of
> > > > > > > > > > > the
> > > > > > > > > > > > > root message is the same.
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > sinan
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > Asaf Mesika <as...@gmail.com> 于2023年3月5日周日
> > > > 20:21写道:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > On Wed, Mar 1, 2023 at 4:33 PM SiNan Liu <
> > > > > > > liusinan1998@gmail.com
> > > > > > > > > >
> > > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Can you please explain how a Protobuf Schema
> > > > > descriptor
> > > > > > > can
> > > > > > > > > be
> > > > > > > > > > > > > > validated
> > > > > > > > > > > > > > > > for backward compatibility check using Avro
> > based
> > > > > > > > > compatibility
> > > > > > > > > > > > > rules?
> > > > > > > > > > > > > > > > Doesn't it expect the schema to be Avro, but
> it
> > > is
> > > > > > > actually a
> > > > > > > > > > > > > Protobuf
> > > > > > > > > > > > > > > > descriptor?
> > > > > > > > > > > > > > > > Is there some translation happening?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > 1. *You can take a quick look at the previous
> > > design,
> > > > > the
> > > > > > > > > PROTOBUF
> > > > > > > > > > > uses
> > > > > > > > > > > > > > > avro struct to store.*
> > > > > > > > > > > > > > > https://github.com/apache/pulsar/pull/1954
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/ProtobufSchema.java#L59-L61
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/ProtobufSchema.java#L110-L115
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Ok. So to summarize your code (easier to write it
> > > than
> > > > > send
> > > > > > > > > links):
> > > > > > > > > > > > > > * Pulsar Client, when used with Protobuf Schema,
> > > > actually
> > > > > > > > > converts
> > > > > > > > > > > the
> > > > > > > > > > > > > > Protobuf descriptor into an Avro Schema (using
> code
> > > > found
> > > > > > > inside
> > > > > > > > > Avro
> > > > > > > > > > > > > > library) and saves that Avro schema as the
> schema.
> > > It's
> > > > > not
> > > > > > > > > saving
> > > > > > > > > > > the
> > > > > > > > > > > > > > protobuf descriptor at all. Very confusing I have
> > to
> > > > add
> > > > > -
> > > > > > > never
> > > > > > > > > > > expected
> > > > > > > > > > > > > > that.
> > > > > > > > > > > > > > This explains why In the
> > > > ProtobufSchemaCompatibilityCheck
> > > > > > > they
> > > > > > > > > just
> > > > > > > > > > > > > extend
> > > > > > > > > > > > > > the Avro without doing any translation.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Thanks for that.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Now thatI finally understand this, I can say
> that:
> > > you
> > > > > > *must*
> > > > > > > > > explain
> > > > > > > > > > > > > that
> > > > > > > > > > > > > > in the motivation part in your PIP.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > 2. *On the broker side, protobuf and avro both
> > use
> > > > > > > `SchemaData`
> > > > > > > > > > > > > converted
> > > > > > > > > > > > > > > to `org.apache.avro.Schema`.*
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/ServerCnx.java#L1280-L1293
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/schema/ProtobufSchemaCompatibilityCheck.java#L26-L31
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/schema/AvroSchemaBasedCompatibilityCheck.java#L47-L70
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Actually those links don't really help.
> > > > > > > > > > > > > > The main link that helps is:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/pulsar/blob/ec102fb024a6ea2b195826778300f20e330dff06/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/ProtobufSchema.java#L102-L122
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I'm sorry - I don't understand.
> > > > > > > > > > > > > > > > I understand the different compatibility
> check
> > > > > > > strategies.
> > > > > > > > > > > > > > > > If you just spell them out here, then as you
> > say,
> > > > > just
> > > > > > > > > translate
> > > > > > > > > > > the
> > > > > > > > > > > > > > > > Protobuf Descriptor into an Avro schema and
> run
> > > the
> > > > > > Avro
> > > > > > > > > > > > > > > > compatibility validation, no?
> > > > > > > > > > > > > > > > I believe the answer is no, since you may
> want
> > to
> > > > > > verify
> > > > > > > > > > > different
> > > > > > > > > > > > > > things
> > > > > > > > > > > > > > > > when it comes to Protobuf, which are
> different
> > > then
> > > > > > Avro.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > 1.
> > > > > > > > > > > > > > > *ProtobufSchema is different from
> > > > ProtobufNativeSchema
> > > > > in
> > > > > > > that
> > > > > > > > > it
> > > > > > > > > > > uses
> > > > > > > > > > > > > > > avro-protobuf.*
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://central.sonatype.com/artifact/org.apache.avro/avro-protobuf/1.11.1/overview
> > > > > > > > > > > > > > > *ProtobufNativeSchema needs a native
> > compatibility
> > > > > check,
> > > > > > > but
> > > > > > > > > > > there is
> > > > > > > > > > > > > no
> > > > > > > > > > > > > > > official or third party implementation. So this
> > PIP
> > > > > does
> > > > > > > not
> > > > > > > > > use
> > > > > > > > > > > > > > > avro-protobuf for protobuf compatibility
> > checking.*
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > 2. *By the way, this is implemented in much the
> > > same
> > > > > way
> > > > > > > that
> > > > > > > > > > > Apache
> > > > > > > > > > > > > avro
> > > > > > > > > > > > > > > does compatibility checking.*
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/SchemaValidatorBuilder.java
> > > > > > > > > > > > > > >
> > > > > > `canReadStrategy`,`canBeReadStrategy`,`mutualReadStrategy`
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/ValidateCanRead.java
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/ValidateCanBeRead.java
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/ValidateMutualRead.java
> > > > > > > > > > > > > > > *In `ValidateMutualRead.java`, the arguments of
> > > > > > > `canRead()` are
> > > > > > > > > > > > > > > writtenSchema and readSchema. We only need to
> > > change
> > > > > the
> > > > > > > order
> > > > > > > > > of
> > > > > > > > > > > > > > arguments
> > > > > > > > > > > > > > > we pass to `canRead()`.*
> > > > > > > > > > > > > > > ```java
> > > > > > > > > > > > > > > private void
> > > > > validateWithStrategy(Descriptors.Descriptor
> > > > > > > > > > > toValidate,
> > > > > > > > > > > > > > > Descriptors.Descriptor fromDescriptor) throws
> > > > > > > > > > > > > > ProtoBufCanReadCheckException
> > > > > > > > > > > > > > > {
> > > > > > > > > > > > > > > switch (strategy) {
> > > > > > > > > > > > > > > case CanReadExistingStrategy ->
> > > > canRead(fromDescriptor,
> > > > > > > > > > > toValidate);
> > > > > > > > > > > > > > > case CanBeReadByExistingStrategy ->
> > > > canRead(toValidate,
> > > > > > > > > > > > > fromDescriptor);
> > > > > > > > > > > > > > > case CanBeReadMutualStrategy -> {
> > > > > > > > > > > > > > > canRead(toValidate, fromDescriptor);
> > > > > > > > > > > > > > > canRead(fromDescriptor, toValidate);
> > > > > > > > > > > > > > > }
> > > > > > > > > > > > > > > }
> > > > > > > > > > > > > > > }
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > private void canRead(Descriptors.Descriptor
> > > > > > writtenSchema,
> > > > > > > > > > > > > > > Descriptors.Descriptor readSchema) throws
> > > > > > > > > > > > > ProtoBufCanReadCheckException {
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> ProtobufNativeSchemaBreakCheckUtils.checkSchemaCompatibility(writtenSchema,
> > > > > > > > > > > > > > > readSchema);
> > > > > > > > > > > > > > > }
> > > > > > > > > > > > > > > ```
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > I get that you want to take inspiration from the
> > > > existing
> > > > > > > Avro
> > > > > > > > > Schema
> > > > > > > > > > > > > > compatibility check, to do your code design.
> > > > > > > > > > > > > > I also understand you *won't* use any existing
> avro
> > > > code
> > > > > > for
> > > > > > > > > that.
> > > > > > > > > > > > > > I also understand, you have to write the
> validation
> > > > check
> > > > > > on
> > > > > > > your
> > > > > > > > > > > own,
> > > > > > > > > > > > > > since there is no 3rd party to explain that.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > The only thing I can't understand are the actual
> > > rules
> > > > > as I
> > > > > > > wrote
> > > > > > > > > > > before,
> > > > > > > > > > > > > > since they are written confusingly.
> > > > > > > > > > > > > > So, I repeat what I asked before:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I think you should structure the validation rules
> > > > > > > differently:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > * Backward checks
> > > > > > > > > > > > > > ** List down rules, where use newSchema (the
> schema
> > > > used
> > > > > by
> > > > > > > > > producer
> > > > > > > > > > > or
> > > > > > > > > > > > > > consumer) and existingSchema (last schema used)
> > > > > > > > > > > > > > * Forward
> > > > > > > > > > > > > > ** List down rules, where use newSchema (the
> schema
> > > > used
> > > > > by
> > > > > > > > > producer
> > > > > > > > > > > or
> > > > > > > > > > > > > > consumer) and existingSchema (last schema used)
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Once that's accomplished I will be able to
> > understand
> > > > the
> > > > > > > > > different
> > > > > > > > > > > > > > validation rules for each compatibility check.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > sinan
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Asaf Mesika <as...@gmail.com>
> 于2023年3月1日周三
> > > > > > 21:19写道:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On Mon, Feb 27, 2023 at 3:47 PM SiNan Liu <
> > > > > > > > > > > liusinan1998@gmail.com>
> > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > I read it and they look identical. What's
> > the
> > > > > > > difference
> > > > > > > > > > > between
> > > > > > > > > > > > > > > them?
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Current avro,json, and protobuf schemas are
> > all
> > > > > > > implemented
> > > > > > > > > > > based
> > > > > > > > > > > > > on
> > > > > > > > > > > > > > > AVRO.
> > > > > > > > > > > > > > > > > > What do you mean, they are all
> implemented
> > > > based
> > > > > on
> > > > > > > > > Avro? You
> > > > > > > > > > > > > mean
> > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > protobuf schema is converted into an Avro
> > > > Schema,
> > > > > > and
> > > > > > > > > then
> > > > > > > > > > > you
> > > > > > > > > > > > > use
> > > > > > > > > > > > > > > Avro
> > > > > > > > > > > > > > > > > > compatibility validation?
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> `org.apache.pulsar.broker.service.schema.ProtobufSchemaCompatibilityCheck`
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > >
> > > >
> `org.apache.pulsar.broker.service.schema.AvroSchemaCompatibilityCheck`
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > >
> > > >
> `org.apache.pulsar.broker.service.schema.JsonSchemaCompatibilityCheck`
> > > > > > > > > > > > > > > > > They all extends
> > > > > `AvroSchemaBasedCompatibilityCheck`,
> > > > > > > the
> > > > > > > > > > > > > > > > > `checkCompatible()` is the same
> > implementation
> > > > with
> > > > > > > `AVRO`.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Can you please explain how a Protobuf Schema
> > > > > descriptor
> > > > > > > can
> > > > > > > > > be
> > > > > > > > > > > > > > validated
> > > > > > > > > > > > > > > > for backward compatibility check using Avro
> > based
> > > > > > > > > compatibility
> > > > > > > > > > > > > rules?
> > > > > > > > > > > > > > > > Doesn't it expect the schema to be Avro, but
> it
> > > is
> > > > > > > actually a
> > > > > > > > > > > > > Protobuf
> > > > > > > > > > > > > > > > descriptor?
> > > > > > > > > > > > > > > > Is there some translation happening?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > I think you should structure the validation
> > > rules
> > > > > > > > > differently:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > The Compatibility check strategy is
> described
> > > on
> > > > > the
> > > > > > > > > website
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://pulsar.apache.org/docs/next/schema-understand/#schema-compatibility-check-strategy
> > > > > > > > > > > > > > > > > 1. BACKWARD(CanReadExistingStrategy):
> > Consumers
> > > > > using
> > > > > > > > > schema
> > > > > > > > > > > V3 can
> > > > > > > > > > > > > > > process
> > > > > > > > > > > > > > > > > data written by producers using the last
> > schema
> > > > > > version
> > > > > > > > > V2. So
> > > > > > > > > > > V2
> > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > "writtenSchema" and V3 is "readSchema".
> > > > > > > > > > > > > > > > > 2. FORWARD(CanBeReadByExistingStrategy):
> > > > Consumers
> > > > > > > using
> > > > > > > > > the
> > > > > > > > > > > last
> > > > > > > > > > > > > > > schema
> > > > > > > > > > > > > > > > > version V2 can process data written by
> > > producers
> > > > > > using
> > > > > > > a
> > > > > > > > > new
> > > > > > > > > > > schema
> > > > > > > > > > > > > > V3,
> > > > > > > > > > > > > > > > > even though they may not be able to use the
> > > full
> > > > > > > > > capabilities
> > > > > > > > > > > of
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > new
> > > > > > > > > > > > > > > > > schema. So V3 is "writtenSchema" and V2 is
> > > > > > > "readSchema".
> > > > > > > > > > > > > > > > > 3. FULL(CanBeReadMutualStrategy): Schemas
> are
> > > > both
> > > > > > > > > backward and
> > > > > > > > > > > > > > forward
> > > > > > > > > > > > > > > > > compatible.
> > > > > > > > > > > > > > > > > Schema can evolve. The old version schema
> and
> > > the
> > > > > new
> > > > > > > > > version
> > > > > > > > > > > > > schema
> > > > > > > > > > > > > > > should
> > > > > > > > > > > > > > > > > be well understood.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > I'm sorry - I don't understand.
> > > > > > > > > > > > > > > > I understand the different compatibility
> check
> > > > > > > strategies.
> > > > > > > > > > > > > > > > If you just spell them out here, then as you
> > say,
> > > > > just
> > > > > > > > > translate
> > > > > > > > > > > the
> > > > > > > > > > > > > > > > Protobuf Descriptor into an Avro schema and
> run
> > > the
> > > > > > Avro
> > > > > > > > > > > > > > > > compatibility validation, no?
> > > > > > > > > > > > > > > > I believe the answer is no, since you may
> want
> > to
> > > > > > verify
> > > > > > > > > > > different
> > > > > > > > > > > > > > things
> > > > > > > > > > > > > > > > when it comes to Protobuf, which are
> different
> > > then
> > > > > > Avro.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > At the current state, I can't understand your
> > > > design
> > > > > at
> > > > > > > all.
> > > > > > > > > > > Please
> > > > > > > > > > > > > > help
> > > > > > > > > > > > > > > > clarify that.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > So each strategy should have its own
> section.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > The arguments of `canRead()` are
> > writtenSchema
> > > > and
> > > > > > > > > readSchema.
> > > > > > > > > > > As
> > > > > > > > > > > > > > we've
> > > > > > > > > > > > > > > > > just described, we only need to change the
> > > order
> > > > of
> > > > > > > > > arguments
> > > > > > > > > > > we
> > > > > > > > > > > > > pass
> > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > `canRead()`.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > > > sinan
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Asaf Mesika <as...@gmail.com>
> > > > 于2023年2月27日周一
> > > > > > > 20:49写道:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > And you can see the difference between
> > > > ProtoBuf
> > > > > > and
> > > > > > > > > > > > > > ProtoBufNative:
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > >
> > > >
> https://pulsar.apache.org/docs/next/schema-get-started/#protobufnative
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > >
> https://pulsar.apache.org/docs/next/schema-get-started/#protobuf
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >  I read it and they look identical.
> What's
> > > the
> > > > > > > difference
> > > > > > > > > > > between
> > > > > > > > > > > > > > > them?
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Current avro,json, and protobuf schemas
> are
> > > all
> > > > > > > > > implemented
> > > > > > > > > > > based
> > > > > > > > > > > > > > on
> > > > > > > > > > > > > > > > > AVRO.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > What do you mean, they are all
> implemented
> > > > based
> > > > > on
> > > > > > > > > Avro? You
> > > > > > > > > > > > > mean
> > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > protobuf schema is converted into an Avro
> > > > Schema,
> > > > > > and
> > > > > > > > > then
> > > > > > > > > > > you
> > > > > > > > > > > > > use
> > > > > > > > > > > > > > > Avro
> > > > > > > > > > > > > > > > > > compatibility validation?
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > *Here are the basic compatibility rules
> > > we've
> > > > > > > defined:*
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > I think you should structure the
> validation
> > > > rules
> > > > > > > > > > > differently:
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > * Backward checks
> > > > > > > > > > > > > > > > > > ** List down rules, where use newSchema
> > (the
> > > > > schema
> > > > > > > used
> > > > > > > > > by
> > > > > > > > > > > > > > producer
> > > > > > > > > > > > > > > or
> > > > > > > > > > > > > > > > > > consumer) and existingSchema (last schema
> > > used)
> > > > > > > > > > > > > > > > > > * Forward
> > > > > > > > > > > > > > > > > > ** List down rules, where use newSchema
> > (the
> > > > > schema
> > > > > > > used
> > > > > > > > > by
> > > > > > > > > > > > > > producer
> > > > > > > > > > > > > > > or
> > > > > > > > > > > > > > > > > > consumer) and existingSchema (last schema
> > > used)
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > So each strategy should have its own
> > section.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > I'm saying this since you used
> > > "writttenSchema"
> > > > > > word
> > > > > > > but
> > > > > > > > > it
> > > > > > > > > > > > > > > represents
> > > > > > > > > > > > > > > > > > something completely different if it's
> > > backward
> > > > > or
> > > > > > > > > forward
> > > > > > > > > > > check.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Once you'll have that structure like
> that,
> > I
> > > > > > > personally
> > > > > > > > > will
> > > > > > > > > > > be
> > > > > > > > > > > > > > able
> > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > read and understand it.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > The motivation and problem statement are
> > now
> > > > > good -
> > > > > > > > > thanks
> > > > > > > > > > > for
> > > > > > > > > > > > > > > improving
> > > > > > > > > > > > > > > > > > it.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > On Mon, Feb 27, 2023 at 8:20 AM SiNan
> Liu <
> > > > > > > > > > > > > liusinan1998@gmail.com>
> > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Hi! I updated the PIP issue again. This
> > > time
> > > > > I've
> > > > > > > added
> > > > > > > > > > > some
> > > > > > > > > > > > > > > background
> > > > > > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > > > some explanations.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > The compatibility check rules are
> already
> > > > > written
> > > > > > > in
> > > > > > > > > the
> > > > > > > > > > > > > > > > > Implementation.
> > > > > > > > > > > > > > > > > > > ProtoBufNative implements the same
> > canRead
> > > > > method
> > > > > > > as
> > > > > > > > > Apache
> > > > > > > > > > > > > Avro.
> > > > > > > > > > > > > > > > > > > It does this by checking whether the
> > schema
> > > > for
> > > > > > > > > writing and
> > > > > > > > > > > > > > reading
> > > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > > > compatible. I also indicate whether the
> > > > > > > writtenSchema
> > > > > > > > > and
> > > > > > > > > > > > > > > readSchema of
> > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > Backward, Forward, and Full strategies
> > are
> > > > the
> > > > > > old
> > > > > > > or
> > > > > > > > > the
> > > > > > > > > > > new
> > > > > > > > > > > > > > > version
> > > > > > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > > > > the schema.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > > > > > sinan
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Asaf Mesika <as...@gmail.com>
> > > > > > 于2023年2月26日周日
> > > > > > > > > 23:24写道:
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > I'm sorry, but this PIP lacks a lot
> of
> > > > > > background
> > > > > > > > > > > knowledge,
> > > > > > > > > > > > > so
> > > > > > > > > > > > > > > you
> > > > > > > > > > > > > > > > > > need
> > > > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > > add IMO for people to understand it.
> > You
> > > > > don't
> > > > > > > need
> > > > > > > > > to
> > > > > > > > > > > > > explain
> > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > entire
> > > > > > > > > > > > > > > > > > > > pulsar in this PIP, but at the very
> > > least a
> > > > > few
> > > > > > > > > > > paragraphs
> > > > > > > > > > > > > > > detailing
> > > > > > > > > > > > > > > > > > all
> > > > > > > > > > > > > > > > > > > > you need to know, to put you in
> > context:
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >    - Start by saying Pulsar as a
> > built-in
> > > > > > schema
> > > > > > > > > registry
> > > > > > > > > > > > > > inside
> > > > > > > > > > > > > > > > > Pulsar
> > > > > > > > > > > > > > > > > > > >    broker.
> > > > > > > > > > > > > > > > > > > >       - Every time the client updates
> > the
> > > > > > > schema, it
> > > > > > > > > > > uploads
> > > > > > > > > > > > > it
> > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > >       broker. When that happens, it
> > has a
> > > > > > feature
> > > > > > > > > which
> > > > > > > > > > > > > > validates
> > > > > > > > > > > > > > > if
> > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > new
> > > > > > > > > > > > > > > > > > > >       schema version is compatible
> with
> > > the
> > > > > > > previous
> > > > > > > > > > > > > versions.
> > > > > > > > > > > > > > > There
> > > > > > > > > > > > > > > > > > > > are 4 types
> > > > > > > > > > > > > > > > > > > >       of compatibility: Full, ...
> > > (complete
> > > > > and
> > > > > > > > > explain
> > > > > > > > > > > each
> > > > > > > > > > > > > > one
> > > > > > > > > > > > > > > > > > briefly)
> > > > > > > > > > > > > > > > > > > >    - Also explain Pulsar Schema
> > registry
> > > > > > supports
> > > > > > > > > various
> > > > > > > > > > > > > > schema
> > > > > > > > > > > > > > > > > > > >    protocols:  Avro, protobuf native,
> > ...
> > > > > > > (complete
> > > > > > > > > the
> > > > > > > > > > > > > rest),
> > > > > > > > > > > > > > > each
> > > > > > > > > > > > > > > > > > > > protocol
> > > > > > > > > > > > > > > > > > > >    has a schema which dictates how to
> > > > > serialize
> > > > > > > and
> > > > > > > > > > > > > deserialize
> > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > message
> > > > > > > > > > > > > > > > > > > >    content into typed object.
> > > > > > > > > > > > > > > > > > > >    - Explain in short what is
> protobuf
> > > > native
> > > > > > > > > (compare
> > > > > > > > > > > > > protobuf
> > > > > > > > > > > > > > > > > > > non-native)
> > > > > > > > > > > > > > > > > > > >    - Please don't paste code instead
> of
> > > > > > > explaining.
> > > > > > > > > > > > > > > > > > > >       - Explain that protobuf native
> > > > current
> > > > > > > > > validation
> > > > > > > > > > > check
> > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > only
> > > > > > > > > > > > > > > > > > > >       composed of checking the root
> > > message
> > > > > > name
> > > > > > > is
> > > > > > > > > the
> > > > > > > > > > > same
> > > > > > > > > > > > > > > between
> > > > > > > > > > > > > > > > > > > > the current
> > > > > > > > > > > > > > > > > > > >       schema version and the new
> > version.
> > > > > > > > > > > > > > > > > > > >          - Explain briefly what is a
> > root
> > > > > > message
> > > > > > > > > and its
> > > > > > > > > > > > > name.
> > > > > > > > > > > > > > > > > > > >       - Explain the problem (list
> > > > scenarios)
> > > > > > > that we
> > > > > > > > > have
> > > > > > > > > > > > > > because
> > > > > > > > > > > > > > > > > > > protobuf
> > > > > > > > > > > > > > > > > > > >       native schema only supports
> FULL
> > > > > > > compatibility
> > > > > > > > > > > > > > validation.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Regarding high level design - as in
> > what
> > > > you
> > > > > > > plan to
> > > > > > > > > do.
> > > > > > > > > > > > > > > > > > > > I suggest you add "High Level Design"
> > and
> > > > in
> > > > > it
> > > > > > > > > detail
> > > > > > > > > > > how
> > > > > > > > > > > > > you
> > > > > > > > > > > > > > > plan
> > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > > validate, per protobuf version, per
> > > > > > compatibility
> > > > > > > > > check
> > > > > > > > > > > > > > > (backward,
> > > > > > > > > > > > > > > > > > > forward,
> > > > > > > > > > > > > > > > > > > > full,...).
> > > > > > > > > > > > > > > > > > > > I tried reading the implementation -
> > for
> > > > me ,
> > > > > > > it's
> > > > > > > > > all
> > > > > > > > > > > over
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > place.
> > > > > > > > > > > > > > > > > > > Can
> > > > > > > > > > > > > > > > > > > > you please list in order what I wrote
> > > > above,
> > > > > > and
> > > > > > > > > list the
> > > > > > > > > > > > > > > validation
> > > > > > > > > > > > > > > > > > > rules
> > > > > > > > > > > > > > > > > > > > with a good explanation why you
> > validate
> > > it
> > > > > > like
> > > > > > > > > that?
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Lastly, one you have all the
> validation
> > > > rules
> > > > > > > clearly
> > > > > > > > > > > stated,
> > > > > > > > > > > > > > you
> > > > > > > > > > > > > > > can
> > > > > > > > > > > > > > > > > > use
> > > > > > > > > > > > > > > > > > > > it to document it properly so users
> can
> > > > know
> > > > > > what
> > > > > > > > > > > validation
> > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > expect.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Asaf
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > On Wed, Feb 22, 2023 at 5:10 PM SiNan
> > > Liu <
> > > > > > > > > > > > > > > liusinan1998@gmail.com>
> > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Sorry, my mistake. I removed the
> code
> > > and
> > > > > > > > > described the
> > > > > > > > > > > > > > design
> > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > improve
> > > > > > > > > > > > > > > > > > > > > the PROTOBUF_NATIVE schema
> > > compatibility
> > > > > > > checks.
> > > > > > > > > You
> > > > > > > > > > > can
> > > > > > > > > > > > > > have a
> > > > > > > > > > > > > > > > > look.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Asaf Mesika <asaf.mesika@gmail.com
> >
> > > > > > > 于2023年2月22日周三
> > > > > > > > > > > 21:16写道:
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > I read it but you're almost
> > directly
> > > > > diving
> > > > > > > into
> > > > > > > > > the
> > > > > > > > > > > > > code -
> > > > > > > > > > > > > > > it
> > > > > > > > > > > > > > > > > will
> > > > > > > > > > > > > > > > > > > > take
> > > > > > > > > > > > > > > > > > > > > me
> > > > > > > > > > > > > > > > > > > > > > hours just to reverse engineer
> your
> > > > > design.
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > Can you please include a "High
> > Level
> > > > > > Design"
> > > > > > > > > section
> > > > > > > > > > > in
> > > > > > > > > > > > > > which
> > > > > > > > > > > > > > > you
> > > > > > > > > > > > > > > > > > > > explain
> > > > > > > > > > > > > > > > > > > > > > how you plan to tackle any issue?
> > > > > > > > > > > > > > > > > > > > > > If I can read that section and
> > > explain
> > > > to
> > > > > > > someone
> > > > > > > > > > > else
> > > > > > > > > > > > > how
> > > > > > > > > > > > > > > this
> > > > > > > > > > > > > > > > > > will
> > > > > > > > > > > > > > > > > > > > > work,
> > > > > > > > > > > > > > > > > > > > > > it means the section is complete.
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > Let's leave the code to the PRs.
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > On Sun, Feb 19, 2023 at 2:59 PM
> > SiNan
> > > > > Liu <
> > > > > > > > > > > > > > > > > liusinan1998@gmail.com>
> > > > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > Hi all,
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > I made a PIP to discuss:
> > > > > > > > > > > > > > > > > > > >
> > > > > https://github.com/apache/pulsar/issues/19565
> > > > > > > > > > > > > > > > > > > > > .
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > We can talk about the current
> > > design
> > > > > > here.
> > > > > > > > > > > Especially
> > > > > > > > > > > > > for
> > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > field
> > > > > > > > > > > > > > > > > > > > > type
> > > > > > > > > > > > > > > > > > > > > > > change check rules, please give
> > > your
> > > > > > > valuable
> > > > > > > > > > > advice.
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > > > > > > > > > Sinan
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] PIP-246: Improved PROTOBUF_NATIVE schema compatibility checks without using avro-protobuf

Posted by SiNan Liu <li...@gmail.com>.
1.

> message SearchReq { string query = 1; int32 page_number = 2; int32
> result_per_page = 3;}
> Then second version I use:
> message SearchRequest { string query = 1; int32 page_number = 2;
> int32 result_per_page = 3;}


The rule in PIP improve the previous implementation, so the previous
implementation needs to be added.
If the user switches from the previous implementation to the current
implementation of PIP, and the current implementation doesn't check for
root message name changes, isn't that changing the old behavior? This PIP
is to make compatibility checking more stringent.


2.

> This is a guideline for a human making the change, not for software
> validation right?
> When you write down the code for doing the comparison to know if the field
> number has changed, you have to take into account both the field name AND
> field type. If both are equal but the field number is different then fail
> it.


I use the field name here to match, the name is the same, but the number is
not the same will be incompatible.
https://github.com/apache/pulsar/pull/19566/files#diff-14b840259375cc8fcae564586fc1c2a188cb71822ab5995ed6807f7dab30c210R129-R132
PIP description here is the rule, but how to implement it is a matter in
PR. I have considered the example you gave here, which is also incompatible
in this case.

Your changes are also described here:
https://stackoverflow.com/questions/65230623/safeness-of-changing-proto-field-number


3.

> If you'll read sentence 2 from the guide, you'll see they write
> "This means that any messages serialized by code using your “old” message
> format can be parsed by your new generated code, as they won’t be missing
> any required elements."
> This is for the *other side*: when you add a required to readSchema
> compare
> to write schema. This of course is not allowed and won't work.


Note that in proto2: **Any new fields that you add should be optional or
repeated.**
https://protobuf.dev/programming-guides/proto2/#updating

I also stated in PIP that proto3 will not check this rule for required
field changes.
This rule applies to proto2. However, proto3 removes required. If you use
proto3, it will not check for changes to the required field


4.

> My write schema is
> message SearchRequest { string query = 1; int32 page_number = 2;
> int32 result_per_page = 3;}
> my read schema is
> message SearchRequest { string query = 1; int32 page_number = 2;
> int32 resultPerPage = 3;}
> - Non-required fields can be removed, as long as the field number is not
> used again in your updated message type. You may want to rename the field
> instead, perhaps adding the prefix “OBSOLETE_”, or make the field number
> reserved <https://protobuf.dev/programming-guides/proto2/#reserved>, so
> that future users of your .proto can’t accidentally reuse the number.
> First, they are talking about removal of fields. They suggest you rename
> the field to OBSOLETE_myField instead of actually deleting it, to make it
> easier on compatability. They don't talk about rules that forbid you to
> rename a field. It's unrelated.


Maybe you are right that changing the field name is compatible.
https://stackoverflow.com/questions/45431685/protocol-buffer-does-changing-field-name-break-the-message/45431953#comment84548234_45431953

But it is interesting I tested the use `avro-proto` implementation
`ProtobufSchemaCompatibilityCheck` compatibility check.(
`org.apache.pulsar.broker.service.schema.BaseAvroSchemaCompatibilityTest`)

private static final String schemaJson1 =
"{\"type\":\"record\",\"name\":\"DefaultTest\",\"namespace\":\"org.apache.pulsar.broker.service.schema"
+
".AvroSchemaCompatibilityCheckTest\",\"fields\":[
{\"name\":\"field1\",\"type\":\"string\"}]}";

private static final String schemaJson4 =
"{\"type\":\"record\",\"name\":\"DefaultTest\",\"namespace\":\"org.apache.pulsar.broker.service.schema"
+
".AvroSchemaCompatibilityCheckTest\",\"fields\":[{\"name\":\"field1_v2\",\"type\":\"string\","
+
"\"aliases\":[\"field1\"]}]}";

schemaCompatibilityCheck.isCompatible(schemaData1, schemaData4,
SchemaCompatibilityStrategy.BACKWARD);
This is compatible.

But that's not compatible
schemaCompatibilityCheck.isCompatible(schemaData1, schemaData4,
SchemaCompatibilityStrategy.FORWARD);

So, what do you think of AVRO implementation
`ProtobufSchemaCompatibilityCheck`, rules about renaming are wrong is it?
It looks something like this.


5.

> I think my previous explanation in this email reply should be good enough
> to explain why the name is irrelevant.
> If not, please ping me on this.
> Regarding your example, can you quote it? I don't understand the read and
> write schema here.
> Also, regarding message names and why they are irrelevant. Same thing: Read
> https://protobuf.dev/programming-guides/encoding/
> In there you will see the names are not encoded thus can be changed freely
> and still successfully decode binary data.


Reader.proto
```protobuf
syntax = "proto2";
package proto.reader;
import "ExternalReader.proto";
option java_package = "org.apache.pulsar.client.schema.proto.reader";
option java_outer_classname = "Reader";

enum WeekEnum {
......
}

message ProtobufMessage {
required string protobufFoo = 1;
required double protobufBar = 2;
}

message ProtobufSchema {
......
optional ProtobufMessage messageField = 12;
......
optional WeekEnum enumField = 15;
}
```

WriterWithTypeNameChange.proto:
```protobuf
syntax = "proto2";
package proto.writerWithTypeNameChange;
import "ExternalReader.proto";
option java_package =
"org.apache.pulsar.client.schema.proto.writerWithTypeNameChange";
option java_outer_classname = "WriterWithTypeNameChange";

enum WeekEnum {
......
}

message ProtobufMessage_V2 {
required string protobufFoo = 1;
required double protobufBar = 2;
}

message ProtobufSchema {
......
optional ProtobufMessage_V2 messageField = 12;
......
optional WeekEnum enumField = 15;
......
}
```

*Doesn't that change the field type? Should that be compatible?*


6.

> So if my write schema is
> message SearchRequest { required string query = 1; optional int32
> page_number = 2; optional int32 result_per_page = 3;}
> and my read schema is
> message SearchRequest { optional int32 page_number = 2; optional
> int32 result_per_page = 3;}
> You can see I removed a field from writeSchema that does not have a default
> value.
> First, I can read it without any problems: Field 2 might exists, no
> problem. Field 3 might exists, no problem. I ignore field 1.
> So the validation you wrote is no correct at all, without any regard to
> optional value.


*It looks like you've added a required field here, which is not allowed (in
3 i am said that).*
*It also uses the number 1 of the deleted field.*

What if I do the other way around?
> My write schema is:
> message SearchRequest { optional int32 page_number = 2; optional
> int32 result_per_page = 3;}
> My read schema is:
> message SearchRequest { required string query = 1; optional int32
> page_number = 2; optional int32 result_per_page = 3;}
> Here I will fail, since it is required I will provide value for field 1 but
> alas it will never be there.


Yes, the required field has been removed, which is incompatible.
The check for changes to the required field starts at the beginning, at
which point an incompatible exception has been thrown and the following
compatibility check is not performed.

Now let's focus on the optional field, ok?
> Write schema
> message SearchRequest { optional int32 page_number = 2; optional
> int32 result_per_page = 3;}
> Read schema:
> message SearchRequest { optional string query = 1; optional int32
> page_number = 2; optional int32 result_per_page = 3;}
> When I read, I won't find field number 1 of course, so I will use the
> default value dictated by the type, as I quoted above, which is for string
> is the empty byte string.


*It looks like sixth item in PIP should be removed.*
*And Rule 7 in PIP should be removed:*

> Changing a default value is generally OK, as long as you remember that
> default values are never sent over the wire.
> Thus, if a program receives a message in which a particular field isn’t
> set, the program will see the default value as it was defined in that
> program’s version of the protocol.
> It will NOT see the default value that was defined in the sender’s code.

This is in the official document 11:
https://protobuf.dev/programming-guides/proto2/#updating


7.

> Read this and then please explain why rule number 3 should not be added:
>
> https://protobuf.dev/programming-guides/dos-donts/#dont-go-from-repeated-to-scalar


It looks like this rule needs to be added.



Thanks,
sinan



Asaf Mesika <as...@gmail.com> 于2023年3月19日周日 19:11写道:

> Also:
>
> (2) PROTOBUF_NATIVE was designed so that not use avro-protobuf for protobuf
> > schema compatibility checking.
>
> The root message name is the class name we pass in when we create the
> > producer or consumer. ProtoBuf has many nested messages or dependencies.
> > The current implementation only checks if the passed class name is the
> > same. It does not check if the fields in the file change in a way that is
> > compatible with older versions of the schema.
>
>
> You missed the most important thing there: PROTOBUF_NATIVE uses Protobuf
> Descriptor when persisting the schema. It's not using Avro Schema
> definition. As I wrote you in previous email:
>
> PROTOBUF_NATIVE was created to fix that shortcoming, by actually persisting
> > the Protobuf Descriptor and using Protobuf for encoding.
> >
>
>
>
>
>
> On Wed, Mar 15, 2023 at 5:46 PM SiNan Liu <li...@gmail.com> wrote:
>
> > 1.
> >
> > > > Why? The root message name is not written over the wire to the best
> of
> > my
> > > > knowledge. I haven't found it written in the official doc.
> >
> >
> > The name of the root message check is the rules in the previous
> > `ProtobufNativeSchemaCompatibilityCheck`. Because if the root message
> has a
> > different name, there is no need to check its contents. "Same" schema,
> > their names must be the same.
> >
> >
> > 2.
> >
> > > >The writtenSchema can not change the field number of any field in
> > > readSchema (the > field name is the same, but the field number is
> > > different).
> > > >You have to take into account field type as well when comparing.
> >
> >
> > The first sentence on the website says that the number of fields cannot
> be
> > changed.
> >
> > > Don’t change the field numbers for any existing fields.
> >
> >
> >
> > 3.
> >
> > > > - The writtenSchema cannot add required fields, but optional or
> > > > duplicate fields can be added (The field number must be new).
> > > >
> > > > That's not true.
> > > You can have a required field in writeSchema, and not have that field
> in
> > > readSchema (based on tag number).
> >
> >
> > The required field must exist. In your case, where does readSchema go to
> > read its required fields? It's not in writtenSchema.
> > The second and third sentences in the official website say, add and
> delete
> > do not operate required field!
> >
> >
> > 4.
> >
> > > (4) The writtenSchema can not change the field name of any field in
> > > > readSchema (the field number is the same, but the field name is
> > > > different).
> > > This is incorrect.
> > > Fields names are encoded into the wire. I don't see this in any best
> > > practice.
> >
> >
> > The third sentence on the website:
> >
> > > You may want to rename the field instead, perhaps adding the prefix
> > > “OBSOLETE_”, or make the field number reserved, so that future users of
> > > your .proto can’t accidentally reuse the number.
> >
> > If you want to rename a field, or add a new field. To delete with the new
> > number!
> >
> >
> > 5.
> >
> > > The writtenSchema does not change the field name and number, but it
> does
> > > change the field type.
> > > > Small correction: for the same field number you are not allowed to
> > change
> > > types. Name is irrelevant.
> >
> >
> > Why doesn't the name irrelevant?
> > Here is the change in type, which is the rule stated in Alternatives.
> There
> > is no check here, just a warning to the user.
> > Another change is that the name of enum is changed, or the name of
> MESSAGE
> > is changed, which is the same as the root message name check in 1, is the
> > change still the same? This is not allowed to change!
> >
> > *Here is example:*
> > readSchema(
> >
> >
> https://github.com/apache/pulsar/pull/19566/files#diff-a7006d73502e6064a80af02822f3a3072be498d8b677c4b838b0dafaea32dea4
> > )
> > writtenSchema(
> >
> >
> https://github.com/apache/pulsar/pull/19566/files#diff-e3e7543624edaf1e0a4fd47947a2cad6e4b816b93843f71a367042ba6c3ec53f
> > )
> >
> >
> > 6.
> >
> > > (6) The writtenSchema removes fields that do not have default values in
> > > > readSchema. Then the schema is incompatible.
> > > Protobuf gives you its own default if you don't supply one. This is
> > > incorrect.
> >
> >
> > (1) This rule only applies if proto2 does not set the default value. If
> > proto3 does not check, the default value will always be there.
> >
> > (2) In PIP issue:
> >
> > > Proto3 canceled the required field, so there is no need to check the
> > > required field. We get the syntax(proto2 or proto3) of proto in the
> code,
> > > and skip the check of the required field if it is proto3. All other
> > > checking rules also apply to proto3.
> >
> >
> > *I made a mistake here. This default value check is not need in proto3. I
> > will modify the rules later according to your suggestion.*
> >
> > > I would remove the proto2/proto3 sections, since they only differ in 1
> > > rule, and just mention that distinction inside that rule (less work for
> > the
> > > reade).
> >
> >
> > (3) And add rules that look like they should be checked:
> >
> > > Rules that you don't have in the doc, but should IMO*
> > > .......
> >
> > There can't be a field in writerSchema, that exists in readerSchema (tag
> > > number based), which in writerSchema its type is scalar, but in
> > readSchema
> > > its type is scalar, it's repeated but with packed=true.
> >
> >
> > But I don't think rule number three needs to be added.
> >
> >
> > Thanks,
> > sinan
> >
> >
> >
> > Asaf Mesika <as...@gmail.com> 于2023年3月14日周二 22:33写道:
> >
> > > Hi Sinan,
> > >
> > > The doc looks much better!
> > >
> > > I have a few additional comments:
> > >
> > > Pasting comment from previous emails:
> > >
> > > Can you convert the code block which is actually a quote in the
> > > beginning of the PIP to something which doesn't require to scroll
> > > horizontally so much?
> > > Use
> > >
> > >
> >
> https://docs.github.com/en/get-started/writing-on-github/getting-started-with-writing-and-formatting-on-github/basic-writing-and-formatting-syntax#quoting-text
> > >
> > > *Validation Rules*
> > >
> > > (1) If the root message names of writtenSchema and readSchema are
> > > > different, then incompatible.
> > >
> > > Why? The root message name is not written over the wire to the best of
> my
> > > knowledge. I haven't found it written in the official doc.
> > >
> > >
> > > >    - The writtenSchema cannot add required fields, but optional or
> > > >    duplicate fields can be added (The field number must be new).
> > > >
> > > > That's not true.
> > > You can have a required field in writeSchema, and not have that field
> in
> > > readSchema (based on tag number).
> > >
> > > The writtenSchema can not change the field number of any field in
> > > readSchema (the
> > > > field name is the same, but the field number is different).
> > >
> > > You have to take into account field type as well when comparing.
> > >
> > > If I have in readSchema
> > > int32 justAnID = 1
> > > int32 customerId = 2
> > >
> > > and in writeSchema I have
> > > int32 justAnID = 1
> > > string customerId = 3
> > >
> > > This is valid.
> > >
> > > (4) The writtenSchema can not change the field name of any field in
> > > > readSchema (the field number is the same, but the field name is
> > > > different).
> > >
> > > This is incorrect.
> > > Fields names are encoded into the wire. I don't see this in any best
> > > practice.
> > >
> > > ) The writtenSchema does not change the field name and number, but it
> > does
> > > > change the field type.
> > > >
> > > >    - If the field type is ENUM or MESSAGE, the schema is not
> compatible
> > > >    when the type name is changed
> > > >    - If the type of the field is another type. The schemas under this
> > > >    rule are not incompatible, but warn the user.(There is another way
> > of
> > > >    testing in PIP issue Alternatives)
> > > >
> > > > Small correction: for the same field number you are not allowed to
> > change
> > > types. Name is irrelevant.
> > >
> > > (6) The writtenSchema removes fields that do not have default values in
> > > > readSchema. Then the schema is incompatible.
> > >
> > > Protobuf gives you its own default if you don't supply one. This is
> > > incorrect.
> > >
> > >
> > > *Rules that you don't have in the doc, but should IMO*
> > > * There can not be a field which exists both in readSchema and
> > writeSchema,
> > > with same tag number, having different default values
> > > * There can't be a field in writerSchema, that exists in readerSchema
> > (tag
> > > number based), which in writerSchema is repeated and its type is
> Scalar (
> > > https://protobuf.dev/programming-guides/proto/#scalar) but in
> readSchema
> > > it
> > > is not repeated anymore.
> > > * There can't be a field in writerSchema, that exists in readerSchema
> > (tag
> > > number based), which in writerSchema its type is scalar, but in
> > readSchema
> > > its type is scalar, it's repeated but with packed=true.
> > >
> > > *Rules you have , but I would phrase a bit differently*
> > >
> > > I would remove the proto2/proto3 sections, since they only differ in 1
> > > rule, and just mention that distinction inside that rule (less work for
> > the
> > > reade).
> > >
> > > * readSchema has a field which doesn't exist in writerSchema (based on
> > tag
> > > number).
> > >    * Proto v2:
> > >       * That field must be `optional` or `repeated` (must not be
> > > `required`)
> > >    * Proto v3:
> > >       * No problem.
> > > * There can not be a field which exists both in readSchema and
> > writeSchema,
> > > with the same tag number, but having different types.
> > >
> > > *Motivation*
> > >
> > > Basically in the motivation section you want people to understand the
> > > following:
> > >
> > > Pulsar has built-in support for typed messages. It allows specifying an
> > > encoding scheme and its matching schema.
> > > For example, it supports Avro. You specify a schema for a given topic,
> > > using Avro Schema Definition (i.e. a JSON describing the schema).
> > Everytime
> > > you produce a message, you first declare the schema definition you wish
> > to
> > > use for your messages. The message data should be an avro-encoded
> binary
> > > data (which the client in some SDKs helps encode a given
> > > data-structure/object).
> > > The same applies when you consume a message. You first specify the
> schema
> > > you use to read the messages, and the client in some SDKs helps by
> > decoding
> > > the message binary data into an object/data-structure.
> > >
> > > Each time you specify a schema to be used, either by a producer or a
> > > consumer, the schema is persisted in Pulsar and given an increasing
> > version
> > > number. If the schema was the same as the previous version, it is not
> > > saved. When the message is persisted, the version number is encoded in
> > the
> > > message headers.
> > >
> > > Pulsar provides a very useful feature named Schema Evolution
> > > <
> >
> https://pulsar.apache.org/docs/2.11.x/schema-understand/#schema-evolution
> > > >.
> > > It allows us to check if a new schema version is compatible with
> previous
> > > versions or versions. When you configure the schema for the topic you
> > > decide the strategy to use for doing the validation check. The
> strategies
> > > validate the following:
> > >
> > >    - BACKWARD strategy
> > >       - A consumer with newSchema can read a message written using
> > >       existingSchema
> > >    - BACKWARD_TRANSITIVE strategy
> > >       - A consumer with newSchema can read messages written using all
> > >       existingSchema
> > >    - FORWARD
> > >       - A consumer with existingSchema can read messages written using
> > >       newSchema
> > >    - FORWARD_TRANSITIVE
> > >       - A consumer defined with any of the existingSchema can read
> > messages
> > >       written using newSchema
> > >    - FULL
> > >       - A consumer defined with newSchema can read messages written
> using
> > >       existingSchema
> > >       - A consumer defined with existingSchema can read messages
> written
> > >       using newSchema
> > >    - FULL_TRANSITIVE
> > >       - A consumer defined with newSchema can read messages written
> using
> > >       any of the existingSchema
> > >       - A consumer defined with any of the existingSchema can read
> > messages
> > >       written using newSchema
> > >
> > >
> > > Aside from Avro, Pulsar also has two additional supported encodings:
> > > PROTOBUF and PROTOBUF_NATIVE.
> > >
> > > PROTOBUF is a bit strange. It encodes the messages using Protobuf
> > encoding,
> > > but the schema that is persisted to Pulsar is *not* Protobuf Descriptor
> > as
> > > you would have expected. The saved schema is a translation of the
> > Protobuf
> > > Descriptor to an Avro Schema, so in fact an Avro schema definition is
> > saved
> > > as the schema.
> > >
> > > PROTOBUF_NATIVE was created to fix that shortcoming, by actually
> > persisting
> > > the Protobuf Descriptor and using Protobuf for encoding.
> > > The problem is that the authors of PROTOBUF_NATIVE haven't completed it
> > > fully, and the backward compatibility validation code almost does not
> > > exist: It only checks if the root message name is the same between
> > > versions.
> > >
> > > GOALS
> > > The goal of this PIP is to amend PROTOBUF_NATIVE by adding a fully
> > > functional validation for any of the defined Schema Compatibility
> > > Strategies.
> > > A secondary goal is to allow the user to choose between different
> > > implementations: The new fully functional validation or the existing
> > > barebones validation.
> > >
> > > -------- END
> > >
> > > I'm ok with having links in the Motivation , as *further reading*.
> > > I'm against stacking up work for the reader to go read 5-6 different
> > links
> > > just to understand the motivation and background knowledge required to
> > > understand the feature.
> > >
> > > I'm against putting code in the Motivation. Especially if it is
> supposed
> > to
> > > replace description in plain English making it easy to understand the
> > > design.
> > > Leave the code to the motivation.
> > > Paste code only if you absolutely can't use plain old descriptions to
> > > explain.
> > >
> > >
> > >
> > > On Sat, Mar 11, 2023 at 11:46 AM SiNan Liu <li...@gmail.com>
> > wrote:
> > >
> > > > *I guess that's right, too! *
> > > >
> > > > But the name `ProtobufNativeAdvancedSchemaCompatibilityCheck` is
> > better,
> > > > because we don't know whether the future will have V2, V3. The
> official
> > > > solution can be called
> > `ProtobufNativeOfficialSchemaCompatibilityCheck`,
> > > or
> > > > is a good `ProtobufNativeXXXXXXXXSchemaCompatibilityCheck`
> third-party
> > > > solution.
> > > >
> > > > I've updated my design in PIP issue.
> > > > 1. A new ProtobufNativeSchemaAdvanceCompatibilityCheck, rather than a
> > > > ProtobufNativeSchemaCompatibilityCheck different validator
> > > implementation.
> > > > 2. Remove the 'builder'
> > > > 3. Clarify the relationship between newSchema, existingSchema, and
> > > > writtenSchema in canRead.
> > > >
> > > > Help to see if the description is comprehensive and what changes and
> > > > improvements need to be made.
> > > >
> > > > Thanks,
> > > > sinan
> > > >
> > > >
> > > >
> > > > Asaf Mesika <as...@gmail.com> 于2023年3月9日周四 17:35写道:
> > > >
> > > > > I like Bo's suggestion - I haven't realized each schema type
> > > > > compatibility check is actually a plugin.
> > > > > It makes sense for any schema type checks to evolve, sometimes in a
> > > > > non-backward compatible way hence having two plugins like
> > > > > protobufNativeSchemaCompatabilityCheckV1 and then
> > > > > protobufNativeSchemaCompatabilityCheckV2 and then
> > > > > protobufNativeSchemaCompatabilityCheckV3 makes sense to me.
> > > > >
> > > > >
> > > > >
> > > > > On Thu, Mar 9, 2023 at 5:49 AM 丛搏 <bo...@apache.org> wrote:
> > > > >
> > > > > >  Hi siNan:
> > > > > >
> > > > > > From my point of view, it is just a plug-in. I don't think it is
> > > > > > necessary to add configuration for the plugin.
> > > > > > This is meaningless, and it will increase the difficulty of use
> for
> > > > > users.
> > > > > >
> > > > > >
> > > > > > SiNan Liu <li...@gmail.com> 于2023年3月8日周三 15:54写道:
> > > > > > >
> > > > > > > Hi, bo.
> > > > > > >
> > > > > > > 1. I understand what you say, to develop a new
> > > > > > > `ProtobufNativeAdvancedSchemaCompatibilityCheck`, rather than
> > > > changing
> > > > > > > existing `ProtobufNativeSchemaCompatibilityCheck`. But I found
> a
> > > few
> > > > > > small
> > > > > > > problems:
> > > > > > >
> > > > > > > (1)ProtobufNativeAdvancedSchemaCompatibilityCheck and
> > > > > > > ProtobufNativeSchemaCompatibilityCheck schemaType is
> > > PROTOBUF_NATIVE.
> > > > > It
> > > > > > > looks like both checkers are PROTOBUF not using AVRO-PROTOBUF's
> > > > > "native"
> > > > > > > implementation, which leads to some problems or "unreasonable"
> > and
> > > > > gives
> > > > > > me
> > > > > > > some extended thinking and questions.
> > > > > > >
> > > > > > `CompatibilityCheck ` its only a plugin.
> > > > > > `ProtobufNativeSchemaCompatibilityCheck` may sooner or later
> leave
> > > the
> > > > > > stage, when `ProtobufNativeAdvancedSchemaCompatibilityCheck` is
> > > > > > stable, we can make it the default Checker.
> > > > > >
> > > > > > It is just a plug-in, users can change it at will and ensure that
> > it
> > > > > > is used correctly
> > > > > > > (2)In broker.conf
> > > > > > >
> > > > > > > `schemaRegistryCompatibilityCheckers`. If
> > > > > > > ProtobufNativeSchemaCompatibilityCheck and
> > > > > > > ProtobufNativeAdvancedSchemaCompatibilityCheck all set. This is
> > > going
> > > > > to
> > > > > > > overwrite each other. Because this is a map:
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/pulsar/blob/af1360fb167c1f9484fda5771df3ea9b21d1440b/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/schema/SchemaRegistryService.java#L36-L44
> > > > > > >
> > > > > > > ```java
> > > > > > >
> > > > > > > Map<SchemaType, SchemaCompatibilityCheck> checkers = new
> > > HashMap<>();
> > > > > > >
> > > > > > > for (String className : checkerClasses) {
> > > > > > >
> > > > > > > SchemaCompatibilityCheck schemaCompatibilityCheck =
> > > > > > > Reflections.createInstance(className,
> > > > > > >
> > > > > > > SchemaCompatibilityCheck.class,
> > > > > > > Thread.currentThread().getContextClassLoader());
> > > > > > >
> > > > > > > checkers.put(schemaCompatibilityCheck.getSchemaType(),
> > > > > > > schemaCompatibilityCheck);
> > > > > > >
> > > > > > > ```
> > > > > > >
> > > > > > > Is this a big problem or a small one? Is it possible or
> > > unnecessary?
> > > > > > Maybe
> > > > > > > we can write in the documentation that protobufNative checkers
> > can
> > > > only
> > > > > > > choose one of the two? Why are there two Checkers for different
> > > > > > > implementations of the same schemaType? Why not the checker to
> > > create
> > > > > > > different validator, so we don not have to change
> > > > > > > schemaRegistryCompatibilityCheckers.
> > > > > >
> > > > > > users can only use one, not two, which will bring complexity to
> > users
> > > > > >
> > > > > > >
> > > > > > > (3)And after the update to
> > > > > > ProtobufNativeAdvancedSchemaCompatibilityCheck.
> > > > > > > Existing topics previously only checked the name of the root
> > > message,
> > > > > not
> > > > > > > the content of protobuf.
> > > > > > >
> > > > > > > What if the user wants both Checkers?
> > > > > > >
> > > > > > > Set to ProtobufNativeAdvancedSchemaCompatibilityCheck, affect
> the
> > > > topic
> > > > > > of
> > > > > > > the existing schema?
> > > > > > >
> > > > > > > Older topics still use the old checker, and newer topics or
> > certain
> > > > > older
> > > > > > > topics use the new advancedchecker.
> > > > > > >
> > > > > > when `ProtobufNativeAdvancedSchemaCompatibilityCheck` stable,
> > > > > > users will not choose `ProtobufNativeSchemaCompatibilityCheck`.
> > > > > > because it not a complete checker.
> > > > > > > (4)So should we have one schemaType for a checker?
> > > > > protobufNativeChecker
> > > > > > > can have as many different implementation classes as possible.
> > This
> > > > > > > classname configuration in PIP, let's see if it can be set at
> the
> > > > topic
> > > > > > > level. In the current PIP design I just load this parameter
> into
> > > the
> > > > > > > checker when the broker is started and the checkers map is set
> > up.
> > > > Can
> > > > > I
> > > > > > do
> > > > > > > this in the new normal pr if I want to support topic level? Or
> > > > perfect
> > > > > it
> > > > > > > here?
> > > > > > >
> > > > > > > Add a call PROTOBUF_NATIVE_ADVANCE schemaType corresponding
> > > > > > > ProtobufNativeAdvancedSchemaCompatibilityCheck? (Seems to be
> more
> > > > > > trouble).
> > > > > > >
> > > > > > > Sorry I can not use the computer and network in the company, I
> > use
> > > my
> > > > > > > mobile phone to reply to the email, the format may be a bit
> > messy.
> > > > > Please
> > > > > > > understand.
> > > > > > >
> > > > > > > Thanks,
> > > > > > >
> > > > > > > sinan
> > > > > > >
> > > > > > >
> > > > > > > 丛搏 <bo...@apache.org> 于 2023年3月7日周二 下午11:39写道:
> > > > > > >
> > > > > > > > SiNan Liu <li...@gmail.com> 于2023年3月7日周二 13:22写道:
> > > > > > > > >
> > > > > > > > > Great to see your comment, bo!
> > > > > > > > >
> > > > > > > > > 1. The first way. The protobuf website has a description of
> > the
> > > > > > rules,
> > > > > > > > but
> > > > > > > > > no plans to implement them.
> > > > > > > > > https://protobuf.dev/programming-guides/proto/#updating
> > > > > > > >
> > > > > > > > https://groups.google.com/g/protobuf
> > > > > > > > maybe ask here
> > > > > > > >
> > > > > > > > >
> > > > > > > > > 2. I think this PIP can be divided into two parts.
> > > > > > > > > (1) Add a flag(`ValidatorClassName`), load it into
> > > > > > > > > `ProtobufNativeSchemaCompatibilityCheck` when the broker
> > > starts.
> > > > > > > > > ValidatorClassName is empty by default, and the
> > implementation
> > > > > > continues
> > > > > > > > as
> > > > > > > > > before, with no change for the user.
> > > > > > > >
> > > > > > > > `ProtobufNativeSchemaCompatibilityCheck` is a plugin in
> > > > `broker.conf`
> > > > > > > > ```
> > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> schemaRegistryCompatibilityCheckers=org.apache.pulsar.broker.service.schema.JsonSchemaCompatibilityCheck,org.apache.pulsar.broker.service.schema.AvroSchemaCompatibilityCheck,org.apache.pulsar.broker.service.schema.ProtobufNativeSchemaCompatibilityCheck
> > > > > > > > ```
> > > > > > > > I do not recommend that we directly modify this plugin and
> > > continue
> > > > > to
> > > > > > > > add configuration items, which will cause trouble for users.
> > > > > > > > We have a lot of configs and it's getting very unwieldy.
> > > > > > > > in my opinion, we don't change
> > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> `org.apache.pulsar.broker.service.schema.ProtobufNativeSchemaCompatibilityCheck`,
> > > > > > > > it is a simple implementation, it doesn't go wrong very
> often,
> > > most
> > > > > > > > users will use it. we can add another ProtobufNativeCheck
> named
> > > > > > > > `ProtobufNativeAdvancedSchemaCompatibilityCheck ` or other.
> in
> > > this
> > > > > > > > way, we don't need to add this flag. There is no need to
> > consider
> > > > > > > > compatibility, it is just a plug-in and will not affect
> current
> > > > > logic.
> > > > > > > > If the user needs it, just change the plugin to the new
> > > > > implementation
> > > > > > > >
> > > > > > > > > ```java
> > > > > > > > >     ProtobufNativeSchemaValidator DEFAULT =
> (fromDescriptors,
> > > > > > > > toDescriptor)
> > > > > > > > > -> {
> > > > > > > > >         for (Descriptors.Descriptor fromDescriptor :
> > > > > > fromDescriptors) {
> > > > > > > > >             // The default implementation only checks if
> the
> > > root
> > > > > > message
> > > > > > > > > has changed.
> > > > > > > > >             if
> > > > > > > > >
> > > > (!fromDescriptor.getFullName().equals(toDescriptor.getFullName()))
> > > > > {
> > > > > > > > >                 throw new
> > > ProtoBufCanReadCheckException("Protobuf
> > > > > > root
> > > > > > > > > message isn't allow change!");
> > > > > > > > >             }
> > > > > > > > >         }
> > > > > > > > >     };
> > > > > > > > > ```
> > > > > > > > > `ValidatorClassName` value also can be set to the current
> > > > > > implementation
> > > > > > > > of
> > > > > > > > > PIP add
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> `org.apache.pulsar.broker.service.schema.validator.ProtobufNativeSchemaBreakValidatorImpl`.
> > > > > > > > >
> > > > > > > > > (2) Recoding the `ProtobufNativeSchemaCompatibilityCheck`.
> > > > Through
> > > > > > the
> > > > > > > > flag
> > > > > > > > > (`ValidatorClassName`) to build different
> > > > > > > > `ProtobufNativeSchemaValidator`.
> > > > > > > > > Isn't it just a plug-in? The user can develop and choose a
> > > > > different
> > > > > > > > > `ProtobufNativeSchemaValidator`. I think it didn't change
> the
> > > > > logic,
> > > > > > it
> > > > > > > > > just allowed him to expand it.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > I think this PIP should be an enhancement and supplement to
> > the
> > > > > > function,
> > > > > > > > > and there is no such thing as unnecessary and meaningless.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > sinan
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > 丛搏 <bo...@apache.org> 于2023年3月7日周二 11:53写道:
> > > > > > > > >
> > > > > > > > > > I think we have two ways to do that.
> > > > > > > > > >
> > > > > > > > > > First way: We need to advance the improvement of java in
> > > > > protobuf.
> > > > > > Ask
> > > > > > > > > > if they have plans to improve.
> > > > > > > > > >
> > > > > > > > > > Second way: the new PROTOBUF_NATIVE
> > > `SchemaCompatibilityCheck`
> > > > > > should
> > > > > > > > > > be implemented as a plugin, don't change any existing
> > plugin
> > > > > logic
> > > > > > > > > > (it's simple and already used). I don't recommend adding
> > > flags
> > > > > for
> > > > > > > > > > rollback, it adds configuration and makes little sense.
> > > > > > > > > >
> > > > > > > > > > Thanks,
> > > > > > > > > > Bo
> > > > > > > > > >
> > > > > > > > > > Asaf Mesika <as...@gmail.com> 于2023年3月6日周一
> 23:00写道:
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Can you convert the code block which is actually a
> quote
> > in
> > > > the
> > > > > > > > > > > beginning of the PIP to something which doesn't require
> > to
> > > > > scroll
> > > > > > > > > > > horizontally so much?
> > > > > > > > > > > Use
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://docs.github.com/en/get-started/writing-on-github/getting-started-with-writing-and-formatting-on-github/basic-writing-and-formatting-syntax#quoting-text
> > > > > > > > > > >
> > > > > > > > > > > Let's improve the clarity of what you wrote:
> > > > > > > > > > >
> > > > > > > > > > > "the PROTOBUF uses avro struct to store."
> > > > > > > > > > > -->
> > > > > > > > > > > When Schema type PROTOBUF is used, Pulsar Client
> assumes
> > > the
> > > > > > object
> > > > > > > > given
> > > > > > > > > > > to it as message data is an auto-generated POJO
> > containing
> > > > the
> > > > > > > > > > annotations
> > > > > > > > > > > encoding the schema. The client is using a converter,
> > which
> > > > > > converts
> > > > > > > > a
> > > > > > > > > > > Protobuf schema descriptor into an Avro schema and
> sends
> > > that
> > > > > as
> > > > > > the
> > > > > > > > > > Schema
> > > > > > > > > > > of the producer/consumer.
> > > > > > > > > > >
> > > > > > > > > > > "On the broker side, protobuf and avro both use
> > SchemaData
> > > > > > converted
> > > > > > > > to
> > > > > > > > > > > org.apache.avro.Schema."
> > > > > > > > > > > -->
> > > > > > > > > > > Since the schema is an Avro schema, the implementation
> of
> > > > > > > > compatibility
> > > > > > > > > > > check on the broker side is to simply re-use the
> > > > compatibility
> > > > > > check
> > > > > > > > of
> > > > > > > > > > the
> > > > > > > > > > > AVRO schema type.
> > > > > > > > > > >
> > > > > > > > > > > "ProtobufSchema is different from ProtobufNativeSchema
> in
> > > > > schema
> > > > > > > > > > > compatibility check it uses avro-protobuf.
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://central.sonatype.com/artifact/org.apache.avro/avro-protobuf/1.11.1/overview
> > > > > > > > > > > But the current implementation of ProtobufNative schema
> > > > > > compatibility
> > > > > > > > > > > check only
> > > > > > > > > > > checked if the root message name is changed."
> > > > > > > > > > >
> > > > > > > > > > > -->
> > > > > > > > > > > PROTOBUF_NATIVE schema type is different.
> > > > > > > > > > > The client is actually using Protobuf Descriptor as the
> > > > schema,
> > > > > > as
> > > > > > > > > > opposed
> > > > > > > > > > > to Avro schema of PROTOBUF schema type. In the broker,
> > the
> > > > > > > > > > PROTOBUF_NATIVE
> > > > > > > > > > > compatibility check actually hasn't implemented any
> rule,
> > > > > besides
> > > > > > > > one:
> > > > > > > > > > > checking if the root message name has changed.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >    1. For now, there is no official or third-party
> > > solution
> > > > > for
> > > > > > > > > > ProtoBuf
> > > > > > > > > > > >    compatibility. If in the future have better
> > solutions
> > > > of a
> > > > > > third
> > > > > > > > > > party or
> > > > > > > > > > > >    the official, we develop new
> > > > ProtobufNativeSchemaValidator
> > > > > > and
> > > > > > > > use,
> > > > > > > > > > so
> > > > > > > > > > > >    add a flag.
> > > > > > > > > > > >
> > > > > > > > > > > > Who do you need to make that configurable? Once you
> > > found a
> > > > > > third
> > > > > > > > > > party,
> > > > > > > > > > > just switch to it? Who knows, maybe you never will.
> > > Introduce
> > > > > it
> > > > > > > > when you
> > > > > > > > > > > find it, not now.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > We improve in ProtobufNativeSchemaCompatibilityCheck
> > > > BACKWARD,
> > > > > > > > FORWARD
> > > > > > > > > > > > these strategies. As with the AVRO implementation,
> > > protobuf
> > > > > > > > > > compatibility
> > > > > > > > > > > > checking need implementing the canRead method. *This
> > will
> > > > > check
> > > > > > > > that
> > > > > > > > > > > > the writtenschema can be read by readSchema.*
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > I completely disagree.
> > > > > > > > > > > Avro implementation is confusing for our use case.
> Don't
> > > copy
> > > > > > that.
> > > > > > > > > > >
> > > > > > > > > > > You have
> > > > > > > > > > >
> > > > > > > > > > > public void checkCompatible(SchemaData from, SchemaData
> > to,
> > > > > > > > > > > SchemaCompatibilityStrategy strategy)
> > > > > > > > > > >         throws IncompatibleSchemaException {
> > > > > > > > > > >     Descriptor fromDescriptor =
> > > > > > > > > > > ProtobufNativeSchemaUtils.deserialize(from.getData());
> > > > > > > > > > >     Descriptor toDescriptor =
> > > > > > > > > > > ProtobufNativeSchemaUtils.deserialize(to.getData());
> > > > > > > > > > >     switch (strategy) {
> > > > > > > > > > >         case BACKWARD_TRANSITIVE:
> > > > > > > > > > >         case BACKWARD:
> > > > > > > > > > >         case FORWARD_TRANSITIVE:
> > > > > > > > > > >         case FORWARD:
> > > > > > > > > > >         case FULL_TRANSITIVE:
> > > > > > > > > > >         case FULL:
> > > > > > > > > > >             checkRootMessageChange(fromDescriptor,
> > > > > toDescriptor,
> > > > > > > > > > strategy);
> > > > > > > > > > >             return;
> > > > > > > > > > >         case ALWAYS_COMPATIBLE:
> > > > > > > > > > >             return;
> > > > > > > > > > >         default:
> > > > > > > > > > >             throw new
> > IncompatibleSchemaException("Unknown
> > > > > > > > > > > SchemaCompatibilityStrategy.");
> > > > > > > > > > >     }
> > > > > > > > > > > }
> > > > > > > > > > >
> > > > > > > > > > > I would rename :
> > > > > > > > > > > from --> currentSchema
> > > > > > > > > > > to --> newSchema
> > > > > > > > > > >
> > > > > > > > > > > Use that switch case and have a method for each like:
> > > > > > > > > > > validateBackwardsCompatibility(currentSchema,
> newSchema)
> > > > > > > > > > >
> > > > > > > > > > > I dislike canRead and usage of writtenSchema, since you
> > > have
> > > > > two
> > > > > > > > > > completely
> > > > > > > > > > > different use cases: from the producing side and the
> > > consumer
> > > > > > side.
> > > > > > > > > > >
> > > > > > > > > > > schemaValidatorBuilder
> > > > > > > > > > > >
> > > > > > > > > > > > I dislike this proposal. IMO Avro implementation is
> way
> > > too
> > > > > > > > > > complicated.
> > > > > > > > > > > Why not have a simple function for validation for each
> > > switch
> > > > > > case
> > > > > > > > above?
> > > > > > > > > > > Why do we need strategy and builder, and all this
> > > complexity?
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > *Here are the basic compatibility rules we've defined:*
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > IMO it's impossible to read the validation rules as you
> > > > > described
> > > > > > > > them.
> > > > > > > > > > > I wrote how they should be structured numerous times
> > above.
> > > > > > > > > > > I can't validate them.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > IMO, the current design is very hard to read.
> > > > > > > > > > > Please try to avoid jumping into code sections.
> > > > > > > > > > > Write a high level design section, in which you
> describe
> > in
> > > > > words
> > > > > > > > what
> > > > > > > > > > you
> > > > > > > > > > > plan to do.
> > > > > > > > > > > Write the validation rules in the structure that is
> easy
> > to
> > > > > > > > understand:
> > > > > > > > > > > rules per each compatibility check, and use proper
> words
> > > > > (current
> > > > > > > > schema,
> > > > > > > > > > > new schema), since new schema can be once used for read
> > and
> > > > > once
> > > > > > > > used for
> > > > > > > > > > > write.
> > > > > > > > > > >
> > > > > > > > > > > In its current form it takes too much time to
> understand
> > > the
> > > > > > design,
> > > > > > > > and
> > > > > > > > > > it
> > > > > > > > > > > shouldn't be the case.
> > > > > > > > > > >
> > > > > > > > > > > Thanks,
> > > > > > > > > > >
> > > > > > > > > > > Asaf
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On Sun, Mar 5, 2023 at 3:58 PM SiNan Liu <
> > > > > liusinan1998@gmail.com
> > > > > > >
> > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Hi! I updated the explanation of some things in the
> PIP
> > > > > issue.
> > > > > > And
> > > > > > > > also
> > > > > > > > > > > > added a new “flag” in the conf is used as the
> different
> > > > > > > > > > > > ProtobufNativeSchemaValidator implementation, also
> set
> > > > > > > > > > > > ProtobufNativeSchemaValidator default only check
> > whether
> > > > the
> > > > > > name
> > > > > > > > of
> > > > > > > > > > the
> > > > > > > > > > > > root message is the same.
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks,
> > > > > > > > > > > > sinan
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Asaf Mesika <as...@gmail.com> 于2023年3月5日周日
> > > 20:21写道:
> > > > > > > > > > > >
> > > > > > > > > > > > > On Wed, Mar 1, 2023 at 4:33 PM SiNan Liu <
> > > > > > liusinan1998@gmail.com
> > > > > > > > >
> > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Can you please explain how a Protobuf Schema
> > > > descriptor
> > > > > > can
> > > > > > > > be
> > > > > > > > > > > > > validated
> > > > > > > > > > > > > > > for backward compatibility check using Avro
> based
> > > > > > > > compatibility
> > > > > > > > > > > > rules?
> > > > > > > > > > > > > > > Doesn't it expect the schema to be Avro, but it
> > is
> > > > > > actually a
> > > > > > > > > > > > Protobuf
> > > > > > > > > > > > > > > descriptor?
> > > > > > > > > > > > > > > Is there some translation happening?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > 1. *You can take a quick look at the previous
> > design,
> > > > the
> > > > > > > > PROTOBUF
> > > > > > > > > > uses
> > > > > > > > > > > > > > avro struct to store.*
> > > > > > > > > > > > > > https://github.com/apache/pulsar/pull/1954
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/ProtobufSchema.java#L59-L61
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/ProtobufSchema.java#L110-L115
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > Ok. So to summarize your code (easier to write it
> > than
> > > > send
> > > > > > > > links):
> > > > > > > > > > > > > * Pulsar Client, when used with Protobuf Schema,
> > > actually
> > > > > > > > converts
> > > > > > > > > > the
> > > > > > > > > > > > > Protobuf descriptor into an Avro Schema (using code
> > > found
> > > > > > inside
> > > > > > > > Avro
> > > > > > > > > > > > > library) and saves that Avro schema as the schema.
> > It's
> > > > not
> > > > > > > > saving
> > > > > > > > > > the
> > > > > > > > > > > > > protobuf descriptor at all. Very confusing I have
> to
> > > add
> > > > -
> > > > > > never
> > > > > > > > > > expected
> > > > > > > > > > > > > that.
> > > > > > > > > > > > > This explains why In the
> > > ProtobufSchemaCompatibilityCheck
> > > > > > they
> > > > > > > > just
> > > > > > > > > > > > extend
> > > > > > > > > > > > > the Avro without doing any translation.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thanks for that.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Now thatI finally understand this, I can say that:
> > you
> > > > > *must*
> > > > > > > > explain
> > > > > > > > > > > > that
> > > > > > > > > > > > > in the motivation part in your PIP.
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > 2. *On the broker side, protobuf and avro both
> use
> > > > > > `SchemaData`
> > > > > > > > > > > > converted
> > > > > > > > > > > > > > to `org.apache.avro.Schema`.*
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/ServerCnx.java#L1280-L1293
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/schema/ProtobufSchemaCompatibilityCheck.java#L26-L31
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/schema/AvroSchemaBasedCompatibilityCheck.java#L47-L70
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > Actually those links don't really help.
> > > > > > > > > > > > > The main link that helps is:
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/pulsar/blob/ec102fb024a6ea2b195826778300f20e330dff06/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/ProtobufSchema.java#L102-L122
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I'm sorry - I don't understand.
> > > > > > > > > > > > > > > I understand the different compatibility check
> > > > > > strategies.
> > > > > > > > > > > > > > > If you just spell them out here, then as you
> say,
> > > > just
> > > > > > > > translate
> > > > > > > > > > the
> > > > > > > > > > > > > > > Protobuf Descriptor into an Avro schema and run
> > the
> > > > > Avro
> > > > > > > > > > > > > > > compatibility validation, no?
> > > > > > > > > > > > > > > I believe the answer is no, since you may want
> to
> > > > > verify
> > > > > > > > > > different
> > > > > > > > > > > > > things
> > > > > > > > > > > > > > > when it comes to Protobuf, which are different
> > then
> > > > > Avro.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > 1.
> > > > > > > > > > > > > > *ProtobufSchema is different from
> > > ProtobufNativeSchema
> > > > in
> > > > > > that
> > > > > > > > it
> > > > > > > > > > uses
> > > > > > > > > > > > > > avro-protobuf.*
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://central.sonatype.com/artifact/org.apache.avro/avro-protobuf/1.11.1/overview
> > > > > > > > > > > > > > *ProtobufNativeSchema needs a native
> compatibility
> > > > check,
> > > > > > but
> > > > > > > > > > there is
> > > > > > > > > > > > no
> > > > > > > > > > > > > > official or third party implementation. So this
> PIP
> > > > does
> > > > > > not
> > > > > > > > use
> > > > > > > > > > > > > > avro-protobuf for protobuf compatibility
> checking.*
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > 2. *By the way, this is implemented in much the
> > same
> > > > way
> > > > > > that
> > > > > > > > > > Apache
> > > > > > > > > > > > avro
> > > > > > > > > > > > > > does compatibility checking.*
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/SchemaValidatorBuilder.java
> > > > > > > > > > > > > >
> > > > > `canReadStrategy`,`canBeReadStrategy`,`mutualReadStrategy`
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/ValidateCanRead.java
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/ValidateCanBeRead.java
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/ValidateMutualRead.java
> > > > > > > > > > > > > > *In `ValidateMutualRead.java`, the arguments of
> > > > > > `canRead()` are
> > > > > > > > > > > > > > writtenSchema and readSchema. We only need to
> > change
> > > > the
> > > > > > order
> > > > > > > > of
> > > > > > > > > > > > > arguments
> > > > > > > > > > > > > > we pass to `canRead()`.*
> > > > > > > > > > > > > > ```java
> > > > > > > > > > > > > > private void
> > > > validateWithStrategy(Descriptors.Descriptor
> > > > > > > > > > toValidate,
> > > > > > > > > > > > > > Descriptors.Descriptor fromDescriptor) throws
> > > > > > > > > > > > > ProtoBufCanReadCheckException
> > > > > > > > > > > > > > {
> > > > > > > > > > > > > > switch (strategy) {
> > > > > > > > > > > > > > case CanReadExistingStrategy ->
> > > canRead(fromDescriptor,
> > > > > > > > > > toValidate);
> > > > > > > > > > > > > > case CanBeReadByExistingStrategy ->
> > > canRead(toValidate,
> > > > > > > > > > > > fromDescriptor);
> > > > > > > > > > > > > > case CanBeReadMutualStrategy -> {
> > > > > > > > > > > > > > canRead(toValidate, fromDescriptor);
> > > > > > > > > > > > > > canRead(fromDescriptor, toValidate);
> > > > > > > > > > > > > > }
> > > > > > > > > > > > > > }
> > > > > > > > > > > > > > }
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > private void canRead(Descriptors.Descriptor
> > > > > writtenSchema,
> > > > > > > > > > > > > > Descriptors.Descriptor readSchema) throws
> > > > > > > > > > > > ProtoBufCanReadCheckException {
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> ProtobufNativeSchemaBreakCheckUtils.checkSchemaCompatibility(writtenSchema,
> > > > > > > > > > > > > > readSchema);
> > > > > > > > > > > > > > }
> > > > > > > > > > > > > > ```
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > I get that you want to take inspiration from the
> > > existing
> > > > > > Avro
> > > > > > > > Schema
> > > > > > > > > > > > > compatibility check, to do your code design.
> > > > > > > > > > > > > I also understand you *won't* use any existing avro
> > > code
> > > > > for
> > > > > > > > that.
> > > > > > > > > > > > > I also understand, you have to write the validation
> > > check
> > > > > on
> > > > > > your
> > > > > > > > > > own,
> > > > > > > > > > > > > since there is no 3rd party to explain that.
> > > > > > > > > > > > >
> > > > > > > > > > > > > The only thing I can't understand are the actual
> > rules
> > > > as I
> > > > > > wrote
> > > > > > > > > > before,
> > > > > > > > > > > > > since they are written confusingly.
> > > > > > > > > > > > > So, I repeat what I asked before:
> > > > > > > > > > > > >
> > > > > > > > > > > > > I think you should structure the validation rules
> > > > > > differently:
> > > > > > > > > > > > >
> > > > > > > > > > > > > * Backward checks
> > > > > > > > > > > > > ** List down rules, where use newSchema (the schema
> > > used
> > > > by
> > > > > > > > producer
> > > > > > > > > > or
> > > > > > > > > > > > > consumer) and existingSchema (last schema used)
> > > > > > > > > > > > > * Forward
> > > > > > > > > > > > > ** List down rules, where use newSchema (the schema
> > > used
> > > > by
> > > > > > > > producer
> > > > > > > > > > or
> > > > > > > > > > > > > consumer) and existingSchema (last schema used)
> > > > > > > > > > > > >
> > > > > > > > > > > > > Once that's accomplished I will be able to
> understand
> > > the
> > > > > > > > different
> > > > > > > > > > > > > validation rules for each compatibility check.
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > sinan
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Asaf Mesika <as...@gmail.com> 于2023年3月1日周三
> > > > > 21:19写道:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Mon, Feb 27, 2023 at 3:47 PM SiNan Liu <
> > > > > > > > > > liusinan1998@gmail.com>
> > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > I read it and they look identical. What's
> the
> > > > > > difference
> > > > > > > > > > between
> > > > > > > > > > > > > > them?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Current avro,json, and protobuf schemas are
> all
> > > > > > implemented
> > > > > > > > > > based
> > > > > > > > > > > > on
> > > > > > > > > > > > > > AVRO.
> > > > > > > > > > > > > > > > > What do you mean, they are all implemented
> > > based
> > > > on
> > > > > > > > Avro? You
> > > > > > > > > > > > mean
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > protobuf schema is converted into an Avro
> > > Schema,
> > > > > and
> > > > > > > > then
> > > > > > > > > > you
> > > > > > > > > > > > use
> > > > > > > > > > > > > > Avro
> > > > > > > > > > > > > > > > > compatibility validation?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> `org.apache.pulsar.broker.service.schema.ProtobufSchemaCompatibilityCheck`
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > >
> > > > > >
> > > `org.apache.pulsar.broker.service.schema.AvroSchemaCompatibilityCheck`
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > >
> > > > > >
> > > `org.apache.pulsar.broker.service.schema.JsonSchemaCompatibilityCheck`
> > > > > > > > > > > > > > > > They all extends
> > > > `AvroSchemaBasedCompatibilityCheck`,
> > > > > > the
> > > > > > > > > > > > > > > > `checkCompatible()` is the same
> implementation
> > > with
> > > > > > `AVRO`.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Can you please explain how a Protobuf Schema
> > > > descriptor
> > > > > > can
> > > > > > > > be
> > > > > > > > > > > > > validated
> > > > > > > > > > > > > > > for backward compatibility check using Avro
> based
> > > > > > > > compatibility
> > > > > > > > > > > > rules?
> > > > > > > > > > > > > > > Doesn't it expect the schema to be Avro, but it
> > is
> > > > > > actually a
> > > > > > > > > > > > Protobuf
> > > > > > > > > > > > > > > descriptor?
> > > > > > > > > > > > > > > Is there some translation happening?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > I think you should structure the validation
> > rules
> > > > > > > > differently:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > The Compatibility check strategy is described
> > on
> > > > the
> > > > > > > > website
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://pulsar.apache.org/docs/next/schema-understand/#schema-compatibility-check-strategy
> > > > > > > > > > > > > > > > 1. BACKWARD(CanReadExistingStrategy):
> Consumers
> > > > using
> > > > > > > > schema
> > > > > > > > > > V3 can
> > > > > > > > > > > > > > process
> > > > > > > > > > > > > > > > data written by producers using the last
> schema
> > > > > version
> > > > > > > > V2. So
> > > > > > > > > > V2
> > > > > > > > > > > > is
> > > > > > > > > > > > > > > > "writtenSchema" and V3 is "readSchema".
> > > > > > > > > > > > > > > > 2. FORWARD(CanBeReadByExistingStrategy):
> > > Consumers
> > > > > > using
> > > > > > > > the
> > > > > > > > > > last
> > > > > > > > > > > > > > schema
> > > > > > > > > > > > > > > > version V2 can process data written by
> > producers
> > > > > using
> > > > > > a
> > > > > > > > new
> > > > > > > > > > schema
> > > > > > > > > > > > > V3,
> > > > > > > > > > > > > > > > even though they may not be able to use the
> > full
> > > > > > > > capabilities
> > > > > > > > > > of
> > > > > > > > > > > > the
> > > > > > > > > > > > > > new
> > > > > > > > > > > > > > > > schema. So V3 is "writtenSchema" and V2 is
> > > > > > "readSchema".
> > > > > > > > > > > > > > > > 3. FULL(CanBeReadMutualStrategy): Schemas are
> > > both
> > > > > > > > backward and
> > > > > > > > > > > > > forward
> > > > > > > > > > > > > > > > compatible.
> > > > > > > > > > > > > > > > Schema can evolve. The old version schema and
> > the
> > > > new
> > > > > > > > version
> > > > > > > > > > > > schema
> > > > > > > > > > > > > > should
> > > > > > > > > > > > > > > > be well understood.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I'm sorry - I don't understand.
> > > > > > > > > > > > > > > I understand the different compatibility check
> > > > > > strategies.
> > > > > > > > > > > > > > > If you just spell them out here, then as you
> say,
> > > > just
> > > > > > > > translate
> > > > > > > > > > the
> > > > > > > > > > > > > > > Protobuf Descriptor into an Avro schema and run
> > the
> > > > > Avro
> > > > > > > > > > > > > > > compatibility validation, no?
> > > > > > > > > > > > > > > I believe the answer is no, since you may want
> to
> > > > > verify
> > > > > > > > > > different
> > > > > > > > > > > > > things
> > > > > > > > > > > > > > > when it comes to Protobuf, which are different
> > then
> > > > > Avro.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > At the current state, I can't understand your
> > > design
> > > > at
> > > > > > all.
> > > > > > > > > > Please
> > > > > > > > > > > > > help
> > > > > > > > > > > > > > > clarify that.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > So each strategy should have its own section.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > The arguments of `canRead()` are
> writtenSchema
> > > and
> > > > > > > > readSchema.
> > > > > > > > > > As
> > > > > > > > > > > > > we've
> > > > > > > > > > > > > > > > just described, we only need to change the
> > order
> > > of
> > > > > > > > arguments
> > > > > > > > > > we
> > > > > > > > > > > > pass
> > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > `canRead()`.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > > sinan
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Asaf Mesika <as...@gmail.com>
> > > 于2023年2月27日周一
> > > > > > 20:49写道:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > And you can see the difference between
> > > ProtoBuf
> > > > > and
> > > > > > > > > > > > > ProtoBufNative:
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > >
> > > > > >
> > > https://pulsar.apache.org/docs/next/schema-get-started/#protobufnative
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > https://pulsar.apache.org/docs/next/schema-get-started/#protobuf
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >  I read it and they look identical. What's
> > the
> > > > > > difference
> > > > > > > > > > between
> > > > > > > > > > > > > > them?
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Current avro,json, and protobuf schemas are
> > all
> > > > > > > > implemented
> > > > > > > > > > based
> > > > > > > > > > > > > on
> > > > > > > > > > > > > > > > AVRO.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > What do you mean, they are all implemented
> > > based
> > > > on
> > > > > > > > Avro? You
> > > > > > > > > > > > mean
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > protobuf schema is converted into an Avro
> > > Schema,
> > > > > and
> > > > > > > > then
> > > > > > > > > > you
> > > > > > > > > > > > use
> > > > > > > > > > > > > > Avro
> > > > > > > > > > > > > > > > > compatibility validation?
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > *Here are the basic compatibility rules
> > we've
> > > > > > defined:*
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > I think you should structure the validation
> > > rules
> > > > > > > > > > differently:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > * Backward checks
> > > > > > > > > > > > > > > > > ** List down rules, where use newSchema
> (the
> > > > schema
> > > > > > used
> > > > > > > > by
> > > > > > > > > > > > > producer
> > > > > > > > > > > > > > or
> > > > > > > > > > > > > > > > > consumer) and existingSchema (last schema
> > used)
> > > > > > > > > > > > > > > > > * Forward
> > > > > > > > > > > > > > > > > ** List down rules, where use newSchema
> (the
> > > > schema
> > > > > > used
> > > > > > > > by
> > > > > > > > > > > > > producer
> > > > > > > > > > > > > > or
> > > > > > > > > > > > > > > > > consumer) and existingSchema (last schema
> > used)
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > So each strategy should have its own
> section.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > I'm saying this since you used
> > "writttenSchema"
> > > > > word
> > > > > > but
> > > > > > > > it
> > > > > > > > > > > > > > represents
> > > > > > > > > > > > > > > > > something completely different if it's
> > backward
> > > > or
> > > > > > > > forward
> > > > > > > > > > check.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Once you'll have that structure like that,
> I
> > > > > > personally
> > > > > > > > will
> > > > > > > > > > be
> > > > > > > > > > > > > able
> > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > read and understand it.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > The motivation and problem statement are
> now
> > > > good -
> > > > > > > > thanks
> > > > > > > > > > for
> > > > > > > > > > > > > > improving
> > > > > > > > > > > > > > > > > it.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > On Mon, Feb 27, 2023 at 8:20 AM SiNan Liu <
> > > > > > > > > > > > liusinan1998@gmail.com>
> > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Hi! I updated the PIP issue again. This
> > time
> > > > I've
> > > > > > added
> > > > > > > > > > some
> > > > > > > > > > > > > > background
> > > > > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > > some explanations.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > The compatibility check rules are already
> > > > written
> > > > > > in
> > > > > > > > the
> > > > > > > > > > > > > > > > Implementation.
> > > > > > > > > > > > > > > > > > ProtoBufNative implements the same
> canRead
> > > > method
> > > > > > as
> > > > > > > > Apache
> > > > > > > > > > > > Avro.
> > > > > > > > > > > > > > > > > > It does this by checking whether the
> schema
> > > for
> > > > > > > > writing and
> > > > > > > > > > > > > reading
> > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > > compatible. I also indicate whether the
> > > > > > writtenSchema
> > > > > > > > and
> > > > > > > > > > > > > > readSchema of
> > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > Backward, Forward, and Full strategies
> are
> > > the
> > > > > old
> > > > > > or
> > > > > > > > the
> > > > > > > > > > new
> > > > > > > > > > > > > > version
> > > > > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > > > the schema.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > > > > sinan
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Asaf Mesika <as...@gmail.com>
> > > > > 于2023年2月26日周日
> > > > > > > > 23:24写道:
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > I'm sorry, but this PIP lacks a lot of
> > > > > background
> > > > > > > > > > knowledge,
> > > > > > > > > > > > so
> > > > > > > > > > > > > > you
> > > > > > > > > > > > > > > > > need
> > > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > add IMO for people to understand it.
> You
> > > > don't
> > > > > > need
> > > > > > > > to
> > > > > > > > > > > > explain
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > entire
> > > > > > > > > > > > > > > > > > > pulsar in this PIP, but at the very
> > least a
> > > > few
> > > > > > > > > > paragraphs
> > > > > > > > > > > > > > detailing
> > > > > > > > > > > > > > > > > all
> > > > > > > > > > > > > > > > > > > you need to know, to put you in
> context:
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >    - Start by saying Pulsar as a
> built-in
> > > > > schema
> > > > > > > > registry
> > > > > > > > > > > > > inside
> > > > > > > > > > > > > > > > Pulsar
> > > > > > > > > > > > > > > > > > >    broker.
> > > > > > > > > > > > > > > > > > >       - Every time the client updates
> the
> > > > > > schema, it
> > > > > > > > > > uploads
> > > > > > > > > > > > it
> > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > >       broker. When that happens, it
> has a
> > > > > feature
> > > > > > > > which
> > > > > > > > > > > > > validates
> > > > > > > > > > > > > > if
> > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > new
> > > > > > > > > > > > > > > > > > >       schema version is compatible with
> > the
> > > > > > previous
> > > > > > > > > > > > versions.
> > > > > > > > > > > > > > There
> > > > > > > > > > > > > > > > > > > are 4 types
> > > > > > > > > > > > > > > > > > >       of compatibility: Full, ...
> > (complete
> > > > and
> > > > > > > > explain
> > > > > > > > > > each
> > > > > > > > > > > > > one
> > > > > > > > > > > > > > > > > briefly)
> > > > > > > > > > > > > > > > > > >    - Also explain Pulsar Schema
> registry
> > > > > supports
> > > > > > > > various
> > > > > > > > > > > > > schema
> > > > > > > > > > > > > > > > > > >    protocols:  Avro, protobuf native,
> ...
> > > > > > (complete
> > > > > > > > the
> > > > > > > > > > > > rest),
> > > > > > > > > > > > > > each
> > > > > > > > > > > > > > > > > > > protocol
> > > > > > > > > > > > > > > > > > >    has a schema which dictates how to
> > > > serialize
> > > > > > and
> > > > > > > > > > > > deserialize
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > message
> > > > > > > > > > > > > > > > > > >    content into typed object.
> > > > > > > > > > > > > > > > > > >    - Explain in short what is protobuf
> > > native
> > > > > > > > (compare
> > > > > > > > > > > > protobuf
> > > > > > > > > > > > > > > > > > non-native)
> > > > > > > > > > > > > > > > > > >    - Please don't paste code instead of
> > > > > > explaining.
> > > > > > > > > > > > > > > > > > >       - Explain that protobuf native
> > > current
> > > > > > > > validation
> > > > > > > > > > check
> > > > > > > > > > > > > is
> > > > > > > > > > > > > > only
> > > > > > > > > > > > > > > > > > >       composed of checking the root
> > message
> > > > > name
> > > > > > is
> > > > > > > > the
> > > > > > > > > > same
> > > > > > > > > > > > > > between
> > > > > > > > > > > > > > > > > > > the current
> > > > > > > > > > > > > > > > > > >       schema version and the new
> version.
> > > > > > > > > > > > > > > > > > >          - Explain briefly what is a
> root
> > > > > message
> > > > > > > > and its
> > > > > > > > > > > > name.
> > > > > > > > > > > > > > > > > > >       - Explain the problem (list
> > > scenarios)
> > > > > > that we
> > > > > > > > have
> > > > > > > > > > > > > because
> > > > > > > > > > > > > > > > > > protobuf
> > > > > > > > > > > > > > > > > > >       native schema only supports FULL
> > > > > > compatibility
> > > > > > > > > > > > > validation.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Regarding high level design - as in
> what
> > > you
> > > > > > plan to
> > > > > > > > do.
> > > > > > > > > > > > > > > > > > > I suggest you add "High Level Design"
> and
> > > in
> > > > it
> > > > > > > > detail
> > > > > > > > > > how
> > > > > > > > > > > > you
> > > > > > > > > > > > > > plan
> > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > validate, per protobuf version, per
> > > > > compatibility
> > > > > > > > check
> > > > > > > > > > > > > > (backward,
> > > > > > > > > > > > > > > > > > forward,
> > > > > > > > > > > > > > > > > > > full,...).
> > > > > > > > > > > > > > > > > > > I tried reading the implementation -
> for
> > > me ,
> > > > > > it's
> > > > > > > > all
> > > > > > > > > > over
> > > > > > > > > > > > the
> > > > > > > > > > > > > > > > place.
> > > > > > > > > > > > > > > > > > Can
> > > > > > > > > > > > > > > > > > > you please list in order what I wrote
> > > above,
> > > > > and
> > > > > > > > list the
> > > > > > > > > > > > > > validation
> > > > > > > > > > > > > > > > > > rules
> > > > > > > > > > > > > > > > > > > with a good explanation why you
> validate
> > it
> > > > > like
> > > > > > > > that?
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Lastly, one you have all the validation
> > > rules
> > > > > > clearly
> > > > > > > > > > stated,
> > > > > > > > > > > > > you
> > > > > > > > > > > > > > can
> > > > > > > > > > > > > > > > > use
> > > > > > > > > > > > > > > > > > > it to document it properly so users can
> > > know
> > > > > what
> > > > > > > > > > validation
> > > > > > > > > > > > to
> > > > > > > > > > > > > > > > expect.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Asaf
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > On Wed, Feb 22, 2023 at 5:10 PM SiNan
> > Liu <
> > > > > > > > > > > > > > liusinan1998@gmail.com>
> > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Sorry, my mistake. I removed the code
> > and
> > > > > > > > described the
> > > > > > > > > > > > > design
> > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > improve
> > > > > > > > > > > > > > > > > > > > the PROTOBUF_NATIVE schema
> > compatibility
> > > > > > checks.
> > > > > > > > You
> > > > > > > > > > can
> > > > > > > > > > > > > have a
> > > > > > > > > > > > > > > > look.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Asaf Mesika <as...@gmail.com>
> > > > > > 于2023年2月22日周三
> > > > > > > > > > 21:16写道:
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > I read it but you're almost
> directly
> > > > diving
> > > > > > into
> > > > > > > > the
> > > > > > > > > > > > code -
> > > > > > > > > > > > > > it
> > > > > > > > > > > > > > > > will
> > > > > > > > > > > > > > > > > > > take
> > > > > > > > > > > > > > > > > > > > me
> > > > > > > > > > > > > > > > > > > > > hours just to reverse engineer your
> > > > design.
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Can you please include a "High
> Level
> > > > > Design"
> > > > > > > > section
> > > > > > > > > > in
> > > > > > > > > > > > > which
> > > > > > > > > > > > > > you
> > > > > > > > > > > > > > > > > > > explain
> > > > > > > > > > > > > > > > > > > > > how you plan to tackle any issue?
> > > > > > > > > > > > > > > > > > > > > If I can read that section and
> > explain
> > > to
> > > > > > someone
> > > > > > > > > > else
> > > > > > > > > > > > how
> > > > > > > > > > > > > > this
> > > > > > > > > > > > > > > > > will
> > > > > > > > > > > > > > > > > > > > work,
> > > > > > > > > > > > > > > > > > > > > it means the section is complete.
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Let's leave the code to the PRs.
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > On Sun, Feb 19, 2023 at 2:59 PM
> SiNan
> > > > Liu <
> > > > > > > > > > > > > > > > liusinan1998@gmail.com>
> > > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > Hi all,
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > I made a PIP to discuss:
> > > > > > > > > > > > > > > > > > >
> > > > https://github.com/apache/pulsar/issues/19565
> > > > > > > > > > > > > > > > > > > > .
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > We can talk about the current
> > design
> > > > > here.
> > > > > > > > > > Especially
> > > > > > > > > > > > for
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > field
> > > > > > > > > > > > > > > > > > > > type
> > > > > > > > > > > > > > > > > > > > > > change check rules, please give
> > your
> > > > > > valuable
> > > > > > > > > > advice.
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > > > > > > > > Sinan
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] PIP-246: Improved PROTOBUF_NATIVE schema compatibility checks without using avro-protobuf

Posted by Asaf Mesika <as...@gmail.com>.
Also:

(2) PROTOBUF_NATIVE was designed so that not use avro-protobuf for protobuf
> schema compatibility checking.

The root message name is the class name we pass in when we create the
> producer or consumer. ProtoBuf has many nested messages or dependencies.
> The current implementation only checks if the passed class name is the
> same. It does not check if the fields in the file change in a way that is
> compatible with older versions of the schema.


You missed the most important thing there: PROTOBUF_NATIVE uses Protobuf
Descriptor when persisting the schema. It's not using Avro Schema
definition. As I wrote you in previous email:

PROTOBUF_NATIVE was created to fix that shortcoming, by actually persisting
> the Protobuf Descriptor and using Protobuf for encoding.
>





On Wed, Mar 15, 2023 at 5:46 PM SiNan Liu <li...@gmail.com> wrote:

> 1.
>
> > > Why? The root message name is not written over the wire to the best of
> my
> > > knowledge. I haven't found it written in the official doc.
>
>
> The name of the root message check is the rules in the previous
> `ProtobufNativeSchemaCompatibilityCheck`. Because if the root message has a
> different name, there is no need to check its contents. "Same" schema,
> their names must be the same.
>
>
> 2.
>
> > >The writtenSchema can not change the field number of any field in
> > readSchema (the > field name is the same, but the field number is
> > different).
> > >You have to take into account field type as well when comparing.
>
>
> The first sentence on the website says that the number of fields cannot be
> changed.
>
> > Don’t change the field numbers for any existing fields.
>
>
>
> 3.
>
> > > - The writtenSchema cannot add required fields, but optional or
> > > duplicate fields can be added (The field number must be new).
> > >
> > > That's not true.
> > You can have a required field in writeSchema, and not have that field in
> > readSchema (based on tag number).
>
>
> The required field must exist. In your case, where does readSchema go to
> read its required fields? It's not in writtenSchema.
> The second and third sentences in the official website say, add and delete
> do not operate required field!
>
>
> 4.
>
> > (4) The writtenSchema can not change the field name of any field in
> > > readSchema (the field number is the same, but the field name is
> > > different).
> > This is incorrect.
> > Fields names are encoded into the wire. I don't see this in any best
> > practice.
>
>
> The third sentence on the website:
>
> > You may want to rename the field instead, perhaps adding the prefix
> > “OBSOLETE_”, or make the field number reserved, so that future users of
> > your .proto can’t accidentally reuse the number.
>
> If you want to rename a field, or add a new field. To delete with the new
> number!
>
>
> 5.
>
> > The writtenSchema does not change the field name and number, but it does
> > change the field type.
> > > Small correction: for the same field number you are not allowed to
> change
> > types. Name is irrelevant.
>
>
> Why doesn't the name irrelevant?
> Here is the change in type, which is the rule stated in Alternatives. There
> is no check here, just a warning to the user.
> Another change is that the name of enum is changed, or the name of MESSAGE
> is changed, which is the same as the root message name check in 1, is the
> change still the same? This is not allowed to change!
>
> *Here is example:*
> readSchema(
>
> https://github.com/apache/pulsar/pull/19566/files#diff-a7006d73502e6064a80af02822f3a3072be498d8b677c4b838b0dafaea32dea4
> )
> writtenSchema(
>
> https://github.com/apache/pulsar/pull/19566/files#diff-e3e7543624edaf1e0a4fd47947a2cad6e4b816b93843f71a367042ba6c3ec53f
> )
>
>
> 6.
>
> > (6) The writtenSchema removes fields that do not have default values in
> > > readSchema. Then the schema is incompatible.
> > Protobuf gives you its own default if you don't supply one. This is
> > incorrect.
>
>
> (1) This rule only applies if proto2 does not set the default value. If
> proto3 does not check, the default value will always be there.
>
> (2) In PIP issue:
>
> > Proto3 canceled the required field, so there is no need to check the
> > required field. We get the syntax(proto2 or proto3) of proto in the code,
> > and skip the check of the required field if it is proto3. All other
> > checking rules also apply to proto3.
>
>
> *I made a mistake here. This default value check is not need in proto3. I
> will modify the rules later according to your suggestion.*
>
> > I would remove the proto2/proto3 sections, since they only differ in 1
> > rule, and just mention that distinction inside that rule (less work for
> the
> > reade).
>
>
> (3) And add rules that look like they should be checked:
>
> > Rules that you don't have in the doc, but should IMO*
> > .......
>
> There can't be a field in writerSchema, that exists in readerSchema (tag
> > number based), which in writerSchema its type is scalar, but in
> readSchema
> > its type is scalar, it's repeated but with packed=true.
>
>
> But I don't think rule number three needs to be added.
>
>
> Thanks,
> sinan
>
>
>
> Asaf Mesika <as...@gmail.com> 于2023年3月14日周二 22:33写道:
>
> > Hi Sinan,
> >
> > The doc looks much better!
> >
> > I have a few additional comments:
> >
> > Pasting comment from previous emails:
> >
> > Can you convert the code block which is actually a quote in the
> > beginning of the PIP to something which doesn't require to scroll
> > horizontally so much?
> > Use
> >
> >
> https://docs.github.com/en/get-started/writing-on-github/getting-started-with-writing-and-formatting-on-github/basic-writing-and-formatting-syntax#quoting-text
> >
> > *Validation Rules*
> >
> > (1) If the root message names of writtenSchema and readSchema are
> > > different, then incompatible.
> >
> > Why? The root message name is not written over the wire to the best of my
> > knowledge. I haven't found it written in the official doc.
> >
> >
> > >    - The writtenSchema cannot add required fields, but optional or
> > >    duplicate fields can be added (The field number must be new).
> > >
> > > That's not true.
> > You can have a required field in writeSchema, and not have that field in
> > readSchema (based on tag number).
> >
> > The writtenSchema can not change the field number of any field in
> > readSchema (the
> > > field name is the same, but the field number is different).
> >
> > You have to take into account field type as well when comparing.
> >
> > If I have in readSchema
> > int32 justAnID = 1
> > int32 customerId = 2
> >
> > and in writeSchema I have
> > int32 justAnID = 1
> > string customerId = 3
> >
> > This is valid.
> >
> > (4) The writtenSchema can not change the field name of any field in
> > > readSchema (the field number is the same, but the field name is
> > > different).
> >
> > This is incorrect.
> > Fields names are encoded into the wire. I don't see this in any best
> > practice.
> >
> > ) The writtenSchema does not change the field name and number, but it
> does
> > > change the field type.
> > >
> > >    - If the field type is ENUM or MESSAGE, the schema is not compatible
> > >    when the type name is changed
> > >    - If the type of the field is another type. The schemas under this
> > >    rule are not incompatible, but warn the user.(There is another way
> of
> > >    testing in PIP issue Alternatives)
> > >
> > > Small correction: for the same field number you are not allowed to
> change
> > types. Name is irrelevant.
> >
> > (6) The writtenSchema removes fields that do not have default values in
> > > readSchema. Then the schema is incompatible.
> >
> > Protobuf gives you its own default if you don't supply one. This is
> > incorrect.
> >
> >
> > *Rules that you don't have in the doc, but should IMO*
> > * There can not be a field which exists both in readSchema and
> writeSchema,
> > with same tag number, having different default values
> > * There can't be a field in writerSchema, that exists in readerSchema
> (tag
> > number based), which in writerSchema is repeated and its type is Scalar (
> > https://protobuf.dev/programming-guides/proto/#scalar) but in readSchema
> > it
> > is not repeated anymore.
> > * There can't be a field in writerSchema, that exists in readerSchema
> (tag
> > number based), which in writerSchema its type is scalar, but in
> readSchema
> > its type is scalar, it's repeated but with packed=true.
> >
> > *Rules you have , but I would phrase a bit differently*
> >
> > I would remove the proto2/proto3 sections, since they only differ in 1
> > rule, and just mention that distinction inside that rule (less work for
> the
> > reade).
> >
> > * readSchema has a field which doesn't exist in writerSchema (based on
> tag
> > number).
> >    * Proto v2:
> >       * That field must be `optional` or `repeated` (must not be
> > `required`)
> >    * Proto v3:
> >       * No problem.
> > * There can not be a field which exists both in readSchema and
> writeSchema,
> > with the same tag number, but having different types.
> >
> > *Motivation*
> >
> > Basically in the motivation section you want people to understand the
> > following:
> >
> > Pulsar has built-in support for typed messages. It allows specifying an
> > encoding scheme and its matching schema.
> > For example, it supports Avro. You specify a schema for a given topic,
> > using Avro Schema Definition (i.e. a JSON describing the schema).
> Everytime
> > you produce a message, you first declare the schema definition you wish
> to
> > use for your messages. The message data should be an avro-encoded binary
> > data (which the client in some SDKs helps encode a given
> > data-structure/object).
> > The same applies when you consume a message. You first specify the schema
> > you use to read the messages, and the client in some SDKs helps by
> decoding
> > the message binary data into an object/data-structure.
> >
> > Each time you specify a schema to be used, either by a producer or a
> > consumer, the schema is persisted in Pulsar and given an increasing
> version
> > number. If the schema was the same as the previous version, it is not
> > saved. When the message is persisted, the version number is encoded in
> the
> > message headers.
> >
> > Pulsar provides a very useful feature named Schema Evolution
> > <
> https://pulsar.apache.org/docs/2.11.x/schema-understand/#schema-evolution
> > >.
> > It allows us to check if a new schema version is compatible with previous
> > versions or versions. When you configure the schema for the topic you
> > decide the strategy to use for doing the validation check. The strategies
> > validate the following:
> >
> >    - BACKWARD strategy
> >       - A consumer with newSchema can read a message written using
> >       existingSchema
> >    - BACKWARD_TRANSITIVE strategy
> >       - A consumer with newSchema can read messages written using all
> >       existingSchema
> >    - FORWARD
> >       - A consumer with existingSchema can read messages written using
> >       newSchema
> >    - FORWARD_TRANSITIVE
> >       - A consumer defined with any of the existingSchema can read
> messages
> >       written using newSchema
> >    - FULL
> >       - A consumer defined with newSchema can read messages written using
> >       existingSchema
> >       - A consumer defined with existingSchema can read messages written
> >       using newSchema
> >    - FULL_TRANSITIVE
> >       - A consumer defined with newSchema can read messages written using
> >       any of the existingSchema
> >       - A consumer defined with any of the existingSchema can read
> messages
> >       written using newSchema
> >
> >
> > Aside from Avro, Pulsar also has two additional supported encodings:
> > PROTOBUF and PROTOBUF_NATIVE.
> >
> > PROTOBUF is a bit strange. It encodes the messages using Protobuf
> encoding,
> > but the schema that is persisted to Pulsar is *not* Protobuf Descriptor
> as
> > you would have expected. The saved schema is a translation of the
> Protobuf
> > Descriptor to an Avro Schema, so in fact an Avro schema definition is
> saved
> > as the schema.
> >
> > PROTOBUF_NATIVE was created to fix that shortcoming, by actually
> persisting
> > the Protobuf Descriptor and using Protobuf for encoding.
> > The problem is that the authors of PROTOBUF_NATIVE haven't completed it
> > fully, and the backward compatibility validation code almost does not
> > exist: It only checks if the root message name is the same between
> > versions.
> >
> > GOALS
> > The goal of this PIP is to amend PROTOBUF_NATIVE by adding a fully
> > functional validation for any of the defined Schema Compatibility
> > Strategies.
> > A secondary goal is to allow the user to choose between different
> > implementations: The new fully functional validation or the existing
> > barebones validation.
> >
> > -------- END
> >
> > I'm ok with having links in the Motivation , as *further reading*.
> > I'm against stacking up work for the reader to go read 5-6 different
> links
> > just to understand the motivation and background knowledge required to
> > understand the feature.
> >
> > I'm against putting code in the Motivation. Especially if it is supposed
> to
> > replace description in plain English making it easy to understand the
> > design.
> > Leave the code to the motivation.
> > Paste code only if you absolutely can't use plain old descriptions to
> > explain.
> >
> >
> >
> > On Sat, Mar 11, 2023 at 11:46 AM SiNan Liu <li...@gmail.com>
> wrote:
> >
> > > *I guess that's right, too! *
> > >
> > > But the name `ProtobufNativeAdvancedSchemaCompatibilityCheck` is
> better,
> > > because we don't know whether the future will have V2, V3. The official
> > > solution can be called
> `ProtobufNativeOfficialSchemaCompatibilityCheck`,
> > or
> > > is a good `ProtobufNativeXXXXXXXXSchemaCompatibilityCheck` third-party
> > > solution.
> > >
> > > I've updated my design in PIP issue.
> > > 1. A new ProtobufNativeSchemaAdvanceCompatibilityCheck, rather than a
> > > ProtobufNativeSchemaCompatibilityCheck different validator
> > implementation.
> > > 2. Remove the 'builder'
> > > 3. Clarify the relationship between newSchema, existingSchema, and
> > > writtenSchema in canRead.
> > >
> > > Help to see if the description is comprehensive and what changes and
> > > improvements need to be made.
> > >
> > > Thanks,
> > > sinan
> > >
> > >
> > >
> > > Asaf Mesika <as...@gmail.com> 于2023年3月9日周四 17:35写道:
> > >
> > > > I like Bo's suggestion - I haven't realized each schema type
> > > > compatibility check is actually a plugin.
> > > > It makes sense for any schema type checks to evolve, sometimes in a
> > > > non-backward compatible way hence having two plugins like
> > > > protobufNativeSchemaCompatabilityCheckV1 and then
> > > > protobufNativeSchemaCompatabilityCheckV2 and then
> > > > protobufNativeSchemaCompatabilityCheckV3 makes sense to me.
> > > >
> > > >
> > > >
> > > > On Thu, Mar 9, 2023 at 5:49 AM 丛搏 <bo...@apache.org> wrote:
> > > >
> > > > >  Hi siNan:
> > > > >
> > > > > From my point of view, it is just a plug-in. I don't think it is
> > > > > necessary to add configuration for the plugin.
> > > > > This is meaningless, and it will increase the difficulty of use for
> > > > users.
> > > > >
> > > > >
> > > > > SiNan Liu <li...@gmail.com> 于2023年3月8日周三 15:54写道:
> > > > > >
> > > > > > Hi, bo.
> > > > > >
> > > > > > 1. I understand what you say, to develop a new
> > > > > > `ProtobufNativeAdvancedSchemaCompatibilityCheck`, rather than
> > > changing
> > > > > > existing `ProtobufNativeSchemaCompatibilityCheck`. But I found a
> > few
> > > > > small
> > > > > > problems:
> > > > > >
> > > > > > (1)ProtobufNativeAdvancedSchemaCompatibilityCheck and
> > > > > > ProtobufNativeSchemaCompatibilityCheck schemaType is
> > PROTOBUF_NATIVE.
> > > > It
> > > > > > looks like both checkers are PROTOBUF not using AVRO-PROTOBUF's
> > > > "native"
> > > > > > implementation, which leads to some problems or "unreasonable"
> and
> > > > gives
> > > > > me
> > > > > > some extended thinking and questions.
> > > > > >
> > > > > `CompatibilityCheck ` its only a plugin.
> > > > > `ProtobufNativeSchemaCompatibilityCheck` may sooner or later leave
> > the
> > > > > stage, when `ProtobufNativeAdvancedSchemaCompatibilityCheck` is
> > > > > stable, we can make it the default Checker.
> > > > >
> > > > > It is just a plug-in, users can change it at will and ensure that
> it
> > > > > is used correctly
> > > > > > (2)In broker.conf
> > > > > >
> > > > > > `schemaRegistryCompatibilityCheckers`. If
> > > > > > ProtobufNativeSchemaCompatibilityCheck and
> > > > > > ProtobufNativeAdvancedSchemaCompatibilityCheck all set. This is
> > going
> > > > to
> > > > > > overwrite each other. Because this is a map:
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/pulsar/blob/af1360fb167c1f9484fda5771df3ea9b21d1440b/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/schema/SchemaRegistryService.java#L36-L44
> > > > > >
> > > > > > ```java
> > > > > >
> > > > > > Map<SchemaType, SchemaCompatibilityCheck> checkers = new
> > HashMap<>();
> > > > > >
> > > > > > for (String className : checkerClasses) {
> > > > > >
> > > > > > SchemaCompatibilityCheck schemaCompatibilityCheck =
> > > > > > Reflections.createInstance(className,
> > > > > >
> > > > > > SchemaCompatibilityCheck.class,
> > > > > > Thread.currentThread().getContextClassLoader());
> > > > > >
> > > > > > checkers.put(schemaCompatibilityCheck.getSchemaType(),
> > > > > > schemaCompatibilityCheck);
> > > > > >
> > > > > > ```
> > > > > >
> > > > > > Is this a big problem or a small one? Is it possible or
> > unnecessary?
> > > > > Maybe
> > > > > > we can write in the documentation that protobufNative checkers
> can
> > > only
> > > > > > choose one of the two? Why are there two Checkers for different
> > > > > > implementations of the same schemaType? Why not the checker to
> > create
> > > > > > different validator, so we don not have to change
> > > > > > schemaRegistryCompatibilityCheckers.
> > > > >
> > > > > users can only use one, not two, which will bring complexity to
> users
> > > > >
> > > > > >
> > > > > > (3)And after the update to
> > > > > ProtobufNativeAdvancedSchemaCompatibilityCheck.
> > > > > > Existing topics previously only checked the name of the root
> > message,
> > > > not
> > > > > > the content of protobuf.
> > > > > >
> > > > > > What if the user wants both Checkers?
> > > > > >
> > > > > > Set to ProtobufNativeAdvancedSchemaCompatibilityCheck, affect the
> > > topic
> > > > > of
> > > > > > the existing schema?
> > > > > >
> > > > > > Older topics still use the old checker, and newer topics or
> certain
> > > > older
> > > > > > topics use the new advancedchecker.
> > > > > >
> > > > > when `ProtobufNativeAdvancedSchemaCompatibilityCheck` stable,
> > > > > users will not choose `ProtobufNativeSchemaCompatibilityCheck`.
> > > > > because it not a complete checker.
> > > > > > (4)So should we have one schemaType for a checker?
> > > > protobufNativeChecker
> > > > > > can have as many different implementation classes as possible.
> This
> > > > > > classname configuration in PIP, let's see if it can be set at the
> > > topic
> > > > > > level. In the current PIP design I just load this parameter into
> > the
> > > > > > checker when the broker is started and the checkers map is set
> up.
> > > Can
> > > > I
> > > > > do
> > > > > > this in the new normal pr if I want to support topic level? Or
> > > perfect
> > > > it
> > > > > > here?
> > > > > >
> > > > > > Add a call PROTOBUF_NATIVE_ADVANCE schemaType corresponding
> > > > > > ProtobufNativeAdvancedSchemaCompatibilityCheck? (Seems to be more
> > > > > trouble).
> > > > > >
> > > > > > Sorry I can not use the computer and network in the company, I
> use
> > my
> > > > > > mobile phone to reply to the email, the format may be a bit
> messy.
> > > > Please
> > > > > > understand.
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > sinan
> > > > > >
> > > > > >
> > > > > > 丛搏 <bo...@apache.org> 于 2023年3月7日周二 下午11:39写道:
> > > > > >
> > > > > > > SiNan Liu <li...@gmail.com> 于2023年3月7日周二 13:22写道:
> > > > > > > >
> > > > > > > > Great to see your comment, bo!
> > > > > > > >
> > > > > > > > 1. The first way. The protobuf website has a description of
> the
> > > > > rules,
> > > > > > > but
> > > > > > > > no plans to implement them.
> > > > > > > > https://protobuf.dev/programming-guides/proto/#updating
> > > > > > >
> > > > > > > https://groups.google.com/g/protobuf
> > > > > > > maybe ask here
> > > > > > >
> > > > > > > >
> > > > > > > > 2. I think this PIP can be divided into two parts.
> > > > > > > > (1) Add a flag(`ValidatorClassName`), load it into
> > > > > > > > `ProtobufNativeSchemaCompatibilityCheck` when the broker
> > starts.
> > > > > > > > ValidatorClassName is empty by default, and the
> implementation
> > > > > continues
> > > > > > > as
> > > > > > > > before, with no change for the user.
> > > > > > >
> > > > > > > `ProtobufNativeSchemaCompatibilityCheck` is a plugin in
> > > `broker.conf`
> > > > > > > ```
> > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
> schemaRegistryCompatibilityCheckers=org.apache.pulsar.broker.service.schema.JsonSchemaCompatibilityCheck,org.apache.pulsar.broker.service.schema.AvroSchemaCompatibilityCheck,org.apache.pulsar.broker.service.schema.ProtobufNativeSchemaCompatibilityCheck
> > > > > > > ```
> > > > > > > I do not recommend that we directly modify this plugin and
> > continue
> > > > to
> > > > > > > add configuration items, which will cause trouble for users.
> > > > > > > We have a lot of configs and it's getting very unwieldy.
> > > > > > > in my opinion, we don't change
> > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
> `org.apache.pulsar.broker.service.schema.ProtobufNativeSchemaCompatibilityCheck`,
> > > > > > > it is a simple implementation, it doesn't go wrong very often,
> > most
> > > > > > > users will use it. we can add another ProtobufNativeCheck named
> > > > > > > `ProtobufNativeAdvancedSchemaCompatibilityCheck ` or other. in
> > this
> > > > > > > way, we don't need to add this flag. There is no need to
> consider
> > > > > > > compatibility, it is just a plug-in and will not affect current
> > > > logic.
> > > > > > > If the user needs it, just change the plugin to the new
> > > > implementation
> > > > > > >
> > > > > > > > ```java
> > > > > > > >     ProtobufNativeSchemaValidator DEFAULT = (fromDescriptors,
> > > > > > > toDescriptor)
> > > > > > > > -> {
> > > > > > > >         for (Descriptors.Descriptor fromDescriptor :
> > > > > fromDescriptors) {
> > > > > > > >             // The default implementation only checks if the
> > root
> > > > > message
> > > > > > > > has changed.
> > > > > > > >             if
> > > > > > > >
> > > (!fromDescriptor.getFullName().equals(toDescriptor.getFullName()))
> > > > {
> > > > > > > >                 throw new
> > ProtoBufCanReadCheckException("Protobuf
> > > > > root
> > > > > > > > message isn't allow change!");
> > > > > > > >             }
> > > > > > > >         }
> > > > > > > >     };
> > > > > > > > ```
> > > > > > > > `ValidatorClassName` value also can be set to the current
> > > > > implementation
> > > > > > > of
> > > > > > > > PIP add
> > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
> `org.apache.pulsar.broker.service.schema.validator.ProtobufNativeSchemaBreakValidatorImpl`.
> > > > > > > >
> > > > > > > > (2) Recoding the `ProtobufNativeSchemaCompatibilityCheck`.
> > > Through
> > > > > the
> > > > > > > flag
> > > > > > > > (`ValidatorClassName`) to build different
> > > > > > > `ProtobufNativeSchemaValidator`.
> > > > > > > > Isn't it just a plug-in? The user can develop and choose a
> > > > different
> > > > > > > > `ProtobufNativeSchemaValidator`. I think it didn't change the
> > > > logic,
> > > > > it
> > > > > > > > just allowed him to expand it.
> > > > > > > >
> > > > > > > >
> > > > > > > > I think this PIP should be an enhancement and supplement to
> the
> > > > > function,
> > > > > > > > and there is no such thing as unnecessary and meaningless.
> > > > > > > >
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > sinan
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > 丛搏 <bo...@apache.org> 于2023年3月7日周二 11:53写道:
> > > > > > > >
> > > > > > > > > I think we have two ways to do that.
> > > > > > > > >
> > > > > > > > > First way: We need to advance the improvement of java in
> > > > protobuf.
> > > > > Ask
> > > > > > > > > if they have plans to improve.
> > > > > > > > >
> > > > > > > > > Second way: the new PROTOBUF_NATIVE
> > `SchemaCompatibilityCheck`
> > > > > should
> > > > > > > > > be implemented as a plugin, don't change any existing
> plugin
> > > > logic
> > > > > > > > > (it's simple and already used). I don't recommend adding
> > flags
> > > > for
> > > > > > > > > rollback, it adds configuration and makes little sense.
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > Bo
> > > > > > > > >
> > > > > > > > > Asaf Mesika <as...@gmail.com> 于2023年3月6日周一 23:00写道:
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Can you convert the code block which is actually a quote
> in
> > > the
> > > > > > > > > > beginning of the PIP to something which doesn't require
> to
> > > > scroll
> > > > > > > > > > horizontally so much?
> > > > > > > > > > Use
> > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
> https://docs.github.com/en/get-started/writing-on-github/getting-started-with-writing-and-formatting-on-github/basic-writing-and-formatting-syntax#quoting-text
> > > > > > > > > >
> > > > > > > > > > Let's improve the clarity of what you wrote:
> > > > > > > > > >
> > > > > > > > > > "the PROTOBUF uses avro struct to store."
> > > > > > > > > > -->
> > > > > > > > > > When Schema type PROTOBUF is used, Pulsar Client assumes
> > the
> > > > > object
> > > > > > > given
> > > > > > > > > > to it as message data is an auto-generated POJO
> containing
> > > the
> > > > > > > > > annotations
> > > > > > > > > > encoding the schema. The client is using a converter,
> which
> > > > > converts
> > > > > > > a
> > > > > > > > > > Protobuf schema descriptor into an Avro schema and sends
> > that
> > > > as
> > > > > the
> > > > > > > > > Schema
> > > > > > > > > > of the producer/consumer.
> > > > > > > > > >
> > > > > > > > > > "On the broker side, protobuf and avro both use
> SchemaData
> > > > > converted
> > > > > > > to
> > > > > > > > > > org.apache.avro.Schema."
> > > > > > > > > > -->
> > > > > > > > > > Since the schema is an Avro schema, the implementation of
> > > > > > > compatibility
> > > > > > > > > > check on the broker side is to simply re-use the
> > > compatibility
> > > > > check
> > > > > > > of
> > > > > > > > > the
> > > > > > > > > > AVRO schema type.
> > > > > > > > > >
> > > > > > > > > > "ProtobufSchema is different from ProtobufNativeSchema in
> > > > schema
> > > > > > > > > > compatibility check it uses avro-protobuf.
> > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
> https://central.sonatype.com/artifact/org.apache.avro/avro-protobuf/1.11.1/overview
> > > > > > > > > > But the current implementation of ProtobufNative schema
> > > > > compatibility
> > > > > > > > > > check only
> > > > > > > > > > checked if the root message name is changed."
> > > > > > > > > >
> > > > > > > > > > -->
> > > > > > > > > > PROTOBUF_NATIVE schema type is different.
> > > > > > > > > > The client is actually using Protobuf Descriptor as the
> > > schema,
> > > > > as
> > > > > > > > > opposed
> > > > > > > > > > to Avro schema of PROTOBUF schema type. In the broker,
> the
> > > > > > > > > PROTOBUF_NATIVE
> > > > > > > > > > compatibility check actually hasn't implemented any rule,
> > > > besides
> > > > > > > one:
> > > > > > > > > > checking if the root message name has changed.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >    1. For now, there is no official or third-party
> > solution
> > > > for
> > > > > > > > > ProtoBuf
> > > > > > > > > > >    compatibility. If in the future have better
> solutions
> > > of a
> > > > > third
> > > > > > > > > party or
> > > > > > > > > > >    the official, we develop new
> > > ProtobufNativeSchemaValidator
> > > > > and
> > > > > > > use,
> > > > > > > > > so
> > > > > > > > > > >    add a flag.
> > > > > > > > > > >
> > > > > > > > > > > Who do you need to make that configurable? Once you
> > found a
> > > > > third
> > > > > > > > > party,
> > > > > > > > > > just switch to it? Who knows, maybe you never will.
> > Introduce
> > > > it
> > > > > > > when you
> > > > > > > > > > find it, not now.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > We improve in ProtobufNativeSchemaCompatibilityCheck
> > > BACKWARD,
> > > > > > > FORWARD
> > > > > > > > > > > these strategies. As with the AVRO implementation,
> > protobuf
> > > > > > > > > compatibility
> > > > > > > > > > > checking need implementing the canRead method. *This
> will
> > > > check
> > > > > > > that
> > > > > > > > > > > the writtenschema can be read by readSchema.*
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > I completely disagree.
> > > > > > > > > > Avro implementation is confusing for our use case. Don't
> > copy
> > > > > that.
> > > > > > > > > >
> > > > > > > > > > You have
> > > > > > > > > >
> > > > > > > > > > public void checkCompatible(SchemaData from, SchemaData
> to,
> > > > > > > > > > SchemaCompatibilityStrategy strategy)
> > > > > > > > > >         throws IncompatibleSchemaException {
> > > > > > > > > >     Descriptor fromDescriptor =
> > > > > > > > > > ProtobufNativeSchemaUtils.deserialize(from.getData());
> > > > > > > > > >     Descriptor toDescriptor =
> > > > > > > > > > ProtobufNativeSchemaUtils.deserialize(to.getData());
> > > > > > > > > >     switch (strategy) {
> > > > > > > > > >         case BACKWARD_TRANSITIVE:
> > > > > > > > > >         case BACKWARD:
> > > > > > > > > >         case FORWARD_TRANSITIVE:
> > > > > > > > > >         case FORWARD:
> > > > > > > > > >         case FULL_TRANSITIVE:
> > > > > > > > > >         case FULL:
> > > > > > > > > >             checkRootMessageChange(fromDescriptor,
> > > > toDescriptor,
> > > > > > > > > strategy);
> > > > > > > > > >             return;
> > > > > > > > > >         case ALWAYS_COMPATIBLE:
> > > > > > > > > >             return;
> > > > > > > > > >         default:
> > > > > > > > > >             throw new
> IncompatibleSchemaException("Unknown
> > > > > > > > > > SchemaCompatibilityStrategy.");
> > > > > > > > > >     }
> > > > > > > > > > }
> > > > > > > > > >
> > > > > > > > > > I would rename :
> > > > > > > > > > from --> currentSchema
> > > > > > > > > > to --> newSchema
> > > > > > > > > >
> > > > > > > > > > Use that switch case and have a method for each like:
> > > > > > > > > > validateBackwardsCompatibility(currentSchema, newSchema)
> > > > > > > > > >
> > > > > > > > > > I dislike canRead and usage of writtenSchema, since you
> > have
> > > > two
> > > > > > > > > completely
> > > > > > > > > > different use cases: from the producing side and the
> > consumer
> > > > > side.
> > > > > > > > > >
> > > > > > > > > > schemaValidatorBuilder
> > > > > > > > > > >
> > > > > > > > > > > I dislike this proposal. IMO Avro implementation is way
> > too
> > > > > > > > > complicated.
> > > > > > > > > > Why not have a simple function for validation for each
> > switch
> > > > > case
> > > > > > > above?
> > > > > > > > > > Why do we need strategy and builder, and all this
> > complexity?
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > *Here are the basic compatibility rules we've defined:*
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > IMO it's impossible to read the validation rules as you
> > > > described
> > > > > > > them.
> > > > > > > > > > I wrote how they should be structured numerous times
> above.
> > > > > > > > > > I can't validate them.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > IMO, the current design is very hard to read.
> > > > > > > > > > Please try to avoid jumping into code sections.
> > > > > > > > > > Write a high level design section, in which you describe
> in
> > > > words
> > > > > > > what
> > > > > > > > > you
> > > > > > > > > > plan to do.
> > > > > > > > > > Write the validation rules in the structure that is easy
> to
> > > > > > > understand:
> > > > > > > > > > rules per each compatibility check, and use proper words
> > > > (current
> > > > > > > schema,
> > > > > > > > > > new schema), since new schema can be once used for read
> and
> > > > once
> > > > > > > used for
> > > > > > > > > > write.
> > > > > > > > > >
> > > > > > > > > > In its current form it takes too much time to understand
> > the
> > > > > design,
> > > > > > > and
> > > > > > > > > it
> > > > > > > > > > shouldn't be the case.
> > > > > > > > > >
> > > > > > > > > > Thanks,
> > > > > > > > > >
> > > > > > > > > > Asaf
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Sun, Mar 5, 2023 at 3:58 PM SiNan Liu <
> > > > liusinan1998@gmail.com
> > > > > >
> > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Hi! I updated the explanation of some things in the PIP
> > > > issue.
> > > > > And
> > > > > > > also
> > > > > > > > > > > added a new “flag” in the conf is used as the different
> > > > > > > > > > > ProtobufNativeSchemaValidator implementation, also set
> > > > > > > > > > > ProtobufNativeSchemaValidator default only check
> whether
> > > the
> > > > > name
> > > > > > > of
> > > > > > > > > the
> > > > > > > > > > > root message is the same.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Thanks,
> > > > > > > > > > > sinan
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Asaf Mesika <as...@gmail.com> 于2023年3月5日周日
> > 20:21写道:
> > > > > > > > > > >
> > > > > > > > > > > > On Wed, Mar 1, 2023 at 4:33 PM SiNan Liu <
> > > > > liusinan1998@gmail.com
> > > > > > > >
> > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Can you please explain how a Protobuf Schema
> > > descriptor
> > > > > can
> > > > > > > be
> > > > > > > > > > > > validated
> > > > > > > > > > > > > > for backward compatibility check using Avro based
> > > > > > > compatibility
> > > > > > > > > > > rules?
> > > > > > > > > > > > > > Doesn't it expect the schema to be Avro, but it
> is
> > > > > actually a
> > > > > > > > > > > Protobuf
> > > > > > > > > > > > > > descriptor?
> > > > > > > > > > > > > > Is there some translation happening?
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > 1. *You can take a quick look at the previous
> design,
> > > the
> > > > > > > PROTOBUF
> > > > > > > > > uses
> > > > > > > > > > > > > avro struct to store.*
> > > > > > > > > > > > > https://github.com/apache/pulsar/pull/1954
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/ProtobufSchema.java#L59-L61
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/ProtobufSchema.java#L110-L115
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Ok. So to summarize your code (easier to write it
> than
> > > send
> > > > > > > links):
> > > > > > > > > > > > * Pulsar Client, when used with Protobuf Schema,
> > actually
> > > > > > > converts
> > > > > > > > > the
> > > > > > > > > > > > Protobuf descriptor into an Avro Schema (using code
> > found
> > > > > inside
> > > > > > > Avro
> > > > > > > > > > > > library) and saves that Avro schema as the schema.
> It's
> > > not
> > > > > > > saving
> > > > > > > > > the
> > > > > > > > > > > > protobuf descriptor at all. Very confusing I have to
> > add
> > > -
> > > > > never
> > > > > > > > > expected
> > > > > > > > > > > > that.
> > > > > > > > > > > > This explains why In the
> > ProtobufSchemaCompatibilityCheck
> > > > > they
> > > > > > > just
> > > > > > > > > > > extend
> > > > > > > > > > > > the Avro without doing any translation.
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks for that.
> > > > > > > > > > > >
> > > > > > > > > > > > Now thatI finally understand this, I can say that:
> you
> > > > *must*
> > > > > > > explain
> > > > > > > > > > > that
> > > > > > > > > > > > in the motivation part in your PIP.
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > 2. *On the broker side, protobuf and avro both use
> > > > > `SchemaData`
> > > > > > > > > > > converted
> > > > > > > > > > > > > to `org.apache.avro.Schema`.*
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/ServerCnx.java#L1280-L1293
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/schema/ProtobufSchemaCompatibilityCheck.java#L26-L31
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/schema/AvroSchemaBasedCompatibilityCheck.java#L47-L70
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Actually those links don't really help.
> > > > > > > > > > > > The main link that helps is:
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/pulsar/blob/ec102fb024a6ea2b195826778300f20e330dff06/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/ProtobufSchema.java#L102-L122
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > I'm sorry - I don't understand.
> > > > > > > > > > > > > > I understand the different compatibility check
> > > > > strategies.
> > > > > > > > > > > > > > If you just spell them out here, then as you say,
> > > just
> > > > > > > translate
> > > > > > > > > the
> > > > > > > > > > > > > > Protobuf Descriptor into an Avro schema and run
> the
> > > > Avro
> > > > > > > > > > > > > > compatibility validation, no?
> > > > > > > > > > > > > > I believe the answer is no, since you may want to
> > > > verify
> > > > > > > > > different
> > > > > > > > > > > > things
> > > > > > > > > > > > > > when it comes to Protobuf, which are different
> then
> > > > Avro.
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > 1.
> > > > > > > > > > > > > *ProtobufSchema is different from
> > ProtobufNativeSchema
> > > in
> > > > > that
> > > > > > > it
> > > > > > > > > uses
> > > > > > > > > > > > > avro-protobuf.*
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
> https://central.sonatype.com/artifact/org.apache.avro/avro-protobuf/1.11.1/overview
> > > > > > > > > > > > > *ProtobufNativeSchema needs a native compatibility
> > > check,
> > > > > but
> > > > > > > > > there is
> > > > > > > > > > > no
> > > > > > > > > > > > > official or third party implementation. So this PIP
> > > does
> > > > > not
> > > > > > > use
> > > > > > > > > > > > > avro-protobuf for protobuf compatibility checking.*
> > > > > > > > > > > > >
> > > > > > > > > > > > > 2. *By the way, this is implemented in much the
> same
> > > way
> > > > > that
> > > > > > > > > Apache
> > > > > > > > > > > avro
> > > > > > > > > > > > > does compatibility checking.*
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/SchemaValidatorBuilder.java
> > > > > > > > > > > > >
> > > > `canReadStrategy`,`canBeReadStrategy`,`mutualReadStrategy`
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/ValidateCanRead.java
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/ValidateCanBeRead.java
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/ValidateMutualRead.java
> > > > > > > > > > > > > *In `ValidateMutualRead.java`, the arguments of
> > > > > `canRead()` are
> > > > > > > > > > > > > writtenSchema and readSchema. We only need to
> change
> > > the
> > > > > order
> > > > > > > of
> > > > > > > > > > > > arguments
> > > > > > > > > > > > > we pass to `canRead()`.*
> > > > > > > > > > > > > ```java
> > > > > > > > > > > > > private void
> > > validateWithStrategy(Descriptors.Descriptor
> > > > > > > > > toValidate,
> > > > > > > > > > > > > Descriptors.Descriptor fromDescriptor) throws
> > > > > > > > > > > > ProtoBufCanReadCheckException
> > > > > > > > > > > > > {
> > > > > > > > > > > > > switch (strategy) {
> > > > > > > > > > > > > case CanReadExistingStrategy ->
> > canRead(fromDescriptor,
> > > > > > > > > toValidate);
> > > > > > > > > > > > > case CanBeReadByExistingStrategy ->
> > canRead(toValidate,
> > > > > > > > > > > fromDescriptor);
> > > > > > > > > > > > > case CanBeReadMutualStrategy -> {
> > > > > > > > > > > > > canRead(toValidate, fromDescriptor);
> > > > > > > > > > > > > canRead(fromDescriptor, toValidate);
> > > > > > > > > > > > > }
> > > > > > > > > > > > > }
> > > > > > > > > > > > > }
> > > > > > > > > > > > >
> > > > > > > > > > > > > private void canRead(Descriptors.Descriptor
> > > > writtenSchema,
> > > > > > > > > > > > > Descriptors.Descriptor readSchema) throws
> > > > > > > > > > > ProtoBufCanReadCheckException {
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
> ProtobufNativeSchemaBreakCheckUtils.checkSchemaCompatibility(writtenSchema,
> > > > > > > > > > > > > readSchema);
> > > > > > > > > > > > > }
> > > > > > > > > > > > > ```
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > I get that you want to take inspiration from the
> > existing
> > > > > Avro
> > > > > > > Schema
> > > > > > > > > > > > compatibility check, to do your code design.
> > > > > > > > > > > > I also understand you *won't* use any existing avro
> > code
> > > > for
> > > > > > > that.
> > > > > > > > > > > > I also understand, you have to write the validation
> > check
> > > > on
> > > > > your
> > > > > > > > > own,
> > > > > > > > > > > > since there is no 3rd party to explain that.
> > > > > > > > > > > >
> > > > > > > > > > > > The only thing I can't understand are the actual
> rules
> > > as I
> > > > > wrote
> > > > > > > > > before,
> > > > > > > > > > > > since they are written confusingly.
> > > > > > > > > > > > So, I repeat what I asked before:
> > > > > > > > > > > >
> > > > > > > > > > > > I think you should structure the validation rules
> > > > > differently:
> > > > > > > > > > > >
> > > > > > > > > > > > * Backward checks
> > > > > > > > > > > > ** List down rules, where use newSchema (the schema
> > used
> > > by
> > > > > > > producer
> > > > > > > > > or
> > > > > > > > > > > > consumer) and existingSchema (last schema used)
> > > > > > > > > > > > * Forward
> > > > > > > > > > > > ** List down rules, where use newSchema (the schema
> > used
> > > by
> > > > > > > producer
> > > > > > > > > or
> > > > > > > > > > > > consumer) and existingSchema (last schema used)
> > > > > > > > > > > >
> > > > > > > > > > > > Once that's accomplished I will be able to understand
> > the
> > > > > > > different
> > > > > > > > > > > > validation rules for each compatibility check.
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > sinan
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > Asaf Mesika <as...@gmail.com> 于2023年3月1日周三
> > > > 21:19写道:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Mon, Feb 27, 2023 at 3:47 PM SiNan Liu <
> > > > > > > > > liusinan1998@gmail.com>
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > I read it and they look identical. What's the
> > > > > difference
> > > > > > > > > between
> > > > > > > > > > > > > them?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Current avro,json, and protobuf schemas are all
> > > > > implemented
> > > > > > > > > based
> > > > > > > > > > > on
> > > > > > > > > > > > > AVRO.
> > > > > > > > > > > > > > > > What do you mean, they are all implemented
> > based
> > > on
> > > > > > > Avro? You
> > > > > > > > > > > mean
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > protobuf schema is converted into an Avro
> > Schema,
> > > > and
> > > > > > > then
> > > > > > > > > you
> > > > > > > > > > > use
> > > > > > > > > > > > > Avro
> > > > > > > > > > > > > > > > compatibility validation?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
> `org.apache.pulsar.broker.service.schema.ProtobufSchemaCompatibilityCheck`
> > > > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > >
> > > > >
> > `org.apache.pulsar.broker.service.schema.AvroSchemaCompatibilityCheck`
> > > > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > >
> > > > >
> > `org.apache.pulsar.broker.service.schema.JsonSchemaCompatibilityCheck`
> > > > > > > > > > > > > > > They all extends
> > > `AvroSchemaBasedCompatibilityCheck`,
> > > > > the
> > > > > > > > > > > > > > > `checkCompatible()` is the same implementation
> > with
> > > > > `AVRO`.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Can you please explain how a Protobuf Schema
> > > descriptor
> > > > > can
> > > > > > > be
> > > > > > > > > > > > validated
> > > > > > > > > > > > > > for backward compatibility check using Avro based
> > > > > > > compatibility
> > > > > > > > > > > rules?
> > > > > > > > > > > > > > Doesn't it expect the schema to be Avro, but it
> is
> > > > > actually a
> > > > > > > > > > > Protobuf
> > > > > > > > > > > > > > descriptor?
> > > > > > > > > > > > > > Is there some translation happening?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I think you should structure the validation
> rules
> > > > > > > differently:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > The Compatibility check strategy is described
> on
> > > the
> > > > > > > website
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
> https://pulsar.apache.org/docs/next/schema-understand/#schema-compatibility-check-strategy
> > > > > > > > > > > > > > > 1. BACKWARD(CanReadExistingStrategy): Consumers
> > > using
> > > > > > > schema
> > > > > > > > > V3 can
> > > > > > > > > > > > > process
> > > > > > > > > > > > > > > data written by producers using the last schema
> > > > version
> > > > > > > V2. So
> > > > > > > > > V2
> > > > > > > > > > > is
> > > > > > > > > > > > > > > "writtenSchema" and V3 is "readSchema".
> > > > > > > > > > > > > > > 2. FORWARD(CanBeReadByExistingStrategy):
> > Consumers
> > > > > using
> > > > > > > the
> > > > > > > > > last
> > > > > > > > > > > > > schema
> > > > > > > > > > > > > > > version V2 can process data written by
> producers
> > > > using
> > > > > a
> > > > > > > new
> > > > > > > > > schema
> > > > > > > > > > > > V3,
> > > > > > > > > > > > > > > even though they may not be able to use the
> full
> > > > > > > capabilities
> > > > > > > > > of
> > > > > > > > > > > the
> > > > > > > > > > > > > new
> > > > > > > > > > > > > > > schema. So V3 is "writtenSchema" and V2 is
> > > > > "readSchema".
> > > > > > > > > > > > > > > 3. FULL(CanBeReadMutualStrategy): Schemas are
> > both
> > > > > > > backward and
> > > > > > > > > > > > forward
> > > > > > > > > > > > > > > compatible.
> > > > > > > > > > > > > > > Schema can evolve. The old version schema and
> the
> > > new
> > > > > > > version
> > > > > > > > > > > schema
> > > > > > > > > > > > > should
> > > > > > > > > > > > > > > be well understood.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > I'm sorry - I don't understand.
> > > > > > > > > > > > > > I understand the different compatibility check
> > > > > strategies.
> > > > > > > > > > > > > > If you just spell them out here, then as you say,
> > > just
> > > > > > > translate
> > > > > > > > > the
> > > > > > > > > > > > > > Protobuf Descriptor into an Avro schema and run
> the
> > > > Avro
> > > > > > > > > > > > > > compatibility validation, no?
> > > > > > > > > > > > > > I believe the answer is no, since you may want to
> > > > verify
> > > > > > > > > different
> > > > > > > > > > > > things
> > > > > > > > > > > > > > when it comes to Protobuf, which are different
> then
> > > > Avro.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > At the current state, I can't understand your
> > design
> > > at
> > > > > all.
> > > > > > > > > Please
> > > > > > > > > > > > help
> > > > > > > > > > > > > > clarify that.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > So each strategy should have its own section.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > The arguments of `canRead()` are writtenSchema
> > and
> > > > > > > readSchema.
> > > > > > > > > As
> > > > > > > > > > > > we've
> > > > > > > > > > > > > > > just described, we only need to change the
> order
> > of
> > > > > > > arguments
> > > > > > > > > we
> > > > > > > > > > > pass
> > > > > > > > > > > > > to
> > > > > > > > > > > > > > > `canRead()`.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > sinan
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Asaf Mesika <as...@gmail.com>
> > 于2023年2月27日周一
> > > > > 20:49写道:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > And you can see the difference between
> > ProtoBuf
> > > > and
> > > > > > > > > > > > ProtoBufNative:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > >
> > > > >
> > https://pulsar.apache.org/docs/next/schema-get-started/#protobufnative
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > https://pulsar.apache.org/docs/next/schema-get-started/#protobuf
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >  I read it and they look identical. What's
> the
> > > > > difference
> > > > > > > > > between
> > > > > > > > > > > > > them?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Current avro,json, and protobuf schemas are
> all
> > > > > > > implemented
> > > > > > > > > based
> > > > > > > > > > > > on
> > > > > > > > > > > > > > > AVRO.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > What do you mean, they are all implemented
> > based
> > > on
> > > > > > > Avro? You
> > > > > > > > > > > mean
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > protobuf schema is converted into an Avro
> > Schema,
> > > > and
> > > > > > > then
> > > > > > > > > you
> > > > > > > > > > > use
> > > > > > > > > > > > > Avro
> > > > > > > > > > > > > > > > compatibility validation?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > *Here are the basic compatibility rules
> we've
> > > > > defined:*
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > I think you should structure the validation
> > rules
> > > > > > > > > differently:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > * Backward checks
> > > > > > > > > > > > > > > > ** List down rules, where use newSchema (the
> > > schema
> > > > > used
> > > > > > > by
> > > > > > > > > > > > producer
> > > > > > > > > > > > > or
> > > > > > > > > > > > > > > > consumer) and existingSchema (last schema
> used)
> > > > > > > > > > > > > > > > * Forward
> > > > > > > > > > > > > > > > ** List down rules, where use newSchema (the
> > > schema
> > > > > used
> > > > > > > by
> > > > > > > > > > > > producer
> > > > > > > > > > > > > or
> > > > > > > > > > > > > > > > consumer) and existingSchema (last schema
> used)
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > So each strategy should have its own section.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > I'm saying this since you used
> "writttenSchema"
> > > > word
> > > > > but
> > > > > > > it
> > > > > > > > > > > > > represents
> > > > > > > > > > > > > > > > something completely different if it's
> backward
> > > or
> > > > > > > forward
> > > > > > > > > check.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Once you'll have that structure like that, I
> > > > > personally
> > > > > > > will
> > > > > > > > > be
> > > > > > > > > > > > able
> > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > read and understand it.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > The motivation and problem statement are now
> > > good -
> > > > > > > thanks
> > > > > > > > > for
> > > > > > > > > > > > > improving
> > > > > > > > > > > > > > > > it.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On Mon, Feb 27, 2023 at 8:20 AM SiNan Liu <
> > > > > > > > > > > liusinan1998@gmail.com>
> > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Hi! I updated the PIP issue again. This
> time
> > > I've
> > > > > added
> > > > > > > > > some
> > > > > > > > > > > > > background
> > > > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > some explanations.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > The compatibility check rules are already
> > > written
> > > > > in
> > > > > > > the
> > > > > > > > > > > > > > > Implementation.
> > > > > > > > > > > > > > > > > ProtoBufNative implements the same canRead
> > > method
> > > > > as
> > > > > > > Apache
> > > > > > > > > > > Avro.
> > > > > > > > > > > > > > > > > It does this by checking whether the schema
> > for
> > > > > > > writing and
> > > > > > > > > > > > reading
> > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > compatible. I also indicate whether the
> > > > > writtenSchema
> > > > > > > and
> > > > > > > > > > > > > readSchema of
> > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > Backward, Forward, and Full strategies are
> > the
> > > > old
> > > > > or
> > > > > > > the
> > > > > > > > > new
> > > > > > > > > > > > > version
> > > > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > > the schema.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > > > sinan
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Asaf Mesika <as...@gmail.com>
> > > > 于2023年2月26日周日
> > > > > > > 23:24写道:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > I'm sorry, but this PIP lacks a lot of
> > > > background
> > > > > > > > > knowledge,
> > > > > > > > > > > so
> > > > > > > > > > > > > you
> > > > > > > > > > > > > > > > need
> > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > add IMO for people to understand it. You
> > > don't
> > > > > need
> > > > > > > to
> > > > > > > > > > > explain
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > entire
> > > > > > > > > > > > > > > > > > pulsar in this PIP, but at the very
> least a
> > > few
> > > > > > > > > paragraphs
> > > > > > > > > > > > > detailing
> > > > > > > > > > > > > > > > all
> > > > > > > > > > > > > > > > > > you need to know, to put you in context:
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >    - Start by saying Pulsar as a built-in
> > > > schema
> > > > > > > registry
> > > > > > > > > > > > inside
> > > > > > > > > > > > > > > Pulsar
> > > > > > > > > > > > > > > > > >    broker.
> > > > > > > > > > > > > > > > > >       - Every time the client updates the
> > > > > schema, it
> > > > > > > > > uploads
> > > > > > > > > > > it
> > > > > > > > > > > > > to
> > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > >       broker. When that happens, it has a
> > > > feature
> > > > > > > which
> > > > > > > > > > > > validates
> > > > > > > > > > > > > if
> > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > new
> > > > > > > > > > > > > > > > > >       schema version is compatible with
> the
> > > > > previous
> > > > > > > > > > > versions.
> > > > > > > > > > > > > There
> > > > > > > > > > > > > > > > > > are 4 types
> > > > > > > > > > > > > > > > > >       of compatibility: Full, ...
> (complete
> > > and
> > > > > > > explain
> > > > > > > > > each
> > > > > > > > > > > > one
> > > > > > > > > > > > > > > > briefly)
> > > > > > > > > > > > > > > > > >    - Also explain Pulsar Schema registry
> > > > supports
> > > > > > > various
> > > > > > > > > > > > schema
> > > > > > > > > > > > > > > > > >    protocols:  Avro, protobuf native, ...
> > > > > (complete
> > > > > > > the
> > > > > > > > > > > rest),
> > > > > > > > > > > > > each
> > > > > > > > > > > > > > > > > > protocol
> > > > > > > > > > > > > > > > > >    has a schema which dictates how to
> > > serialize
> > > > > and
> > > > > > > > > > > deserialize
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > message
> > > > > > > > > > > > > > > > > >    content into typed object.
> > > > > > > > > > > > > > > > > >    - Explain in short what is protobuf
> > native
> > > > > > > (compare
> > > > > > > > > > > protobuf
> > > > > > > > > > > > > > > > > non-native)
> > > > > > > > > > > > > > > > > >    - Please don't paste code instead of
> > > > > explaining.
> > > > > > > > > > > > > > > > > >       - Explain that protobuf native
> > current
> > > > > > > validation
> > > > > > > > > check
> > > > > > > > > > > > is
> > > > > > > > > > > > > only
> > > > > > > > > > > > > > > > > >       composed of checking the root
> message
> > > > name
> > > > > is
> > > > > > > the
> > > > > > > > > same
> > > > > > > > > > > > > between
> > > > > > > > > > > > > > > > > > the current
> > > > > > > > > > > > > > > > > >       schema version and the new version.
> > > > > > > > > > > > > > > > > >          - Explain briefly what is a root
> > > > message
> > > > > > > and its
> > > > > > > > > > > name.
> > > > > > > > > > > > > > > > > >       - Explain the problem (list
> > scenarios)
> > > > > that we
> > > > > > > have
> > > > > > > > > > > > because
> > > > > > > > > > > > > > > > > protobuf
> > > > > > > > > > > > > > > > > >       native schema only supports FULL
> > > > > compatibility
> > > > > > > > > > > > validation.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Regarding high level design - as in what
> > you
> > > > > plan to
> > > > > > > do.
> > > > > > > > > > > > > > > > > > I suggest you add "High Level Design" and
> > in
> > > it
> > > > > > > detail
> > > > > > > > > how
> > > > > > > > > > > you
> > > > > > > > > > > > > plan
> > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > validate, per protobuf version, per
> > > > compatibility
> > > > > > > check
> > > > > > > > > > > > > (backward,
> > > > > > > > > > > > > > > > > forward,
> > > > > > > > > > > > > > > > > > full,...).
> > > > > > > > > > > > > > > > > > I tried reading the implementation - for
> > me ,
> > > > > it's
> > > > > > > all
> > > > > > > > > over
> > > > > > > > > > > the
> > > > > > > > > > > > > > > place.
> > > > > > > > > > > > > > > > > Can
> > > > > > > > > > > > > > > > > > you please list in order what I wrote
> > above,
> > > > and
> > > > > > > list the
> > > > > > > > > > > > > validation
> > > > > > > > > > > > > > > > > rules
> > > > > > > > > > > > > > > > > > with a good explanation why you validate
> it
> > > > like
> > > > > > > that?
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Lastly, one you have all the validation
> > rules
> > > > > clearly
> > > > > > > > > stated,
> > > > > > > > > > > > you
> > > > > > > > > > > > > can
> > > > > > > > > > > > > > > > use
> > > > > > > > > > > > > > > > > > it to document it properly so users can
> > know
> > > > what
> > > > > > > > > validation
> > > > > > > > > > > to
> > > > > > > > > > > > > > > expect.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Asaf
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > On Wed, Feb 22, 2023 at 5:10 PM SiNan
> Liu <
> > > > > > > > > > > > > liusinan1998@gmail.com>
> > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Sorry, my mistake. I removed the code
> and
> > > > > > > described the
> > > > > > > > > > > > design
> > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > improve
> > > > > > > > > > > > > > > > > > > the PROTOBUF_NATIVE schema
> compatibility
> > > > > checks.
> > > > > > > You
> > > > > > > > > can
> > > > > > > > > > > > have a
> > > > > > > > > > > > > > > look.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Asaf Mesika <as...@gmail.com>
> > > > > 于2023年2月22日周三
> > > > > > > > > 21:16写道:
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > I read it but you're almost directly
> > > diving
> > > > > into
> > > > > > > the
> > > > > > > > > > > code -
> > > > > > > > > > > > > it
> > > > > > > > > > > > > > > will
> > > > > > > > > > > > > > > > > > take
> > > > > > > > > > > > > > > > > > > me
> > > > > > > > > > > > > > > > > > > > hours just to reverse engineer your
> > > design.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Can you please include a "High Level
> > > > Design"
> > > > > > > section
> > > > > > > > > in
> > > > > > > > > > > > which
> > > > > > > > > > > > > you
> > > > > > > > > > > > > > > > > > explain
> > > > > > > > > > > > > > > > > > > > how you plan to tackle any issue?
> > > > > > > > > > > > > > > > > > > > If I can read that section and
> explain
> > to
> > > > > someone
> > > > > > > > > else
> > > > > > > > > > > how
> > > > > > > > > > > > > this
> > > > > > > > > > > > > > > > will
> > > > > > > > > > > > > > > > > > > work,
> > > > > > > > > > > > > > > > > > > > it means the section is complete.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Let's leave the code to the PRs.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > On Sun, Feb 19, 2023 at 2:59 PM SiNan
> > > Liu <
> > > > > > > > > > > > > > > liusinan1998@gmail.com>
> > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Hi all,
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > I made a PIP to discuss:
> > > > > > > > > > > > > > > > > >
> > > https://github.com/apache/pulsar/issues/19565
> > > > > > > > > > > > > > > > > > > .
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > We can talk about the current
> design
> > > > here.
> > > > > > > > > Especially
> > > > > > > > > > > for
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > field
> > > > > > > > > > > > > > > > > > > type
> > > > > > > > > > > > > > > > > > > > > change check rules, please give
> your
> > > > > valuable
> > > > > > > > > advice.
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > > > > > > > Sinan
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] PIP-246: Improved PROTOBUF_NATIVE schema compatibility checks without using avro-protobuf

Posted by SiNan Liu <li...@gmail.com>.
1.

> > Why? The root message name is not written over the wire to the best of my
> > knowledge. I haven't found it written in the official doc.


The name of the root message check is the rules in the previous
`ProtobufNativeSchemaCompatibilityCheck`. Because if the root message has a
different name, there is no need to check its contents. "Same" schema,
their names must be the same.


2.

> >The writtenSchema can not change the field number of any field in
> readSchema (the > field name is the same, but the field number is
> different).
> >You have to take into account field type as well when comparing.


The first sentence on the website says that the number of fields cannot be
changed.

> Don’t change the field numbers for any existing fields.



3.

> > - The writtenSchema cannot add required fields, but optional or
> > duplicate fields can be added (The field number must be new).
> >
> > That's not true.
> You can have a required field in writeSchema, and not have that field in
> readSchema (based on tag number).


The required field must exist. In your case, where does readSchema go to
read its required fields? It's not in writtenSchema.
The second and third sentences in the official website say, add and delete
do not operate required field!


4.

> (4) The writtenSchema can not change the field name of any field in
> > readSchema (the field number is the same, but the field name is
> > different).
> This is incorrect.
> Fields names are encoded into the wire. I don't see this in any best
> practice.


The third sentence on the website:

> You may want to rename the field instead, perhaps adding the prefix
> “OBSOLETE_”, or make the field number reserved, so that future users of
> your .proto can’t accidentally reuse the number.

If you want to rename a field, or add a new field. To delete with the new
number!


5.

> The writtenSchema does not change the field name and number, but it does
> change the field type.
> > Small correction: for the same field number you are not allowed to change
> types. Name is irrelevant.


Why doesn't the name irrelevant?
Here is the change in type, which is the rule stated in Alternatives. There
is no check here, just a warning to the user.
Another change is that the name of enum is changed, or the name of MESSAGE
is changed, which is the same as the root message name check in 1, is the
change still the same? This is not allowed to change!

*Here is example:*
readSchema(
https://github.com/apache/pulsar/pull/19566/files#diff-a7006d73502e6064a80af02822f3a3072be498d8b677c4b838b0dafaea32dea4
)
writtenSchema(
https://github.com/apache/pulsar/pull/19566/files#diff-e3e7543624edaf1e0a4fd47947a2cad6e4b816b93843f71a367042ba6c3ec53f
)


6.

> (6) The writtenSchema removes fields that do not have default values in
> > readSchema. Then the schema is incompatible.
> Protobuf gives you its own default if you don't supply one. This is
> incorrect.


(1) This rule only applies if proto2 does not set the default value. If
proto3 does not check, the default value will always be there.

(2) In PIP issue:

> Proto3 canceled the required field, so there is no need to check the
> required field. We get the syntax(proto2 or proto3) of proto in the code,
> and skip the check of the required field if it is proto3. All other
> checking rules also apply to proto3.


*I made a mistake here. This default value check is not need in proto3. I
will modify the rules later according to your suggestion.*

> I would remove the proto2/proto3 sections, since they only differ in 1
> rule, and just mention that distinction inside that rule (less work for the
> reade).


(3) And add rules that look like they should be checked:

> Rules that you don't have in the doc, but should IMO*
> .......

There can't be a field in writerSchema, that exists in readerSchema (tag
> number based), which in writerSchema its type is scalar, but in readSchema
> its type is scalar, it's repeated but with packed=true.


But I don't think rule number three needs to be added.


Thanks,
sinan



Asaf Mesika <as...@gmail.com> 于2023年3月14日周二 22:33写道:

> Hi Sinan,
>
> The doc looks much better!
>
> I have a few additional comments:
>
> Pasting comment from previous emails:
>
> Can you convert the code block which is actually a quote in the
> beginning of the PIP to something which doesn't require to scroll
> horizontally so much?
> Use
>
> https://docs.github.com/en/get-started/writing-on-github/getting-started-with-writing-and-formatting-on-github/basic-writing-and-formatting-syntax#quoting-text
>
> *Validation Rules*
>
> (1) If the root message names of writtenSchema and readSchema are
> > different, then incompatible.
>
> Why? The root message name is not written over the wire to the best of my
> knowledge. I haven't found it written in the official doc.
>
>
> >    - The writtenSchema cannot add required fields, but optional or
> >    duplicate fields can be added (The field number must be new).
> >
> > That's not true.
> You can have a required field in writeSchema, and not have that field in
> readSchema (based on tag number).
>
> The writtenSchema can not change the field number of any field in
> readSchema (the
> > field name is the same, but the field number is different).
>
> You have to take into account field type as well when comparing.
>
> If I have in readSchema
> int32 justAnID = 1
> int32 customerId = 2
>
> and in writeSchema I have
> int32 justAnID = 1
> string customerId = 3
>
> This is valid.
>
> (4) The writtenSchema can not change the field name of any field in
> > readSchema (the field number is the same, but the field name is
> > different).
>
> This is incorrect.
> Fields names are encoded into the wire. I don't see this in any best
> practice.
>
> ) The writtenSchema does not change the field name and number, but it does
> > change the field type.
> >
> >    - If the field type is ENUM or MESSAGE, the schema is not compatible
> >    when the type name is changed
> >    - If the type of the field is another type. The schemas under this
> >    rule are not incompatible, but warn the user.(There is another way of
> >    testing in PIP issue Alternatives)
> >
> > Small correction: for the same field number you are not allowed to change
> types. Name is irrelevant.
>
> (6) The writtenSchema removes fields that do not have default values in
> > readSchema. Then the schema is incompatible.
>
> Protobuf gives you its own default if you don't supply one. This is
> incorrect.
>
>
> *Rules that you don't have in the doc, but should IMO*
> * There can not be a field which exists both in readSchema and writeSchema,
> with same tag number, having different default values
> * There can't be a field in writerSchema, that exists in readerSchema (tag
> number based), which in writerSchema is repeated and its type is Scalar (
> https://protobuf.dev/programming-guides/proto/#scalar) but in readSchema
> it
> is not repeated anymore.
> * There can't be a field in writerSchema, that exists in readerSchema (tag
> number based), which in writerSchema its type is scalar, but in readSchema
> its type is scalar, it's repeated but with packed=true.
>
> *Rules you have , but I would phrase a bit differently*
>
> I would remove the proto2/proto3 sections, since they only differ in 1
> rule, and just mention that distinction inside that rule (less work for the
> reade).
>
> * readSchema has a field which doesn't exist in writerSchema (based on tag
> number).
>    * Proto v2:
>       * That field must be `optional` or `repeated` (must not be
> `required`)
>    * Proto v3:
>       * No problem.
> * There can not be a field which exists both in readSchema and writeSchema,
> with the same tag number, but having different types.
>
> *Motivation*
>
> Basically in the motivation section you want people to understand the
> following:
>
> Pulsar has built-in support for typed messages. It allows specifying an
> encoding scheme and its matching schema.
> For example, it supports Avro. You specify a schema for a given topic,
> using Avro Schema Definition (i.e. a JSON describing the schema). Everytime
> you produce a message, you first declare the schema definition you wish to
> use for your messages. The message data should be an avro-encoded binary
> data (which the client in some SDKs helps encode a given
> data-structure/object).
> The same applies when you consume a message. You first specify the schema
> you use to read the messages, and the client in some SDKs helps by decoding
> the message binary data into an object/data-structure.
>
> Each time you specify a schema to be used, either by a producer or a
> consumer, the schema is persisted in Pulsar and given an increasing version
> number. If the schema was the same as the previous version, it is not
> saved. When the message is persisted, the version number is encoded in the
> message headers.
>
> Pulsar provides a very useful feature named Schema Evolution
> <https://pulsar.apache.org/docs/2.11.x/schema-understand/#schema-evolution
> >.
> It allows us to check if a new schema version is compatible with previous
> versions or versions. When you configure the schema for the topic you
> decide the strategy to use for doing the validation check. The strategies
> validate the following:
>
>    - BACKWARD strategy
>       - A consumer with newSchema can read a message written using
>       existingSchema
>    - BACKWARD_TRANSITIVE strategy
>       - A consumer with newSchema can read messages written using all
>       existingSchema
>    - FORWARD
>       - A consumer with existingSchema can read messages written using
>       newSchema
>    - FORWARD_TRANSITIVE
>       - A consumer defined with any of the existingSchema can read messages
>       written using newSchema
>    - FULL
>       - A consumer defined with newSchema can read messages written using
>       existingSchema
>       - A consumer defined with existingSchema can read messages written
>       using newSchema
>    - FULL_TRANSITIVE
>       - A consumer defined with newSchema can read messages written using
>       any of the existingSchema
>       - A consumer defined with any of the existingSchema can read messages
>       written using newSchema
>
>
> Aside from Avro, Pulsar also has two additional supported encodings:
> PROTOBUF and PROTOBUF_NATIVE.
>
> PROTOBUF is a bit strange. It encodes the messages using Protobuf encoding,
> but the schema that is persisted to Pulsar is *not* Protobuf Descriptor as
> you would have expected. The saved schema is a translation of the Protobuf
> Descriptor to an Avro Schema, so in fact an Avro schema definition is saved
> as the schema.
>
> PROTOBUF_NATIVE was created to fix that shortcoming, by actually persisting
> the Protobuf Descriptor and using Protobuf for encoding.
> The problem is that the authors of PROTOBUF_NATIVE haven't completed it
> fully, and the backward compatibility validation code almost does not
> exist: It only checks if the root message name is the same between
> versions.
>
> GOALS
> The goal of this PIP is to amend PROTOBUF_NATIVE by adding a fully
> functional validation for any of the defined Schema Compatibility
> Strategies.
> A secondary goal is to allow the user to choose between different
> implementations: The new fully functional validation or the existing
> barebones validation.
>
> -------- END
>
> I'm ok with having links in the Motivation , as *further reading*.
> I'm against stacking up work for the reader to go read 5-6 different links
> just to understand the motivation and background knowledge required to
> understand the feature.
>
> I'm against putting code in the Motivation. Especially if it is supposed to
> replace description in plain English making it easy to understand the
> design.
> Leave the code to the motivation.
> Paste code only if you absolutely can't use plain old descriptions to
> explain.
>
>
>
> On Sat, Mar 11, 2023 at 11:46 AM SiNan Liu <li...@gmail.com> wrote:
>
> > *I guess that's right, too! *
> >
> > But the name `ProtobufNativeAdvancedSchemaCompatibilityCheck` is better,
> > because we don't know whether the future will have V2, V3. The official
> > solution can be called `ProtobufNativeOfficialSchemaCompatibilityCheck`,
> or
> > is a good `ProtobufNativeXXXXXXXXSchemaCompatibilityCheck` third-party
> > solution.
> >
> > I've updated my design in PIP issue.
> > 1. A new ProtobufNativeSchemaAdvanceCompatibilityCheck, rather than a
> > ProtobufNativeSchemaCompatibilityCheck different validator
> implementation.
> > 2. Remove the 'builder'
> > 3. Clarify the relationship between newSchema, existingSchema, and
> > writtenSchema in canRead.
> >
> > Help to see if the description is comprehensive and what changes and
> > improvements need to be made.
> >
> > Thanks,
> > sinan
> >
> >
> >
> > Asaf Mesika <as...@gmail.com> 于2023年3月9日周四 17:35写道:
> >
> > > I like Bo's suggestion - I haven't realized each schema type
> > > compatibility check is actually a plugin.
> > > It makes sense for any schema type checks to evolve, sometimes in a
> > > non-backward compatible way hence having two plugins like
> > > protobufNativeSchemaCompatabilityCheckV1 and then
> > > protobufNativeSchemaCompatabilityCheckV2 and then
> > > protobufNativeSchemaCompatabilityCheckV3 makes sense to me.
> > >
> > >
> > >
> > > On Thu, Mar 9, 2023 at 5:49 AM 丛搏 <bo...@apache.org> wrote:
> > >
> > > >  Hi siNan:
> > > >
> > > > From my point of view, it is just a plug-in. I don't think it is
> > > > necessary to add configuration for the plugin.
> > > > This is meaningless, and it will increase the difficulty of use for
> > > users.
> > > >
> > > >
> > > > SiNan Liu <li...@gmail.com> 于2023年3月8日周三 15:54写道:
> > > > >
> > > > > Hi, bo.
> > > > >
> > > > > 1. I understand what you say, to develop a new
> > > > > `ProtobufNativeAdvancedSchemaCompatibilityCheck`, rather than
> > changing
> > > > > existing `ProtobufNativeSchemaCompatibilityCheck`. But I found a
> few
> > > > small
> > > > > problems:
> > > > >
> > > > > (1)ProtobufNativeAdvancedSchemaCompatibilityCheck and
> > > > > ProtobufNativeSchemaCompatibilityCheck schemaType is
> PROTOBUF_NATIVE.
> > > It
> > > > > looks like both checkers are PROTOBUF not using AVRO-PROTOBUF's
> > > "native"
> > > > > implementation, which leads to some problems or "unreasonable" and
> > > gives
> > > > me
> > > > > some extended thinking and questions.
> > > > >
> > > > `CompatibilityCheck ` its only a plugin.
> > > > `ProtobufNativeSchemaCompatibilityCheck` may sooner or later leave
> the
> > > > stage, when `ProtobufNativeAdvancedSchemaCompatibilityCheck` is
> > > > stable, we can make it the default Checker.
> > > >
> > > > It is just a plug-in, users can change it at will and ensure that it
> > > > is used correctly
> > > > > (2)In broker.conf
> > > > >
> > > > > `schemaRegistryCompatibilityCheckers`. If
> > > > > ProtobufNativeSchemaCompatibilityCheck and
> > > > > ProtobufNativeAdvancedSchemaCompatibilityCheck all set. This is
> going
> > > to
> > > > > overwrite each other. Because this is a map:
> > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/pulsar/blob/af1360fb167c1f9484fda5771df3ea9b21d1440b/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/schema/SchemaRegistryService.java#L36-L44
> > > > >
> > > > > ```java
> > > > >
> > > > > Map<SchemaType, SchemaCompatibilityCheck> checkers = new
> HashMap<>();
> > > > >
> > > > > for (String className : checkerClasses) {
> > > > >
> > > > > SchemaCompatibilityCheck schemaCompatibilityCheck =
> > > > > Reflections.createInstance(className,
> > > > >
> > > > > SchemaCompatibilityCheck.class,
> > > > > Thread.currentThread().getContextClassLoader());
> > > > >
> > > > > checkers.put(schemaCompatibilityCheck.getSchemaType(),
> > > > > schemaCompatibilityCheck);
> > > > >
> > > > > ```
> > > > >
> > > > > Is this a big problem or a small one? Is it possible or
> unnecessary?
> > > > Maybe
> > > > > we can write in the documentation that protobufNative checkers can
> > only
> > > > > choose one of the two? Why are there two Checkers for different
> > > > > implementations of the same schemaType? Why not the checker to
> create
> > > > > different validator, so we don not have to change
> > > > > schemaRegistryCompatibilityCheckers.
> > > >
> > > > users can only use one, not two, which will bring complexity to users
> > > >
> > > > >
> > > > > (3)And after the update to
> > > > ProtobufNativeAdvancedSchemaCompatibilityCheck.
> > > > > Existing topics previously only checked the name of the root
> message,
> > > not
> > > > > the content of protobuf.
> > > > >
> > > > > What if the user wants both Checkers?
> > > > >
> > > > > Set to ProtobufNativeAdvancedSchemaCompatibilityCheck, affect the
> > topic
> > > > of
> > > > > the existing schema?
> > > > >
> > > > > Older topics still use the old checker, and newer topics or certain
> > > older
> > > > > topics use the new advancedchecker.
> > > > >
> > > > when `ProtobufNativeAdvancedSchemaCompatibilityCheck` stable,
> > > > users will not choose `ProtobufNativeSchemaCompatibilityCheck`.
> > > > because it not a complete checker.
> > > > > (4)So should we have one schemaType for a checker?
> > > protobufNativeChecker
> > > > > can have as many different implementation classes as possible. This
> > > > > classname configuration in PIP, let's see if it can be set at the
> > topic
> > > > > level. In the current PIP design I just load this parameter into
> the
> > > > > checker when the broker is started and the checkers map is set up.
> > Can
> > > I
> > > > do
> > > > > this in the new normal pr if I want to support topic level? Or
> > perfect
> > > it
> > > > > here?
> > > > >
> > > > > Add a call PROTOBUF_NATIVE_ADVANCE schemaType corresponding
> > > > > ProtobufNativeAdvancedSchemaCompatibilityCheck? (Seems to be more
> > > > trouble).
> > > > >
> > > > > Sorry I can not use the computer and network in the company, I use
> my
> > > > > mobile phone to reply to the email, the format may be a bit messy.
> > > Please
> > > > > understand.
> > > > >
> > > > > Thanks,
> > > > >
> > > > > sinan
> > > > >
> > > > >
> > > > > 丛搏 <bo...@apache.org> 于 2023年3月7日周二 下午11:39写道:
> > > > >
> > > > > > SiNan Liu <li...@gmail.com> 于2023年3月7日周二 13:22写道:
> > > > > > >
> > > > > > > Great to see your comment, bo!
> > > > > > >
> > > > > > > 1. The first way. The protobuf website has a description of the
> > > > rules,
> > > > > > but
> > > > > > > no plans to implement them.
> > > > > > > https://protobuf.dev/programming-guides/proto/#updating
> > > > > >
> > > > > > https://groups.google.com/g/protobuf
> > > > > > maybe ask here
> > > > > >
> > > > > > >
> > > > > > > 2. I think this PIP can be divided into two parts.
> > > > > > > (1) Add a flag(`ValidatorClassName`), load it into
> > > > > > > `ProtobufNativeSchemaCompatibilityCheck` when the broker
> starts.
> > > > > > > ValidatorClassName is empty by default, and the implementation
> > > > continues
> > > > > > as
> > > > > > > before, with no change for the user.
> > > > > >
> > > > > > `ProtobufNativeSchemaCompatibilityCheck` is a plugin in
> > `broker.conf`
> > > > > > ```
> > > > > >
> > > > > >
> > > >
> > >
> >
> schemaRegistryCompatibilityCheckers=org.apache.pulsar.broker.service.schema.JsonSchemaCompatibilityCheck,org.apache.pulsar.broker.service.schema.AvroSchemaCompatibilityCheck,org.apache.pulsar.broker.service.schema.ProtobufNativeSchemaCompatibilityCheck
> > > > > > ```
> > > > > > I do not recommend that we directly modify this plugin and
> continue
> > > to
> > > > > > add configuration items, which will cause trouble for users.
> > > > > > We have a lot of configs and it's getting very unwieldy.
> > > > > > in my opinion, we don't change
> > > > > >
> > > > > >
> > > >
> > >
> >
> `org.apache.pulsar.broker.service.schema.ProtobufNativeSchemaCompatibilityCheck`,
> > > > > > it is a simple implementation, it doesn't go wrong very often,
> most
> > > > > > users will use it. we can add another ProtobufNativeCheck named
> > > > > > `ProtobufNativeAdvancedSchemaCompatibilityCheck ` or other. in
> this
> > > > > > way, we don't need to add this flag. There is no need to consider
> > > > > > compatibility, it is just a plug-in and will not affect current
> > > logic.
> > > > > > If the user needs it, just change the plugin to the new
> > > implementation
> > > > > >
> > > > > > > ```java
> > > > > > >     ProtobufNativeSchemaValidator DEFAULT = (fromDescriptors,
> > > > > > toDescriptor)
> > > > > > > -> {
> > > > > > >         for (Descriptors.Descriptor fromDescriptor :
> > > > fromDescriptors) {
> > > > > > >             // The default implementation only checks if the
> root
> > > > message
> > > > > > > has changed.
> > > > > > >             if
> > > > > > >
> > (!fromDescriptor.getFullName().equals(toDescriptor.getFullName()))
> > > {
> > > > > > >                 throw new
> ProtoBufCanReadCheckException("Protobuf
> > > > root
> > > > > > > message isn't allow change!");
> > > > > > >             }
> > > > > > >         }
> > > > > > >     };
> > > > > > > ```
> > > > > > > `ValidatorClassName` value also can be set to the current
> > > > implementation
> > > > > > of
> > > > > > > PIP add
> > > > > > >
> > > > > >
> > > >
> > >
> >
> `org.apache.pulsar.broker.service.schema.validator.ProtobufNativeSchemaBreakValidatorImpl`.
> > > > > > >
> > > > > > > (2) Recoding the `ProtobufNativeSchemaCompatibilityCheck`.
> > Through
> > > > the
> > > > > > flag
> > > > > > > (`ValidatorClassName`) to build different
> > > > > > `ProtobufNativeSchemaValidator`.
> > > > > > > Isn't it just a plug-in? The user can develop and choose a
> > > different
> > > > > > > `ProtobufNativeSchemaValidator`. I think it didn't change the
> > > logic,
> > > > it
> > > > > > > just allowed him to expand it.
> > > > > > >
> > > > > > >
> > > > > > > I think this PIP should be an enhancement and supplement to the
> > > > function,
> > > > > > > and there is no such thing as unnecessary and meaningless.
> > > > > > >
> > > > > > >
> > > > > > > Thanks,
> > > > > > > sinan
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > 丛搏 <bo...@apache.org> 于2023年3月7日周二 11:53写道:
> > > > > > >
> > > > > > > > I think we have two ways to do that.
> > > > > > > >
> > > > > > > > First way: We need to advance the improvement of java in
> > > protobuf.
> > > > Ask
> > > > > > > > if they have plans to improve.
> > > > > > > >
> > > > > > > > Second way: the new PROTOBUF_NATIVE
> `SchemaCompatibilityCheck`
> > > > should
> > > > > > > > be implemented as a plugin, don't change any existing plugin
> > > logic
> > > > > > > > (it's simple and already used). I don't recommend adding
> flags
> > > for
> > > > > > > > rollback, it adds configuration and makes little sense.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Bo
> > > > > > > >
> > > > > > > > Asaf Mesika <as...@gmail.com> 于2023年3月6日周一 23:00写道:
> > > > > > > >
> > > > > > > > >
> > > > > > > > > Can you convert the code block which is actually a quote in
> > the
> > > > > > > > > beginning of the PIP to something which doesn't require to
> > > scroll
> > > > > > > > > horizontally so much?
> > > > > > > > > Use
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > >
> >
> https://docs.github.com/en/get-started/writing-on-github/getting-started-with-writing-and-formatting-on-github/basic-writing-and-formatting-syntax#quoting-text
> > > > > > > > >
> > > > > > > > > Let's improve the clarity of what you wrote:
> > > > > > > > >
> > > > > > > > > "the PROTOBUF uses avro struct to store."
> > > > > > > > > -->
> > > > > > > > > When Schema type PROTOBUF is used, Pulsar Client assumes
> the
> > > > object
> > > > > > given
> > > > > > > > > to it as message data is an auto-generated POJO containing
> > the
> > > > > > > > annotations
> > > > > > > > > encoding the schema. The client is using a converter, which
> > > > converts
> > > > > > a
> > > > > > > > > Protobuf schema descriptor into an Avro schema and sends
> that
> > > as
> > > > the
> > > > > > > > Schema
> > > > > > > > > of the producer/consumer.
> > > > > > > > >
> > > > > > > > > "On the broker side, protobuf and avro both use SchemaData
> > > > converted
> > > > > > to
> > > > > > > > > org.apache.avro.Schema."
> > > > > > > > > -->
> > > > > > > > > Since the schema is an Avro schema, the implementation of
> > > > > > compatibility
> > > > > > > > > check on the broker side is to simply re-use the
> > compatibility
> > > > check
> > > > > > of
> > > > > > > > the
> > > > > > > > > AVRO schema type.
> > > > > > > > >
> > > > > > > > > "ProtobufSchema is different from ProtobufNativeSchema in
> > > schema
> > > > > > > > > compatibility check it uses avro-protobuf.
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > >
> >
> https://central.sonatype.com/artifact/org.apache.avro/avro-protobuf/1.11.1/overview
> > > > > > > > > But the current implementation of ProtobufNative schema
> > > > compatibility
> > > > > > > > > check only
> > > > > > > > > checked if the root message name is changed."
> > > > > > > > >
> > > > > > > > > -->
> > > > > > > > > PROTOBUF_NATIVE schema type is different.
> > > > > > > > > The client is actually using Protobuf Descriptor as the
> > schema,
> > > > as
> > > > > > > > opposed
> > > > > > > > > to Avro schema of PROTOBUF schema type. In the broker, the
> > > > > > > > PROTOBUF_NATIVE
> > > > > > > > > compatibility check actually hasn't implemented any rule,
> > > besides
> > > > > > one:
> > > > > > > > > checking if the root message name has changed.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > >    1. For now, there is no official or third-party
> solution
> > > for
> > > > > > > > ProtoBuf
> > > > > > > > > >    compatibility. If in the future have better solutions
> > of a
> > > > third
> > > > > > > > party or
> > > > > > > > > >    the official, we develop new
> > ProtobufNativeSchemaValidator
> > > > and
> > > > > > use,
> > > > > > > > so
> > > > > > > > > >    add a flag.
> > > > > > > > > >
> > > > > > > > > > Who do you need to make that configurable? Once you
> found a
> > > > third
> > > > > > > > party,
> > > > > > > > > just switch to it? Who knows, maybe you never will.
> Introduce
> > > it
> > > > > > when you
> > > > > > > > > find it, not now.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > We improve in ProtobufNativeSchemaCompatibilityCheck
> > BACKWARD,
> > > > > > FORWARD
> > > > > > > > > > these strategies. As with the AVRO implementation,
> protobuf
> > > > > > > > compatibility
> > > > > > > > > > checking need implementing the canRead method. *This will
> > > check
> > > > > > that
> > > > > > > > > > the writtenschema can be read by readSchema.*
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > I completely disagree.
> > > > > > > > > Avro implementation is confusing for our use case. Don't
> copy
> > > > that.
> > > > > > > > >
> > > > > > > > > You have
> > > > > > > > >
> > > > > > > > > public void checkCompatible(SchemaData from, SchemaData to,
> > > > > > > > > SchemaCompatibilityStrategy strategy)
> > > > > > > > >         throws IncompatibleSchemaException {
> > > > > > > > >     Descriptor fromDescriptor =
> > > > > > > > > ProtobufNativeSchemaUtils.deserialize(from.getData());
> > > > > > > > >     Descriptor toDescriptor =
> > > > > > > > > ProtobufNativeSchemaUtils.deserialize(to.getData());
> > > > > > > > >     switch (strategy) {
> > > > > > > > >         case BACKWARD_TRANSITIVE:
> > > > > > > > >         case BACKWARD:
> > > > > > > > >         case FORWARD_TRANSITIVE:
> > > > > > > > >         case FORWARD:
> > > > > > > > >         case FULL_TRANSITIVE:
> > > > > > > > >         case FULL:
> > > > > > > > >             checkRootMessageChange(fromDescriptor,
> > > toDescriptor,
> > > > > > > > strategy);
> > > > > > > > >             return;
> > > > > > > > >         case ALWAYS_COMPATIBLE:
> > > > > > > > >             return;
> > > > > > > > >         default:
> > > > > > > > >             throw new IncompatibleSchemaException("Unknown
> > > > > > > > > SchemaCompatibilityStrategy.");
> > > > > > > > >     }
> > > > > > > > > }
> > > > > > > > >
> > > > > > > > > I would rename :
> > > > > > > > > from --> currentSchema
> > > > > > > > > to --> newSchema
> > > > > > > > >
> > > > > > > > > Use that switch case and have a method for each like:
> > > > > > > > > validateBackwardsCompatibility(currentSchema, newSchema)
> > > > > > > > >
> > > > > > > > > I dislike canRead and usage of writtenSchema, since you
> have
> > > two
> > > > > > > > completely
> > > > > > > > > different use cases: from the producing side and the
> consumer
> > > > side.
> > > > > > > > >
> > > > > > > > > schemaValidatorBuilder
> > > > > > > > > >
> > > > > > > > > > I dislike this proposal. IMO Avro implementation is way
> too
> > > > > > > > complicated.
> > > > > > > > > Why not have a simple function for validation for each
> switch
> > > > case
> > > > > > above?
> > > > > > > > > Why do we need strategy and builder, and all this
> complexity?
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > *Here are the basic compatibility rules we've defined:*
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > IMO it's impossible to read the validation rules as you
> > > described
> > > > > > them.
> > > > > > > > > I wrote how they should be structured numerous times above.
> > > > > > > > > I can't validate them.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > IMO, the current design is very hard to read.
> > > > > > > > > Please try to avoid jumping into code sections.
> > > > > > > > > Write a high level design section, in which you describe in
> > > words
> > > > > > what
> > > > > > > > you
> > > > > > > > > plan to do.
> > > > > > > > > Write the validation rules in the structure that is easy to
> > > > > > understand:
> > > > > > > > > rules per each compatibility check, and use proper words
> > > (current
> > > > > > schema,
> > > > > > > > > new schema), since new schema can be once used for read and
> > > once
> > > > > > used for
> > > > > > > > > write.
> > > > > > > > >
> > > > > > > > > In its current form it takes too much time to understand
> the
> > > > design,
> > > > > > and
> > > > > > > > it
> > > > > > > > > shouldn't be the case.
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > >
> > > > > > > > > Asaf
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Sun, Mar 5, 2023 at 3:58 PM SiNan Liu <
> > > liusinan1998@gmail.com
> > > > >
> > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi! I updated the explanation of some things in the PIP
> > > issue.
> > > > And
> > > > > > also
> > > > > > > > > > added a new “flag” in the conf is used as the different
> > > > > > > > > > ProtobufNativeSchemaValidator implementation, also set
> > > > > > > > > > ProtobufNativeSchemaValidator default only check whether
> > the
> > > > name
> > > > > > of
> > > > > > > > the
> > > > > > > > > > root message is the same.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Thanks,
> > > > > > > > > > sinan
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Asaf Mesika <as...@gmail.com> 于2023年3月5日周日
> 20:21写道:
> > > > > > > > > >
> > > > > > > > > > > On Wed, Mar 1, 2023 at 4:33 PM SiNan Liu <
> > > > liusinan1998@gmail.com
> > > > > > >
> > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > Can you please explain how a Protobuf Schema
> > descriptor
> > > > can
> > > > > > be
> > > > > > > > > > > validated
> > > > > > > > > > > > > for backward compatibility check using Avro based
> > > > > > compatibility
> > > > > > > > > > rules?
> > > > > > > > > > > > > Doesn't it expect the schema to be Avro, but it is
> > > > actually a
> > > > > > > > > > Protobuf
> > > > > > > > > > > > > descriptor?
> > > > > > > > > > > > > Is there some translation happening?
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > 1. *You can take a quick look at the previous design,
> > the
> > > > > > PROTOBUF
> > > > > > > > uses
> > > > > > > > > > > > avro struct to store.*
> > > > > > > > > > > > https://github.com/apache/pulsar/pull/1954
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > >
> >
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/ProtobufSchema.java#L59-L61
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > >
> >
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/ProtobufSchema.java#L110-L115
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Ok. So to summarize your code (easier to write it than
> > send
> > > > > > links):
> > > > > > > > > > > * Pulsar Client, when used with Protobuf Schema,
> actually
> > > > > > converts
> > > > > > > > the
> > > > > > > > > > > Protobuf descriptor into an Avro Schema (using code
> found
> > > > inside
> > > > > > Avro
> > > > > > > > > > > library) and saves that Avro schema as the schema. It's
> > not
> > > > > > saving
> > > > > > > > the
> > > > > > > > > > > protobuf descriptor at all. Very confusing I have to
> add
> > -
> > > > never
> > > > > > > > expected
> > > > > > > > > > > that.
> > > > > > > > > > > This explains why In the
> ProtobufSchemaCompatibilityCheck
> > > > they
> > > > > > just
> > > > > > > > > > extend
> > > > > > > > > > > the Avro without doing any translation.
> > > > > > > > > > >
> > > > > > > > > > > Thanks for that.
> > > > > > > > > > >
> > > > > > > > > > > Now thatI finally understand this, I can say that: you
> > > *must*
> > > > > > explain
> > > > > > > > > > that
> > > > > > > > > > > in the motivation part in your PIP.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > 2. *On the broker side, protobuf and avro both use
> > > > `SchemaData`
> > > > > > > > > > converted
> > > > > > > > > > > > to `org.apache.avro.Schema`.*
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > >
> >
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/ServerCnx.java#L1280-L1293
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > >
> >
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/schema/ProtobufSchemaCompatibilityCheck.java#L26-L31
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > >
> >
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/schema/AvroSchemaBasedCompatibilityCheck.java#L47-L70
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Actually those links don't really help.
> > > > > > > > > > > The main link that helps is:
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > >
> >
> https://github.com/apache/pulsar/blob/ec102fb024a6ea2b195826778300f20e330dff06/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/ProtobufSchema.java#L102-L122
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > I'm sorry - I don't understand.
> > > > > > > > > > > > > I understand the different compatibility check
> > > > strategies.
> > > > > > > > > > > > > If you just spell them out here, then as you say,
> > just
> > > > > > translate
> > > > > > > > the
> > > > > > > > > > > > > Protobuf Descriptor into an Avro schema and run the
> > > Avro
> > > > > > > > > > > > > compatibility validation, no?
> > > > > > > > > > > > > I believe the answer is no, since you may want to
> > > verify
> > > > > > > > different
> > > > > > > > > > > things
> > > > > > > > > > > > > when it comes to Protobuf, which are different then
> > > Avro.
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > 1.
> > > > > > > > > > > > *ProtobufSchema is different from
> ProtobufNativeSchema
> > in
> > > > that
> > > > > > it
> > > > > > > > uses
> > > > > > > > > > > > avro-protobuf.*
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > >
> >
> https://central.sonatype.com/artifact/org.apache.avro/avro-protobuf/1.11.1/overview
> > > > > > > > > > > > *ProtobufNativeSchema needs a native compatibility
> > check,
> > > > but
> > > > > > > > there is
> > > > > > > > > > no
> > > > > > > > > > > > official or third party implementation. So this PIP
> > does
> > > > not
> > > > > > use
> > > > > > > > > > > > avro-protobuf for protobuf compatibility checking.*
> > > > > > > > > > > >
> > > > > > > > > > > > 2. *By the way, this is implemented in much the same
> > way
> > > > that
> > > > > > > > Apache
> > > > > > > > > > avro
> > > > > > > > > > > > does compatibility checking.*
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > >
> >
> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/SchemaValidatorBuilder.java
> > > > > > > > > > > >
> > > `canReadStrategy`,`canBeReadStrategy`,`mutualReadStrategy`
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > >
> >
> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/ValidateCanRead.java
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > >
> >
> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/ValidateCanBeRead.java
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > >
> >
> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/ValidateMutualRead.java
> > > > > > > > > > > > *In `ValidateMutualRead.java`, the arguments of
> > > > `canRead()` are
> > > > > > > > > > > > writtenSchema and readSchema. We only need to change
> > the
> > > > order
> > > > > > of
> > > > > > > > > > > arguments
> > > > > > > > > > > > we pass to `canRead()`.*
> > > > > > > > > > > > ```java
> > > > > > > > > > > > private void
> > validateWithStrategy(Descriptors.Descriptor
> > > > > > > > toValidate,
> > > > > > > > > > > > Descriptors.Descriptor fromDescriptor) throws
> > > > > > > > > > > ProtoBufCanReadCheckException
> > > > > > > > > > > > {
> > > > > > > > > > > > switch (strategy) {
> > > > > > > > > > > > case CanReadExistingStrategy ->
> canRead(fromDescriptor,
> > > > > > > > toValidate);
> > > > > > > > > > > > case CanBeReadByExistingStrategy ->
> canRead(toValidate,
> > > > > > > > > > fromDescriptor);
> > > > > > > > > > > > case CanBeReadMutualStrategy -> {
> > > > > > > > > > > > canRead(toValidate, fromDescriptor);
> > > > > > > > > > > > canRead(fromDescriptor, toValidate);
> > > > > > > > > > > > }
> > > > > > > > > > > > }
> > > > > > > > > > > > }
> > > > > > > > > > > >
> > > > > > > > > > > > private void canRead(Descriptors.Descriptor
> > > writtenSchema,
> > > > > > > > > > > > Descriptors.Descriptor readSchema) throws
> > > > > > > > > > ProtoBufCanReadCheckException {
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > >
> >
> ProtobufNativeSchemaBreakCheckUtils.checkSchemaCompatibility(writtenSchema,
> > > > > > > > > > > > readSchema);
> > > > > > > > > > > > }
> > > > > > > > > > > > ```
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > I get that you want to take inspiration from the
> existing
> > > > Avro
> > > > > > Schema
> > > > > > > > > > > compatibility check, to do your code design.
> > > > > > > > > > > I also understand you *won't* use any existing avro
> code
> > > for
> > > > > > that.
> > > > > > > > > > > I also understand, you have to write the validation
> check
> > > on
> > > > your
> > > > > > > > own,
> > > > > > > > > > > since there is no 3rd party to explain that.
> > > > > > > > > > >
> > > > > > > > > > > The only thing I can't understand are the actual rules
> > as I
> > > > wrote
> > > > > > > > before,
> > > > > > > > > > > since they are written confusingly.
> > > > > > > > > > > So, I repeat what I asked before:
> > > > > > > > > > >
> > > > > > > > > > > I think you should structure the validation rules
> > > > differently:
> > > > > > > > > > >
> > > > > > > > > > > * Backward checks
> > > > > > > > > > > ** List down rules, where use newSchema (the schema
> used
> > by
> > > > > > producer
> > > > > > > > or
> > > > > > > > > > > consumer) and existingSchema (last schema used)
> > > > > > > > > > > * Forward
> > > > > > > > > > > ** List down rules, where use newSchema (the schema
> used
> > by
> > > > > > producer
> > > > > > > > or
> > > > > > > > > > > consumer) and existingSchema (last schema used)
> > > > > > > > > > >
> > > > > > > > > > > Once that's accomplished I will be able to understand
> the
> > > > > > different
> > > > > > > > > > > validation rules for each compatibility check.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks,
> > > > > > > > > > > > sinan
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Asaf Mesika <as...@gmail.com> 于2023年3月1日周三
> > > 21:19写道:
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Mon, Feb 27, 2023 at 3:47 PM SiNan Liu <
> > > > > > > > liusinan1998@gmail.com>
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I read it and they look identical. What's the
> > > > difference
> > > > > > > > between
> > > > > > > > > > > > them?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Current avro,json, and protobuf schemas are all
> > > > implemented
> > > > > > > > based
> > > > > > > > > > on
> > > > > > > > > > > > AVRO.
> > > > > > > > > > > > > > > What do you mean, they are all implemented
> based
> > on
> > > > > > Avro? You
> > > > > > > > > > mean
> > > > > > > > > > > > the
> > > > > > > > > > > > > > > protobuf schema is converted into an Avro
> Schema,
> > > and
> > > > > > then
> > > > > > > > you
> > > > > > > > > > use
> > > > > > > > > > > > Avro
> > > > > > > > > > > > > > > compatibility validation?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > >
> >
> `org.apache.pulsar.broker.service.schema.ProtobufSchemaCompatibilityCheck`
> > > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > >
> > > >
> `org.apache.pulsar.broker.service.schema.AvroSchemaCompatibilityCheck`
> > > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > >
> > > >
> `org.apache.pulsar.broker.service.schema.JsonSchemaCompatibilityCheck`
> > > > > > > > > > > > > > They all extends
> > `AvroSchemaBasedCompatibilityCheck`,
> > > > the
> > > > > > > > > > > > > > `checkCompatible()` is the same implementation
> with
> > > > `AVRO`.
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > Can you please explain how a Protobuf Schema
> > descriptor
> > > > can
> > > > > > be
> > > > > > > > > > > validated
> > > > > > > > > > > > > for backward compatibility check using Avro based
> > > > > > compatibility
> > > > > > > > > > rules?
> > > > > > > > > > > > > Doesn't it expect the schema to be Avro, but it is
> > > > actually a
> > > > > > > > > > Protobuf
> > > > > > > > > > > > > descriptor?
> > > > > > > > > > > > > Is there some translation happening?
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I think you should structure the validation rules
> > > > > > differently:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > The Compatibility check strategy is described on
> > the
> > > > > > website
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > >
> >
> https://pulsar.apache.org/docs/next/schema-understand/#schema-compatibility-check-strategy
> > > > > > > > > > > > > > 1. BACKWARD(CanReadExistingStrategy): Consumers
> > using
> > > > > > schema
> > > > > > > > V3 can
> > > > > > > > > > > > process
> > > > > > > > > > > > > > data written by producers using the last schema
> > > version
> > > > > > V2. So
> > > > > > > > V2
> > > > > > > > > > is
> > > > > > > > > > > > > > "writtenSchema" and V3 is "readSchema".
> > > > > > > > > > > > > > 2. FORWARD(CanBeReadByExistingStrategy):
> Consumers
> > > > using
> > > > > > the
> > > > > > > > last
> > > > > > > > > > > > schema
> > > > > > > > > > > > > > version V2 can process data written by producers
> > > using
> > > > a
> > > > > > new
> > > > > > > > schema
> > > > > > > > > > > V3,
> > > > > > > > > > > > > > even though they may not be able to use the full
> > > > > > capabilities
> > > > > > > > of
> > > > > > > > > > the
> > > > > > > > > > > > new
> > > > > > > > > > > > > > schema. So V3 is "writtenSchema" and V2 is
> > > > "readSchema".
> > > > > > > > > > > > > > 3. FULL(CanBeReadMutualStrategy): Schemas are
> both
> > > > > > backward and
> > > > > > > > > > > forward
> > > > > > > > > > > > > > compatible.
> > > > > > > > > > > > > > Schema can evolve. The old version schema and the
> > new
> > > > > > version
> > > > > > > > > > schema
> > > > > > > > > > > > should
> > > > > > > > > > > > > > be well understood.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > I'm sorry - I don't understand.
> > > > > > > > > > > > > I understand the different compatibility check
> > > > strategies.
> > > > > > > > > > > > > If you just spell them out here, then as you say,
> > just
> > > > > > translate
> > > > > > > > the
> > > > > > > > > > > > > Protobuf Descriptor into an Avro schema and run the
> > > Avro
> > > > > > > > > > > > > compatibility validation, no?
> > > > > > > > > > > > > I believe the answer is no, since you may want to
> > > verify
> > > > > > > > different
> > > > > > > > > > > things
> > > > > > > > > > > > > when it comes to Protobuf, which are different then
> > > Avro.
> > > > > > > > > > > > >
> > > > > > > > > > > > > At the current state, I can't understand your
> design
> > at
> > > > all.
> > > > > > > > Please
> > > > > > > > > > > help
> > > > > > > > > > > > > clarify that.
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > So each strategy should have its own section.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > The arguments of `canRead()` are writtenSchema
> and
> > > > > > readSchema.
> > > > > > > > As
> > > > > > > > > > > we've
> > > > > > > > > > > > > > just described, we only need to change the order
> of
> > > > > > arguments
> > > > > > > > we
> > > > > > > > > > pass
> > > > > > > > > > > > to
> > > > > > > > > > > > > > `canRead()`.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > sinan
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Asaf Mesika <as...@gmail.com>
> 于2023年2月27日周一
> > > > 20:49写道:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > And you can see the difference between
> ProtoBuf
> > > and
> > > > > > > > > > > ProtoBufNative:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > >
> > > >
> https://pulsar.apache.org/docs/next/schema-get-started/#protobufnative
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > >
> > > > https://pulsar.apache.org/docs/next/schema-get-started/#protobuf
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >  I read it and they look identical. What's the
> > > > difference
> > > > > > > > between
> > > > > > > > > > > > them?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Current avro,json, and protobuf schemas are all
> > > > > > implemented
> > > > > > > > based
> > > > > > > > > > > on
> > > > > > > > > > > > > > AVRO.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > What do you mean, they are all implemented
> based
> > on
> > > > > > Avro? You
> > > > > > > > > > mean
> > > > > > > > > > > > the
> > > > > > > > > > > > > > > protobuf schema is converted into an Avro
> Schema,
> > > and
> > > > > > then
> > > > > > > > you
> > > > > > > > > > use
> > > > > > > > > > > > Avro
> > > > > > > > > > > > > > > compatibility validation?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > *Here are the basic compatibility rules we've
> > > > defined:*
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I think you should structure the validation
> rules
> > > > > > > > differently:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > * Backward checks
> > > > > > > > > > > > > > > ** List down rules, where use newSchema (the
> > schema
> > > > used
> > > > > > by
> > > > > > > > > > > producer
> > > > > > > > > > > > or
> > > > > > > > > > > > > > > consumer) and existingSchema (last schema used)
> > > > > > > > > > > > > > > * Forward
> > > > > > > > > > > > > > > ** List down rules, where use newSchema (the
> > schema
> > > > used
> > > > > > by
> > > > > > > > > > > producer
> > > > > > > > > > > > or
> > > > > > > > > > > > > > > consumer) and existingSchema (last schema used)
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > So each strategy should have its own section.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I'm saying this since you used "writttenSchema"
> > > word
> > > > but
> > > > > > it
> > > > > > > > > > > > represents
> > > > > > > > > > > > > > > something completely different if it's backward
> > or
> > > > > > forward
> > > > > > > > check.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Once you'll have that structure like that, I
> > > > personally
> > > > > > will
> > > > > > > > be
> > > > > > > > > > > able
> > > > > > > > > > > > to
> > > > > > > > > > > > > > > read and understand it.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > The motivation and problem statement are now
> > good -
> > > > > > thanks
> > > > > > > > for
> > > > > > > > > > > > improving
> > > > > > > > > > > > > > > it.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Mon, Feb 27, 2023 at 8:20 AM SiNan Liu <
> > > > > > > > > > liusinan1998@gmail.com>
> > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Hi! I updated the PIP issue again. This time
> > I've
> > > > added
> > > > > > > > some
> > > > > > > > > > > > background
> > > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > some explanations.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > The compatibility check rules are already
> > written
> > > > in
> > > > > > the
> > > > > > > > > > > > > > Implementation.
> > > > > > > > > > > > > > > > ProtoBufNative implements the same canRead
> > method
> > > > as
> > > > > > Apache
> > > > > > > > > > Avro.
> > > > > > > > > > > > > > > > It does this by checking whether the schema
> for
> > > > > > writing and
> > > > > > > > > > > reading
> > > > > > > > > > > > is
> > > > > > > > > > > > > > > > compatible. I also indicate whether the
> > > > writtenSchema
> > > > > > and
> > > > > > > > > > > > readSchema of
> > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > Backward, Forward, and Full strategies are
> the
> > > old
> > > > or
> > > > > > the
> > > > > > > > new
> > > > > > > > > > > > version
> > > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > the schema.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > > sinan
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Asaf Mesika <as...@gmail.com>
> > > 于2023年2月26日周日
> > > > > > 23:24写道:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > I'm sorry, but this PIP lacks a lot of
> > > background
> > > > > > > > knowledge,
> > > > > > > > > > so
> > > > > > > > > > > > you
> > > > > > > > > > > > > > > need
> > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > add IMO for people to understand it. You
> > don't
> > > > need
> > > > > > to
> > > > > > > > > > explain
> > > > > > > > > > > > the
> > > > > > > > > > > > > > > entire
> > > > > > > > > > > > > > > > > pulsar in this PIP, but at the very least a
> > few
> > > > > > > > paragraphs
> > > > > > > > > > > > detailing
> > > > > > > > > > > > > > > all
> > > > > > > > > > > > > > > > > you need to know, to put you in context:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >    - Start by saying Pulsar as a built-in
> > > schema
> > > > > > registry
> > > > > > > > > > > inside
> > > > > > > > > > > > > > Pulsar
> > > > > > > > > > > > > > > > >    broker.
> > > > > > > > > > > > > > > > >       - Every time the client updates the
> > > > schema, it
> > > > > > > > uploads
> > > > > > > > > > it
> > > > > > > > > > > > to
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > >       broker. When that happens, it has a
> > > feature
> > > > > > which
> > > > > > > > > > > validates
> > > > > > > > > > > > if
> > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > new
> > > > > > > > > > > > > > > > >       schema version is compatible with the
> > > > previous
> > > > > > > > > > versions.
> > > > > > > > > > > > There
> > > > > > > > > > > > > > > > > are 4 types
> > > > > > > > > > > > > > > > >       of compatibility: Full, ... (complete
> > and
> > > > > > explain
> > > > > > > > each
> > > > > > > > > > > one
> > > > > > > > > > > > > > > briefly)
> > > > > > > > > > > > > > > > >    - Also explain Pulsar Schema registry
> > > supports
> > > > > > various
> > > > > > > > > > > schema
> > > > > > > > > > > > > > > > >    protocols:  Avro, protobuf native, ...
> > > > (complete
> > > > > > the
> > > > > > > > > > rest),
> > > > > > > > > > > > each
> > > > > > > > > > > > > > > > > protocol
> > > > > > > > > > > > > > > > >    has a schema which dictates how to
> > serialize
> > > > and
> > > > > > > > > > deserialize
> > > > > > > > > > > > the
> > > > > > > > > > > > > > > > message
> > > > > > > > > > > > > > > > >    content into typed object.
> > > > > > > > > > > > > > > > >    - Explain in short what is protobuf
> native
> > > > > > (compare
> > > > > > > > > > protobuf
> > > > > > > > > > > > > > > > non-native)
> > > > > > > > > > > > > > > > >    - Please don't paste code instead of
> > > > explaining.
> > > > > > > > > > > > > > > > >       - Explain that protobuf native
> current
> > > > > > validation
> > > > > > > > check
> > > > > > > > > > > is
> > > > > > > > > > > > only
> > > > > > > > > > > > > > > > >       composed of checking the root message
> > > name
> > > > is
> > > > > > the
> > > > > > > > same
> > > > > > > > > > > > between
> > > > > > > > > > > > > > > > > the current
> > > > > > > > > > > > > > > > >       schema version and the new version.
> > > > > > > > > > > > > > > > >          - Explain briefly what is a root
> > > message
> > > > > > and its
> > > > > > > > > > name.
> > > > > > > > > > > > > > > > >       - Explain the problem (list
> scenarios)
> > > > that we
> > > > > > have
> > > > > > > > > > > because
> > > > > > > > > > > > > > > > protobuf
> > > > > > > > > > > > > > > > >       native schema only supports FULL
> > > > compatibility
> > > > > > > > > > > validation.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Regarding high level design - as in what
> you
> > > > plan to
> > > > > > do.
> > > > > > > > > > > > > > > > > I suggest you add "High Level Design" and
> in
> > it
> > > > > > detail
> > > > > > > > how
> > > > > > > > > > you
> > > > > > > > > > > > plan
> > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > validate, per protobuf version, per
> > > compatibility
> > > > > > check
> > > > > > > > > > > > (backward,
> > > > > > > > > > > > > > > > forward,
> > > > > > > > > > > > > > > > > full,...).
> > > > > > > > > > > > > > > > > I tried reading the implementation - for
> me ,
> > > > it's
> > > > > > all
> > > > > > > > over
> > > > > > > > > > the
> > > > > > > > > > > > > > place.
> > > > > > > > > > > > > > > > Can
> > > > > > > > > > > > > > > > > you please list in order what I wrote
> above,
> > > and
> > > > > > list the
> > > > > > > > > > > > validation
> > > > > > > > > > > > > > > > rules
> > > > > > > > > > > > > > > > > with a good explanation why you validate it
> > > like
> > > > > > that?
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Lastly, one you have all the validation
> rules
> > > > clearly
> > > > > > > > stated,
> > > > > > > > > > > you
> > > > > > > > > > > > can
> > > > > > > > > > > > > > > use
> > > > > > > > > > > > > > > > > it to document it properly so users can
> know
> > > what
> > > > > > > > validation
> > > > > > > > > > to
> > > > > > > > > > > > > > expect.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Asaf
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > On Wed, Feb 22, 2023 at 5:10 PM SiNan Liu <
> > > > > > > > > > > > liusinan1998@gmail.com>
> > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Sorry, my mistake. I removed the code and
> > > > > > described the
> > > > > > > > > > > design
> > > > > > > > > > > > to
> > > > > > > > > > > > > > > > improve
> > > > > > > > > > > > > > > > > > the PROTOBUF_NATIVE schema compatibility
> > > > checks.
> > > > > > You
> > > > > > > > can
> > > > > > > > > > > have a
> > > > > > > > > > > > > > look.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Asaf Mesika <as...@gmail.com>
> > > > 于2023年2月22日周三
> > > > > > > > 21:16写道:
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > I read it but you're almost directly
> > diving
> > > > into
> > > > > > the
> > > > > > > > > > code -
> > > > > > > > > > > > it
> > > > > > > > > > > > > > will
> > > > > > > > > > > > > > > > > take
> > > > > > > > > > > > > > > > > > me
> > > > > > > > > > > > > > > > > > > hours just to reverse engineer your
> > design.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Can you please include a "High Level
> > > Design"
> > > > > > section
> > > > > > > > in
> > > > > > > > > > > which
> > > > > > > > > > > > you
> > > > > > > > > > > > > > > > > explain
> > > > > > > > > > > > > > > > > > > how you plan to tackle any issue?
> > > > > > > > > > > > > > > > > > > If I can read that section and explain
> to
> > > > someone
> > > > > > > > else
> > > > > > > > > > how
> > > > > > > > > > > > this
> > > > > > > > > > > > > > > will
> > > > > > > > > > > > > > > > > > work,
> > > > > > > > > > > > > > > > > > > it means the section is complete.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Let's leave the code to the PRs.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > On Sun, Feb 19, 2023 at 2:59 PM SiNan
> > Liu <
> > > > > > > > > > > > > > liusinan1998@gmail.com>
> > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Hi all,
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > I made a PIP to discuss:
> > > > > > > > > > > > > > > > >
> > https://github.com/apache/pulsar/issues/19565
> > > > > > > > > > > > > > > > > > .
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > We can talk about the current design
> > > here.
> > > > > > > > Especially
> > > > > > > > > > for
> > > > > > > > > > > > the
> > > > > > > > > > > > > > > field
> > > > > > > > > > > > > > > > > > type
> > > > > > > > > > > > > > > > > > > > change check rules, please give your
> > > > valuable
> > > > > > > > advice.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > > > > > > Sinan
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] PIP-246: Improved PROTOBUF_NATIVE schema compatibility checks without using avro-protobuf

Posted by Asaf Mesika <as...@gmail.com>.
Hi Sinan,

The doc looks much better!

I have a few additional comments:

Pasting comment from previous emails:

Can you convert the code block which is actually a quote in the
beginning of the PIP to something which doesn't require to scroll
horizontally so much?
Use
https://docs.github.com/en/get-started/writing-on-github/getting-started-with-writing-and-formatting-on-github/basic-writing-and-formatting-syntax#quoting-text

*Validation Rules*

(1) If the root message names of writtenSchema and readSchema are
> different, then incompatible.

Why? The root message name is not written over the wire to the best of my
knowledge. I haven't found it written in the official doc.


>    - The writtenSchema cannot add required fields, but optional or
>    duplicate fields can be added (The field number must be new).
>
> That's not true.
You can have a required field in writeSchema, and not have that field in
readSchema (based on tag number).

The writtenSchema can not change the field number of any field in
readSchema (the
> field name is the same, but the field number is different).

You have to take into account field type as well when comparing.

If I have in readSchema
int32 justAnID = 1
int32 customerId = 2

and in writeSchema I have
int32 justAnID = 1
string customerId = 3

This is valid.

(4) The writtenSchema can not change the field name of any field in
> readSchema (the field number is the same, but the field name is
> different).

This is incorrect.
Fields names are encoded into the wire. I don't see this in any best
practice.

) The writtenSchema does not change the field name and number, but it does
> change the field type.
>
>    - If the field type is ENUM or MESSAGE, the schema is not compatible
>    when the type name is changed
>    - If the type of the field is another type. The schemas under this
>    rule are not incompatible, but warn the user.(There is another way of
>    testing in PIP issue Alternatives)
>
> Small correction: for the same field number you are not allowed to change
types. Name is irrelevant.

(6) The writtenSchema removes fields that do not have default values in
> readSchema. Then the schema is incompatible.

Protobuf gives you its own default if you don't supply one. This is
incorrect.


*Rules that you don't have in the doc, but should IMO*
* There can not be a field which exists both in readSchema and writeSchema,
with same tag number, having different default values
* There can't be a field in writerSchema, that exists in readerSchema (tag
number based), which in writerSchema is repeated and its type is Scalar (
https://protobuf.dev/programming-guides/proto/#scalar) but in readSchema it
is not repeated anymore.
* There can't be a field in writerSchema, that exists in readerSchema (tag
number based), which in writerSchema its type is scalar, but in readSchema
its type is scalar, it's repeated but with packed=true.

*Rules you have , but I would phrase a bit differently*

I would remove the proto2/proto3 sections, since they only differ in 1
rule, and just mention that distinction inside that rule (less work for the
reade).

* readSchema has a field which doesn't exist in writerSchema (based on tag
number).
   * Proto v2:
      * That field must be `optional` or `repeated` (must not be `required`)
   * Proto v3:
      * No problem.
* There can not be a field which exists both in readSchema and writeSchema,
with the same tag number, but having different types.

*Motivation*

Basically in the motivation section you want people to understand the
following:

Pulsar has built-in support for typed messages. It allows specifying an
encoding scheme and its matching schema.
For example, it supports Avro. You specify a schema for a given topic,
using Avro Schema Definition (i.e. a JSON describing the schema). Everytime
you produce a message, you first declare the schema definition you wish to
use for your messages. The message data should be an avro-encoded binary
data (which the client in some SDKs helps encode a given
data-structure/object).
The same applies when you consume a message. You first specify the schema
you use to read the messages, and the client in some SDKs helps by decoding
the message binary data into an object/data-structure.

Each time you specify a schema to be used, either by a producer or a
consumer, the schema is persisted in Pulsar and given an increasing version
number. If the schema was the same as the previous version, it is not
saved. When the message is persisted, the version number is encoded in the
message headers.

Pulsar provides a very useful feature named Schema Evolution
<https://pulsar.apache.org/docs/2.11.x/schema-understand/#schema-evolution>.
It allows us to check if a new schema version is compatible with previous
versions or versions. When you configure the schema for the topic you
decide the strategy to use for doing the validation check. The strategies
validate the following:

   - BACKWARD strategy
      - A consumer with newSchema can read a message written using
      existingSchema
   - BACKWARD_TRANSITIVE strategy
      - A consumer with newSchema can read messages written using all
      existingSchema
   - FORWARD
      - A consumer with existingSchema can read messages written using
      newSchema
   - FORWARD_TRANSITIVE
      - A consumer defined with any of the existingSchema can read messages
      written using newSchema
   - FULL
      - A consumer defined with newSchema can read messages written using
      existingSchema
      - A consumer defined with existingSchema can read messages written
      using newSchema
   - FULL_TRANSITIVE
      - A consumer defined with newSchema can read messages written using
      any of the existingSchema
      - A consumer defined with any of the existingSchema can read messages
      written using newSchema


Aside from Avro, Pulsar also has two additional supported encodings:
PROTOBUF and PROTOBUF_NATIVE.

PROTOBUF is a bit strange. It encodes the messages using Protobuf encoding,
but the schema that is persisted to Pulsar is *not* Protobuf Descriptor as
you would have expected. The saved schema is a translation of the Protobuf
Descriptor to an Avro Schema, so in fact an Avro schema definition is saved
as the schema.

PROTOBUF_NATIVE was created to fix that shortcoming, by actually persisting
the Protobuf Descriptor and using Protobuf for encoding.
The problem is that the authors of PROTOBUF_NATIVE haven't completed it
fully, and the backward compatibility validation code almost does not
exist: It only checks if the root message name is the same between versions.

GOALS
The goal of this PIP is to amend PROTOBUF_NATIVE by adding a fully
functional validation for any of the defined Schema Compatibility
Strategies.
A secondary goal is to allow the user to choose between different
implementations: The new fully functional validation or the existing
barebones validation.

-------- END

I'm ok with having links in the Motivation , as *further reading*.
I'm against stacking up work for the reader to go read 5-6 different links
just to understand the motivation and background knowledge required to
understand the feature.

I'm against putting code in the Motivation. Especially if it is supposed to
replace description in plain English making it easy to understand the
design.
Leave the code to the motivation.
Paste code only if you absolutely can't use plain old descriptions to
explain.



On Sat, Mar 11, 2023 at 11:46 AM SiNan Liu <li...@gmail.com> wrote:

> *I guess that's right, too! *
>
> But the name `ProtobufNativeAdvancedSchemaCompatibilityCheck` is better,
> because we don't know whether the future will have V2, V3. The official
> solution can be called `ProtobufNativeOfficialSchemaCompatibilityCheck`, or
> is a good `ProtobufNativeXXXXXXXXSchemaCompatibilityCheck` third-party
> solution.
>
> I've updated my design in PIP issue.
> 1. A new ProtobufNativeSchemaAdvanceCompatibilityCheck, rather than a
> ProtobufNativeSchemaCompatibilityCheck different validator implementation.
> 2. Remove the 'builder'
> 3. Clarify the relationship between newSchema, existingSchema, and
> writtenSchema in canRead.
>
> Help to see if the description is comprehensive and what changes and
> improvements need to be made.
>
> Thanks,
> sinan
>
>
>
> Asaf Mesika <as...@gmail.com> 于2023年3月9日周四 17:35写道:
>
> > I like Bo's suggestion - I haven't realized each schema type
> > compatibility check is actually a plugin.
> > It makes sense for any schema type checks to evolve, sometimes in a
> > non-backward compatible way hence having two plugins like
> > protobufNativeSchemaCompatabilityCheckV1 and then
> > protobufNativeSchemaCompatabilityCheckV2 and then
> > protobufNativeSchemaCompatabilityCheckV3 makes sense to me.
> >
> >
> >
> > On Thu, Mar 9, 2023 at 5:49 AM 丛搏 <bo...@apache.org> wrote:
> >
> > >  Hi siNan:
> > >
> > > From my point of view, it is just a plug-in. I don't think it is
> > > necessary to add configuration for the plugin.
> > > This is meaningless, and it will increase the difficulty of use for
> > users.
> > >
> > >
> > > SiNan Liu <li...@gmail.com> 于2023年3月8日周三 15:54写道:
> > > >
> > > > Hi, bo.
> > > >
> > > > 1. I understand what you say, to develop a new
> > > > `ProtobufNativeAdvancedSchemaCompatibilityCheck`, rather than
> changing
> > > > existing `ProtobufNativeSchemaCompatibilityCheck`. But I found a few
> > > small
> > > > problems:
> > > >
> > > > (1)ProtobufNativeAdvancedSchemaCompatibilityCheck and
> > > > ProtobufNativeSchemaCompatibilityCheck schemaType is PROTOBUF_NATIVE.
> > It
> > > > looks like both checkers are PROTOBUF not using AVRO-PROTOBUF's
> > "native"
> > > > implementation, which leads to some problems or "unreasonable" and
> > gives
> > > me
> > > > some extended thinking and questions.
> > > >
> > > `CompatibilityCheck ` its only a plugin.
> > > `ProtobufNativeSchemaCompatibilityCheck` may sooner or later leave the
> > > stage, when `ProtobufNativeAdvancedSchemaCompatibilityCheck` is
> > > stable, we can make it the default Checker.
> > >
> > > It is just a plug-in, users can change it at will and ensure that it
> > > is used correctly
> > > > (2)In broker.conf
> > > >
> > > > `schemaRegistryCompatibilityCheckers`. If
> > > > ProtobufNativeSchemaCompatibilityCheck and
> > > > ProtobufNativeAdvancedSchemaCompatibilityCheck all set. This is going
> > to
> > > > overwrite each other. Because this is a map:
> > > >
> > > >
> > >
> >
> https://github.com/apache/pulsar/blob/af1360fb167c1f9484fda5771df3ea9b21d1440b/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/schema/SchemaRegistryService.java#L36-L44
> > > >
> > > > ```java
> > > >
> > > > Map<SchemaType, SchemaCompatibilityCheck> checkers = new HashMap<>();
> > > >
> > > > for (String className : checkerClasses) {
> > > >
> > > > SchemaCompatibilityCheck schemaCompatibilityCheck =
> > > > Reflections.createInstance(className,
> > > >
> > > > SchemaCompatibilityCheck.class,
> > > > Thread.currentThread().getContextClassLoader());
> > > >
> > > > checkers.put(schemaCompatibilityCheck.getSchemaType(),
> > > > schemaCompatibilityCheck);
> > > >
> > > > ```
> > > >
> > > > Is this a big problem or a small one? Is it possible or unnecessary?
> > > Maybe
> > > > we can write in the documentation that protobufNative checkers can
> only
> > > > choose one of the two? Why are there two Checkers for different
> > > > implementations of the same schemaType? Why not the checker to create
> > > > different validator, so we don not have to change
> > > > schemaRegistryCompatibilityCheckers.
> > >
> > > users can only use one, not two, which will bring complexity to users
> > >
> > > >
> > > > (3)And after the update to
> > > ProtobufNativeAdvancedSchemaCompatibilityCheck.
> > > > Existing topics previously only checked the name of the root message,
> > not
> > > > the content of protobuf.
> > > >
> > > > What if the user wants both Checkers?
> > > >
> > > > Set to ProtobufNativeAdvancedSchemaCompatibilityCheck, affect the
> topic
> > > of
> > > > the existing schema?
> > > >
> > > > Older topics still use the old checker, and newer topics or certain
> > older
> > > > topics use the new advancedchecker.
> > > >
> > > when `ProtobufNativeAdvancedSchemaCompatibilityCheck` stable,
> > > users will not choose `ProtobufNativeSchemaCompatibilityCheck`.
> > > because it not a complete checker.
> > > > (4)So should we have one schemaType for a checker?
> > protobufNativeChecker
> > > > can have as many different implementation classes as possible. This
> > > > classname configuration in PIP, let's see if it can be set at the
> topic
> > > > level. In the current PIP design I just load this parameter into the
> > > > checker when the broker is started and the checkers map is set up.
> Can
> > I
> > > do
> > > > this in the new normal pr if I want to support topic level? Or
> perfect
> > it
> > > > here?
> > > >
> > > > Add a call PROTOBUF_NATIVE_ADVANCE schemaType corresponding
> > > > ProtobufNativeAdvancedSchemaCompatibilityCheck? (Seems to be more
> > > trouble).
> > > >
> > > > Sorry I can not use the computer and network in the company, I use my
> > > > mobile phone to reply to the email, the format may be a bit messy.
> > Please
> > > > understand.
> > > >
> > > > Thanks,
> > > >
> > > > sinan
> > > >
> > > >
> > > > 丛搏 <bo...@apache.org> 于 2023年3月7日周二 下午11:39写道:
> > > >
> > > > > SiNan Liu <li...@gmail.com> 于2023年3月7日周二 13:22写道:
> > > > > >
> > > > > > Great to see your comment, bo!
> > > > > >
> > > > > > 1. The first way. The protobuf website has a description of the
> > > rules,
> > > > > but
> > > > > > no plans to implement them.
> > > > > > https://protobuf.dev/programming-guides/proto/#updating
> > > > >
> > > > > https://groups.google.com/g/protobuf
> > > > > maybe ask here
> > > > >
> > > > > >
> > > > > > 2. I think this PIP can be divided into two parts.
> > > > > > (1) Add a flag(`ValidatorClassName`), load it into
> > > > > > `ProtobufNativeSchemaCompatibilityCheck` when the broker starts.
> > > > > > ValidatorClassName is empty by default, and the implementation
> > > continues
> > > > > as
> > > > > > before, with no change for the user.
> > > > >
> > > > > `ProtobufNativeSchemaCompatibilityCheck` is a plugin in
> `broker.conf`
> > > > > ```
> > > > >
> > > > >
> > >
> >
> schemaRegistryCompatibilityCheckers=org.apache.pulsar.broker.service.schema.JsonSchemaCompatibilityCheck,org.apache.pulsar.broker.service.schema.AvroSchemaCompatibilityCheck,org.apache.pulsar.broker.service.schema.ProtobufNativeSchemaCompatibilityCheck
> > > > > ```
> > > > > I do not recommend that we directly modify this plugin and continue
> > to
> > > > > add configuration items, which will cause trouble for users.
> > > > > We have a lot of configs and it's getting very unwieldy.
> > > > > in my opinion, we don't change
> > > > >
> > > > >
> > >
> >
> `org.apache.pulsar.broker.service.schema.ProtobufNativeSchemaCompatibilityCheck`,
> > > > > it is a simple implementation, it doesn't go wrong very often, most
> > > > > users will use it. we can add another ProtobufNativeCheck named
> > > > > `ProtobufNativeAdvancedSchemaCompatibilityCheck ` or other. in this
> > > > > way, we don't need to add this flag. There is no need to consider
> > > > > compatibility, it is just a plug-in and will not affect current
> > logic.
> > > > > If the user needs it, just change the plugin to the new
> > implementation
> > > > >
> > > > > > ```java
> > > > > >     ProtobufNativeSchemaValidator DEFAULT = (fromDescriptors,
> > > > > toDescriptor)
> > > > > > -> {
> > > > > >         for (Descriptors.Descriptor fromDescriptor :
> > > fromDescriptors) {
> > > > > >             // The default implementation only checks if the root
> > > message
> > > > > > has changed.
> > > > > >             if
> > > > > >
> (!fromDescriptor.getFullName().equals(toDescriptor.getFullName()))
> > {
> > > > > >                 throw new ProtoBufCanReadCheckException("Protobuf
> > > root
> > > > > > message isn't allow change!");
> > > > > >             }
> > > > > >         }
> > > > > >     };
> > > > > > ```
> > > > > > `ValidatorClassName` value also can be set to the current
> > > implementation
> > > > > of
> > > > > > PIP add
> > > > > >
> > > > >
> > >
> >
> `org.apache.pulsar.broker.service.schema.validator.ProtobufNativeSchemaBreakValidatorImpl`.
> > > > > >
> > > > > > (2) Recoding the `ProtobufNativeSchemaCompatibilityCheck`.
> Through
> > > the
> > > > > flag
> > > > > > (`ValidatorClassName`) to build different
> > > > > `ProtobufNativeSchemaValidator`.
> > > > > > Isn't it just a plug-in? The user can develop and choose a
> > different
> > > > > > `ProtobufNativeSchemaValidator`. I think it didn't change the
> > logic,
> > > it
> > > > > > just allowed him to expand it.
> > > > > >
> > > > > >
> > > > > > I think this PIP should be an enhancement and supplement to the
> > > function,
> > > > > > and there is no such thing as unnecessary and meaningless.
> > > > > >
> > > > > >
> > > > > > Thanks,
> > > > > > sinan
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > 丛搏 <bo...@apache.org> 于2023年3月7日周二 11:53写道:
> > > > > >
> > > > > > > I think we have two ways to do that.
> > > > > > >
> > > > > > > First way: We need to advance the improvement of java in
> > protobuf.
> > > Ask
> > > > > > > if they have plans to improve.
> > > > > > >
> > > > > > > Second way: the new PROTOBUF_NATIVE `SchemaCompatibilityCheck`
> > > should
> > > > > > > be implemented as a plugin, don't change any existing plugin
> > logic
> > > > > > > (it's simple and already used). I don't recommend adding flags
> > for
> > > > > > > rollback, it adds configuration and makes little sense.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Bo
> > > > > > >
> > > > > > > Asaf Mesika <as...@gmail.com> 于2023年3月6日周一 23:00写道:
> > > > > > >
> > > > > > > >
> > > > > > > > Can you convert the code block which is actually a quote in
> the
> > > > > > > > beginning of the PIP to something which doesn't require to
> > scroll
> > > > > > > > horizontally so much?
> > > > > > > > Use
> > > > > > > >
> > > > > > >
> > > > >
> > >
> >
> https://docs.github.com/en/get-started/writing-on-github/getting-started-with-writing-and-formatting-on-github/basic-writing-and-formatting-syntax#quoting-text
> > > > > > > >
> > > > > > > > Let's improve the clarity of what you wrote:
> > > > > > > >
> > > > > > > > "the PROTOBUF uses avro struct to store."
> > > > > > > > -->
> > > > > > > > When Schema type PROTOBUF is used, Pulsar Client assumes the
> > > object
> > > > > given
> > > > > > > > to it as message data is an auto-generated POJO containing
> the
> > > > > > > annotations
> > > > > > > > encoding the schema. The client is using a converter, which
> > > converts
> > > > > a
> > > > > > > > Protobuf schema descriptor into an Avro schema and sends that
> > as
> > > the
> > > > > > > Schema
> > > > > > > > of the producer/consumer.
> > > > > > > >
> > > > > > > > "On the broker side, protobuf and avro both use SchemaData
> > > converted
> > > > > to
> > > > > > > > org.apache.avro.Schema."
> > > > > > > > -->
> > > > > > > > Since the schema is an Avro schema, the implementation of
> > > > > compatibility
> > > > > > > > check on the broker side is to simply re-use the
> compatibility
> > > check
> > > > > of
> > > > > > > the
> > > > > > > > AVRO schema type.
> > > > > > > >
> > > > > > > > "ProtobufSchema is different from ProtobufNativeSchema in
> > schema
> > > > > > > > compatibility check it uses avro-protobuf.
> > > > > > > >
> > > > > > >
> > > > >
> > >
> >
> https://central.sonatype.com/artifact/org.apache.avro/avro-protobuf/1.11.1/overview
> > > > > > > > But the current implementation of ProtobufNative schema
> > > compatibility
> > > > > > > > check only
> > > > > > > > checked if the root message name is changed."
> > > > > > > >
> > > > > > > > -->
> > > > > > > > PROTOBUF_NATIVE schema type is different.
> > > > > > > > The client is actually using Protobuf Descriptor as the
> schema,
> > > as
> > > > > > > opposed
> > > > > > > > to Avro schema of PROTOBUF schema type. In the broker, the
> > > > > > > PROTOBUF_NATIVE
> > > > > > > > compatibility check actually hasn't implemented any rule,
> > besides
> > > > > one:
> > > > > > > > checking if the root message name has changed.
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > >    1. For now, there is no official or third-party solution
> > for
> > > > > > > ProtoBuf
> > > > > > > > >    compatibility. If in the future have better solutions
> of a
> > > third
> > > > > > > party or
> > > > > > > > >    the official, we develop new
> ProtobufNativeSchemaValidator
> > > and
> > > > > use,
> > > > > > > so
> > > > > > > > >    add a flag.
> > > > > > > > >
> > > > > > > > > Who do you need to make that configurable? Once you found a
> > > third
> > > > > > > party,
> > > > > > > > just switch to it? Who knows, maybe you never will. Introduce
> > it
> > > > > when you
> > > > > > > > find it, not now.
> > > > > > > >
> > > > > > > >
> > > > > > > > We improve in ProtobufNativeSchemaCompatibilityCheck
> BACKWARD,
> > > > > FORWARD
> > > > > > > > > these strategies. As with the AVRO implementation, protobuf
> > > > > > > compatibility
> > > > > > > > > checking need implementing the canRead method. *This will
> > check
> > > > > that
> > > > > > > > > the writtenschema can be read by readSchema.*
> > > > > > > >
> > > > > > > >
> > > > > > > > I completely disagree.
> > > > > > > > Avro implementation is confusing for our use case. Don't copy
> > > that.
> > > > > > > >
> > > > > > > > You have
> > > > > > > >
> > > > > > > > public void checkCompatible(SchemaData from, SchemaData to,
> > > > > > > > SchemaCompatibilityStrategy strategy)
> > > > > > > >         throws IncompatibleSchemaException {
> > > > > > > >     Descriptor fromDescriptor =
> > > > > > > > ProtobufNativeSchemaUtils.deserialize(from.getData());
> > > > > > > >     Descriptor toDescriptor =
> > > > > > > > ProtobufNativeSchemaUtils.deserialize(to.getData());
> > > > > > > >     switch (strategy) {
> > > > > > > >         case BACKWARD_TRANSITIVE:
> > > > > > > >         case BACKWARD:
> > > > > > > >         case FORWARD_TRANSITIVE:
> > > > > > > >         case FORWARD:
> > > > > > > >         case FULL_TRANSITIVE:
> > > > > > > >         case FULL:
> > > > > > > >             checkRootMessageChange(fromDescriptor,
> > toDescriptor,
> > > > > > > strategy);
> > > > > > > >             return;
> > > > > > > >         case ALWAYS_COMPATIBLE:
> > > > > > > >             return;
> > > > > > > >         default:
> > > > > > > >             throw new IncompatibleSchemaException("Unknown
> > > > > > > > SchemaCompatibilityStrategy.");
> > > > > > > >     }
> > > > > > > > }
> > > > > > > >
> > > > > > > > I would rename :
> > > > > > > > from --> currentSchema
> > > > > > > > to --> newSchema
> > > > > > > >
> > > > > > > > Use that switch case and have a method for each like:
> > > > > > > > validateBackwardsCompatibility(currentSchema, newSchema)
> > > > > > > >
> > > > > > > > I dislike canRead and usage of writtenSchema, since you have
> > two
> > > > > > > completely
> > > > > > > > different use cases: from the producing side and the consumer
> > > side.
> > > > > > > >
> > > > > > > > schemaValidatorBuilder
> > > > > > > > >
> > > > > > > > > I dislike this proposal. IMO Avro implementation is way too
> > > > > > > complicated.
> > > > > > > > Why not have a simple function for validation for each switch
> > > case
> > > > > above?
> > > > > > > > Why do we need strategy and builder, and all this complexity?
> > > > > > > >
> > > > > > > >
> > > > > > > > *Here are the basic compatibility rules we've defined:*
> > > > > > > >
> > > > > > > >
> > > > > > > > IMO it's impossible to read the validation rules as you
> > described
> > > > > them.
> > > > > > > > I wrote how they should be structured numerous times above.
> > > > > > > > I can't validate them.
> > > > > > > >
> > > > > > > >
> > > > > > > > IMO, the current design is very hard to read.
> > > > > > > > Please try to avoid jumping into code sections.
> > > > > > > > Write a high level design section, in which you describe in
> > words
> > > > > what
> > > > > > > you
> > > > > > > > plan to do.
> > > > > > > > Write the validation rules in the structure that is easy to
> > > > > understand:
> > > > > > > > rules per each compatibility check, and use proper words
> > (current
> > > > > schema,
> > > > > > > > new schema), since new schema can be once used for read and
> > once
> > > > > used for
> > > > > > > > write.
> > > > > > > >
> > > > > > > > In its current form it takes too much time to understand the
> > > design,
> > > > > and
> > > > > > > it
> > > > > > > > shouldn't be the case.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > >
> > > > > > > > Asaf
> > > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > On Sun, Mar 5, 2023 at 3:58 PM SiNan Liu <
> > liusinan1998@gmail.com
> > > >
> > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi! I updated the explanation of some things in the PIP
> > issue.
> > > And
> > > > > also
> > > > > > > > > added a new “flag” in the conf is used as the different
> > > > > > > > > ProtobufNativeSchemaValidator implementation, also set
> > > > > > > > > ProtobufNativeSchemaValidator default only check whether
> the
> > > name
> > > > > of
> > > > > > > the
> > > > > > > > > root message is the same.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > sinan
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Asaf Mesika <as...@gmail.com> 于2023年3月5日周日 20:21写道:
> > > > > > > > >
> > > > > > > > > > On Wed, Mar 1, 2023 at 4:33 PM SiNan Liu <
> > > liusinan1998@gmail.com
> > > > > >
> > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Can you please explain how a Protobuf Schema
> descriptor
> > > can
> > > > > be
> > > > > > > > > > validated
> > > > > > > > > > > > for backward compatibility check using Avro based
> > > > > compatibility
> > > > > > > > > rules?
> > > > > > > > > > > > Doesn't it expect the schema to be Avro, but it is
> > > actually a
> > > > > > > > > Protobuf
> > > > > > > > > > > > descriptor?
> > > > > > > > > > > > Is there some translation happening?
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > 1. *You can take a quick look at the previous design,
> the
> > > > > PROTOBUF
> > > > > > > uses
> > > > > > > > > > > avro struct to store.*
> > > > > > > > > > > https://github.com/apache/pulsar/pull/1954
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > >
> >
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/ProtobufSchema.java#L59-L61
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > >
> >
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/ProtobufSchema.java#L110-L115
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Ok. So to summarize your code (easier to write it than
> send
> > > > > links):
> > > > > > > > > > * Pulsar Client, when used with Protobuf Schema, actually
> > > > > converts
> > > > > > > the
> > > > > > > > > > Protobuf descriptor into an Avro Schema (using code found
> > > inside
> > > > > Avro
> > > > > > > > > > library) and saves that Avro schema as the schema. It's
> not
> > > > > saving
> > > > > > > the
> > > > > > > > > > protobuf descriptor at all. Very confusing I have to add
> -
> > > never
> > > > > > > expected
> > > > > > > > > > that.
> > > > > > > > > > This explains why In the ProtobufSchemaCompatibilityCheck
> > > they
> > > > > just
> > > > > > > > > extend
> > > > > > > > > > the Avro without doing any translation.
> > > > > > > > > >
> > > > > > > > > > Thanks for that.
> > > > > > > > > >
> > > > > > > > > > Now thatI finally understand this, I can say that: you
> > *must*
> > > > > explain
> > > > > > > > > that
> > > > > > > > > > in the motivation part in your PIP.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > 2. *On the broker side, protobuf and avro both use
> > > `SchemaData`
> > > > > > > > > converted
> > > > > > > > > > > to `org.apache.avro.Schema`.*
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > >
> >
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/ServerCnx.java#L1280-L1293
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > >
> >
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/schema/ProtobufSchemaCompatibilityCheck.java#L26-L31
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > >
> >
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/schema/AvroSchemaBasedCompatibilityCheck.java#L47-L70
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Actually those links don't really help.
> > > > > > > > > > The main link that helps is:
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > >
> >
> https://github.com/apache/pulsar/blob/ec102fb024a6ea2b195826778300f20e330dff06/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/ProtobufSchema.java#L102-L122
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > I'm sorry - I don't understand.
> > > > > > > > > > > > I understand the different compatibility check
> > > strategies.
> > > > > > > > > > > > If you just spell them out here, then as you say,
> just
> > > > > translate
> > > > > > > the
> > > > > > > > > > > > Protobuf Descriptor into an Avro schema and run the
> > Avro
> > > > > > > > > > > > compatibility validation, no?
> > > > > > > > > > > > I believe the answer is no, since you may want to
> > verify
> > > > > > > different
> > > > > > > > > > things
> > > > > > > > > > > > when it comes to Protobuf, which are different then
> > Avro.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > 1.
> > > > > > > > > > > *ProtobufSchema is different from ProtobufNativeSchema
> in
> > > that
> > > > > it
> > > > > > > uses
> > > > > > > > > > > avro-protobuf.*
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > >
> >
> https://central.sonatype.com/artifact/org.apache.avro/avro-protobuf/1.11.1/overview
> > > > > > > > > > > *ProtobufNativeSchema needs a native compatibility
> check,
> > > but
> > > > > > > there is
> > > > > > > > > no
> > > > > > > > > > > official or third party implementation. So this PIP
> does
> > > not
> > > > > use
> > > > > > > > > > > avro-protobuf for protobuf compatibility checking.*
> > > > > > > > > > >
> > > > > > > > > > > 2. *By the way, this is implemented in much the same
> way
> > > that
> > > > > > > Apache
> > > > > > > > > avro
> > > > > > > > > > > does compatibility checking.*
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > >
> >
> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/SchemaValidatorBuilder.java
> > > > > > > > > > >
> > `canReadStrategy`,`canBeReadStrategy`,`mutualReadStrategy`
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > >
> >
> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/ValidateCanRead.java
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > >
> >
> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/ValidateCanBeRead.java
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > >
> >
> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/ValidateMutualRead.java
> > > > > > > > > > > *In `ValidateMutualRead.java`, the arguments of
> > > `canRead()` are
> > > > > > > > > > > writtenSchema and readSchema. We only need to change
> the
> > > order
> > > > > of
> > > > > > > > > > arguments
> > > > > > > > > > > we pass to `canRead()`.*
> > > > > > > > > > > ```java
> > > > > > > > > > > private void
> validateWithStrategy(Descriptors.Descriptor
> > > > > > > toValidate,
> > > > > > > > > > > Descriptors.Descriptor fromDescriptor) throws
> > > > > > > > > > ProtoBufCanReadCheckException
> > > > > > > > > > > {
> > > > > > > > > > > switch (strategy) {
> > > > > > > > > > > case CanReadExistingStrategy -> canRead(fromDescriptor,
> > > > > > > toValidate);
> > > > > > > > > > > case CanBeReadByExistingStrategy -> canRead(toValidate,
> > > > > > > > > fromDescriptor);
> > > > > > > > > > > case CanBeReadMutualStrategy -> {
> > > > > > > > > > > canRead(toValidate, fromDescriptor);
> > > > > > > > > > > canRead(fromDescriptor, toValidate);
> > > > > > > > > > > }
> > > > > > > > > > > }
> > > > > > > > > > > }
> > > > > > > > > > >
> > > > > > > > > > > private void canRead(Descriptors.Descriptor
> > writtenSchema,
> > > > > > > > > > > Descriptors.Descriptor readSchema) throws
> > > > > > > > > ProtoBufCanReadCheckException {
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > >
> >
> ProtobufNativeSchemaBreakCheckUtils.checkSchemaCompatibility(writtenSchema,
> > > > > > > > > > > readSchema);
> > > > > > > > > > > }
> > > > > > > > > > > ```
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > I get that you want to take inspiration from the existing
> > > Avro
> > > > > Schema
> > > > > > > > > > compatibility check, to do your code design.
> > > > > > > > > > I also understand you *won't* use any existing avro code
> > for
> > > > > that.
> > > > > > > > > > I also understand, you have to write the validation check
> > on
> > > your
> > > > > > > own,
> > > > > > > > > > since there is no 3rd party to explain that.
> > > > > > > > > >
> > > > > > > > > > The only thing I can't understand are the actual rules
> as I
> > > wrote
> > > > > > > before,
> > > > > > > > > > since they are written confusingly.
> > > > > > > > > > So, I repeat what I asked before:
> > > > > > > > > >
> > > > > > > > > > I think you should structure the validation rules
> > > differently:
> > > > > > > > > >
> > > > > > > > > > * Backward checks
> > > > > > > > > > ** List down rules, where use newSchema (the schema used
> by
> > > > > producer
> > > > > > > or
> > > > > > > > > > consumer) and existingSchema (last schema used)
> > > > > > > > > > * Forward
> > > > > > > > > > ** List down rules, where use newSchema (the schema used
> by
> > > > > producer
> > > > > > > or
> > > > > > > > > > consumer) and existingSchema (last schema used)
> > > > > > > > > >
> > > > > > > > > > Once that's accomplished I will be able to understand the
> > > > > different
> > > > > > > > > > validation rules for each compatibility check.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Thanks,
> > > > > > > > > > > sinan
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Asaf Mesika <as...@gmail.com> 于2023年3月1日周三
> > 21:19写道:
> > > > > > > > > > > >
> > > > > > > > > > > > On Mon, Feb 27, 2023 at 3:47 PM SiNan Liu <
> > > > > > > liusinan1998@gmail.com>
> > > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I read it and they look identical. What's the
> > > difference
> > > > > > > between
> > > > > > > > > > > them?
> > > > > > > > > > > > >
> > > > > > > > > > > > > Current avro,json, and protobuf schemas are all
> > > implemented
> > > > > > > based
> > > > > > > > > on
> > > > > > > > > > > AVRO.
> > > > > > > > > > > > > > What do you mean, they are all implemented based
> on
> > > > > Avro? You
> > > > > > > > > mean
> > > > > > > > > > > the
> > > > > > > > > > > > > > protobuf schema is converted into an Avro Schema,
> > and
> > > > > then
> > > > > > > you
> > > > > > > > > use
> > > > > > > > > > > Avro
> > > > > > > > > > > > > > compatibility validation?
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > >
> >
> `org.apache.pulsar.broker.service.schema.ProtobufSchemaCompatibilityCheck`
> > > > > > > > > > > > >
> > > > > > > > > >
> > > > > > >
> > > `org.apache.pulsar.broker.service.schema.AvroSchemaCompatibilityCheck`
> > > > > > > > > > > > >
> > > > > > > > > >
> > > > > > >
> > > `org.apache.pulsar.broker.service.schema.JsonSchemaCompatibilityCheck`
> > > > > > > > > > > > > They all extends
> `AvroSchemaBasedCompatibilityCheck`,
> > > the
> > > > > > > > > > > > > `checkCompatible()` is the same implementation with
> > > `AVRO`.
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Can you please explain how a Protobuf Schema
> descriptor
> > > can
> > > > > be
> > > > > > > > > > validated
> > > > > > > > > > > > for backward compatibility check using Avro based
> > > > > compatibility
> > > > > > > > > rules?
> > > > > > > > > > > > Doesn't it expect the schema to be Avro, but it is
> > > actually a
> > > > > > > > > Protobuf
> > > > > > > > > > > > descriptor?
> > > > > > > > > > > > Is there some translation happening?
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > I think you should structure the validation rules
> > > > > differently:
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > The Compatibility check strategy is described on
> the
> > > > > website
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > >
> >
> https://pulsar.apache.org/docs/next/schema-understand/#schema-compatibility-check-strategy
> > > > > > > > > > > > > 1. BACKWARD(CanReadExistingStrategy): Consumers
> using
> > > > > schema
> > > > > > > V3 can
> > > > > > > > > > > process
> > > > > > > > > > > > > data written by producers using the last schema
> > version
> > > > > V2. So
> > > > > > > V2
> > > > > > > > > is
> > > > > > > > > > > > > "writtenSchema" and V3 is "readSchema".
> > > > > > > > > > > > > 2. FORWARD(CanBeReadByExistingStrategy): Consumers
> > > using
> > > > > the
> > > > > > > last
> > > > > > > > > > > schema
> > > > > > > > > > > > > version V2 can process data written by producers
> > using
> > > a
> > > > > new
> > > > > > > schema
> > > > > > > > > > V3,
> > > > > > > > > > > > > even though they may not be able to use the full
> > > > > capabilities
> > > > > > > of
> > > > > > > > > the
> > > > > > > > > > > new
> > > > > > > > > > > > > schema. So V3 is "writtenSchema" and V2 is
> > > "readSchema".
> > > > > > > > > > > > > 3. FULL(CanBeReadMutualStrategy): Schemas are both
> > > > > backward and
> > > > > > > > > > forward
> > > > > > > > > > > > > compatible.
> > > > > > > > > > > > > Schema can evolve. The old version schema and the
> new
> > > > > version
> > > > > > > > > schema
> > > > > > > > > > > should
> > > > > > > > > > > > > be well understood.
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > I'm sorry - I don't understand.
> > > > > > > > > > > > I understand the different compatibility check
> > > strategies.
> > > > > > > > > > > > If you just spell them out here, then as you say,
> just
> > > > > translate
> > > > > > > the
> > > > > > > > > > > > Protobuf Descriptor into an Avro schema and run the
> > Avro
> > > > > > > > > > > > compatibility validation, no?
> > > > > > > > > > > > I believe the answer is no, since you may want to
> > verify
> > > > > > > different
> > > > > > > > > > things
> > > > > > > > > > > > when it comes to Protobuf, which are different then
> > Avro.
> > > > > > > > > > > >
> > > > > > > > > > > > At the current state, I can't understand your design
> at
> > > all.
> > > > > > > Please
> > > > > > > > > > help
> > > > > > > > > > > > clarify that.
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > So each strategy should have its own section.
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > The arguments of `canRead()` are writtenSchema and
> > > > > readSchema.
> > > > > > > As
> > > > > > > > > > we've
> > > > > > > > > > > > > just described, we only need to change the order of
> > > > > arguments
> > > > > > > we
> > > > > > > > > pass
> > > > > > > > > > > to
> > > > > > > > > > > > > `canRead()`.
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > sinan
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > Asaf Mesika <as...@gmail.com> 于2023年2月27日周一
> > > 20:49写道:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > And you can see the difference between ProtoBuf
> > and
> > > > > > > > > > ProtoBufNative:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > >
> > > https://pulsar.apache.org/docs/next/schema-get-started/#protobufnative
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > >
> > > https://pulsar.apache.org/docs/next/schema-get-started/#protobuf
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >  I read it and they look identical. What's the
> > > difference
> > > > > > > between
> > > > > > > > > > > them?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Current avro,json, and protobuf schemas are all
> > > > > implemented
> > > > > > > based
> > > > > > > > > > on
> > > > > > > > > > > > > AVRO.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > What do you mean, they are all implemented based
> on
> > > > > Avro? You
> > > > > > > > > mean
> > > > > > > > > > > the
> > > > > > > > > > > > > > protobuf schema is converted into an Avro Schema,
> > and
> > > > > then
> > > > > > > you
> > > > > > > > > use
> > > > > > > > > > > Avro
> > > > > > > > > > > > > > compatibility validation?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > *Here are the basic compatibility rules we've
> > > defined:*
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I think you should structure the validation rules
> > > > > > > differently:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > * Backward checks
> > > > > > > > > > > > > > ** List down rules, where use newSchema (the
> schema
> > > used
> > > > > by
> > > > > > > > > > producer
> > > > > > > > > > > or
> > > > > > > > > > > > > > consumer) and existingSchema (last schema used)
> > > > > > > > > > > > > > * Forward
> > > > > > > > > > > > > > ** List down rules, where use newSchema (the
> schema
> > > used
> > > > > by
> > > > > > > > > > producer
> > > > > > > > > > > or
> > > > > > > > > > > > > > consumer) and existingSchema (last schema used)
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > So each strategy should have its own section.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I'm saying this since you used "writttenSchema"
> > word
> > > but
> > > > > it
> > > > > > > > > > > represents
> > > > > > > > > > > > > > something completely different if it's backward
> or
> > > > > forward
> > > > > > > check.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Once you'll have that structure like that, I
> > > personally
> > > > > will
> > > > > > > be
> > > > > > > > > > able
> > > > > > > > > > > to
> > > > > > > > > > > > > > read and understand it.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > The motivation and problem statement are now
> good -
> > > > > thanks
> > > > > > > for
> > > > > > > > > > > improving
> > > > > > > > > > > > > > it.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Mon, Feb 27, 2023 at 8:20 AM SiNan Liu <
> > > > > > > > > liusinan1998@gmail.com>
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Hi! I updated the PIP issue again. This time
> I've
> > > added
> > > > > > > some
> > > > > > > > > > > background
> > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > some explanations.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > The compatibility check rules are already
> written
> > > in
> > > > > the
> > > > > > > > > > > > > Implementation.
> > > > > > > > > > > > > > > ProtoBufNative implements the same canRead
> method
> > > as
> > > > > Apache
> > > > > > > > > Avro.
> > > > > > > > > > > > > > > It does this by checking whether the schema for
> > > > > writing and
> > > > > > > > > > reading
> > > > > > > > > > > is
> > > > > > > > > > > > > > > compatible. I also indicate whether the
> > > writtenSchema
> > > > > and
> > > > > > > > > > > readSchema of
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > Backward, Forward, and Full strategies are the
> > old
> > > or
> > > > > the
> > > > > > > new
> > > > > > > > > > > version
> > > > > > > > > > > > > of
> > > > > > > > > > > > > > > the schema.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > sinan
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Asaf Mesika <as...@gmail.com>
> > 于2023年2月26日周日
> > > > > 23:24写道:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > I'm sorry, but this PIP lacks a lot of
> > background
> > > > > > > knowledge,
> > > > > > > > > so
> > > > > > > > > > > you
> > > > > > > > > > > > > > need
> > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > add IMO for people to understand it. You
> don't
> > > need
> > > > > to
> > > > > > > > > explain
> > > > > > > > > > > the
> > > > > > > > > > > > > > entire
> > > > > > > > > > > > > > > > pulsar in this PIP, but at the very least a
> few
> > > > > > > paragraphs
> > > > > > > > > > > detailing
> > > > > > > > > > > > > > all
> > > > > > > > > > > > > > > > you need to know, to put you in context:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >    - Start by saying Pulsar as a built-in
> > schema
> > > > > registry
> > > > > > > > > > inside
> > > > > > > > > > > > > Pulsar
> > > > > > > > > > > > > > > >    broker.
> > > > > > > > > > > > > > > >       - Every time the client updates the
> > > schema, it
> > > > > > > uploads
> > > > > > > > > it
> > > > > > > > > > > to
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > >       broker. When that happens, it has a
> > feature
> > > > > which
> > > > > > > > > > validates
> > > > > > > > > > > if
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > new
> > > > > > > > > > > > > > > >       schema version is compatible with the
> > > previous
> > > > > > > > > versions.
> > > > > > > > > > > There
> > > > > > > > > > > > > > > > are 4 types
> > > > > > > > > > > > > > > >       of compatibility: Full, ... (complete
> and
> > > > > explain
> > > > > > > each
> > > > > > > > > > one
> > > > > > > > > > > > > > briefly)
> > > > > > > > > > > > > > > >    - Also explain Pulsar Schema registry
> > supports
> > > > > various
> > > > > > > > > > schema
> > > > > > > > > > > > > > > >    protocols:  Avro, protobuf native, ...
> > > (complete
> > > > > the
> > > > > > > > > rest),
> > > > > > > > > > > each
> > > > > > > > > > > > > > > > protocol
> > > > > > > > > > > > > > > >    has a schema which dictates how to
> serialize
> > > and
> > > > > > > > > deserialize
> > > > > > > > > > > the
> > > > > > > > > > > > > > > message
> > > > > > > > > > > > > > > >    content into typed object.
> > > > > > > > > > > > > > > >    - Explain in short what is protobuf native
> > > > > (compare
> > > > > > > > > protobuf
> > > > > > > > > > > > > > > non-native)
> > > > > > > > > > > > > > > >    - Please don't paste code instead of
> > > explaining.
> > > > > > > > > > > > > > > >       - Explain that protobuf native current
> > > > > validation
> > > > > > > check
> > > > > > > > > > is
> > > > > > > > > > > only
> > > > > > > > > > > > > > > >       composed of checking the root message
> > name
> > > is
> > > > > the
> > > > > > > same
> > > > > > > > > > > between
> > > > > > > > > > > > > > > > the current
> > > > > > > > > > > > > > > >       schema version and the new version.
> > > > > > > > > > > > > > > >          - Explain briefly what is a root
> > message
> > > > > and its
> > > > > > > > > name.
> > > > > > > > > > > > > > > >       - Explain the problem (list scenarios)
> > > that we
> > > > > have
> > > > > > > > > > because
> > > > > > > > > > > > > > > protobuf
> > > > > > > > > > > > > > > >       native schema only supports FULL
> > > compatibility
> > > > > > > > > > validation.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Regarding high level design - as in what you
> > > plan to
> > > > > do.
> > > > > > > > > > > > > > > > I suggest you add "High Level Design" and in
> it
> > > > > detail
> > > > > > > how
> > > > > > > > > you
> > > > > > > > > > > plan
> > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > validate, per protobuf version, per
> > compatibility
> > > > > check
> > > > > > > > > > > (backward,
> > > > > > > > > > > > > > > forward,
> > > > > > > > > > > > > > > > full,...).
> > > > > > > > > > > > > > > > I tried reading the implementation - for me ,
> > > it's
> > > > > all
> > > > > > > over
> > > > > > > > > the
> > > > > > > > > > > > > place.
> > > > > > > > > > > > > > > Can
> > > > > > > > > > > > > > > > you please list in order what I wrote above,
> > and
> > > > > list the
> > > > > > > > > > > validation
> > > > > > > > > > > > > > > rules
> > > > > > > > > > > > > > > > with a good explanation why you validate it
> > like
> > > > > that?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Lastly, one you have all the validation rules
> > > clearly
> > > > > > > stated,
> > > > > > > > > > you
> > > > > > > > > > > can
> > > > > > > > > > > > > > use
> > > > > > > > > > > > > > > > it to document it properly so users can know
> > what
> > > > > > > validation
> > > > > > > > > to
> > > > > > > > > > > > > expect.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Asaf
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On Wed, Feb 22, 2023 at 5:10 PM SiNan Liu <
> > > > > > > > > > > liusinan1998@gmail.com>
> > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Sorry, my mistake. I removed the code and
> > > > > described the
> > > > > > > > > > design
> > > > > > > > > > > to
> > > > > > > > > > > > > > > improve
> > > > > > > > > > > > > > > > > the PROTOBUF_NATIVE schema compatibility
> > > checks.
> > > > > You
> > > > > > > can
> > > > > > > > > > have a
> > > > > > > > > > > > > look.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Asaf Mesika <as...@gmail.com>
> > > 于2023年2月22日周三
> > > > > > > 21:16写道:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > I read it but you're almost directly
> diving
> > > into
> > > > > the
> > > > > > > > > code -
> > > > > > > > > > > it
> > > > > > > > > > > > > will
> > > > > > > > > > > > > > > > take
> > > > > > > > > > > > > > > > > me
> > > > > > > > > > > > > > > > > > hours just to reverse engineer your
> design.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Can you please include a "High Level
> > Design"
> > > > > section
> > > > > > > in
> > > > > > > > > > which
> > > > > > > > > > > you
> > > > > > > > > > > > > > > > explain
> > > > > > > > > > > > > > > > > > how you plan to tackle any issue?
> > > > > > > > > > > > > > > > > > If I can read that section and explain to
> > > someone
> > > > > > > else
> > > > > > > > > how
> > > > > > > > > > > this
> > > > > > > > > > > > > > will
> > > > > > > > > > > > > > > > > work,
> > > > > > > > > > > > > > > > > > it means the section is complete.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Let's leave the code to the PRs.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > On Sun, Feb 19, 2023 at 2:59 PM SiNan
> Liu <
> > > > > > > > > > > > > liusinan1998@gmail.com>
> > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Hi all,
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > I made a PIP to discuss:
> > > > > > > > > > > > > > > >
> https://github.com/apache/pulsar/issues/19565
> > > > > > > > > > > > > > > > > .
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > We can talk about the current design
> > here.
> > > > > > > Especially
> > > > > > > > > for
> > > > > > > > > > > the
> > > > > > > > > > > > > > field
> > > > > > > > > > > > > > > > > type
> > > > > > > > > > > > > > > > > > > change check rules, please give your
> > > valuable
> > > > > > > advice.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > > > > > Sinan
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > >
> >
>

Re: [DISCUSS] PIP-246: Improved PROTOBUF_NATIVE schema compatibility checks without using avro-protobuf

Posted by SiNan Liu <li...@gmail.com>.
*I guess that's right, too! *

But the name `ProtobufNativeAdvancedSchemaCompatibilityCheck` is better,
because we don't know whether the future will have V2, V3. The official
solution can be called `ProtobufNativeOfficialSchemaCompatibilityCheck`, or
is a good `ProtobufNativeXXXXXXXXSchemaCompatibilityCheck` third-party
solution.

I've updated my design in PIP issue.
1. A new ProtobufNativeSchemaAdvanceCompatibilityCheck, rather than a
ProtobufNativeSchemaCompatibilityCheck different validator implementation.
2. Remove the 'builder'
3. Clarify the relationship between newSchema, existingSchema, and
writtenSchema in canRead.

Help to see if the description is comprehensive and what changes and
improvements need to be made.

Thanks,
sinan



Asaf Mesika <as...@gmail.com> 于2023年3月9日周四 17:35写道:

> I like Bo's suggestion - I haven't realized each schema type
> compatibility check is actually a plugin.
> It makes sense for any schema type checks to evolve, sometimes in a
> non-backward compatible way hence having two plugins like
> protobufNativeSchemaCompatabilityCheckV1 and then
> protobufNativeSchemaCompatabilityCheckV2 and then
> protobufNativeSchemaCompatabilityCheckV3 makes sense to me.
>
>
>
> On Thu, Mar 9, 2023 at 5:49 AM 丛搏 <bo...@apache.org> wrote:
>
> >  Hi siNan:
> >
> > From my point of view, it is just a plug-in. I don't think it is
> > necessary to add configuration for the plugin.
> > This is meaningless, and it will increase the difficulty of use for
> users.
> >
> >
> > SiNan Liu <li...@gmail.com> 于2023年3月8日周三 15:54写道:
> > >
> > > Hi, bo.
> > >
> > > 1. I understand what you say, to develop a new
> > > `ProtobufNativeAdvancedSchemaCompatibilityCheck`, rather than changing
> > > existing `ProtobufNativeSchemaCompatibilityCheck`. But I found a few
> > small
> > > problems:
> > >
> > > (1)ProtobufNativeAdvancedSchemaCompatibilityCheck and
> > > ProtobufNativeSchemaCompatibilityCheck schemaType is PROTOBUF_NATIVE.
> It
> > > looks like both checkers are PROTOBUF not using AVRO-PROTOBUF's
> "native"
> > > implementation, which leads to some problems or "unreasonable" and
> gives
> > me
> > > some extended thinking and questions.
> > >
> > `CompatibilityCheck ` its only a plugin.
> > `ProtobufNativeSchemaCompatibilityCheck` may sooner or later leave the
> > stage, when `ProtobufNativeAdvancedSchemaCompatibilityCheck` is
> > stable, we can make it the default Checker.
> >
> > It is just a plug-in, users can change it at will and ensure that it
> > is used correctly
> > > (2)In broker.conf
> > >
> > > `schemaRegistryCompatibilityCheckers`. If
> > > ProtobufNativeSchemaCompatibilityCheck and
> > > ProtobufNativeAdvancedSchemaCompatibilityCheck all set. This is going
> to
> > > overwrite each other. Because this is a map:
> > >
> > >
> >
> https://github.com/apache/pulsar/blob/af1360fb167c1f9484fda5771df3ea9b21d1440b/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/schema/SchemaRegistryService.java#L36-L44
> > >
> > > ```java
> > >
> > > Map<SchemaType, SchemaCompatibilityCheck> checkers = new HashMap<>();
> > >
> > > for (String className : checkerClasses) {
> > >
> > > SchemaCompatibilityCheck schemaCompatibilityCheck =
> > > Reflections.createInstance(className,
> > >
> > > SchemaCompatibilityCheck.class,
> > > Thread.currentThread().getContextClassLoader());
> > >
> > > checkers.put(schemaCompatibilityCheck.getSchemaType(),
> > > schemaCompatibilityCheck);
> > >
> > > ```
> > >
> > > Is this a big problem or a small one? Is it possible or unnecessary?
> > Maybe
> > > we can write in the documentation that protobufNative checkers can only
> > > choose one of the two? Why are there two Checkers for different
> > > implementations of the same schemaType? Why not the checker to create
> > > different validator, so we don not have to change
> > > schemaRegistryCompatibilityCheckers.
> >
> > users can only use one, not two, which will bring complexity to users
> >
> > >
> > > (3)And after the update to
> > ProtobufNativeAdvancedSchemaCompatibilityCheck.
> > > Existing topics previously only checked the name of the root message,
> not
> > > the content of protobuf.
> > >
> > > What if the user wants both Checkers?
> > >
> > > Set to ProtobufNativeAdvancedSchemaCompatibilityCheck, affect the topic
> > of
> > > the existing schema?
> > >
> > > Older topics still use the old checker, and newer topics or certain
> older
> > > topics use the new advancedchecker.
> > >
> > when `ProtobufNativeAdvancedSchemaCompatibilityCheck` stable,
> > users will not choose `ProtobufNativeSchemaCompatibilityCheck`.
> > because it not a complete checker.
> > > (4)So should we have one schemaType for a checker?
> protobufNativeChecker
> > > can have as many different implementation classes as possible. This
> > > classname configuration in PIP, let's see if it can be set at the topic
> > > level. In the current PIP design I just load this parameter into the
> > > checker when the broker is started and the checkers map is set up. Can
> I
> > do
> > > this in the new normal pr if I want to support topic level? Or perfect
> it
> > > here?
> > >
> > > Add a call PROTOBUF_NATIVE_ADVANCE schemaType corresponding
> > > ProtobufNativeAdvancedSchemaCompatibilityCheck? (Seems to be more
> > trouble).
> > >
> > > Sorry I can not use the computer and network in the company, I use my
> > > mobile phone to reply to the email, the format may be a bit messy.
> Please
> > > understand.
> > >
> > > Thanks,
> > >
> > > sinan
> > >
> > >
> > > 丛搏 <bo...@apache.org> 于 2023年3月7日周二 下午11:39写道:
> > >
> > > > SiNan Liu <li...@gmail.com> 于2023年3月7日周二 13:22写道:
> > > > >
> > > > > Great to see your comment, bo!
> > > > >
> > > > > 1. The first way. The protobuf website has a description of the
> > rules,
> > > > but
> > > > > no plans to implement them.
> > > > > https://protobuf.dev/programming-guides/proto/#updating
> > > >
> > > > https://groups.google.com/g/protobuf
> > > > maybe ask here
> > > >
> > > > >
> > > > > 2. I think this PIP can be divided into two parts.
> > > > > (1) Add a flag(`ValidatorClassName`), load it into
> > > > > `ProtobufNativeSchemaCompatibilityCheck` when the broker starts.
> > > > > ValidatorClassName is empty by default, and the implementation
> > continues
> > > > as
> > > > > before, with no change for the user.
> > > >
> > > > `ProtobufNativeSchemaCompatibilityCheck` is a plugin in `broker.conf`
> > > > ```
> > > >
> > > >
> >
> schemaRegistryCompatibilityCheckers=org.apache.pulsar.broker.service.schema.JsonSchemaCompatibilityCheck,org.apache.pulsar.broker.service.schema.AvroSchemaCompatibilityCheck,org.apache.pulsar.broker.service.schema.ProtobufNativeSchemaCompatibilityCheck
> > > > ```
> > > > I do not recommend that we directly modify this plugin and continue
> to
> > > > add configuration items, which will cause trouble for users.
> > > > We have a lot of configs and it's getting very unwieldy.
> > > > in my opinion, we don't change
> > > >
> > > >
> >
> `org.apache.pulsar.broker.service.schema.ProtobufNativeSchemaCompatibilityCheck`,
> > > > it is a simple implementation, it doesn't go wrong very often, most
> > > > users will use it. we can add another ProtobufNativeCheck named
> > > > `ProtobufNativeAdvancedSchemaCompatibilityCheck ` or other. in this
> > > > way, we don't need to add this flag. There is no need to consider
> > > > compatibility, it is just a plug-in and will not affect current
> logic.
> > > > If the user needs it, just change the plugin to the new
> implementation
> > > >
> > > > > ```java
> > > > >     ProtobufNativeSchemaValidator DEFAULT = (fromDescriptors,
> > > > toDescriptor)
> > > > > -> {
> > > > >         for (Descriptors.Descriptor fromDescriptor :
> > fromDescriptors) {
> > > > >             // The default implementation only checks if the root
> > message
> > > > > has changed.
> > > > >             if
> > > > > (!fromDescriptor.getFullName().equals(toDescriptor.getFullName()))
> {
> > > > >                 throw new ProtoBufCanReadCheckException("Protobuf
> > root
> > > > > message isn't allow change!");
> > > > >             }
> > > > >         }
> > > > >     };
> > > > > ```
> > > > > `ValidatorClassName` value also can be set to the current
> > implementation
> > > > of
> > > > > PIP add
> > > > >
> > > >
> >
> `org.apache.pulsar.broker.service.schema.validator.ProtobufNativeSchemaBreakValidatorImpl`.
> > > > >
> > > > > (2) Recoding the `ProtobufNativeSchemaCompatibilityCheck`. Through
> > the
> > > > flag
> > > > > (`ValidatorClassName`) to build different
> > > > `ProtobufNativeSchemaValidator`.
> > > > > Isn't it just a plug-in? The user can develop and choose a
> different
> > > > > `ProtobufNativeSchemaValidator`. I think it didn't change the
> logic,
> > it
> > > > > just allowed him to expand it.
> > > > >
> > > > >
> > > > > I think this PIP should be an enhancement and supplement to the
> > function,
> > > > > and there is no such thing as unnecessary and meaningless.
> > > > >
> > > > >
> > > > > Thanks,
> > > > > sinan
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > 丛搏 <bo...@apache.org> 于2023年3月7日周二 11:53写道:
> > > > >
> > > > > > I think we have two ways to do that.
> > > > > >
> > > > > > First way: We need to advance the improvement of java in
> protobuf.
> > Ask
> > > > > > if they have plans to improve.
> > > > > >
> > > > > > Second way: the new PROTOBUF_NATIVE `SchemaCompatibilityCheck`
> > should
> > > > > > be implemented as a plugin, don't change any existing plugin
> logic
> > > > > > (it's simple and already used). I don't recommend adding flags
> for
> > > > > > rollback, it adds configuration and makes little sense.
> > > > > >
> > > > > > Thanks,
> > > > > > Bo
> > > > > >
> > > > > > Asaf Mesika <as...@gmail.com> 于2023年3月6日周一 23:00写道:
> > > > > >
> > > > > > >
> > > > > > > Can you convert the code block which is actually a quote in the
> > > > > > > beginning of the PIP to something which doesn't require to
> scroll
> > > > > > > horizontally so much?
> > > > > > > Use
> > > > > > >
> > > > > >
> > > >
> >
> https://docs.github.com/en/get-started/writing-on-github/getting-started-with-writing-and-formatting-on-github/basic-writing-and-formatting-syntax#quoting-text
> > > > > > >
> > > > > > > Let's improve the clarity of what you wrote:
> > > > > > >
> > > > > > > "the PROTOBUF uses avro struct to store."
> > > > > > > -->
> > > > > > > When Schema type PROTOBUF is used, Pulsar Client assumes the
> > object
> > > > given
> > > > > > > to it as message data is an auto-generated POJO containing the
> > > > > > annotations
> > > > > > > encoding the schema. The client is using a converter, which
> > converts
> > > > a
> > > > > > > Protobuf schema descriptor into an Avro schema and sends that
> as
> > the
> > > > > > Schema
> > > > > > > of the producer/consumer.
> > > > > > >
> > > > > > > "On the broker side, protobuf and avro both use SchemaData
> > converted
> > > > to
> > > > > > > org.apache.avro.Schema."
> > > > > > > -->
> > > > > > > Since the schema is an Avro schema, the implementation of
> > > > compatibility
> > > > > > > check on the broker side is to simply re-use the compatibility
> > check
> > > > of
> > > > > > the
> > > > > > > AVRO schema type.
> > > > > > >
> > > > > > > "ProtobufSchema is different from ProtobufNativeSchema in
> schema
> > > > > > > compatibility check it uses avro-protobuf.
> > > > > > >
> > > > > >
> > > >
> >
> https://central.sonatype.com/artifact/org.apache.avro/avro-protobuf/1.11.1/overview
> > > > > > > But the current implementation of ProtobufNative schema
> > compatibility
> > > > > > > check only
> > > > > > > checked if the root message name is changed."
> > > > > > >
> > > > > > > -->
> > > > > > > PROTOBUF_NATIVE schema type is different.
> > > > > > > The client is actually using Protobuf Descriptor as the schema,
> > as
> > > > > > opposed
> > > > > > > to Avro schema of PROTOBUF schema type. In the broker, the
> > > > > > PROTOBUF_NATIVE
> > > > > > > compatibility check actually hasn't implemented any rule,
> besides
> > > > one:
> > > > > > > checking if the root message name has changed.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > >    1. For now, there is no official or third-party solution
> for
> > > > > > ProtoBuf
> > > > > > > >    compatibility. If in the future have better solutions of a
> > third
> > > > > > party or
> > > > > > > >    the official, we develop new ProtobufNativeSchemaValidator
> > and
> > > > use,
> > > > > > so
> > > > > > > >    add a flag.
> > > > > > > >
> > > > > > > > Who do you need to make that configurable? Once you found a
> > third
> > > > > > party,
> > > > > > > just switch to it? Who knows, maybe you never will. Introduce
> it
> > > > when you
> > > > > > > find it, not now.
> > > > > > >
> > > > > > >
> > > > > > > We improve in ProtobufNativeSchemaCompatibilityCheck BACKWARD,
> > > > FORWARD
> > > > > > > > these strategies. As with the AVRO implementation, protobuf
> > > > > > compatibility
> > > > > > > > checking need implementing the canRead method. *This will
> check
> > > > that
> > > > > > > > the writtenschema can be read by readSchema.*
> > > > > > >
> > > > > > >
> > > > > > > I completely disagree.
> > > > > > > Avro implementation is confusing for our use case. Don't copy
> > that.
> > > > > > >
> > > > > > > You have
> > > > > > >
> > > > > > > public void checkCompatible(SchemaData from, SchemaData to,
> > > > > > > SchemaCompatibilityStrategy strategy)
> > > > > > >         throws IncompatibleSchemaException {
> > > > > > >     Descriptor fromDescriptor =
> > > > > > > ProtobufNativeSchemaUtils.deserialize(from.getData());
> > > > > > >     Descriptor toDescriptor =
> > > > > > > ProtobufNativeSchemaUtils.deserialize(to.getData());
> > > > > > >     switch (strategy) {
> > > > > > >         case BACKWARD_TRANSITIVE:
> > > > > > >         case BACKWARD:
> > > > > > >         case FORWARD_TRANSITIVE:
> > > > > > >         case FORWARD:
> > > > > > >         case FULL_TRANSITIVE:
> > > > > > >         case FULL:
> > > > > > >             checkRootMessageChange(fromDescriptor,
> toDescriptor,
> > > > > > strategy);
> > > > > > >             return;
> > > > > > >         case ALWAYS_COMPATIBLE:
> > > > > > >             return;
> > > > > > >         default:
> > > > > > >             throw new IncompatibleSchemaException("Unknown
> > > > > > > SchemaCompatibilityStrategy.");
> > > > > > >     }
> > > > > > > }
> > > > > > >
> > > > > > > I would rename :
> > > > > > > from --> currentSchema
> > > > > > > to --> newSchema
> > > > > > >
> > > > > > > Use that switch case and have a method for each like:
> > > > > > > validateBackwardsCompatibility(currentSchema, newSchema)
> > > > > > >
> > > > > > > I dislike canRead and usage of writtenSchema, since you have
> two
> > > > > > completely
> > > > > > > different use cases: from the producing side and the consumer
> > side.
> > > > > > >
> > > > > > > schemaValidatorBuilder
> > > > > > > >
> > > > > > > > I dislike this proposal. IMO Avro implementation is way too
> > > > > > complicated.
> > > > > > > Why not have a simple function for validation for each switch
> > case
> > > > above?
> > > > > > > Why do we need strategy and builder, and all this complexity?
> > > > > > >
> > > > > > >
> > > > > > > *Here are the basic compatibility rules we've defined:*
> > > > > > >
> > > > > > >
> > > > > > > IMO it's impossible to read the validation rules as you
> described
> > > > them.
> > > > > > > I wrote how they should be structured numerous times above.
> > > > > > > I can't validate them.
> > > > > > >
> > > > > > >
> > > > > > > IMO, the current design is very hard to read.
> > > > > > > Please try to avoid jumping into code sections.
> > > > > > > Write a high level design section, in which you describe in
> words
> > > > what
> > > > > > you
> > > > > > > plan to do.
> > > > > > > Write the validation rules in the structure that is easy to
> > > > understand:
> > > > > > > rules per each compatibility check, and use proper words
> (current
> > > > schema,
> > > > > > > new schema), since new schema can be once used for read and
> once
> > > > used for
> > > > > > > write.
> > > > > > >
> > > > > > > In its current form it takes too much time to understand the
> > design,
> > > > and
> > > > > > it
> > > > > > > shouldn't be the case.
> > > > > > >
> > > > > > > Thanks,
> > > > > > >
> > > > > > > Asaf
> > > > > > >
> > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Sun, Mar 5, 2023 at 3:58 PM SiNan Liu <
> liusinan1998@gmail.com
> > >
> > > > wrote:
> > > > > > >
> > > > > > > > Hi! I updated the explanation of some things in the PIP
> issue.
> > And
> > > > also
> > > > > > > > added a new “flag” in the conf is used as the different
> > > > > > > > ProtobufNativeSchemaValidator implementation, also set
> > > > > > > > ProtobufNativeSchemaValidator default only check whether the
> > name
> > > > of
> > > > > > the
> > > > > > > > root message is the same.
> > > > > > > >
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > sinan
> > > > > > > >
> > > > > > > >
> > > > > > > > Asaf Mesika <as...@gmail.com> 于2023年3月5日周日 20:21写道:
> > > > > > > >
> > > > > > > > > On Wed, Mar 1, 2023 at 4:33 PM SiNan Liu <
> > liusinan1998@gmail.com
> > > > >
> > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Can you please explain how a Protobuf Schema descriptor
> > can
> > > > be
> > > > > > > > > validated
> > > > > > > > > > > for backward compatibility check using Avro based
> > > > compatibility
> > > > > > > > rules?
> > > > > > > > > > > Doesn't it expect the schema to be Avro, but it is
> > actually a
> > > > > > > > Protobuf
> > > > > > > > > > > descriptor?
> > > > > > > > > > > Is there some translation happening?
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > 1. *You can take a quick look at the previous design, the
> > > > PROTOBUF
> > > > > > uses
> > > > > > > > > > avro struct to store.*
> > > > > > > > > > https://github.com/apache/pulsar/pull/1954
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> >
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/ProtobufSchema.java#L59-L61
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> >
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/ProtobufSchema.java#L110-L115
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Ok. So to summarize your code (easier to write it than send
> > > > links):
> > > > > > > > > * Pulsar Client, when used with Protobuf Schema, actually
> > > > converts
> > > > > > the
> > > > > > > > > Protobuf descriptor into an Avro Schema (using code found
> > inside
> > > > Avro
> > > > > > > > > library) and saves that Avro schema as the schema. It's not
> > > > saving
> > > > > > the
> > > > > > > > > protobuf descriptor at all. Very confusing I have to add -
> > never
> > > > > > expected
> > > > > > > > > that.
> > > > > > > > > This explains why In the ProtobufSchemaCompatibilityCheck
> > they
> > > > just
> > > > > > > > extend
> > > > > > > > > the Avro without doing any translation.
> > > > > > > > >
> > > > > > > > > Thanks for that.
> > > > > > > > >
> > > > > > > > > Now thatI finally understand this, I can say that: you
> *must*
> > > > explain
> > > > > > > > that
> > > > > > > > > in the motivation part in your PIP.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > 2. *On the broker side, protobuf and avro both use
> > `SchemaData`
> > > > > > > > converted
> > > > > > > > > > to `org.apache.avro.Schema`.*
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> >
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/ServerCnx.java#L1280-L1293
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> >
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/schema/ProtobufSchemaCompatibilityCheck.java#L26-L31
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> >
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/schema/AvroSchemaBasedCompatibilityCheck.java#L47-L70
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Actually those links don't really help.
> > > > > > > > > The main link that helps is:
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> >
> https://github.com/apache/pulsar/blob/ec102fb024a6ea2b195826778300f20e330dff06/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/ProtobufSchema.java#L102-L122
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > I'm sorry - I don't understand.
> > > > > > > > > > > I understand the different compatibility check
> > strategies.
> > > > > > > > > > > If you just spell them out here, then as you say, just
> > > > translate
> > > > > > the
> > > > > > > > > > > Protobuf Descriptor into an Avro schema and run the
> Avro
> > > > > > > > > > > compatibility validation, no?
> > > > > > > > > > > I believe the answer is no, since you may want to
> verify
> > > > > > different
> > > > > > > > > things
> > > > > > > > > > > when it comes to Protobuf, which are different then
> Avro.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > 1.
> > > > > > > > > > *ProtobufSchema is different from ProtobufNativeSchema in
> > that
> > > > it
> > > > > > uses
> > > > > > > > > > avro-protobuf.*
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> >
> https://central.sonatype.com/artifact/org.apache.avro/avro-protobuf/1.11.1/overview
> > > > > > > > > > *ProtobufNativeSchema needs a native compatibility check,
> > but
> > > > > > there is
> > > > > > > > no
> > > > > > > > > > official or third party implementation. So this PIP does
> > not
> > > > use
> > > > > > > > > > avro-protobuf for protobuf compatibility checking.*
> > > > > > > > > >
> > > > > > > > > > 2. *By the way, this is implemented in much the same way
> > that
> > > > > > Apache
> > > > > > > > avro
> > > > > > > > > > does compatibility checking.*
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> >
> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/SchemaValidatorBuilder.java
> > > > > > > > > >
> `canReadStrategy`,`canBeReadStrategy`,`mutualReadStrategy`
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> >
> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/ValidateCanRead.java
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> >
> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/ValidateCanBeRead.java
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> >
> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/ValidateMutualRead.java
> > > > > > > > > > *In `ValidateMutualRead.java`, the arguments of
> > `canRead()` are
> > > > > > > > > > writtenSchema and readSchema. We only need to change the
> > order
> > > > of
> > > > > > > > > arguments
> > > > > > > > > > we pass to `canRead()`.*
> > > > > > > > > > ```java
> > > > > > > > > > private void validateWithStrategy(Descriptors.Descriptor
> > > > > > toValidate,
> > > > > > > > > > Descriptors.Descriptor fromDescriptor) throws
> > > > > > > > > ProtoBufCanReadCheckException
> > > > > > > > > > {
> > > > > > > > > > switch (strategy) {
> > > > > > > > > > case CanReadExistingStrategy -> canRead(fromDescriptor,
> > > > > > toValidate);
> > > > > > > > > > case CanBeReadByExistingStrategy -> canRead(toValidate,
> > > > > > > > fromDescriptor);
> > > > > > > > > > case CanBeReadMutualStrategy -> {
> > > > > > > > > > canRead(toValidate, fromDescriptor);
> > > > > > > > > > canRead(fromDescriptor, toValidate);
> > > > > > > > > > }
> > > > > > > > > > }
> > > > > > > > > > }
> > > > > > > > > >
> > > > > > > > > > private void canRead(Descriptors.Descriptor
> writtenSchema,
> > > > > > > > > > Descriptors.Descriptor readSchema) throws
> > > > > > > > ProtoBufCanReadCheckException {
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> >
> ProtobufNativeSchemaBreakCheckUtils.checkSchemaCompatibility(writtenSchema,
> > > > > > > > > > readSchema);
> > > > > > > > > > }
> > > > > > > > > > ```
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > I get that you want to take inspiration from the existing
> > Avro
> > > > Schema
> > > > > > > > > compatibility check, to do your code design.
> > > > > > > > > I also understand you *won't* use any existing avro code
> for
> > > > that.
> > > > > > > > > I also understand, you have to write the validation check
> on
> > your
> > > > > > own,
> > > > > > > > > since there is no 3rd party to explain that.
> > > > > > > > >
> > > > > > > > > The only thing I can't understand are the actual rules as I
> > wrote
> > > > > > before,
> > > > > > > > > since they are written confusingly.
> > > > > > > > > So, I repeat what I asked before:
> > > > > > > > >
> > > > > > > > > I think you should structure the validation rules
> > differently:
> > > > > > > > >
> > > > > > > > > * Backward checks
> > > > > > > > > ** List down rules, where use newSchema (the schema used by
> > > > producer
> > > > > > or
> > > > > > > > > consumer) and existingSchema (last schema used)
> > > > > > > > > * Forward
> > > > > > > > > ** List down rules, where use newSchema (the schema used by
> > > > producer
> > > > > > or
> > > > > > > > > consumer) and existingSchema (last schema used)
> > > > > > > > >
> > > > > > > > > Once that's accomplished I will be able to understand the
> > > > different
> > > > > > > > > validation rules for each compatibility check.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Thanks,
> > > > > > > > > > sinan
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Asaf Mesika <as...@gmail.com> 于2023年3月1日周三
> 21:19写道:
> > > > > > > > > > >
> > > > > > > > > > > On Mon, Feb 27, 2023 at 3:47 PM SiNan Liu <
> > > > > > liusinan1998@gmail.com>
> > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > I read it and they look identical. What's the
> > difference
> > > > > > between
> > > > > > > > > > them?
> > > > > > > > > > > >
> > > > > > > > > > > > Current avro,json, and protobuf schemas are all
> > implemented
> > > > > > based
> > > > > > > > on
> > > > > > > > > > AVRO.
> > > > > > > > > > > > > What do you mean, they are all implemented based on
> > > > Avro? You
> > > > > > > > mean
> > > > > > > > > > the
> > > > > > > > > > > > > protobuf schema is converted into an Avro Schema,
> and
> > > > then
> > > > > > you
> > > > > > > > use
> > > > > > > > > > Avro
> > > > > > > > > > > > > compatibility validation?
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> >
> `org.apache.pulsar.broker.service.schema.ProtobufSchemaCompatibilityCheck`
> > > > > > > > > > > >
> > > > > > > > >
> > > > > >
> > `org.apache.pulsar.broker.service.schema.AvroSchemaCompatibilityCheck`
> > > > > > > > > > > >
> > > > > > > > >
> > > > > >
> > `org.apache.pulsar.broker.service.schema.JsonSchemaCompatibilityCheck`
> > > > > > > > > > > > They all extends `AvroSchemaBasedCompatibilityCheck`,
> > the
> > > > > > > > > > > > `checkCompatible()` is the same implementation with
> > `AVRO`.
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Can you please explain how a Protobuf Schema descriptor
> > can
> > > > be
> > > > > > > > > validated
> > > > > > > > > > > for backward compatibility check using Avro based
> > > > compatibility
> > > > > > > > rules?
> > > > > > > > > > > Doesn't it expect the schema to be Avro, but it is
> > actually a
> > > > > > > > Protobuf
> > > > > > > > > > > descriptor?
> > > > > > > > > > > Is there some translation happening?
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > I think you should structure the validation rules
> > > > differently:
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > The Compatibility check strategy is described on the
> > > > website
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> >
> https://pulsar.apache.org/docs/next/schema-understand/#schema-compatibility-check-strategy
> > > > > > > > > > > > 1. BACKWARD(CanReadExistingStrategy): Consumers using
> > > > schema
> > > > > > V3 can
> > > > > > > > > > process
> > > > > > > > > > > > data written by producers using the last schema
> version
> > > > V2. So
> > > > > > V2
> > > > > > > > is
> > > > > > > > > > > > "writtenSchema" and V3 is "readSchema".
> > > > > > > > > > > > 2. FORWARD(CanBeReadByExistingStrategy): Consumers
> > using
> > > > the
> > > > > > last
> > > > > > > > > > schema
> > > > > > > > > > > > version V2 can process data written by producers
> using
> > a
> > > > new
> > > > > > schema
> > > > > > > > > V3,
> > > > > > > > > > > > even though they may not be able to use the full
> > > > capabilities
> > > > > > of
> > > > > > > > the
> > > > > > > > > > new
> > > > > > > > > > > > schema. So V3 is "writtenSchema" and V2 is
> > "readSchema".
> > > > > > > > > > > > 3. FULL(CanBeReadMutualStrategy): Schemas are both
> > > > backward and
> > > > > > > > > forward
> > > > > > > > > > > > compatible.
> > > > > > > > > > > > Schema can evolve. The old version schema and the new
> > > > version
> > > > > > > > schema
> > > > > > > > > > should
> > > > > > > > > > > > be well understood.
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > I'm sorry - I don't understand.
> > > > > > > > > > > I understand the different compatibility check
> > strategies.
> > > > > > > > > > > If you just spell them out here, then as you say, just
> > > > translate
> > > > > > the
> > > > > > > > > > > Protobuf Descriptor into an Avro schema and run the
> Avro
> > > > > > > > > > > compatibility validation, no?
> > > > > > > > > > > I believe the answer is no, since you may want to
> verify
> > > > > > different
> > > > > > > > > things
> > > > > > > > > > > when it comes to Protobuf, which are different then
> Avro.
> > > > > > > > > > >
> > > > > > > > > > > At the current state, I can't understand your design at
> > all.
> > > > > > Please
> > > > > > > > > help
> > > > > > > > > > > clarify that.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > So each strategy should have its own section.
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > The arguments of `canRead()` are writtenSchema and
> > > > readSchema.
> > > > > > As
> > > > > > > > > we've
> > > > > > > > > > > > just described, we only need to change the order of
> > > > arguments
> > > > > > we
> > > > > > > > pass
> > > > > > > > > > to
> > > > > > > > > > > > `canRead()`.
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks,
> > > > > > > > > > > > sinan
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Asaf Mesika <as...@gmail.com> 于2023年2月27日周一
> > 20:49写道:
> > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > And you can see the difference between ProtoBuf
> and
> > > > > > > > > ProtoBufNative:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > >
> > > > > >
> > https://pulsar.apache.org/docs/next/schema-get-started/#protobufnative
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > >
> > https://pulsar.apache.org/docs/next/schema-get-started/#protobuf
> > > > > > > > > > > > > >
> > > > > > > > > > > > >  I read it and they look identical. What's the
> > difference
> > > > > > between
> > > > > > > > > > them?
> > > > > > > > > > > > >
> > > > > > > > > > > > > Current avro,json, and protobuf schemas are all
> > > > implemented
> > > > > > based
> > > > > > > > > on
> > > > > > > > > > > > AVRO.
> > > > > > > > > > > > >
> > > > > > > > > > > > > What do you mean, they are all implemented based on
> > > > Avro? You
> > > > > > > > mean
> > > > > > > > > > the
> > > > > > > > > > > > > protobuf schema is converted into an Avro Schema,
> and
> > > > then
> > > > > > you
> > > > > > > > use
> > > > > > > > > > Avro
> > > > > > > > > > > > > compatibility validation?
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > > *Here are the basic compatibility rules we've
> > defined:*
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > I think you should structure the validation rules
> > > > > > differently:
> > > > > > > > > > > > >
> > > > > > > > > > > > > * Backward checks
> > > > > > > > > > > > > ** List down rules, where use newSchema (the schema
> > used
> > > > by
> > > > > > > > > producer
> > > > > > > > > > or
> > > > > > > > > > > > > consumer) and existingSchema (last schema used)
> > > > > > > > > > > > > * Forward
> > > > > > > > > > > > > ** List down rules, where use newSchema (the schema
> > used
> > > > by
> > > > > > > > > producer
> > > > > > > > > > or
> > > > > > > > > > > > > consumer) and existingSchema (last schema used)
> > > > > > > > > > > > >
> > > > > > > > > > > > > So each strategy should have its own section.
> > > > > > > > > > > > >
> > > > > > > > > > > > > I'm saying this since you used "writttenSchema"
> word
> > but
> > > > it
> > > > > > > > > > represents
> > > > > > > > > > > > > something completely different if it's backward or
> > > > forward
> > > > > > check.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Once you'll have that structure like that, I
> > personally
> > > > will
> > > > > > be
> > > > > > > > > able
> > > > > > > > > > to
> > > > > > > > > > > > > read and understand it.
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > The motivation and problem statement are now good -
> > > > thanks
> > > > > > for
> > > > > > > > > > improving
> > > > > > > > > > > > > it.
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Mon, Feb 27, 2023 at 8:20 AM SiNan Liu <
> > > > > > > > liusinan1998@gmail.com>
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Hi! I updated the PIP issue again. This time I've
> > added
> > > > > > some
> > > > > > > > > > background
> > > > > > > > > > > > > and
> > > > > > > > > > > > > > some explanations.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > The compatibility check rules are already written
> > in
> > > > the
> > > > > > > > > > > > Implementation.
> > > > > > > > > > > > > > ProtoBufNative implements the same canRead method
> > as
> > > > Apache
> > > > > > > > Avro.
> > > > > > > > > > > > > > It does this by checking whether the schema for
> > > > writing and
> > > > > > > > > reading
> > > > > > > > > > is
> > > > > > > > > > > > > > compatible. I also indicate whether the
> > writtenSchema
> > > > and
> > > > > > > > > > readSchema of
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > Backward, Forward, and Full strategies are the
> old
> > or
> > > > the
> > > > > > new
> > > > > > > > > > version
> > > > > > > > > > > > of
> > > > > > > > > > > > > > the schema.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > sinan
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Asaf Mesika <as...@gmail.com>
> 于2023年2月26日周日
> > > > 23:24写道:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I'm sorry, but this PIP lacks a lot of
> background
> > > > > > knowledge,
> > > > > > > > so
> > > > > > > > > > you
> > > > > > > > > > > > > need
> > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > add IMO for people to understand it. You don't
> > need
> > > > to
> > > > > > > > explain
> > > > > > > > > > the
> > > > > > > > > > > > > entire
> > > > > > > > > > > > > > > pulsar in this PIP, but at the very least a few
> > > > > > paragraphs
> > > > > > > > > > detailing
> > > > > > > > > > > > > all
> > > > > > > > > > > > > > > you need to know, to put you in context:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >    - Start by saying Pulsar as a built-in
> schema
> > > > registry
> > > > > > > > > inside
> > > > > > > > > > > > Pulsar
> > > > > > > > > > > > > > >    broker.
> > > > > > > > > > > > > > >       - Every time the client updates the
> > schema, it
> > > > > > uploads
> > > > > > > > it
> > > > > > > > > > to
> > > > > > > > > > > > the
> > > > > > > > > > > > > > >       broker. When that happens, it has a
> feature
> > > > which
> > > > > > > > > validates
> > > > > > > > > > if
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > new
> > > > > > > > > > > > > > >       schema version is compatible with the
> > previous
> > > > > > > > versions.
> > > > > > > > > > There
> > > > > > > > > > > > > > > are 4 types
> > > > > > > > > > > > > > >       of compatibility: Full, ... (complete and
> > > > explain
> > > > > > each
> > > > > > > > > one
> > > > > > > > > > > > > briefly)
> > > > > > > > > > > > > > >    - Also explain Pulsar Schema registry
> supports
> > > > various
> > > > > > > > > schema
> > > > > > > > > > > > > > >    protocols:  Avro, protobuf native, ...
> > (complete
> > > > the
> > > > > > > > rest),
> > > > > > > > > > each
> > > > > > > > > > > > > > > protocol
> > > > > > > > > > > > > > >    has a schema which dictates how to serialize
> > and
> > > > > > > > deserialize
> > > > > > > > > > the
> > > > > > > > > > > > > > message
> > > > > > > > > > > > > > >    content into typed object.
> > > > > > > > > > > > > > >    - Explain in short what is protobuf native
> > > > (compare
> > > > > > > > protobuf
> > > > > > > > > > > > > > non-native)
> > > > > > > > > > > > > > >    - Please don't paste code instead of
> > explaining.
> > > > > > > > > > > > > > >       - Explain that protobuf native current
> > > > validation
> > > > > > check
> > > > > > > > > is
> > > > > > > > > > only
> > > > > > > > > > > > > > >       composed of checking the root message
> name
> > is
> > > > the
> > > > > > same
> > > > > > > > > > between
> > > > > > > > > > > > > > > the current
> > > > > > > > > > > > > > >       schema version and the new version.
> > > > > > > > > > > > > > >          - Explain briefly what is a root
> message
> > > > and its
> > > > > > > > name.
> > > > > > > > > > > > > > >       - Explain the problem (list scenarios)
> > that we
> > > > have
> > > > > > > > > because
> > > > > > > > > > > > > > protobuf
> > > > > > > > > > > > > > >       native schema only supports FULL
> > compatibility
> > > > > > > > > validation.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Regarding high level design - as in what you
> > plan to
> > > > do.
> > > > > > > > > > > > > > > I suggest you add "High Level Design" and in it
> > > > detail
> > > > > > how
> > > > > > > > you
> > > > > > > > > > plan
> > > > > > > > > > > > to
> > > > > > > > > > > > > > > validate, per protobuf version, per
> compatibility
> > > > check
> > > > > > > > > > (backward,
> > > > > > > > > > > > > > forward,
> > > > > > > > > > > > > > > full,...).
> > > > > > > > > > > > > > > I tried reading the implementation - for me ,
> > it's
> > > > all
> > > > > > over
> > > > > > > > the
> > > > > > > > > > > > place.
> > > > > > > > > > > > > > Can
> > > > > > > > > > > > > > > you please list in order what I wrote above,
> and
> > > > list the
> > > > > > > > > > validation
> > > > > > > > > > > > > > rules
> > > > > > > > > > > > > > > with a good explanation why you validate it
> like
> > > > that?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Lastly, one you have all the validation rules
> > clearly
> > > > > > stated,
> > > > > > > > > you
> > > > > > > > > > can
> > > > > > > > > > > > > use
> > > > > > > > > > > > > > > it to document it properly so users can know
> what
> > > > > > validation
> > > > > > > > to
> > > > > > > > > > > > expect.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Asaf
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Wed, Feb 22, 2023 at 5:10 PM SiNan Liu <
> > > > > > > > > > liusinan1998@gmail.com>
> > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Sorry, my mistake. I removed the code and
> > > > described the
> > > > > > > > > design
> > > > > > > > > > to
> > > > > > > > > > > > > > improve
> > > > > > > > > > > > > > > > the PROTOBUF_NATIVE schema compatibility
> > checks.
> > > > You
> > > > > > can
> > > > > > > > > have a
> > > > > > > > > > > > look.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Asaf Mesika <as...@gmail.com>
> > 于2023年2月22日周三
> > > > > > 21:16写道:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > I read it but you're almost directly diving
> > into
> > > > the
> > > > > > > > code -
> > > > > > > > > > it
> > > > > > > > > > > > will
> > > > > > > > > > > > > > > take
> > > > > > > > > > > > > > > > me
> > > > > > > > > > > > > > > > > hours just to reverse engineer your design.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Can you please include a "High Level
> Design"
> > > > section
> > > > > > in
> > > > > > > > > which
> > > > > > > > > > you
> > > > > > > > > > > > > > > explain
> > > > > > > > > > > > > > > > > how you plan to tackle any issue?
> > > > > > > > > > > > > > > > > If I can read that section and explain to
> > someone
> > > > > > else
> > > > > > > > how
> > > > > > > > > > this
> > > > > > > > > > > > > will
> > > > > > > > > > > > > > > > work,
> > > > > > > > > > > > > > > > > it means the section is complete.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Let's leave the code to the PRs.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > On Sun, Feb 19, 2023 at 2:59 PM SiNan Liu <
> > > > > > > > > > > > liusinan1998@gmail.com>
> > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Hi all,
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > I made a PIP to discuss:
> > > > > > > > > > > > > > > https://github.com/apache/pulsar/issues/19565
> > > > > > > > > > > > > > > > .
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > We can talk about the current design
> here.
> > > > > > Especially
> > > > > > > > for
> > > > > > > > > > the
> > > > > > > > > > > > > field
> > > > > > > > > > > > > > > > type
> > > > > > > > > > > > > > > > > > change check rules, please give your
> > valuable
> > > > > > advice.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > > > > Sinan
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> >
>

Re: [DISCUSS] PIP-246: Improved PROTOBUF_NATIVE schema compatibility checks without using avro-protobuf

Posted by Asaf Mesika <as...@gmail.com>.
I like Bo's suggestion - I haven't realized each schema type
compatibility check is actually a plugin.
It makes sense for any schema type checks to evolve, sometimes in a
non-backward compatible way hence having two plugins like
protobufNativeSchemaCompatabilityCheckV1 and then
protobufNativeSchemaCompatabilityCheckV2 and then
protobufNativeSchemaCompatabilityCheckV3 makes sense to me.



On Thu, Mar 9, 2023 at 5:49 AM 丛搏 <bo...@apache.org> wrote:

>  Hi siNan:
>
> From my point of view, it is just a plug-in. I don't think it is
> necessary to add configuration for the plugin.
> This is meaningless, and it will increase the difficulty of use for users.
>
>
> SiNan Liu <li...@gmail.com> 于2023年3月8日周三 15:54写道:
> >
> > Hi, bo.
> >
> > 1. I understand what you say, to develop a new
> > `ProtobufNativeAdvancedSchemaCompatibilityCheck`, rather than changing
> > existing `ProtobufNativeSchemaCompatibilityCheck`. But I found a few
> small
> > problems:
> >
> > (1)ProtobufNativeAdvancedSchemaCompatibilityCheck and
> > ProtobufNativeSchemaCompatibilityCheck schemaType is PROTOBUF_NATIVE. It
> > looks like both checkers are PROTOBUF not using AVRO-PROTOBUF's "native"
> > implementation, which leads to some problems or "unreasonable" and gives
> me
> > some extended thinking and questions.
> >
> `CompatibilityCheck ` its only a plugin.
> `ProtobufNativeSchemaCompatibilityCheck` may sooner or later leave the
> stage, when `ProtobufNativeAdvancedSchemaCompatibilityCheck` is
> stable, we can make it the default Checker.
>
> It is just a plug-in, users can change it at will and ensure that it
> is used correctly
> > (2)In broker.conf
> >
> > `schemaRegistryCompatibilityCheckers`. If
> > ProtobufNativeSchemaCompatibilityCheck and
> > ProtobufNativeAdvancedSchemaCompatibilityCheck all set. This is going to
> > overwrite each other. Because this is a map:
> >
> >
> https://github.com/apache/pulsar/blob/af1360fb167c1f9484fda5771df3ea9b21d1440b/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/schema/SchemaRegistryService.java#L36-L44
> >
> > ```java
> >
> > Map<SchemaType, SchemaCompatibilityCheck> checkers = new HashMap<>();
> >
> > for (String className : checkerClasses) {
> >
> > SchemaCompatibilityCheck schemaCompatibilityCheck =
> > Reflections.createInstance(className,
> >
> > SchemaCompatibilityCheck.class,
> > Thread.currentThread().getContextClassLoader());
> >
> > checkers.put(schemaCompatibilityCheck.getSchemaType(),
> > schemaCompatibilityCheck);
> >
> > ```
> >
> > Is this a big problem or a small one? Is it possible or unnecessary?
> Maybe
> > we can write in the documentation that protobufNative checkers can only
> > choose one of the two? Why are there two Checkers for different
> > implementations of the same schemaType? Why not the checker to create
> > different validator, so we don not have to change
> > schemaRegistryCompatibilityCheckers.
>
> users can only use one, not two, which will bring complexity to users
>
> >
> > (3)And after the update to
> ProtobufNativeAdvancedSchemaCompatibilityCheck.
> > Existing topics previously only checked the name of the root message, not
> > the content of protobuf.
> >
> > What if the user wants both Checkers?
> >
> > Set to ProtobufNativeAdvancedSchemaCompatibilityCheck, affect the topic
> of
> > the existing schema?
> >
> > Older topics still use the old checker, and newer topics or certain older
> > topics use the new advancedchecker.
> >
> when `ProtobufNativeAdvancedSchemaCompatibilityCheck` stable,
> users will not choose `ProtobufNativeSchemaCompatibilityCheck`.
> because it not a complete checker.
> > (4)So should we have one schemaType for a checker? protobufNativeChecker
> > can have as many different implementation classes as possible. This
> > classname configuration in PIP, let's see if it can be set at the topic
> > level. In the current PIP design I just load this parameter into the
> > checker when the broker is started and the checkers map is set up. Can I
> do
> > this in the new normal pr if I want to support topic level? Or perfect it
> > here?
> >
> > Add a call PROTOBUF_NATIVE_ADVANCE schemaType corresponding
> > ProtobufNativeAdvancedSchemaCompatibilityCheck? (Seems to be more
> trouble).
> >
> > Sorry I can not use the computer and network in the company, I use my
> > mobile phone to reply to the email, the format may be a bit messy. Please
> > understand.
> >
> > Thanks,
> >
> > sinan
> >
> >
> > 丛搏 <bo...@apache.org> 于 2023年3月7日周二 下午11:39写道:
> >
> > > SiNan Liu <li...@gmail.com> 于2023年3月7日周二 13:22写道:
> > > >
> > > > Great to see your comment, bo!
> > > >
> > > > 1. The first way. The protobuf website has a description of the
> rules,
> > > but
> > > > no plans to implement them.
> > > > https://protobuf.dev/programming-guides/proto/#updating
> > >
> > > https://groups.google.com/g/protobuf
> > > maybe ask here
> > >
> > > >
> > > > 2. I think this PIP can be divided into two parts.
> > > > (1) Add a flag(`ValidatorClassName`), load it into
> > > > `ProtobufNativeSchemaCompatibilityCheck` when the broker starts.
> > > > ValidatorClassName is empty by default, and the implementation
> continues
> > > as
> > > > before, with no change for the user.
> > >
> > > `ProtobufNativeSchemaCompatibilityCheck` is a plugin in `broker.conf`
> > > ```
> > >
> > >
> schemaRegistryCompatibilityCheckers=org.apache.pulsar.broker.service.schema.JsonSchemaCompatibilityCheck,org.apache.pulsar.broker.service.schema.AvroSchemaCompatibilityCheck,org.apache.pulsar.broker.service.schema.ProtobufNativeSchemaCompatibilityCheck
> > > ```
> > > I do not recommend that we directly modify this plugin and continue to
> > > add configuration items, which will cause trouble for users.
> > > We have a lot of configs and it's getting very unwieldy.
> > > in my opinion, we don't change
> > >
> > >
> `org.apache.pulsar.broker.service.schema.ProtobufNativeSchemaCompatibilityCheck`,
> > > it is a simple implementation, it doesn't go wrong very often, most
> > > users will use it. we can add another ProtobufNativeCheck named
> > > `ProtobufNativeAdvancedSchemaCompatibilityCheck ` or other. in this
> > > way, we don't need to add this flag. There is no need to consider
> > > compatibility, it is just a plug-in and will not affect current logic.
> > > If the user needs it, just change the plugin to the new implementation
> > >
> > > > ```java
> > > >     ProtobufNativeSchemaValidator DEFAULT = (fromDescriptors,
> > > toDescriptor)
> > > > -> {
> > > >         for (Descriptors.Descriptor fromDescriptor :
> fromDescriptors) {
> > > >             // The default implementation only checks if the root
> message
> > > > has changed.
> > > >             if
> > > > (!fromDescriptor.getFullName().equals(toDescriptor.getFullName())) {
> > > >                 throw new ProtoBufCanReadCheckException("Protobuf
> root
> > > > message isn't allow change!");
> > > >             }
> > > >         }
> > > >     };
> > > > ```
> > > > `ValidatorClassName` value also can be set to the current
> implementation
> > > of
> > > > PIP add
> > > >
> > >
> `org.apache.pulsar.broker.service.schema.validator.ProtobufNativeSchemaBreakValidatorImpl`.
> > > >
> > > > (2) Recoding the `ProtobufNativeSchemaCompatibilityCheck`. Through
> the
> > > flag
> > > > (`ValidatorClassName`) to build different
> > > `ProtobufNativeSchemaValidator`.
> > > > Isn't it just a plug-in? The user can develop and choose a different
> > > > `ProtobufNativeSchemaValidator`. I think it didn't change the logic,
> it
> > > > just allowed him to expand it.
> > > >
> > > >
> > > > I think this PIP should be an enhancement and supplement to the
> function,
> > > > and there is no such thing as unnecessary and meaningless.
> > > >
> > > >
> > > > Thanks,
> > > > sinan
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > 丛搏 <bo...@apache.org> 于2023年3月7日周二 11:53写道:
> > > >
> > > > > I think we have two ways to do that.
> > > > >
> > > > > First way: We need to advance the improvement of java in protobuf.
> Ask
> > > > > if they have plans to improve.
> > > > >
> > > > > Second way: the new PROTOBUF_NATIVE `SchemaCompatibilityCheck`
> should
> > > > > be implemented as a plugin, don't change any existing plugin logic
> > > > > (it's simple and already used). I don't recommend adding flags for
> > > > > rollback, it adds configuration and makes little sense.
> > > > >
> > > > > Thanks,
> > > > > Bo
> > > > >
> > > > > Asaf Mesika <as...@gmail.com> 于2023年3月6日周一 23:00写道:
> > > > >
> > > > > >
> > > > > > Can you convert the code block which is actually a quote in the
> > > > > > beginning of the PIP to something which doesn't require to scroll
> > > > > > horizontally so much?
> > > > > > Use
> > > > > >
> > > > >
> > >
> https://docs.github.com/en/get-started/writing-on-github/getting-started-with-writing-and-formatting-on-github/basic-writing-and-formatting-syntax#quoting-text
> > > > > >
> > > > > > Let's improve the clarity of what you wrote:
> > > > > >
> > > > > > "the PROTOBUF uses avro struct to store."
> > > > > > -->
> > > > > > When Schema type PROTOBUF is used, Pulsar Client assumes the
> object
> > > given
> > > > > > to it as message data is an auto-generated POJO containing the
> > > > > annotations
> > > > > > encoding the schema. The client is using a converter, which
> converts
> > > a
> > > > > > Protobuf schema descriptor into an Avro schema and sends that as
> the
> > > > > Schema
> > > > > > of the producer/consumer.
> > > > > >
> > > > > > "On the broker side, protobuf and avro both use SchemaData
> converted
> > > to
> > > > > > org.apache.avro.Schema."
> > > > > > -->
> > > > > > Since the schema is an Avro schema, the implementation of
> > > compatibility
> > > > > > check on the broker side is to simply re-use the compatibility
> check
> > > of
> > > > > the
> > > > > > AVRO schema type.
> > > > > >
> > > > > > "ProtobufSchema is different from ProtobufNativeSchema in schema
> > > > > > compatibility check it uses avro-protobuf.
> > > > > >
> > > > >
> > >
> https://central.sonatype.com/artifact/org.apache.avro/avro-protobuf/1.11.1/overview
> > > > > > But the current implementation of ProtobufNative schema
> compatibility
> > > > > > check only
> > > > > > checked if the root message name is changed."
> > > > > >
> > > > > > -->
> > > > > > PROTOBUF_NATIVE schema type is different.
> > > > > > The client is actually using Protobuf Descriptor as the schema,
> as
> > > > > opposed
> > > > > > to Avro schema of PROTOBUF schema type. In the broker, the
> > > > > PROTOBUF_NATIVE
> > > > > > compatibility check actually hasn't implemented any rule, besides
> > > one:
> > > > > > checking if the root message name has changed.
> > > > > >
> > > > > >
> > > > > >
> > > > > > >    1. For now, there is no official or third-party solution for
> > > > > ProtoBuf
> > > > > > >    compatibility. If in the future have better solutions of a
> third
> > > > > party or
> > > > > > >    the official, we develop new ProtobufNativeSchemaValidator
> and
> > > use,
> > > > > so
> > > > > > >    add a flag.
> > > > > > >
> > > > > > > Who do you need to make that configurable? Once you found a
> third
> > > > > party,
> > > > > > just switch to it? Who knows, maybe you never will. Introduce it
> > > when you
> > > > > > find it, not now.
> > > > > >
> > > > > >
> > > > > > We improve in ProtobufNativeSchemaCompatibilityCheck BACKWARD,
> > > FORWARD
> > > > > > > these strategies. As with the AVRO implementation, protobuf
> > > > > compatibility
> > > > > > > checking need implementing the canRead method. *This will check
> > > that
> > > > > > > the writtenschema can be read by readSchema.*
> > > > > >
> > > > > >
> > > > > > I completely disagree.
> > > > > > Avro implementation is confusing for our use case. Don't copy
> that.
> > > > > >
> > > > > > You have
> > > > > >
> > > > > > public void checkCompatible(SchemaData from, SchemaData to,
> > > > > > SchemaCompatibilityStrategy strategy)
> > > > > >         throws IncompatibleSchemaException {
> > > > > >     Descriptor fromDescriptor =
> > > > > > ProtobufNativeSchemaUtils.deserialize(from.getData());
> > > > > >     Descriptor toDescriptor =
> > > > > > ProtobufNativeSchemaUtils.deserialize(to.getData());
> > > > > >     switch (strategy) {
> > > > > >         case BACKWARD_TRANSITIVE:
> > > > > >         case BACKWARD:
> > > > > >         case FORWARD_TRANSITIVE:
> > > > > >         case FORWARD:
> > > > > >         case FULL_TRANSITIVE:
> > > > > >         case FULL:
> > > > > >             checkRootMessageChange(fromDescriptor, toDescriptor,
> > > > > strategy);
> > > > > >             return;
> > > > > >         case ALWAYS_COMPATIBLE:
> > > > > >             return;
> > > > > >         default:
> > > > > >             throw new IncompatibleSchemaException("Unknown
> > > > > > SchemaCompatibilityStrategy.");
> > > > > >     }
> > > > > > }
> > > > > >
> > > > > > I would rename :
> > > > > > from --> currentSchema
> > > > > > to --> newSchema
> > > > > >
> > > > > > Use that switch case and have a method for each like:
> > > > > > validateBackwardsCompatibility(currentSchema, newSchema)
> > > > > >
> > > > > > I dislike canRead and usage of writtenSchema, since you have two
> > > > > completely
> > > > > > different use cases: from the producing side and the consumer
> side.
> > > > > >
> > > > > > schemaValidatorBuilder
> > > > > > >
> > > > > > > I dislike this proposal. IMO Avro implementation is way too
> > > > > complicated.
> > > > > > Why not have a simple function for validation for each switch
> case
> > > above?
> > > > > > Why do we need strategy and builder, and all this complexity?
> > > > > >
> > > > > >
> > > > > > *Here are the basic compatibility rules we've defined:*
> > > > > >
> > > > > >
> > > > > > IMO it's impossible to read the validation rules as you described
> > > them.
> > > > > > I wrote how they should be structured numerous times above.
> > > > > > I can't validate them.
> > > > > >
> > > > > >
> > > > > > IMO, the current design is very hard to read.
> > > > > > Please try to avoid jumping into code sections.
> > > > > > Write a high level design section, in which you describe in words
> > > what
> > > > > you
> > > > > > plan to do.
> > > > > > Write the validation rules in the structure that is easy to
> > > understand:
> > > > > > rules per each compatibility check, and use proper words (current
> > > schema,
> > > > > > new schema), since new schema can be once used for read and once
> > > used for
> > > > > > write.
> > > > > >
> > > > > > In its current form it takes too much time to understand the
> design,
> > > and
> > > > > it
> > > > > > shouldn't be the case.
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Asaf
> > > > > >
> > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Sun, Mar 5, 2023 at 3:58 PM SiNan Liu <liusinan1998@gmail.com
> >
> > > wrote:
> > > > > >
> > > > > > > Hi! I updated the explanation of some things in the PIP issue.
> And
> > > also
> > > > > > > added a new “flag” in the conf is used as the different
> > > > > > > ProtobufNativeSchemaValidator implementation, also set
> > > > > > > ProtobufNativeSchemaValidator default only check whether the
> name
> > > of
> > > > > the
> > > > > > > root message is the same.
> > > > > > >
> > > > > > >
> > > > > > > Thanks,
> > > > > > > sinan
> > > > > > >
> > > > > > >
> > > > > > > Asaf Mesika <as...@gmail.com> 于2023年3月5日周日 20:21写道:
> > > > > > >
> > > > > > > > On Wed, Mar 1, 2023 at 4:33 PM SiNan Liu <
> liusinan1998@gmail.com
> > > >
> > > > > wrote:
> > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Can you please explain how a Protobuf Schema descriptor
> can
> > > be
> > > > > > > > validated
> > > > > > > > > > for backward compatibility check using Avro based
> > > compatibility
> > > > > > > rules?
> > > > > > > > > > Doesn't it expect the schema to be Avro, but it is
> actually a
> > > > > > > Protobuf
> > > > > > > > > > descriptor?
> > > > > > > > > > Is there some translation happening?
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > 1. *You can take a quick look at the previous design, the
> > > PROTOBUF
> > > > > uses
> > > > > > > > > avro struct to store.*
> > > > > > > > > https://github.com/apache/pulsar/pull/1954
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > >
> > >
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/ProtobufSchema.java#L59-L61
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > >
> > >
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/ProtobufSchema.java#L110-L115
> > > > > > > >
> > > > > > > >
> > > > > > > > Ok. So to summarize your code (easier to write it than send
> > > links):
> > > > > > > > * Pulsar Client, when used with Protobuf Schema, actually
> > > converts
> > > > > the
> > > > > > > > Protobuf descriptor into an Avro Schema (using code found
> inside
> > > Avro
> > > > > > > > library) and saves that Avro schema as the schema. It's not
> > > saving
> > > > > the
> > > > > > > > protobuf descriptor at all. Very confusing I have to add -
> never
> > > > > expected
> > > > > > > > that.
> > > > > > > > This explains why In the ProtobufSchemaCompatibilityCheck
> they
> > > just
> > > > > > > extend
> > > > > > > > the Avro without doing any translation.
> > > > > > > >
> > > > > > > > Thanks for that.
> > > > > > > >
> > > > > > > > Now thatI finally understand this, I can say that: you *must*
> > > explain
> > > > > > > that
> > > > > > > > in the motivation part in your PIP.
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > 2. *On the broker side, protobuf and avro both use
> `SchemaData`
> > > > > > > converted
> > > > > > > > > to `org.apache.avro.Schema`.*
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > >
> > >
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/ServerCnx.java#L1280-L1293
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > >
> > >
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/schema/ProtobufSchemaCompatibilityCheck.java#L26-L31
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > >
> > >
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/schema/AvroSchemaBasedCompatibilityCheck.java#L47-L70
> > > > > > > >
> > > > > > > >
> > > > > > > > Actually those links don't really help.
> > > > > > > > The main link that helps is:
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > >
> > >
> https://github.com/apache/pulsar/blob/ec102fb024a6ea2b195826778300f20e330dff06/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/ProtobufSchema.java#L102-L122
> > > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > I'm sorry - I don't understand.
> > > > > > > > > > I understand the different compatibility check
> strategies.
> > > > > > > > > > If you just spell them out here, then as you say, just
> > > translate
> > > > > the
> > > > > > > > > > Protobuf Descriptor into an Avro schema and run the Avro
> > > > > > > > > > compatibility validation, no?
> > > > > > > > > > I believe the answer is no, since you may want to verify
> > > > > different
> > > > > > > > things
> > > > > > > > > > when it comes to Protobuf, which are different then Avro.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > 1.
> > > > > > > > > *ProtobufSchema is different from ProtobufNativeSchema in
> that
> > > it
> > > > > uses
> > > > > > > > > avro-protobuf.*
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > >
> > >
> https://central.sonatype.com/artifact/org.apache.avro/avro-protobuf/1.11.1/overview
> > > > > > > > > *ProtobufNativeSchema needs a native compatibility check,
> but
> > > > > there is
> > > > > > > no
> > > > > > > > > official or third party implementation. So this PIP does
> not
> > > use
> > > > > > > > > avro-protobuf for protobuf compatibility checking.*
> > > > > > > > >
> > > > > > > > > 2. *By the way, this is implemented in much the same way
> that
> > > > > Apache
> > > > > > > avro
> > > > > > > > > does compatibility checking.*
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > >
> > >
> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/SchemaValidatorBuilder.java
> > > > > > > > > `canReadStrategy`,`canBeReadStrategy`,`mutualReadStrategy`
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > >
> > >
> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/ValidateCanRead.java
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > >
> > >
> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/ValidateCanBeRead.java
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > >
> > >
> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/ValidateMutualRead.java
> > > > > > > > > *In `ValidateMutualRead.java`, the arguments of
> `canRead()` are
> > > > > > > > > writtenSchema and readSchema. We only need to change the
> order
> > > of
> > > > > > > > arguments
> > > > > > > > > we pass to `canRead()`.*
> > > > > > > > > ```java
> > > > > > > > > private void validateWithStrategy(Descriptors.Descriptor
> > > > > toValidate,
> > > > > > > > > Descriptors.Descriptor fromDescriptor) throws
> > > > > > > > ProtoBufCanReadCheckException
> > > > > > > > > {
> > > > > > > > > switch (strategy) {
> > > > > > > > > case CanReadExistingStrategy -> canRead(fromDescriptor,
> > > > > toValidate);
> > > > > > > > > case CanBeReadByExistingStrategy -> canRead(toValidate,
> > > > > > > fromDescriptor);
> > > > > > > > > case CanBeReadMutualStrategy -> {
> > > > > > > > > canRead(toValidate, fromDescriptor);
> > > > > > > > > canRead(fromDescriptor, toValidate);
> > > > > > > > > }
> > > > > > > > > }
> > > > > > > > > }
> > > > > > > > >
> > > > > > > > > private void canRead(Descriptors.Descriptor writtenSchema,
> > > > > > > > > Descriptors.Descriptor readSchema) throws
> > > > > > > ProtoBufCanReadCheckException {
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > >
> > >
> ProtobufNativeSchemaBreakCheckUtils.checkSchemaCompatibility(writtenSchema,
> > > > > > > > > readSchema);
> > > > > > > > > }
> > > > > > > > > ```
> > > > > > > > >
> > > > > > > > >
> > > > > > > > I get that you want to take inspiration from the existing
> Avro
> > > Schema
> > > > > > > > compatibility check, to do your code design.
> > > > > > > > I also understand you *won't* use any existing avro code for
> > > that.
> > > > > > > > I also understand, you have to write the validation check on
> your
> > > > > own,
> > > > > > > > since there is no 3rd party to explain that.
> > > > > > > >
> > > > > > > > The only thing I can't understand are the actual rules as I
> wrote
> > > > > before,
> > > > > > > > since they are written confusingly.
> > > > > > > > So, I repeat what I asked before:
> > > > > > > >
> > > > > > > > I think you should structure the validation rules
> differently:
> > > > > > > >
> > > > > > > > * Backward checks
> > > > > > > > ** List down rules, where use newSchema (the schema used by
> > > producer
> > > > > or
> > > > > > > > consumer) and existingSchema (last schema used)
> > > > > > > > * Forward
> > > > > > > > ** List down rules, where use newSchema (the schema used by
> > > producer
> > > > > or
> > > > > > > > consumer) and existingSchema (last schema used)
> > > > > > > >
> > > > > > > > Once that's accomplished I will be able to understand the
> > > different
> > > > > > > > validation rules for each compatibility check.
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > sinan
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Asaf Mesika <as...@gmail.com> 于2023年3月1日周三 21:19写道:
> > > > > > > > > >
> > > > > > > > > > On Mon, Feb 27, 2023 at 3:47 PM SiNan Liu <
> > > > > liusinan1998@gmail.com>
> > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > I read it and they look identical. What's the
> difference
> > > > > between
> > > > > > > > > them?
> > > > > > > > > > >
> > > > > > > > > > > Current avro,json, and protobuf schemas are all
> implemented
> > > > > based
> > > > > > > on
> > > > > > > > > AVRO.
> > > > > > > > > > > > What do you mean, they are all implemented based on
> > > Avro? You
> > > > > > > mean
> > > > > > > > > the
> > > > > > > > > > > > protobuf schema is converted into an Avro Schema, and
> > > then
> > > > > you
> > > > > > > use
> > > > > > > > > Avro
> > > > > > > > > > > > compatibility validation?
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > >
> > >
> `org.apache.pulsar.broker.service.schema.ProtobufSchemaCompatibilityCheck`
> > > > > > > > > > >
> > > > > > > >
> > > > >
> `org.apache.pulsar.broker.service.schema.AvroSchemaCompatibilityCheck`
> > > > > > > > > > >
> > > > > > > >
> > > > >
> `org.apache.pulsar.broker.service.schema.JsonSchemaCompatibilityCheck`
> > > > > > > > > > > They all extends `AvroSchemaBasedCompatibilityCheck`,
> the
> > > > > > > > > > > `checkCompatible()` is the same implementation with
> `AVRO`.
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Can you please explain how a Protobuf Schema descriptor
> can
> > > be
> > > > > > > > validated
> > > > > > > > > > for backward compatibility check using Avro based
> > > compatibility
> > > > > > > rules?
> > > > > > > > > > Doesn't it expect the schema to be Avro, but it is
> actually a
> > > > > > > Protobuf
> > > > > > > > > > descriptor?
> > > > > > > > > > Is there some translation happening?
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > I think you should structure the validation rules
> > > differently:
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > The Compatibility check strategy is described on the
> > > website
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > >
> > >
> https://pulsar.apache.org/docs/next/schema-understand/#schema-compatibility-check-strategy
> > > > > > > > > > > 1. BACKWARD(CanReadExistingStrategy): Consumers using
> > > schema
> > > > > V3 can
> > > > > > > > > process
> > > > > > > > > > > data written by producers using the last schema version
> > > V2. So
> > > > > V2
> > > > > > > is
> > > > > > > > > > > "writtenSchema" and V3 is "readSchema".
> > > > > > > > > > > 2. FORWARD(CanBeReadByExistingStrategy): Consumers
> using
> > > the
> > > > > last
> > > > > > > > > schema
> > > > > > > > > > > version V2 can process data written by producers using
> a
> > > new
> > > > > schema
> > > > > > > > V3,
> > > > > > > > > > > even though they may not be able to use the full
> > > capabilities
> > > > > of
> > > > > > > the
> > > > > > > > > new
> > > > > > > > > > > schema. So V3 is "writtenSchema" and V2 is
> "readSchema".
> > > > > > > > > > > 3. FULL(CanBeReadMutualStrategy): Schemas are both
> > > backward and
> > > > > > > > forward
> > > > > > > > > > > compatible.
> > > > > > > > > > > Schema can evolve. The old version schema and the new
> > > version
> > > > > > > schema
> > > > > > > > > should
> > > > > > > > > > > be well understood.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > I'm sorry - I don't understand.
> > > > > > > > > > I understand the different compatibility check
> strategies.
> > > > > > > > > > If you just spell them out here, then as you say, just
> > > translate
> > > > > the
> > > > > > > > > > Protobuf Descriptor into an Avro schema and run the Avro
> > > > > > > > > > compatibility validation, no?
> > > > > > > > > > I believe the answer is no, since you may want to verify
> > > > > different
> > > > > > > > things
> > > > > > > > > > when it comes to Protobuf, which are different then Avro.
> > > > > > > > > >
> > > > > > > > > > At the current state, I can't understand your design at
> all.
> > > > > Please
> > > > > > > > help
> > > > > > > > > > clarify that.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > So each strategy should have its own section.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > The arguments of `canRead()` are writtenSchema and
> > > readSchema.
> > > > > As
> > > > > > > > we've
> > > > > > > > > > > just described, we only need to change the order of
> > > arguments
> > > > > we
> > > > > > > pass
> > > > > > > > > to
> > > > > > > > > > > `canRead()`.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Thanks,
> > > > > > > > > > > sinan
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Asaf Mesika <as...@gmail.com> 于2023年2月27日周一
> 20:49写道:
> > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > And you can see the difference between ProtoBuf and
> > > > > > > > ProtoBufNative:
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > >
> > > > >
> https://pulsar.apache.org/docs/next/schema-get-started/#protobufnative
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > >
> https://pulsar.apache.org/docs/next/schema-get-started/#protobuf
> > > > > > > > > > > > >
> > > > > > > > > > > >  I read it and they look identical. What's the
> difference
> > > > > between
> > > > > > > > > them?
> > > > > > > > > > > >
> > > > > > > > > > > > Current avro,json, and protobuf schemas are all
> > > implemented
> > > > > based
> > > > > > > > on
> > > > > > > > > > > AVRO.
> > > > > > > > > > > >
> > > > > > > > > > > > What do you mean, they are all implemented based on
> > > Avro? You
> > > > > > > mean
> > > > > > > > > the
> > > > > > > > > > > > protobuf schema is converted into an Avro Schema, and
> > > then
> > > > > you
> > > > > > > use
> > > > > > > > > Avro
> > > > > > > > > > > > compatibility validation?
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > > *Here are the basic compatibility rules we've
> defined:*
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > I think you should structure the validation rules
> > > > > differently:
> > > > > > > > > > > >
> > > > > > > > > > > > * Backward checks
> > > > > > > > > > > > ** List down rules, where use newSchema (the schema
> used
> > > by
> > > > > > > > producer
> > > > > > > > > or
> > > > > > > > > > > > consumer) and existingSchema (last schema used)
> > > > > > > > > > > > * Forward
> > > > > > > > > > > > ** List down rules, where use newSchema (the schema
> used
> > > by
> > > > > > > > producer
> > > > > > > > > or
> > > > > > > > > > > > consumer) and existingSchema (last schema used)
> > > > > > > > > > > >
> > > > > > > > > > > > So each strategy should have its own section.
> > > > > > > > > > > >
> > > > > > > > > > > > I'm saying this since you used "writttenSchema" word
> but
> > > it
> > > > > > > > > represents
> > > > > > > > > > > > something completely different if it's backward or
> > > forward
> > > > > check.
> > > > > > > > > > > >
> > > > > > > > > > > > Once you'll have that structure like that, I
> personally
> > > will
> > > > > be
> > > > > > > > able
> > > > > > > > > to
> > > > > > > > > > > > read and understand it.
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > The motivation and problem statement are now good -
> > > thanks
> > > > > for
> > > > > > > > > improving
> > > > > > > > > > > > it.
> > > > > > > > > > > >
> > > > > > > > > > > > On Mon, Feb 27, 2023 at 8:20 AM SiNan Liu <
> > > > > > > liusinan1998@gmail.com>
> > > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > Hi! I updated the PIP issue again. This time I've
> added
> > > > > some
> > > > > > > > > background
> > > > > > > > > > > > and
> > > > > > > > > > > > > some explanations.
> > > > > > > > > > > > >
> > > > > > > > > > > > > The compatibility check rules are already written
> in
> > > the
> > > > > > > > > > > Implementation.
> > > > > > > > > > > > > ProtoBufNative implements the same canRead method
> as
> > > Apache
> > > > > > > Avro.
> > > > > > > > > > > > > It does this by checking whether the schema for
> > > writing and
> > > > > > > > reading
> > > > > > > > > is
> > > > > > > > > > > > > compatible. I also indicate whether the
> writtenSchema
> > > and
> > > > > > > > > readSchema of
> > > > > > > > > > > > the
> > > > > > > > > > > > > Backward, Forward, and Full strategies are the old
> or
> > > the
> > > > > new
> > > > > > > > > version
> > > > > > > > > > > of
> > > > > > > > > > > > > the schema.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > sinan
> > > > > > > > > > > > >
> > > > > > > > > > > > > Asaf Mesika <as...@gmail.com> 于2023年2月26日周日
> > > 23:24写道:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > I'm sorry, but this PIP lacks a lot of background
> > > > > knowledge,
> > > > > > > so
> > > > > > > > > you
> > > > > > > > > > > > need
> > > > > > > > > > > > > to
> > > > > > > > > > > > > > add IMO for people to understand it. You don't
> need
> > > to
> > > > > > > explain
> > > > > > > > > the
> > > > > > > > > > > > entire
> > > > > > > > > > > > > > pulsar in this PIP, but at the very least a few
> > > > > paragraphs
> > > > > > > > > detailing
> > > > > > > > > > > > all
> > > > > > > > > > > > > > you need to know, to put you in context:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >    - Start by saying Pulsar as a built-in schema
> > > registry
> > > > > > > > inside
> > > > > > > > > > > Pulsar
> > > > > > > > > > > > > >    broker.
> > > > > > > > > > > > > >       - Every time the client updates the
> schema, it
> > > > > uploads
> > > > > > > it
> > > > > > > > > to
> > > > > > > > > > > the
> > > > > > > > > > > > > >       broker. When that happens, it has a feature
> > > which
> > > > > > > > validates
> > > > > > > > > if
> > > > > > > > > > > > the
> > > > > > > > > > > > > > new
> > > > > > > > > > > > > >       schema version is compatible with the
> previous
> > > > > > > versions.
> > > > > > > > > There
> > > > > > > > > > > > > > are 4 types
> > > > > > > > > > > > > >       of compatibility: Full, ... (complete and
> > > explain
> > > > > each
> > > > > > > > one
> > > > > > > > > > > > briefly)
> > > > > > > > > > > > > >    - Also explain Pulsar Schema registry supports
> > > various
> > > > > > > > schema
> > > > > > > > > > > > > >    protocols:  Avro, protobuf native, ...
> (complete
> > > the
> > > > > > > rest),
> > > > > > > > > each
> > > > > > > > > > > > > > protocol
> > > > > > > > > > > > > >    has a schema which dictates how to serialize
> and
> > > > > > > deserialize
> > > > > > > > > the
> > > > > > > > > > > > > message
> > > > > > > > > > > > > >    content into typed object.
> > > > > > > > > > > > > >    - Explain in short what is protobuf native
> > > (compare
> > > > > > > protobuf
> > > > > > > > > > > > > non-native)
> > > > > > > > > > > > > >    - Please don't paste code instead of
> explaining.
> > > > > > > > > > > > > >       - Explain that protobuf native current
> > > validation
> > > > > check
> > > > > > > > is
> > > > > > > > > only
> > > > > > > > > > > > > >       composed of checking the root message name
> is
> > > the
> > > > > same
> > > > > > > > > between
> > > > > > > > > > > > > > the current
> > > > > > > > > > > > > >       schema version and the new version.
> > > > > > > > > > > > > >          - Explain briefly what is a root message
> > > and its
> > > > > > > name.
> > > > > > > > > > > > > >       - Explain the problem (list scenarios)
> that we
> > > have
> > > > > > > > because
> > > > > > > > > > > > > protobuf
> > > > > > > > > > > > > >       native schema only supports FULL
> compatibility
> > > > > > > > validation.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Regarding high level design - as in what you
> plan to
> > > do.
> > > > > > > > > > > > > > I suggest you add "High Level Design" and in it
> > > detail
> > > > > how
> > > > > > > you
> > > > > > > > > plan
> > > > > > > > > > > to
> > > > > > > > > > > > > > validate, per protobuf version, per compatibility
> > > check
> > > > > > > > > (backward,
> > > > > > > > > > > > > forward,
> > > > > > > > > > > > > > full,...).
> > > > > > > > > > > > > > I tried reading the implementation - for me ,
> it's
> > > all
> > > > > over
> > > > > > > the
> > > > > > > > > > > place.
> > > > > > > > > > > > > Can
> > > > > > > > > > > > > > you please list in order what I wrote above, and
> > > list the
> > > > > > > > > validation
> > > > > > > > > > > > > rules
> > > > > > > > > > > > > > with a good explanation why you validate it like
> > > that?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Lastly, one you have all the validation rules
> clearly
> > > > > stated,
> > > > > > > > you
> > > > > > > > > can
> > > > > > > > > > > > use
> > > > > > > > > > > > > > it to document it properly so users can know what
> > > > > validation
> > > > > > > to
> > > > > > > > > > > expect.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Asaf
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Wed, Feb 22, 2023 at 5:10 PM SiNan Liu <
> > > > > > > > > liusinan1998@gmail.com>
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Sorry, my mistake. I removed the code and
> > > described the
> > > > > > > > design
> > > > > > > > > to
> > > > > > > > > > > > > improve
> > > > > > > > > > > > > > > the PROTOBUF_NATIVE schema compatibility
> checks.
> > > You
> > > > > can
> > > > > > > > have a
> > > > > > > > > > > look.
> > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Asaf Mesika <as...@gmail.com>
> 于2023年2月22日周三
> > > > > 21:16写道:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > I read it but you're almost directly diving
> into
> > > the
> > > > > > > code -
> > > > > > > > > it
> > > > > > > > > > > will
> > > > > > > > > > > > > > take
> > > > > > > > > > > > > > > me
> > > > > > > > > > > > > > > > hours just to reverse engineer your design.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Can you please include a "High Level Design"
> > > section
> > > > > in
> > > > > > > > which
> > > > > > > > > you
> > > > > > > > > > > > > > explain
> > > > > > > > > > > > > > > > how you plan to tackle any issue?
> > > > > > > > > > > > > > > > If I can read that section and explain to
> someone
> > > > > else
> > > > > > > how
> > > > > > > > > this
> > > > > > > > > > > > will
> > > > > > > > > > > > > > > work,
> > > > > > > > > > > > > > > > it means the section is complete.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Let's leave the code to the PRs.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On Sun, Feb 19, 2023 at 2:59 PM SiNan Liu <
> > > > > > > > > > > liusinan1998@gmail.com>
> > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Hi all,
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > I made a PIP to discuss:
> > > > > > > > > > > > > > https://github.com/apache/pulsar/issues/19565
> > > > > > > > > > > > > > > .
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > We can talk about the current design here.
> > > > > Especially
> > > > > > > for
> > > > > > > > > the
> > > > > > > > > > > > field
> > > > > > > > > > > > > > > type
> > > > > > > > > > > > > > > > > change check rules, please give your
> valuable
> > > > > advice.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > > > Sinan
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > >
> > >
>

Re: [DISCUSS] PIP-246: Improved PROTOBUF_NATIVE schema compatibility checks without using avro-protobuf

Posted by 丛搏 <bo...@apache.org>.
 Hi siNan:

From my point of view, it is just a plug-in. I don't think it is
necessary to add configuration for the plugin.
This is meaningless, and it will increase the difficulty of use for users.


SiNan Liu <li...@gmail.com> 于2023年3月8日周三 15:54写道:
>
> Hi, bo.
>
> 1. I understand what you say, to develop a new
> `ProtobufNativeAdvancedSchemaCompatibilityCheck`, rather than changing
> existing `ProtobufNativeSchemaCompatibilityCheck`. But I found a few small
> problems:
>
> (1)ProtobufNativeAdvancedSchemaCompatibilityCheck and
> ProtobufNativeSchemaCompatibilityCheck schemaType is PROTOBUF_NATIVE. It
> looks like both checkers are PROTOBUF not using AVRO-PROTOBUF's "native"
> implementation, which leads to some problems or "unreasonable" and gives me
> some extended thinking and questions.
>
`CompatibilityCheck ` its only a plugin.
`ProtobufNativeSchemaCompatibilityCheck` may sooner or later leave the
stage, when `ProtobufNativeAdvancedSchemaCompatibilityCheck` is
stable, we can make it the default Checker.

It is just a plug-in, users can change it at will and ensure that it
is used correctly
> (2)In broker.conf
>
> `schemaRegistryCompatibilityCheckers`. If
> ProtobufNativeSchemaCompatibilityCheck and
> ProtobufNativeAdvancedSchemaCompatibilityCheck all set. This is going to
> overwrite each other. Because this is a map:
>
> https://github.com/apache/pulsar/blob/af1360fb167c1f9484fda5771df3ea9b21d1440b/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/schema/SchemaRegistryService.java#L36-L44
>
> ```java
>
> Map<SchemaType, SchemaCompatibilityCheck> checkers = new HashMap<>();
>
> for (String className : checkerClasses) {
>
> SchemaCompatibilityCheck schemaCompatibilityCheck =
> Reflections.createInstance(className,
>
> SchemaCompatibilityCheck.class,
> Thread.currentThread().getContextClassLoader());
>
> checkers.put(schemaCompatibilityCheck.getSchemaType(),
> schemaCompatibilityCheck);
>
> ```
>
> Is this a big problem or a small one? Is it possible or unnecessary? Maybe
> we can write in the documentation that protobufNative checkers can only
> choose one of the two? Why are there two Checkers for different
> implementations of the same schemaType? Why not the checker to create
> different validator, so we don not have to change
> schemaRegistryCompatibilityCheckers.

users can only use one, not two, which will bring complexity to users

>
> (3)And after the update to ProtobufNativeAdvancedSchemaCompatibilityCheck.
> Existing topics previously only checked the name of the root message, not
> the content of protobuf.
>
> What if the user wants both Checkers?
>
> Set to ProtobufNativeAdvancedSchemaCompatibilityCheck, affect the topic of
> the existing schema?
>
> Older topics still use the old checker, and newer topics or certain older
> topics use the new advancedchecker.
>
when `ProtobufNativeAdvancedSchemaCompatibilityCheck` stable,
users will not choose `ProtobufNativeSchemaCompatibilityCheck`.
because it not a complete checker.
> (4)So should we have one schemaType for a checker? protobufNativeChecker
> can have as many different implementation classes as possible. This
> classname configuration in PIP, let's see if it can be set at the topic
> level. In the current PIP design I just load this parameter into the
> checker when the broker is started and the checkers map is set up. Can I do
> this in the new normal pr if I want to support topic level? Or perfect it
> here?
>
> Add a call PROTOBUF_NATIVE_ADVANCE schemaType corresponding
> ProtobufNativeAdvancedSchemaCompatibilityCheck? (Seems to be more trouble).
>
> Sorry I can not use the computer and network in the company, I use my
> mobile phone to reply to the email, the format may be a bit messy. Please
> understand.
>
> Thanks,
>
> sinan
>
>
> 丛搏 <bo...@apache.org> 于 2023年3月7日周二 下午11:39写道:
>
> > SiNan Liu <li...@gmail.com> 于2023年3月7日周二 13:22写道:
> > >
> > > Great to see your comment, bo!
> > >
> > > 1. The first way. The protobuf website has a description of the rules,
> > but
> > > no plans to implement them.
> > > https://protobuf.dev/programming-guides/proto/#updating
> >
> > https://groups.google.com/g/protobuf
> > maybe ask here
> >
> > >
> > > 2. I think this PIP can be divided into two parts.
> > > (1) Add a flag(`ValidatorClassName`), load it into
> > > `ProtobufNativeSchemaCompatibilityCheck` when the broker starts.
> > > ValidatorClassName is empty by default, and the implementation continues
> > as
> > > before, with no change for the user.
> >
> > `ProtobufNativeSchemaCompatibilityCheck` is a plugin in `broker.conf`
> > ```
> >
> > schemaRegistryCompatibilityCheckers=org.apache.pulsar.broker.service.schema.JsonSchemaCompatibilityCheck,org.apache.pulsar.broker.service.schema.AvroSchemaCompatibilityCheck,org.apache.pulsar.broker.service.schema.ProtobufNativeSchemaCompatibilityCheck
> > ```
> > I do not recommend that we directly modify this plugin and continue to
> > add configuration items, which will cause trouble for users.
> > We have a lot of configs and it's getting very unwieldy.
> > in my opinion, we don't change
> >
> > `org.apache.pulsar.broker.service.schema.ProtobufNativeSchemaCompatibilityCheck`,
> > it is a simple implementation, it doesn't go wrong very often, most
> > users will use it. we can add another ProtobufNativeCheck named
> > `ProtobufNativeAdvancedSchemaCompatibilityCheck ` or other. in this
> > way, we don't need to add this flag. There is no need to consider
> > compatibility, it is just a plug-in and will not affect current logic.
> > If the user needs it, just change the plugin to the new implementation
> >
> > > ```java
> > >     ProtobufNativeSchemaValidator DEFAULT = (fromDescriptors,
> > toDescriptor)
> > > -> {
> > >         for (Descriptors.Descriptor fromDescriptor : fromDescriptors) {
> > >             // The default implementation only checks if the root message
> > > has changed.
> > >             if
> > > (!fromDescriptor.getFullName().equals(toDescriptor.getFullName())) {
> > >                 throw new ProtoBufCanReadCheckException("Protobuf root
> > > message isn't allow change!");
> > >             }
> > >         }
> > >     };
> > > ```
> > > `ValidatorClassName` value also can be set to the current implementation
> > of
> > > PIP add
> > >
> > `org.apache.pulsar.broker.service.schema.validator.ProtobufNativeSchemaBreakValidatorImpl`.
> > >
> > > (2) Recoding the `ProtobufNativeSchemaCompatibilityCheck`. Through the
> > flag
> > > (`ValidatorClassName`) to build different
> > `ProtobufNativeSchemaValidator`.
> > > Isn't it just a plug-in? The user can develop and choose a different
> > > `ProtobufNativeSchemaValidator`. I think it didn't change the logic, it
> > > just allowed him to expand it.
> > >
> > >
> > > I think this PIP should be an enhancement and supplement to the function,
> > > and there is no such thing as unnecessary and meaningless.
> > >
> > >
> > > Thanks,
> > > sinan
> > >
> > >
> > >
> > >
> > >
> > > 丛搏 <bo...@apache.org> 于2023年3月7日周二 11:53写道:
> > >
> > > > I think we have two ways to do that.
> > > >
> > > > First way: We need to advance the improvement of java in protobuf. Ask
> > > > if they have plans to improve.
> > > >
> > > > Second way: the new PROTOBUF_NATIVE `SchemaCompatibilityCheck` should
> > > > be implemented as a plugin, don't change any existing plugin logic
> > > > (it's simple and already used). I don't recommend adding flags for
> > > > rollback, it adds configuration and makes little sense.
> > > >
> > > > Thanks,
> > > > Bo
> > > >
> > > > Asaf Mesika <as...@gmail.com> 于2023年3月6日周一 23:00写道:
> > > >
> > > > >
> > > > > Can you convert the code block which is actually a quote in the
> > > > > beginning of the PIP to something which doesn't require to scroll
> > > > > horizontally so much?
> > > > > Use
> > > > >
> > > >
> > https://docs.github.com/en/get-started/writing-on-github/getting-started-with-writing-and-formatting-on-github/basic-writing-and-formatting-syntax#quoting-text
> > > > >
> > > > > Let's improve the clarity of what you wrote:
> > > > >
> > > > > "the PROTOBUF uses avro struct to store."
> > > > > -->
> > > > > When Schema type PROTOBUF is used, Pulsar Client assumes the object
> > given
> > > > > to it as message data is an auto-generated POJO containing the
> > > > annotations
> > > > > encoding the schema. The client is using a converter, which converts
> > a
> > > > > Protobuf schema descriptor into an Avro schema and sends that as the
> > > > Schema
> > > > > of the producer/consumer.
> > > > >
> > > > > "On the broker side, protobuf and avro both use SchemaData converted
> > to
> > > > > org.apache.avro.Schema."
> > > > > -->
> > > > > Since the schema is an Avro schema, the implementation of
> > compatibility
> > > > > check on the broker side is to simply re-use the compatibility check
> > of
> > > > the
> > > > > AVRO schema type.
> > > > >
> > > > > "ProtobufSchema is different from ProtobufNativeSchema in schema
> > > > > compatibility check it uses avro-protobuf.
> > > > >
> > > >
> > https://central.sonatype.com/artifact/org.apache.avro/avro-protobuf/1.11.1/overview
> > > > > But the current implementation of ProtobufNative schema compatibility
> > > > > check only
> > > > > checked if the root message name is changed."
> > > > >
> > > > > -->
> > > > > PROTOBUF_NATIVE schema type is different.
> > > > > The client is actually using Protobuf Descriptor as the schema, as
> > > > opposed
> > > > > to Avro schema of PROTOBUF schema type. In the broker, the
> > > > PROTOBUF_NATIVE
> > > > > compatibility check actually hasn't implemented any rule, besides
> > one:
> > > > > checking if the root message name has changed.
> > > > >
> > > > >
> > > > >
> > > > > >    1. For now, there is no official or third-party solution for
> > > > ProtoBuf
> > > > > >    compatibility. If in the future have better solutions of a third
> > > > party or
> > > > > >    the official, we develop new ProtobufNativeSchemaValidator and
> > use,
> > > > so
> > > > > >    add a flag.
> > > > > >
> > > > > > Who do you need to make that configurable? Once you found a third
> > > > party,
> > > > > just switch to it? Who knows, maybe you never will. Introduce it
> > when you
> > > > > find it, not now.
> > > > >
> > > > >
> > > > > We improve in ProtobufNativeSchemaCompatibilityCheck BACKWARD,
> > FORWARD
> > > > > > these strategies. As with the AVRO implementation, protobuf
> > > > compatibility
> > > > > > checking need implementing the canRead method. *This will check
> > that
> > > > > > the writtenschema can be read by readSchema.*
> > > > >
> > > > >
> > > > > I completely disagree.
> > > > > Avro implementation is confusing for our use case. Don't copy that.
> > > > >
> > > > > You have
> > > > >
> > > > > public void checkCompatible(SchemaData from, SchemaData to,
> > > > > SchemaCompatibilityStrategy strategy)
> > > > >         throws IncompatibleSchemaException {
> > > > >     Descriptor fromDescriptor =
> > > > > ProtobufNativeSchemaUtils.deserialize(from.getData());
> > > > >     Descriptor toDescriptor =
> > > > > ProtobufNativeSchemaUtils.deserialize(to.getData());
> > > > >     switch (strategy) {
> > > > >         case BACKWARD_TRANSITIVE:
> > > > >         case BACKWARD:
> > > > >         case FORWARD_TRANSITIVE:
> > > > >         case FORWARD:
> > > > >         case FULL_TRANSITIVE:
> > > > >         case FULL:
> > > > >             checkRootMessageChange(fromDescriptor, toDescriptor,
> > > > strategy);
> > > > >             return;
> > > > >         case ALWAYS_COMPATIBLE:
> > > > >             return;
> > > > >         default:
> > > > >             throw new IncompatibleSchemaException("Unknown
> > > > > SchemaCompatibilityStrategy.");
> > > > >     }
> > > > > }
> > > > >
> > > > > I would rename :
> > > > > from --> currentSchema
> > > > > to --> newSchema
> > > > >
> > > > > Use that switch case and have a method for each like:
> > > > > validateBackwardsCompatibility(currentSchema, newSchema)
> > > > >
> > > > > I dislike canRead and usage of writtenSchema, since you have two
> > > > completely
> > > > > different use cases: from the producing side and the consumer side.
> > > > >
> > > > > schemaValidatorBuilder
> > > > > >
> > > > > > I dislike this proposal. IMO Avro implementation is way too
> > > > complicated.
> > > > > Why not have a simple function for validation for each switch case
> > above?
> > > > > Why do we need strategy and builder, and all this complexity?
> > > > >
> > > > >
> > > > > *Here are the basic compatibility rules we've defined:*
> > > > >
> > > > >
> > > > > IMO it's impossible to read the validation rules as you described
> > them.
> > > > > I wrote how they should be structured numerous times above.
> > > > > I can't validate them.
> > > > >
> > > > >
> > > > > IMO, the current design is very hard to read.
> > > > > Please try to avoid jumping into code sections.
> > > > > Write a high level design section, in which you describe in words
> > what
> > > > you
> > > > > plan to do.
> > > > > Write the validation rules in the structure that is easy to
> > understand:
> > > > > rules per each compatibility check, and use proper words (current
> > schema,
> > > > > new schema), since new schema can be once used for read and once
> > used for
> > > > > write.
> > > > >
> > > > > In its current form it takes too much time to understand the design,
> > and
> > > > it
> > > > > shouldn't be the case.
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Asaf
> > > > >
> > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > On Sun, Mar 5, 2023 at 3:58 PM SiNan Liu <li...@gmail.com>
> > wrote:
> > > > >
> > > > > > Hi! I updated the explanation of some things in the PIP issue. And
> > also
> > > > > > added a new “flag” in the conf is used as the different
> > > > > > ProtobufNativeSchemaValidator implementation, also set
> > > > > > ProtobufNativeSchemaValidator default only check whether the name
> > of
> > > > the
> > > > > > root message is the same.
> > > > > >
> > > > > >
> > > > > > Thanks,
> > > > > > sinan
> > > > > >
> > > > > >
> > > > > > Asaf Mesika <as...@gmail.com> 于2023年3月5日周日 20:21写道:
> > > > > >
> > > > > > > On Wed, Mar 1, 2023 at 4:33 PM SiNan Liu <liusinan1998@gmail.com
> > >
> > > > wrote:
> > > > > > >
> > > > > > > > >
> > > > > > > > > Can you please explain how a Protobuf Schema descriptor can
> > be
> > > > > > > validated
> > > > > > > > > for backward compatibility check using Avro based
> > compatibility
> > > > > > rules?
> > > > > > > > > Doesn't it expect the schema to be Avro, but it is actually a
> > > > > > Protobuf
> > > > > > > > > descriptor?
> > > > > > > > > Is there some translation happening?
> > > > > > > >
> > > > > > > >
> > > > > > > > 1. *You can take a quick look at the previous design, the
> > PROTOBUF
> > > > uses
> > > > > > > > avro struct to store.*
> > > > > > > > https://github.com/apache/pulsar/pull/1954
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > >
> > https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/ProtobufSchema.java#L59-L61
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > >
> > https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/ProtobufSchema.java#L110-L115
> > > > > > >
> > > > > > >
> > > > > > > Ok. So to summarize your code (easier to write it than send
> > links):
> > > > > > > * Pulsar Client, when used with Protobuf Schema, actually
> > converts
> > > > the
> > > > > > > Protobuf descriptor into an Avro Schema (using code found inside
> > Avro
> > > > > > > library) and saves that Avro schema as the schema. It's not
> > saving
> > > > the
> > > > > > > protobuf descriptor at all. Very confusing I have to add - never
> > > > expected
> > > > > > > that.
> > > > > > > This explains why In the ProtobufSchemaCompatibilityCheck they
> > just
> > > > > > extend
> > > > > > > the Avro without doing any translation.
> > > > > > >
> > > > > > > Thanks for that.
> > > > > > >
> > > > > > > Now thatI finally understand this, I can say that: you *must*
> > explain
> > > > > > that
> > > > > > > in the motivation part in your PIP.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > 2. *On the broker side, protobuf and avro both use `SchemaData`
> > > > > > converted
> > > > > > > > to `org.apache.avro.Schema`.*
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > >
> > https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/ServerCnx.java#L1280-L1293
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > >
> > https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/schema/ProtobufSchemaCompatibilityCheck.java#L26-L31
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > >
> > https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/schema/AvroSchemaBasedCompatibilityCheck.java#L47-L70
> > > > > > >
> > > > > > >
> > > > > > > Actually those links don't really help.
> > > > > > > The main link that helps is:
> > > > > > >
> > > > > > >
> > > > > >
> > > >
> > https://github.com/apache/pulsar/blob/ec102fb024a6ea2b195826778300f20e330dff06/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/ProtobufSchema.java#L102-L122
> > > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > I'm sorry - I don't understand.
> > > > > > > > > I understand the different compatibility check strategies.
> > > > > > > > > If you just spell them out here, then as you say, just
> > translate
> > > > the
> > > > > > > > > Protobuf Descriptor into an Avro schema and run the Avro
> > > > > > > > > compatibility validation, no?
> > > > > > > > > I believe the answer is no, since you may want to verify
> > > > different
> > > > > > > things
> > > > > > > > > when it comes to Protobuf, which are different then Avro.
> > > > > > > >
> > > > > > > >
> > > > > > > > 1.
> > > > > > > > *ProtobufSchema is different from ProtobufNativeSchema in that
> > it
> > > > uses
> > > > > > > > avro-protobuf.*
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > >
> > https://central.sonatype.com/artifact/org.apache.avro/avro-protobuf/1.11.1/overview
> > > > > > > > *ProtobufNativeSchema needs a native compatibility check, but
> > > > there is
> > > > > > no
> > > > > > > > official or third party implementation. So this PIP does not
> > use
> > > > > > > > avro-protobuf for protobuf compatibility checking.*
> > > > > > > >
> > > > > > > > 2. *By the way, this is implemented in much the same way that
> > > > Apache
> > > > > > avro
> > > > > > > > does compatibility checking.*
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > >
> > https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/SchemaValidatorBuilder.java
> > > > > > > > `canReadStrategy`,`canBeReadStrategy`,`mutualReadStrategy`
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > >
> > https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/ValidateCanRead.java
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > >
> > https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/ValidateCanBeRead.java
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > >
> > https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/ValidateMutualRead.java
> > > > > > > > *In `ValidateMutualRead.java`, the arguments of `canRead()` are
> > > > > > > > writtenSchema and readSchema. We only need to change the order
> > of
> > > > > > > arguments
> > > > > > > > we pass to `canRead()`.*
> > > > > > > > ```java
> > > > > > > > private void validateWithStrategy(Descriptors.Descriptor
> > > > toValidate,
> > > > > > > > Descriptors.Descriptor fromDescriptor) throws
> > > > > > > ProtoBufCanReadCheckException
> > > > > > > > {
> > > > > > > > switch (strategy) {
> > > > > > > > case CanReadExistingStrategy -> canRead(fromDescriptor,
> > > > toValidate);
> > > > > > > > case CanBeReadByExistingStrategy -> canRead(toValidate,
> > > > > > fromDescriptor);
> > > > > > > > case CanBeReadMutualStrategy -> {
> > > > > > > > canRead(toValidate, fromDescriptor);
> > > > > > > > canRead(fromDescriptor, toValidate);
> > > > > > > > }
> > > > > > > > }
> > > > > > > > }
> > > > > > > >
> > > > > > > > private void canRead(Descriptors.Descriptor writtenSchema,
> > > > > > > > Descriptors.Descriptor readSchema) throws
> > > > > > ProtoBufCanReadCheckException {
> > > > > > > >
> > > > > > >
> > > > > >
> > > >
> > ProtobufNativeSchemaBreakCheckUtils.checkSchemaCompatibility(writtenSchema,
> > > > > > > > readSchema);
> > > > > > > > }
> > > > > > > > ```
> > > > > > > >
> > > > > > > >
> > > > > > > I get that you want to take inspiration from the existing Avro
> > Schema
> > > > > > > compatibility check, to do your code design.
> > > > > > > I also understand you *won't* use any existing avro code for
> > that.
> > > > > > > I also understand, you have to write the validation check on your
> > > > own,
> > > > > > > since there is no 3rd party to explain that.
> > > > > > >
> > > > > > > The only thing I can't understand are the actual rules as I wrote
> > > > before,
> > > > > > > since they are written confusingly.
> > > > > > > So, I repeat what I asked before:
> > > > > > >
> > > > > > > I think you should structure the validation rules differently:
> > > > > > >
> > > > > > > * Backward checks
> > > > > > > ** List down rules, where use newSchema (the schema used by
> > producer
> > > > or
> > > > > > > consumer) and existingSchema (last schema used)
> > > > > > > * Forward
> > > > > > > ** List down rules, where use newSchema (the schema used by
> > producer
> > > > or
> > > > > > > consumer) and existingSchema (last schema used)
> > > > > > >
> > > > > > > Once that's accomplished I will be able to understand the
> > different
> > > > > > > validation rules for each compatibility check.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > sinan
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > Asaf Mesika <as...@gmail.com> 于2023年3月1日周三 21:19写道:
> > > > > > > > >
> > > > > > > > > On Mon, Feb 27, 2023 at 3:47 PM SiNan Liu <
> > > > liusinan1998@gmail.com>
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > I read it and they look identical. What's the difference
> > > > between
> > > > > > > > them?
> > > > > > > > > >
> > > > > > > > > > Current avro,json, and protobuf schemas are all implemented
> > > > based
> > > > > > on
> > > > > > > > AVRO.
> > > > > > > > > > > What do you mean, they are all implemented based on
> > Avro? You
> > > > > > mean
> > > > > > > > the
> > > > > > > > > > > protobuf schema is converted into an Avro Schema, and
> > then
> > > > you
> > > > > > use
> > > > > > > > Avro
> > > > > > > > > > > compatibility validation?
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > >
> > `org.apache.pulsar.broker.service.schema.ProtobufSchemaCompatibilityCheck`
> > > > > > > > > >
> > > > > > >
> > > > `org.apache.pulsar.broker.service.schema.AvroSchemaCompatibilityCheck`
> > > > > > > > > >
> > > > > > >
> > > > `org.apache.pulsar.broker.service.schema.JsonSchemaCompatibilityCheck`
> > > > > > > > > > They all extends `AvroSchemaBasedCompatibilityCheck`, the
> > > > > > > > > > `checkCompatible()` is the same implementation with `AVRO`.
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > Can you please explain how a Protobuf Schema descriptor can
> > be
> > > > > > > validated
> > > > > > > > > for backward compatibility check using Avro based
> > compatibility
> > > > > > rules?
> > > > > > > > > Doesn't it expect the schema to be Avro, but it is actually a
> > > > > > Protobuf
> > > > > > > > > descriptor?
> > > > > > > > > Is there some translation happening?
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > I think you should structure the validation rules
> > differently:
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > The Compatibility check strategy is described on the
> > website
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > >
> > https://pulsar.apache.org/docs/next/schema-understand/#schema-compatibility-check-strategy
> > > > > > > > > > 1. BACKWARD(CanReadExistingStrategy): Consumers using
> > schema
> > > > V3 can
> > > > > > > > process
> > > > > > > > > > data written by producers using the last schema version
> > V2. So
> > > > V2
> > > > > > is
> > > > > > > > > > "writtenSchema" and V3 is "readSchema".
> > > > > > > > > > 2. FORWARD(CanBeReadByExistingStrategy): Consumers using
> > the
> > > > last
> > > > > > > > schema
> > > > > > > > > > version V2 can process data written by producers using a
> > new
> > > > schema
> > > > > > > V3,
> > > > > > > > > > even though they may not be able to use the full
> > capabilities
> > > > of
> > > > > > the
> > > > > > > > new
> > > > > > > > > > schema. So V3 is "writtenSchema" and V2 is "readSchema".
> > > > > > > > > > 3. FULL(CanBeReadMutualStrategy): Schemas are both
> > backward and
> > > > > > > forward
> > > > > > > > > > compatible.
> > > > > > > > > > Schema can evolve. The old version schema and the new
> > version
> > > > > > schema
> > > > > > > > should
> > > > > > > > > > be well understood.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > I'm sorry - I don't understand.
> > > > > > > > > I understand the different compatibility check strategies.
> > > > > > > > > If you just spell them out here, then as you say, just
> > translate
> > > > the
> > > > > > > > > Protobuf Descriptor into an Avro schema and run the Avro
> > > > > > > > > compatibility validation, no?
> > > > > > > > > I believe the answer is no, since you may want to verify
> > > > different
> > > > > > > things
> > > > > > > > > when it comes to Protobuf, which are different then Avro.
> > > > > > > > >
> > > > > > > > > At the current state, I can't understand your design at all.
> > > > Please
> > > > > > > help
> > > > > > > > > clarify that.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > So each strategy should have its own section.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > The arguments of `canRead()` are writtenSchema and
> > readSchema.
> > > > As
> > > > > > > we've
> > > > > > > > > > just described, we only need to change the order of
> > arguments
> > > > we
> > > > > > pass
> > > > > > > > to
> > > > > > > > > > `canRead()`.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Thanks,
> > > > > > > > > > sinan
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Asaf Mesika <as...@gmail.com> 于2023年2月27日周一 20:49写道:
> > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > And you can see the difference between ProtoBuf and
> > > > > > > ProtoBufNative:
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > >
> > > > https://pulsar.apache.org/docs/next/schema-get-started/#protobufnative
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > https://pulsar.apache.org/docs/next/schema-get-started/#protobuf
> > > > > > > > > > > >
> > > > > > > > > > >  I read it and they look identical. What's the difference
> > > > between
> > > > > > > > them?
> > > > > > > > > > >
> > > > > > > > > > > Current avro,json, and protobuf schemas are all
> > implemented
> > > > based
> > > > > > > on
> > > > > > > > > > AVRO.
> > > > > > > > > > >
> > > > > > > > > > > What do you mean, they are all implemented based on
> > Avro? You
> > > > > > mean
> > > > > > > > the
> > > > > > > > > > > protobuf schema is converted into an Avro Schema, and
> > then
> > > > you
> > > > > > use
> > > > > > > > Avro
> > > > > > > > > > > compatibility validation?
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > > *Here are the basic compatibility rules we've defined:*
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > I think you should structure the validation rules
> > > > differently:
> > > > > > > > > > >
> > > > > > > > > > > * Backward checks
> > > > > > > > > > > ** List down rules, where use newSchema (the schema used
> > by
> > > > > > > producer
> > > > > > > > or
> > > > > > > > > > > consumer) and existingSchema (last schema used)
> > > > > > > > > > > * Forward
> > > > > > > > > > > ** List down rules, where use newSchema (the schema used
> > by
> > > > > > > producer
> > > > > > > > or
> > > > > > > > > > > consumer) and existingSchema (last schema used)
> > > > > > > > > > >
> > > > > > > > > > > So each strategy should have its own section.
> > > > > > > > > > >
> > > > > > > > > > > I'm saying this since you used "writttenSchema" word but
> > it
> > > > > > > > represents
> > > > > > > > > > > something completely different if it's backward or
> > forward
> > > > check.
> > > > > > > > > > >
> > > > > > > > > > > Once you'll have that structure like that, I personally
> > will
> > > > be
> > > > > > > able
> > > > > > > > to
> > > > > > > > > > > read and understand it.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > The motivation and problem statement are now good -
> > thanks
> > > > for
> > > > > > > > improving
> > > > > > > > > > > it.
> > > > > > > > > > >
> > > > > > > > > > > On Mon, Feb 27, 2023 at 8:20 AM SiNan Liu <
> > > > > > liusinan1998@gmail.com>
> > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Hi! I updated the PIP issue again. This time I've added
> > > > some
> > > > > > > > background
> > > > > > > > > > > and
> > > > > > > > > > > > some explanations.
> > > > > > > > > > > >
> > > > > > > > > > > > The compatibility check rules are already written in
> > the
> > > > > > > > > > Implementation.
> > > > > > > > > > > > ProtoBufNative implements the same canRead method as
> > Apache
> > > > > > Avro.
> > > > > > > > > > > > It does this by checking whether the schema for
> > writing and
> > > > > > > reading
> > > > > > > > is
> > > > > > > > > > > > compatible. I also indicate whether the writtenSchema
> > and
> > > > > > > > readSchema of
> > > > > > > > > > > the
> > > > > > > > > > > > Backward, Forward, and Full strategies are the old or
> > the
> > > > new
> > > > > > > > version
> > > > > > > > > > of
> > > > > > > > > > > > the schema.
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks,
> > > > > > > > > > > > sinan
> > > > > > > > > > > >
> > > > > > > > > > > > Asaf Mesika <as...@gmail.com> 于2023年2月26日周日
> > 23:24写道:
> > > > > > > > > > > >
> > > > > > > > > > > > > I'm sorry, but this PIP lacks a lot of background
> > > > knowledge,
> > > > > > so
> > > > > > > > you
> > > > > > > > > > > need
> > > > > > > > > > > > to
> > > > > > > > > > > > > add IMO for people to understand it. You don't need
> > to
> > > > > > explain
> > > > > > > > the
> > > > > > > > > > > entire
> > > > > > > > > > > > > pulsar in this PIP, but at the very least a few
> > > > paragraphs
> > > > > > > > detailing
> > > > > > > > > > > all
> > > > > > > > > > > > > you need to know, to put you in context:
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >    - Start by saying Pulsar as a built-in schema
> > registry
> > > > > > > inside
> > > > > > > > > > Pulsar
> > > > > > > > > > > > >    broker.
> > > > > > > > > > > > >       - Every time the client updates the schema, it
> > > > uploads
> > > > > > it
> > > > > > > > to
> > > > > > > > > > the
> > > > > > > > > > > > >       broker. When that happens, it has a feature
> > which
> > > > > > > validates
> > > > > > > > if
> > > > > > > > > > > the
> > > > > > > > > > > > > new
> > > > > > > > > > > > >       schema version is compatible with the previous
> > > > > > versions.
> > > > > > > > There
> > > > > > > > > > > > > are 4 types
> > > > > > > > > > > > >       of compatibility: Full, ... (complete and
> > explain
> > > > each
> > > > > > > one
> > > > > > > > > > > briefly)
> > > > > > > > > > > > >    - Also explain Pulsar Schema registry supports
> > various
> > > > > > > schema
> > > > > > > > > > > > >    protocols:  Avro, protobuf native, ... (complete
> > the
> > > > > > rest),
> > > > > > > > each
> > > > > > > > > > > > > protocol
> > > > > > > > > > > > >    has a schema which dictates how to serialize and
> > > > > > deserialize
> > > > > > > > the
> > > > > > > > > > > > message
> > > > > > > > > > > > >    content into typed object.
> > > > > > > > > > > > >    - Explain in short what is protobuf native
> > (compare
> > > > > > protobuf
> > > > > > > > > > > > non-native)
> > > > > > > > > > > > >    - Please don't paste code instead of explaining.
> > > > > > > > > > > > >       - Explain that protobuf native current
> > validation
> > > > check
> > > > > > > is
> > > > > > > > only
> > > > > > > > > > > > >       composed of checking the root message name is
> > the
> > > > same
> > > > > > > > between
> > > > > > > > > > > > > the current
> > > > > > > > > > > > >       schema version and the new version.
> > > > > > > > > > > > >          - Explain briefly what is a root message
> > and its
> > > > > > name.
> > > > > > > > > > > > >       - Explain the problem (list scenarios) that we
> > have
> > > > > > > because
> > > > > > > > > > > > protobuf
> > > > > > > > > > > > >       native schema only supports FULL compatibility
> > > > > > > validation.
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > Regarding high level design - as in what you plan to
> > do.
> > > > > > > > > > > > > I suggest you add "High Level Design" and in it
> > detail
> > > > how
> > > > > > you
> > > > > > > > plan
> > > > > > > > > > to
> > > > > > > > > > > > > validate, per protobuf version, per compatibility
> > check
> > > > > > > > (backward,
> > > > > > > > > > > > forward,
> > > > > > > > > > > > > full,...).
> > > > > > > > > > > > > I tried reading the implementation - for me , it's
> > all
> > > > over
> > > > > > the
> > > > > > > > > > place.
> > > > > > > > > > > > Can
> > > > > > > > > > > > > you please list in order what I wrote above, and
> > list the
> > > > > > > > validation
> > > > > > > > > > > > rules
> > > > > > > > > > > > > with a good explanation why you validate it like
> > that?
> > > > > > > > > > > > >
> > > > > > > > > > > > > Lastly, one you have all the validation rules clearly
> > > > stated,
> > > > > > > you
> > > > > > > > can
> > > > > > > > > > > use
> > > > > > > > > > > > > it to document it properly so users can know what
> > > > validation
> > > > > > to
> > > > > > > > > > expect.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > >
> > > > > > > > > > > > > Asaf
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Wed, Feb 22, 2023 at 5:10 PM SiNan Liu <
> > > > > > > > liusinan1998@gmail.com>
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Sorry, my mistake. I removed the code and
> > described the
> > > > > > > design
> > > > > > > > to
> > > > > > > > > > > > improve
> > > > > > > > > > > > > > the PROTOBUF_NATIVE schema compatibility checks.
> > You
> > > > can
> > > > > > > have a
> > > > > > > > > > look.
> > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Asaf Mesika <as...@gmail.com> 于2023年2月22日周三
> > > > 21:16写道:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I read it but you're almost directly diving into
> > the
> > > > > > code -
> > > > > > > > it
> > > > > > > > > > will
> > > > > > > > > > > > > take
> > > > > > > > > > > > > > me
> > > > > > > > > > > > > > > hours just to reverse engineer your design.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Can you please include a "High Level Design"
> > section
> > > > in
> > > > > > > which
> > > > > > > > you
> > > > > > > > > > > > > explain
> > > > > > > > > > > > > > > how you plan to tackle any issue?
> > > > > > > > > > > > > > > If I can read that section and explain to someone
> > > > else
> > > > > > how
> > > > > > > > this
> > > > > > > > > > > will
> > > > > > > > > > > > > > work,
> > > > > > > > > > > > > > > it means the section is complete.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Let's leave the code to the PRs.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Sun, Feb 19, 2023 at 2:59 PM SiNan Liu <
> > > > > > > > > > liusinan1998@gmail.com>
> > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Hi all,
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > I made a PIP to discuss:
> > > > > > > > > > > > > https://github.com/apache/pulsar/issues/19565
> > > > > > > > > > > > > > .
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > We can talk about the current design here.
> > > > Especially
> > > > > > for
> > > > > > > > the
> > > > > > > > > > > field
> > > > > > > > > > > > > > type
> > > > > > > > > > > > > > > > change check rules, please give your valuable
> > > > advice.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > > Sinan
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > >
> >

Re: [DISCUSS] PIP-246: Improved PROTOBUF_NATIVE schema compatibility checks without using avro-protobuf

Posted by SiNan Liu <li...@gmail.com>.
Hi, bo.

1. I understand what you say, to develop a new
`ProtobufNativeAdvancedSchemaCompatibilityCheck`, rather than changing
existing `ProtobufNativeSchemaCompatibilityCheck`. But I found a few small
problems:

(1)ProtobufNativeAdvancedSchemaCompatibilityCheck and
ProtobufNativeSchemaCompatibilityCheck schemaType is PROTOBUF_NATIVE. It
looks like both checkers are PROTOBUF not using AVRO-PROTOBUF's "native"
implementation, which leads to some problems or "unreasonable" and gives me
some extended thinking and questions.

(2)In broker.conf

`schemaRegistryCompatibilityCheckers`. If
ProtobufNativeSchemaCompatibilityCheck and
ProtobufNativeAdvancedSchemaCompatibilityCheck all set. This is going to
overwrite each other. Because this is a map:

https://github.com/apache/pulsar/blob/af1360fb167c1f9484fda5771df3ea9b21d1440b/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/schema/SchemaRegistryService.java#L36-L44

```java

Map<SchemaType, SchemaCompatibilityCheck> checkers = new HashMap<>();

for (String className : checkerClasses) {

SchemaCompatibilityCheck schemaCompatibilityCheck =
Reflections.createInstance(className,

SchemaCompatibilityCheck.class,
Thread.currentThread().getContextClassLoader());

checkers.put(schemaCompatibilityCheck.getSchemaType(),
schemaCompatibilityCheck);

```

Is this a big problem or a small one? Is it possible or unnecessary? Maybe
we can write in the documentation that protobufNative checkers can only
choose one of the two? Why are there two Checkers for different
implementations of the same schemaType? Why not the checker to create
different validator, so we don not have to change
schemaRegistryCompatibilityCheckers.

(3)And after the update to ProtobufNativeAdvancedSchemaCompatibilityCheck.
Existing topics previously only checked the name of the root message, not
the content of protobuf.

What if the user wants both Checkers?

Set to ProtobufNativeAdvancedSchemaCompatibilityCheck, affect the topic of
the existing schema?

Older topics still use the old checker, and newer topics or certain older
topics use the new advancedchecker.

(4)So should we have one schemaType for a checker? protobufNativeChecker
can have as many different implementation classes as possible. This
classname configuration in PIP, let's see if it can be set at the topic
level. In the current PIP design I just load this parameter into the
checker when the broker is started and the checkers map is set up. Can I do
this in the new normal pr if I want to support topic level? Or perfect it
here?

Add a call PROTOBUF_NATIVE_ADVANCE schemaType corresponding
ProtobufNativeAdvancedSchemaCompatibilityCheck? (Seems to be more trouble).

Sorry I can not use the computer and network in the company, I use my
mobile phone to reply to the email, the format may be a bit messy. Please
understand.

Thanks,

sinan


丛搏 <bo...@apache.org> 于 2023年3月7日周二 下午11:39写道:

> SiNan Liu <li...@gmail.com> 于2023年3月7日周二 13:22写道:
> >
> > Great to see your comment, bo!
> >
> > 1. The first way. The protobuf website has a description of the rules,
> but
> > no plans to implement them.
> > https://protobuf.dev/programming-guides/proto/#updating
>
> https://groups.google.com/g/protobuf
> maybe ask here
>
> >
> > 2. I think this PIP can be divided into two parts.
> > (1) Add a flag(`ValidatorClassName`), load it into
> > `ProtobufNativeSchemaCompatibilityCheck` when the broker starts.
> > ValidatorClassName is empty by default, and the implementation continues
> as
> > before, with no change for the user.
>
> `ProtobufNativeSchemaCompatibilityCheck` is a plugin in `broker.conf`
> ```
>
> schemaRegistryCompatibilityCheckers=org.apache.pulsar.broker.service.schema.JsonSchemaCompatibilityCheck,org.apache.pulsar.broker.service.schema.AvroSchemaCompatibilityCheck,org.apache.pulsar.broker.service.schema.ProtobufNativeSchemaCompatibilityCheck
> ```
> I do not recommend that we directly modify this plugin and continue to
> add configuration items, which will cause trouble for users.
> We have a lot of configs and it's getting very unwieldy.
> in my opinion, we don't change
>
> `org.apache.pulsar.broker.service.schema.ProtobufNativeSchemaCompatibilityCheck`,
> it is a simple implementation, it doesn't go wrong very often, most
> users will use it. we can add another ProtobufNativeCheck named
> `ProtobufNativeAdvancedSchemaCompatibilityCheck ` or other. in this
> way, we don't need to add this flag. There is no need to consider
> compatibility, it is just a plug-in and will not affect current logic.
> If the user needs it, just change the plugin to the new implementation
>
> > ```java
> >     ProtobufNativeSchemaValidator DEFAULT = (fromDescriptors,
> toDescriptor)
> > -> {
> >         for (Descriptors.Descriptor fromDescriptor : fromDescriptors) {
> >             // The default implementation only checks if the root message
> > has changed.
> >             if
> > (!fromDescriptor.getFullName().equals(toDescriptor.getFullName())) {
> >                 throw new ProtoBufCanReadCheckException("Protobuf root
> > message isn't allow change!");
> >             }
> >         }
> >     };
> > ```
> > `ValidatorClassName` value also can be set to the current implementation
> of
> > PIP add
> >
> `org.apache.pulsar.broker.service.schema.validator.ProtobufNativeSchemaBreakValidatorImpl`.
> >
> > (2) Recoding the `ProtobufNativeSchemaCompatibilityCheck`. Through the
> flag
> > (`ValidatorClassName`) to build different
> `ProtobufNativeSchemaValidator`.
> > Isn't it just a plug-in? The user can develop and choose a different
> > `ProtobufNativeSchemaValidator`. I think it didn't change the logic, it
> > just allowed him to expand it.
> >
> >
> > I think this PIP should be an enhancement and supplement to the function,
> > and there is no such thing as unnecessary and meaningless.
> >
> >
> > Thanks,
> > sinan
> >
> >
> >
> >
> >
> > 丛搏 <bo...@apache.org> 于2023年3月7日周二 11:53写道:
> >
> > > I think we have two ways to do that.
> > >
> > > First way: We need to advance the improvement of java in protobuf. Ask
> > > if they have plans to improve.
> > >
> > > Second way: the new PROTOBUF_NATIVE `SchemaCompatibilityCheck` should
> > > be implemented as a plugin, don't change any existing plugin logic
> > > (it's simple and already used). I don't recommend adding flags for
> > > rollback, it adds configuration and makes little sense.
> > >
> > > Thanks,
> > > Bo
> > >
> > > Asaf Mesika <as...@gmail.com> 于2023年3月6日周一 23:00写道:
> > >
> > > >
> > > > Can you convert the code block which is actually a quote in the
> > > > beginning of the PIP to something which doesn't require to scroll
> > > > horizontally so much?
> > > > Use
> > > >
> > >
> https://docs.github.com/en/get-started/writing-on-github/getting-started-with-writing-and-formatting-on-github/basic-writing-and-formatting-syntax#quoting-text
> > > >
> > > > Let's improve the clarity of what you wrote:
> > > >
> > > > "the PROTOBUF uses avro struct to store."
> > > > -->
> > > > When Schema type PROTOBUF is used, Pulsar Client assumes the object
> given
> > > > to it as message data is an auto-generated POJO containing the
> > > annotations
> > > > encoding the schema. The client is using a converter, which converts
> a
> > > > Protobuf schema descriptor into an Avro schema and sends that as the
> > > Schema
> > > > of the producer/consumer.
> > > >
> > > > "On the broker side, protobuf and avro both use SchemaData converted
> to
> > > > org.apache.avro.Schema."
> > > > -->
> > > > Since the schema is an Avro schema, the implementation of
> compatibility
> > > > check on the broker side is to simply re-use the compatibility check
> of
> > > the
> > > > AVRO schema type.
> > > >
> > > > "ProtobufSchema is different from ProtobufNativeSchema in schema
> > > > compatibility check it uses avro-protobuf.
> > > >
> > >
> https://central.sonatype.com/artifact/org.apache.avro/avro-protobuf/1.11.1/overview
> > > > But the current implementation of ProtobufNative schema compatibility
> > > > check only
> > > > checked if the root message name is changed."
> > > >
> > > > -->
> > > > PROTOBUF_NATIVE schema type is different.
> > > > The client is actually using Protobuf Descriptor as the schema, as
> > > opposed
> > > > to Avro schema of PROTOBUF schema type. In the broker, the
> > > PROTOBUF_NATIVE
> > > > compatibility check actually hasn't implemented any rule, besides
> one:
> > > > checking if the root message name has changed.
> > > >
> > > >
> > > >
> > > > >    1. For now, there is no official or third-party solution for
> > > ProtoBuf
> > > > >    compatibility. If in the future have better solutions of a third
> > > party or
> > > > >    the official, we develop new ProtobufNativeSchemaValidator and
> use,
> > > so
> > > > >    add a flag.
> > > > >
> > > > > Who do you need to make that configurable? Once you found a third
> > > party,
> > > > just switch to it? Who knows, maybe you never will. Introduce it
> when you
> > > > find it, not now.
> > > >
> > > >
> > > > We improve in ProtobufNativeSchemaCompatibilityCheck BACKWARD,
> FORWARD
> > > > > these strategies. As with the AVRO implementation, protobuf
> > > compatibility
> > > > > checking need implementing the canRead method. *This will check
> that
> > > > > the writtenschema can be read by readSchema.*
> > > >
> > > >
> > > > I completely disagree.
> > > > Avro implementation is confusing for our use case. Don't copy that.
> > > >
> > > > You have
> > > >
> > > > public void checkCompatible(SchemaData from, SchemaData to,
> > > > SchemaCompatibilityStrategy strategy)
> > > >         throws IncompatibleSchemaException {
> > > >     Descriptor fromDescriptor =
> > > > ProtobufNativeSchemaUtils.deserialize(from.getData());
> > > >     Descriptor toDescriptor =
> > > > ProtobufNativeSchemaUtils.deserialize(to.getData());
> > > >     switch (strategy) {
> > > >         case BACKWARD_TRANSITIVE:
> > > >         case BACKWARD:
> > > >         case FORWARD_TRANSITIVE:
> > > >         case FORWARD:
> > > >         case FULL_TRANSITIVE:
> > > >         case FULL:
> > > >             checkRootMessageChange(fromDescriptor, toDescriptor,
> > > strategy);
> > > >             return;
> > > >         case ALWAYS_COMPATIBLE:
> > > >             return;
> > > >         default:
> > > >             throw new IncompatibleSchemaException("Unknown
> > > > SchemaCompatibilityStrategy.");
> > > >     }
> > > > }
> > > >
> > > > I would rename :
> > > > from --> currentSchema
> > > > to --> newSchema
> > > >
> > > > Use that switch case and have a method for each like:
> > > > validateBackwardsCompatibility(currentSchema, newSchema)
> > > >
> > > > I dislike canRead and usage of writtenSchema, since you have two
> > > completely
> > > > different use cases: from the producing side and the consumer side.
> > > >
> > > > schemaValidatorBuilder
> > > > >
> > > > > I dislike this proposal. IMO Avro implementation is way too
> > > complicated.
> > > > Why not have a simple function for validation for each switch case
> above?
> > > > Why do we need strategy and builder, and all this complexity?
> > > >
> > > >
> > > > *Here are the basic compatibility rules we've defined:*
> > > >
> > > >
> > > > IMO it's impossible to read the validation rules as you described
> them.
> > > > I wrote how they should be structured numerous times above.
> > > > I can't validate them.
> > > >
> > > >
> > > > IMO, the current design is very hard to read.
> > > > Please try to avoid jumping into code sections.
> > > > Write a high level design section, in which you describe in words
> what
> > > you
> > > > plan to do.
> > > > Write the validation rules in the structure that is easy to
> understand:
> > > > rules per each compatibility check, and use proper words (current
> schema,
> > > > new schema), since new schema can be once used for read and once
> used for
> > > > write.
> > > >
> > > > In its current form it takes too much time to understand the design,
> and
> > > it
> > > > shouldn't be the case.
> > > >
> > > > Thanks,
> > > >
> > > > Asaf
> > > >
> > > >
> > > > >
> > > >
> > > >
> > > >
> > > > On Sun, Mar 5, 2023 at 3:58 PM SiNan Liu <li...@gmail.com>
> wrote:
> > > >
> > > > > Hi! I updated the explanation of some things in the PIP issue. And
> also
> > > > > added a new “flag” in the conf is used as the different
> > > > > ProtobufNativeSchemaValidator implementation, also set
> > > > > ProtobufNativeSchemaValidator default only check whether the name
> of
> > > the
> > > > > root message is the same.
> > > > >
> > > > >
> > > > > Thanks,
> > > > > sinan
> > > > >
> > > > >
> > > > > Asaf Mesika <as...@gmail.com> 于2023年3月5日周日 20:21写道:
> > > > >
> > > > > > On Wed, Mar 1, 2023 at 4:33 PM SiNan Liu <liusinan1998@gmail.com
> >
> > > wrote:
> > > > > >
> > > > > > > >
> > > > > > > > Can you please explain how a Protobuf Schema descriptor can
> be
> > > > > > validated
> > > > > > > > for backward compatibility check using Avro based
> compatibility
> > > > > rules?
> > > > > > > > Doesn't it expect the schema to be Avro, but it is actually a
> > > > > Protobuf
> > > > > > > > descriptor?
> > > > > > > > Is there some translation happening?
> > > > > > >
> > > > > > >
> > > > > > > 1. *You can take a quick look at the previous design, the
> PROTOBUF
> > > uses
> > > > > > > avro struct to store.*
> > > > > > > https://github.com/apache/pulsar/pull/1954
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/ProtobufSchema.java#L59-L61
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/ProtobufSchema.java#L110-L115
> > > > > >
> > > > > >
> > > > > > Ok. So to summarize your code (easier to write it than send
> links):
> > > > > > * Pulsar Client, when used with Protobuf Schema, actually
> converts
> > > the
> > > > > > Protobuf descriptor into an Avro Schema (using code found inside
> Avro
> > > > > > library) and saves that Avro schema as the schema. It's not
> saving
> > > the
> > > > > > protobuf descriptor at all. Very confusing I have to add - never
> > > expected
> > > > > > that.
> > > > > > This explains why In the ProtobufSchemaCompatibilityCheck they
> just
> > > > > extend
> > > > > > the Avro without doing any translation.
> > > > > >
> > > > > > Thanks for that.
> > > > > >
> > > > > > Now thatI finally understand this, I can say that: you *must*
> explain
> > > > > that
> > > > > > in the motivation part in your PIP.
> > > > > >
> > > > > >
> > > > > >
> > > > > > >
> > > > > > >
> > > > > > > 2. *On the broker side, protobuf and avro both use `SchemaData`
> > > > > converted
> > > > > > > to `org.apache.avro.Schema`.*
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/ServerCnx.java#L1280-L1293
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/schema/ProtobufSchemaCompatibilityCheck.java#L26-L31
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/schema/AvroSchemaBasedCompatibilityCheck.java#L47-L70
> > > > > >
> > > > > >
> > > > > > Actually those links don't really help.
> > > > > > The main link that helps is:
> > > > > >
> > > > > >
> > > > >
> > >
> https://github.com/apache/pulsar/blob/ec102fb024a6ea2b195826778300f20e330dff06/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/ProtobufSchema.java#L102-L122
> > > > > >
> > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > I'm sorry - I don't understand.
> > > > > > > > I understand the different compatibility check strategies.
> > > > > > > > If you just spell them out here, then as you say, just
> translate
> > > the
> > > > > > > > Protobuf Descriptor into an Avro schema and run the Avro
> > > > > > > > compatibility validation, no?
> > > > > > > > I believe the answer is no, since you may want to verify
> > > different
> > > > > > things
> > > > > > > > when it comes to Protobuf, which are different then Avro.
> > > > > > >
> > > > > > >
> > > > > > > 1.
> > > > > > > *ProtobufSchema is different from ProtobufNativeSchema in that
> it
> > > uses
> > > > > > > avro-protobuf.*
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> https://central.sonatype.com/artifact/org.apache.avro/avro-protobuf/1.11.1/overview
> > > > > > > *ProtobufNativeSchema needs a native compatibility check, but
> > > there is
> > > > > no
> > > > > > > official or third party implementation. So this PIP does not
> use
> > > > > > > avro-protobuf for protobuf compatibility checking.*
> > > > > > >
> > > > > > > 2. *By the way, this is implemented in much the same way that
> > > Apache
> > > > > avro
> > > > > > > does compatibility checking.*
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/SchemaValidatorBuilder.java
> > > > > > > `canReadStrategy`,`canBeReadStrategy`,`mutualReadStrategy`
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/ValidateCanRead.java
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/ValidateCanBeRead.java
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/ValidateMutualRead.java
> > > > > > > *In `ValidateMutualRead.java`, the arguments of `canRead()` are
> > > > > > > writtenSchema and readSchema. We only need to change the order
> of
> > > > > > arguments
> > > > > > > we pass to `canRead()`.*
> > > > > > > ```java
> > > > > > > private void validateWithStrategy(Descriptors.Descriptor
> > > toValidate,
> > > > > > > Descriptors.Descriptor fromDescriptor) throws
> > > > > > ProtoBufCanReadCheckException
> > > > > > > {
> > > > > > > switch (strategy) {
> > > > > > > case CanReadExistingStrategy -> canRead(fromDescriptor,
> > > toValidate);
> > > > > > > case CanBeReadByExistingStrategy -> canRead(toValidate,
> > > > > fromDescriptor);
> > > > > > > case CanBeReadMutualStrategy -> {
> > > > > > > canRead(toValidate, fromDescriptor);
> > > > > > > canRead(fromDescriptor, toValidate);
> > > > > > > }
> > > > > > > }
> > > > > > > }
> > > > > > >
> > > > > > > private void canRead(Descriptors.Descriptor writtenSchema,
> > > > > > > Descriptors.Descriptor readSchema) throws
> > > > > ProtoBufCanReadCheckException {
> > > > > > >
> > > > > >
> > > > >
> > >
> ProtobufNativeSchemaBreakCheckUtils.checkSchemaCompatibility(writtenSchema,
> > > > > > > readSchema);
> > > > > > > }
> > > > > > > ```
> > > > > > >
> > > > > > >
> > > > > > I get that you want to take inspiration from the existing Avro
> Schema
> > > > > > compatibility check, to do your code design.
> > > > > > I also understand you *won't* use any existing avro code for
> that.
> > > > > > I also understand, you have to write the validation check on your
> > > own,
> > > > > > since there is no 3rd party to explain that.
> > > > > >
> > > > > > The only thing I can't understand are the actual rules as I wrote
> > > before,
> > > > > > since they are written confusingly.
> > > > > > So, I repeat what I asked before:
> > > > > >
> > > > > > I think you should structure the validation rules differently:
> > > > > >
> > > > > > * Backward checks
> > > > > > ** List down rules, where use newSchema (the schema used by
> producer
> > > or
> > > > > > consumer) and existingSchema (last schema used)
> > > > > > * Forward
> > > > > > ** List down rules, where use newSchema (the schema used by
> producer
> > > or
> > > > > > consumer) and existingSchema (last schema used)
> > > > > >
> > > > > > Once that's accomplished I will be able to understand the
> different
> > > > > > validation rules for each compatibility check.
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > >
> > > > > > > Thanks,
> > > > > > > sinan
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Asaf Mesika <as...@gmail.com> 于2023年3月1日周三 21:19写道:
> > > > > > > >
> > > > > > > > On Mon, Feb 27, 2023 at 3:47 PM SiNan Liu <
> > > liusinan1998@gmail.com>
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > > >
> > > > > > > > > > I read it and they look identical. What's the difference
> > > between
> > > > > > > them?
> > > > > > > > >
> > > > > > > > > Current avro,json, and protobuf schemas are all implemented
> > > based
> > > > > on
> > > > > > > AVRO.
> > > > > > > > > > What do you mean, they are all implemented based on
> Avro? You
> > > > > mean
> > > > > > > the
> > > > > > > > > > protobuf schema is converted into an Avro Schema, and
> then
> > > you
> > > > > use
> > > > > > > Avro
> > > > > > > > > > compatibility validation?
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> `org.apache.pulsar.broker.service.schema.ProtobufSchemaCompatibilityCheck`
> > > > > > > > >
> > > > > >
> > > `org.apache.pulsar.broker.service.schema.AvroSchemaCompatibilityCheck`
> > > > > > > > >
> > > > > >
> > > `org.apache.pulsar.broker.service.schema.JsonSchemaCompatibilityCheck`
> > > > > > > > > They all extends `AvroSchemaBasedCompatibilityCheck`, the
> > > > > > > > > `checkCompatible()` is the same implementation with `AVRO`.
> > > > > > > > >
> > > > > > > >
> > > > > > > > Can you please explain how a Protobuf Schema descriptor can
> be
> > > > > > validated
> > > > > > > > for backward compatibility check using Avro based
> compatibility
> > > > > rules?
> > > > > > > > Doesn't it expect the schema to be Avro, but it is actually a
> > > > > Protobuf
> > > > > > > > descriptor?
> > > > > > > > Is there some translation happening?
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > I think you should structure the validation rules
> differently:
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > The Compatibility check strategy is described on the
> website
> > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> https://pulsar.apache.org/docs/next/schema-understand/#schema-compatibility-check-strategy
> > > > > > > > > 1. BACKWARD(CanReadExistingStrategy): Consumers using
> schema
> > > V3 can
> > > > > > > process
> > > > > > > > > data written by producers using the last schema version
> V2. So
> > > V2
> > > > > is
> > > > > > > > > "writtenSchema" and V3 is "readSchema".
> > > > > > > > > 2. FORWARD(CanBeReadByExistingStrategy): Consumers using
> the
> > > last
> > > > > > > schema
> > > > > > > > > version V2 can process data written by producers using a
> new
> > > schema
> > > > > > V3,
> > > > > > > > > even though they may not be able to use the full
> capabilities
> > > of
> > > > > the
> > > > > > > new
> > > > > > > > > schema. So V3 is "writtenSchema" and V2 is "readSchema".
> > > > > > > > > 3. FULL(CanBeReadMutualStrategy): Schemas are both
> backward and
> > > > > > forward
> > > > > > > > > compatible.
> > > > > > > > > Schema can evolve. The old version schema and the new
> version
> > > > > schema
> > > > > > > should
> > > > > > > > > be well understood.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > I'm sorry - I don't understand.
> > > > > > > > I understand the different compatibility check strategies.
> > > > > > > > If you just spell them out here, then as you say, just
> translate
> > > the
> > > > > > > > Protobuf Descriptor into an Avro schema and run the Avro
> > > > > > > > compatibility validation, no?
> > > > > > > > I believe the answer is no, since you may want to verify
> > > different
> > > > > > things
> > > > > > > > when it comes to Protobuf, which are different then Avro.
> > > > > > > >
> > > > > > > > At the current state, I can't understand your design at all.
> > > Please
> > > > > > help
> > > > > > > > clarify that.
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > So each strategy should have its own section.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > The arguments of `canRead()` are writtenSchema and
> readSchema.
> > > As
> > > > > > we've
> > > > > > > > > just described, we only need to change the order of
> arguments
> > > we
> > > > > pass
> > > > > > > to
> > > > > > > > > `canRead()`.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > sinan
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Asaf Mesika <as...@gmail.com> 于2023年2月27日周一 20:49写道:
> > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > And you can see the difference between ProtoBuf and
> > > > > > ProtoBufNative:
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > >
> > > https://pulsar.apache.org/docs/next/schema-get-started/#protobufnative
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > https://pulsar.apache.org/docs/next/schema-get-started/#protobuf
> > > > > > > > > > >
> > > > > > > > > >  I read it and they look identical. What's the difference
> > > between
> > > > > > > them?
> > > > > > > > > >
> > > > > > > > > > Current avro,json, and protobuf schemas are all
> implemented
> > > based
> > > > > > on
> > > > > > > > > AVRO.
> > > > > > > > > >
> > > > > > > > > > What do you mean, they are all implemented based on
> Avro? You
> > > > > mean
> > > > > > > the
> > > > > > > > > > protobuf schema is converted into an Avro Schema, and
> then
> > > you
> > > > > use
> > > > > > > Avro
> > > > > > > > > > compatibility validation?
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > *Here are the basic compatibility rules we've defined:*
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > I think you should structure the validation rules
> > > differently:
> > > > > > > > > >
> > > > > > > > > > * Backward checks
> > > > > > > > > > ** List down rules, where use newSchema (the schema used
> by
> > > > > > producer
> > > > > > > or
> > > > > > > > > > consumer) and existingSchema (last schema used)
> > > > > > > > > > * Forward
> > > > > > > > > > ** List down rules, where use newSchema (the schema used
> by
> > > > > > producer
> > > > > > > or
> > > > > > > > > > consumer) and existingSchema (last schema used)
> > > > > > > > > >
> > > > > > > > > > So each strategy should have its own section.
> > > > > > > > > >
> > > > > > > > > > I'm saying this since you used "writttenSchema" word but
> it
> > > > > > > represents
> > > > > > > > > > something completely different if it's backward or
> forward
> > > check.
> > > > > > > > > >
> > > > > > > > > > Once you'll have that structure like that, I personally
> will
> > > be
> > > > > > able
> > > > > > > to
> > > > > > > > > > read and understand it.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > The motivation and problem statement are now good -
> thanks
> > > for
> > > > > > > improving
> > > > > > > > > > it.
> > > > > > > > > >
> > > > > > > > > > On Mon, Feb 27, 2023 at 8:20 AM SiNan Liu <
> > > > > liusinan1998@gmail.com>
> > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Hi! I updated the PIP issue again. This time I've added
> > > some
> > > > > > > background
> > > > > > > > > > and
> > > > > > > > > > > some explanations.
> > > > > > > > > > >
> > > > > > > > > > > The compatibility check rules are already written in
> the
> > > > > > > > > Implementation.
> > > > > > > > > > > ProtoBufNative implements the same canRead method as
> Apache
> > > > > Avro.
> > > > > > > > > > > It does this by checking whether the schema for
> writing and
> > > > > > reading
> > > > > > > is
> > > > > > > > > > > compatible. I also indicate whether the writtenSchema
> and
> > > > > > > readSchema of
> > > > > > > > > > the
> > > > > > > > > > > Backward, Forward, and Full strategies are the old or
> the
> > > new
> > > > > > > version
> > > > > > > > > of
> > > > > > > > > > > the schema.
> > > > > > > > > > >
> > > > > > > > > > > Thanks,
> > > > > > > > > > > sinan
> > > > > > > > > > >
> > > > > > > > > > > Asaf Mesika <as...@gmail.com> 于2023年2月26日周日
> 23:24写道:
> > > > > > > > > > >
> > > > > > > > > > > > I'm sorry, but this PIP lacks a lot of background
> > > knowledge,
> > > > > so
> > > > > > > you
> > > > > > > > > > need
> > > > > > > > > > > to
> > > > > > > > > > > > add IMO for people to understand it. You don't need
> to
> > > > > explain
> > > > > > > the
> > > > > > > > > > entire
> > > > > > > > > > > > pulsar in this PIP, but at the very least a few
> > > paragraphs
> > > > > > > detailing
> > > > > > > > > > all
> > > > > > > > > > > > you need to know, to put you in context:
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >    - Start by saying Pulsar as a built-in schema
> registry
> > > > > > inside
> > > > > > > > > Pulsar
> > > > > > > > > > > >    broker.
> > > > > > > > > > > >       - Every time the client updates the schema, it
> > > uploads
> > > > > it
> > > > > > > to
> > > > > > > > > the
> > > > > > > > > > > >       broker. When that happens, it has a feature
> which
> > > > > > validates
> > > > > > > if
> > > > > > > > > > the
> > > > > > > > > > > > new
> > > > > > > > > > > >       schema version is compatible with the previous
> > > > > versions.
> > > > > > > There
> > > > > > > > > > > > are 4 types
> > > > > > > > > > > >       of compatibility: Full, ... (complete and
> explain
> > > each
> > > > > > one
> > > > > > > > > > briefly)
> > > > > > > > > > > >    - Also explain Pulsar Schema registry supports
> various
> > > > > > schema
> > > > > > > > > > > >    protocols:  Avro, protobuf native, ... (complete
> the
> > > > > rest),
> > > > > > > each
> > > > > > > > > > > > protocol
> > > > > > > > > > > >    has a schema which dictates how to serialize and
> > > > > deserialize
> > > > > > > the
> > > > > > > > > > > message
> > > > > > > > > > > >    content into typed object.
> > > > > > > > > > > >    - Explain in short what is protobuf native
> (compare
> > > > > protobuf
> > > > > > > > > > > non-native)
> > > > > > > > > > > >    - Please don't paste code instead of explaining.
> > > > > > > > > > > >       - Explain that protobuf native current
> validation
> > > check
> > > > > > is
> > > > > > > only
> > > > > > > > > > > >       composed of checking the root message name is
> the
> > > same
> > > > > > > between
> > > > > > > > > > > > the current
> > > > > > > > > > > >       schema version and the new version.
> > > > > > > > > > > >          - Explain briefly what is a root message
> and its
> > > > > name.
> > > > > > > > > > > >       - Explain the problem (list scenarios) that we
> have
> > > > > > because
> > > > > > > > > > > protobuf
> > > > > > > > > > > >       native schema only supports FULL compatibility
> > > > > > validation.
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Regarding high level design - as in what you plan to
> do.
> > > > > > > > > > > > I suggest you add "High Level Design" and in it
> detail
> > > how
> > > > > you
> > > > > > > plan
> > > > > > > > > to
> > > > > > > > > > > > validate, per protobuf version, per compatibility
> check
> > > > > > > (backward,
> > > > > > > > > > > forward,
> > > > > > > > > > > > full,...).
> > > > > > > > > > > > I tried reading the implementation - for me , it's
> all
> > > over
> > > > > the
> > > > > > > > > place.
> > > > > > > > > > > Can
> > > > > > > > > > > > you please list in order what I wrote above, and
> list the
> > > > > > > validation
> > > > > > > > > > > rules
> > > > > > > > > > > > with a good explanation why you validate it like
> that?
> > > > > > > > > > > >
> > > > > > > > > > > > Lastly, one you have all the validation rules clearly
> > > stated,
> > > > > > you
> > > > > > > can
> > > > > > > > > > use
> > > > > > > > > > > > it to document it properly so users can know what
> > > validation
> > > > > to
> > > > > > > > > expect.
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks,
> > > > > > > > > > > >
> > > > > > > > > > > > Asaf
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > On Wed, Feb 22, 2023 at 5:10 PM SiNan Liu <
> > > > > > > liusinan1998@gmail.com>
> > > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > Sorry, my mistake. I removed the code and
> described the
> > > > > > design
> > > > > > > to
> > > > > > > > > > > improve
> > > > > > > > > > > > > the PROTOBUF_NATIVE schema compatibility checks.
> You
> > > can
> > > > > > have a
> > > > > > > > > look.
> > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > Asaf Mesika <as...@gmail.com> 于2023年2月22日周三
> > > 21:16写道:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > I read it but you're almost directly diving into
> the
> > > > > code -
> > > > > > > it
> > > > > > > > > will
> > > > > > > > > > > > take
> > > > > > > > > > > > > me
> > > > > > > > > > > > > > hours just to reverse engineer your design.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Can you please include a "High Level Design"
> section
> > > in
> > > > > > which
> > > > > > > you
> > > > > > > > > > > > explain
> > > > > > > > > > > > > > how you plan to tackle any issue?
> > > > > > > > > > > > > > If I can read that section and explain to someone
> > > else
> > > > > how
> > > > > > > this
> > > > > > > > > > will
> > > > > > > > > > > > > work,
> > > > > > > > > > > > > > it means the section is complete.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Let's leave the code to the PRs.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Sun, Feb 19, 2023 at 2:59 PM SiNan Liu <
> > > > > > > > > liusinan1998@gmail.com>
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Hi all,
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I made a PIP to discuss:
> > > > > > > > > > > > https://github.com/apache/pulsar/issues/19565
> > > > > > > > > > > > > .
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > We can talk about the current design here.
> > > Especially
> > > > > for
> > > > > > > the
> > > > > > > > > > field
> > > > > > > > > > > > > type
> > > > > > > > > > > > > > > change check rules, please give your valuable
> > > advice.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > Sinan
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
>

Re: [DISCUSS] PIP-246: Improved PROTOBUF_NATIVE schema compatibility checks without using avro-protobuf

Posted by 丛搏 <bo...@apache.org>.
SiNan Liu <li...@gmail.com> 于2023年3月7日周二 13:22写道:
>
> Great to see your comment, bo!
>
> 1. The first way. The protobuf website has a description of the rules, but
> no plans to implement them.
> https://protobuf.dev/programming-guides/proto/#updating

https://groups.google.com/g/protobuf
maybe ask here

>
> 2. I think this PIP can be divided into two parts.
> (1) Add a flag(`ValidatorClassName`), load it into
> `ProtobufNativeSchemaCompatibilityCheck` when the broker starts.
> ValidatorClassName is empty by default, and the implementation continues as
> before, with no change for the user.

`ProtobufNativeSchemaCompatibilityCheck` is a plugin in `broker.conf`
```
schemaRegistryCompatibilityCheckers=org.apache.pulsar.broker.service.schema.JsonSchemaCompatibilityCheck,org.apache.pulsar.broker.service.schema.AvroSchemaCompatibilityCheck,org.apache.pulsar.broker.service.schema.ProtobufNativeSchemaCompatibilityCheck
```
I do not recommend that we directly modify this plugin and continue to
add configuration items, which will cause trouble for users.
We have a lot of configs and it's getting very unwieldy.
in my opinion, we don't change
`org.apache.pulsar.broker.service.schema.ProtobufNativeSchemaCompatibilityCheck`,
it is a simple implementation, it doesn't go wrong very often, most
users will use it. we can add another ProtobufNativeCheck named
`ProtobufNativeAdvancedSchemaCompatibilityCheck ` or other. in this
way, we don't need to add this flag. There is no need to consider
compatibility, it is just a plug-in and will not affect current logic.
If the user needs it, just change the plugin to the new implementation

> ```java
>     ProtobufNativeSchemaValidator DEFAULT = (fromDescriptors, toDescriptor)
> -> {
>         for (Descriptors.Descriptor fromDescriptor : fromDescriptors) {
>             // The default implementation only checks if the root message
> has changed.
>             if
> (!fromDescriptor.getFullName().equals(toDescriptor.getFullName())) {
>                 throw new ProtoBufCanReadCheckException("Protobuf root
> message isn't allow change!");
>             }
>         }
>     };
> ```
> `ValidatorClassName` value also can be set to the current implementation of
> PIP add
> `org.apache.pulsar.broker.service.schema.validator.ProtobufNativeSchemaBreakValidatorImpl`.
>
> (2) Recoding the `ProtobufNativeSchemaCompatibilityCheck`. Through the flag
> (`ValidatorClassName`) to build different `ProtobufNativeSchemaValidator`.
> Isn't it just a plug-in? The user can develop and choose a different
> `ProtobufNativeSchemaValidator`. I think it didn't change the logic, it
> just allowed him to expand it.
>
>
> I think this PIP should be an enhancement and supplement to the function,
> and there is no such thing as unnecessary and meaningless.
>
>
> Thanks,
> sinan
>
>
>
>
>
> 丛搏 <bo...@apache.org> 于2023年3月7日周二 11:53写道:
>
> > I think we have two ways to do that.
> >
> > First way: We need to advance the improvement of java in protobuf. Ask
> > if they have plans to improve.
> >
> > Second way: the new PROTOBUF_NATIVE `SchemaCompatibilityCheck` should
> > be implemented as a plugin, don't change any existing plugin logic
> > (it's simple and already used). I don't recommend adding flags for
> > rollback, it adds configuration and makes little sense.
> >
> > Thanks,
> > Bo
> >
> > Asaf Mesika <as...@gmail.com> 于2023年3月6日周一 23:00写道:
> >
> > >
> > > Can you convert the code block which is actually a quote in the
> > > beginning of the PIP to something which doesn't require to scroll
> > > horizontally so much?
> > > Use
> > >
> > https://docs.github.com/en/get-started/writing-on-github/getting-started-with-writing-and-formatting-on-github/basic-writing-and-formatting-syntax#quoting-text
> > >
> > > Let's improve the clarity of what you wrote:
> > >
> > > "the PROTOBUF uses avro struct to store."
> > > -->
> > > When Schema type PROTOBUF is used, Pulsar Client assumes the object given
> > > to it as message data is an auto-generated POJO containing the
> > annotations
> > > encoding the schema. The client is using a converter, which converts a
> > > Protobuf schema descriptor into an Avro schema and sends that as the
> > Schema
> > > of the producer/consumer.
> > >
> > > "On the broker side, protobuf and avro both use SchemaData converted to
> > > org.apache.avro.Schema."
> > > -->
> > > Since the schema is an Avro schema, the implementation of compatibility
> > > check on the broker side is to simply re-use the compatibility check of
> > the
> > > AVRO schema type.
> > >
> > > "ProtobufSchema is different from ProtobufNativeSchema in schema
> > > compatibility check it uses avro-protobuf.
> > >
> > https://central.sonatype.com/artifact/org.apache.avro/avro-protobuf/1.11.1/overview
> > > But the current implementation of ProtobufNative schema compatibility
> > > check only
> > > checked if the root message name is changed."
> > >
> > > -->
> > > PROTOBUF_NATIVE schema type is different.
> > > The client is actually using Protobuf Descriptor as the schema, as
> > opposed
> > > to Avro schema of PROTOBUF schema type. In the broker, the
> > PROTOBUF_NATIVE
> > > compatibility check actually hasn't implemented any rule, besides one:
> > > checking if the root message name has changed.
> > >
> > >
> > >
> > > >    1. For now, there is no official or third-party solution for
> > ProtoBuf
> > > >    compatibility. If in the future have better solutions of a third
> > party or
> > > >    the official, we develop new ProtobufNativeSchemaValidator and use,
> > so
> > > >    add a flag.
> > > >
> > > > Who do you need to make that configurable? Once you found a third
> > party,
> > > just switch to it? Who knows, maybe you never will. Introduce it when you
> > > find it, not now.
> > >
> > >
> > > We improve in ProtobufNativeSchemaCompatibilityCheck BACKWARD, FORWARD
> > > > these strategies. As with the AVRO implementation, protobuf
> > compatibility
> > > > checking need implementing the canRead method. *This will check that
> > > > the writtenschema can be read by readSchema.*
> > >
> > >
> > > I completely disagree.
> > > Avro implementation is confusing for our use case. Don't copy that.
> > >
> > > You have
> > >
> > > public void checkCompatible(SchemaData from, SchemaData to,
> > > SchemaCompatibilityStrategy strategy)
> > >         throws IncompatibleSchemaException {
> > >     Descriptor fromDescriptor =
> > > ProtobufNativeSchemaUtils.deserialize(from.getData());
> > >     Descriptor toDescriptor =
> > > ProtobufNativeSchemaUtils.deserialize(to.getData());
> > >     switch (strategy) {
> > >         case BACKWARD_TRANSITIVE:
> > >         case BACKWARD:
> > >         case FORWARD_TRANSITIVE:
> > >         case FORWARD:
> > >         case FULL_TRANSITIVE:
> > >         case FULL:
> > >             checkRootMessageChange(fromDescriptor, toDescriptor,
> > strategy);
> > >             return;
> > >         case ALWAYS_COMPATIBLE:
> > >             return;
> > >         default:
> > >             throw new IncompatibleSchemaException("Unknown
> > > SchemaCompatibilityStrategy.");
> > >     }
> > > }
> > >
> > > I would rename :
> > > from --> currentSchema
> > > to --> newSchema
> > >
> > > Use that switch case and have a method for each like:
> > > validateBackwardsCompatibility(currentSchema, newSchema)
> > >
> > > I dislike canRead and usage of writtenSchema, since you have two
> > completely
> > > different use cases: from the producing side and the consumer side.
> > >
> > > schemaValidatorBuilder
> > > >
> > > > I dislike this proposal. IMO Avro implementation is way too
> > complicated.
> > > Why not have a simple function for validation for each switch case above?
> > > Why do we need strategy and builder, and all this complexity?
> > >
> > >
> > > *Here are the basic compatibility rules we've defined:*
> > >
> > >
> > > IMO it's impossible to read the validation rules as you described them.
> > > I wrote how they should be structured numerous times above.
> > > I can't validate them.
> > >
> > >
> > > IMO, the current design is very hard to read.
> > > Please try to avoid jumping into code sections.
> > > Write a high level design section, in which you describe in words what
> > you
> > > plan to do.
> > > Write the validation rules in the structure that is easy to understand:
> > > rules per each compatibility check, and use proper words (current schema,
> > > new schema), since new schema can be once used for read and once used for
> > > write.
> > >
> > > In its current form it takes too much time to understand the design, and
> > it
> > > shouldn't be the case.
> > >
> > > Thanks,
> > >
> > > Asaf
> > >
> > >
> > > >
> > >
> > >
> > >
> > > On Sun, Mar 5, 2023 at 3:58 PM SiNan Liu <li...@gmail.com> wrote:
> > >
> > > > Hi! I updated the explanation of some things in the PIP issue. And also
> > > > added a new “flag” in the conf is used as the different
> > > > ProtobufNativeSchemaValidator implementation, also set
> > > > ProtobufNativeSchemaValidator default only check whether the name of
> > the
> > > > root message is the same.
> > > >
> > > >
> > > > Thanks,
> > > > sinan
> > > >
> > > >
> > > > Asaf Mesika <as...@gmail.com> 于2023年3月5日周日 20:21写道:
> > > >
> > > > > On Wed, Mar 1, 2023 at 4:33 PM SiNan Liu <li...@gmail.com>
> > wrote:
> > > > >
> > > > > > >
> > > > > > > Can you please explain how a Protobuf Schema descriptor can be
> > > > > validated
> > > > > > > for backward compatibility check using Avro based compatibility
> > > > rules?
> > > > > > > Doesn't it expect the schema to be Avro, but it is actually a
> > > > Protobuf
> > > > > > > descriptor?
> > > > > > > Is there some translation happening?
> > > > > >
> > > > > >
> > > > > > 1. *You can take a quick look at the previous design, the PROTOBUF
> > uses
> > > > > > avro struct to store.*
> > > > > > https://github.com/apache/pulsar/pull/1954
> > > > > >
> > > > > >
> > > > >
> > > >
> > https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/ProtobufSchema.java#L59-L61
> > > > > >
> > > > > >
> > > > >
> > > >
> > https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/ProtobufSchema.java#L110-L115
> > > > >
> > > > >
> > > > > Ok. So to summarize your code (easier to write it than send links):
> > > > > * Pulsar Client, when used with Protobuf Schema, actually converts
> > the
> > > > > Protobuf descriptor into an Avro Schema (using code found inside Avro
> > > > > library) and saves that Avro schema as the schema. It's not saving
> > the
> > > > > protobuf descriptor at all. Very confusing I have to add - never
> > expected
> > > > > that.
> > > > > This explains why In the ProtobufSchemaCompatibilityCheck they just
> > > > extend
> > > > > the Avro without doing any translation.
> > > > >
> > > > > Thanks for that.
> > > > >
> > > > > Now thatI finally understand this, I can say that: you *must* explain
> > > > that
> > > > > in the motivation part in your PIP.
> > > > >
> > > > >
> > > > >
> > > > > >
> > > > > >
> > > > > > 2. *On the broker side, protobuf and avro both use `SchemaData`
> > > > converted
> > > > > > to `org.apache.avro.Schema`.*
> > > > > >
> > > > > >
> > > > >
> > > >
> > https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/ServerCnx.java#L1280-L1293
> > > > > >
> > > > > >
> > > > >
> > > >
> > https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/schema/ProtobufSchemaCompatibilityCheck.java#L26-L31
> > > > > >
> > > > > >
> > > > >
> > > >
> > https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/schema/AvroSchemaBasedCompatibilityCheck.java#L47-L70
> > > > >
> > > > >
> > > > > Actually those links don't really help.
> > > > > The main link that helps is:
> > > > >
> > > > >
> > > >
> > https://github.com/apache/pulsar/blob/ec102fb024a6ea2b195826778300f20e330dff06/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/ProtobufSchema.java#L102-L122
> > > > >
> > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > I'm sorry - I don't understand.
> > > > > > > I understand the different compatibility check strategies.
> > > > > > > If you just spell them out here, then as you say, just translate
> > the
> > > > > > > Protobuf Descriptor into an Avro schema and run the Avro
> > > > > > > compatibility validation, no?
> > > > > > > I believe the answer is no, since you may want to verify
> > different
> > > > > things
> > > > > > > when it comes to Protobuf, which are different then Avro.
> > > > > >
> > > > > >
> > > > > > 1.
> > > > > > *ProtobufSchema is different from ProtobufNativeSchema in that it
> > uses
> > > > > > avro-protobuf.*
> > > > > >
> > > > > >
> > > > >
> > > >
> > https://central.sonatype.com/artifact/org.apache.avro/avro-protobuf/1.11.1/overview
> > > > > > *ProtobufNativeSchema needs a native compatibility check, but
> > there is
> > > > no
> > > > > > official or third party implementation. So this PIP does not use
> > > > > > avro-protobuf for protobuf compatibility checking.*
> > > > > >
> > > > > > 2. *By the way, this is implemented in much the same way that
> > Apache
> > > > avro
> > > > > > does compatibility checking.*
> > > > > >
> > > > > >
> > > > >
> > > >
> > https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/SchemaValidatorBuilder.java
> > > > > > `canReadStrategy`,`canBeReadStrategy`,`mutualReadStrategy`
> > > > > >
> > > > > >
> > > > >
> > > >
> > https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/ValidateCanRead.java
> > > > > >
> > > > > >
> > > > >
> > > >
> > https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/ValidateCanBeRead.java
> > > > > >
> > > > > >
> > > > >
> > > >
> > https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/ValidateMutualRead.java
> > > > > > *In `ValidateMutualRead.java`, the arguments of `canRead()` are
> > > > > > writtenSchema and readSchema. We only need to change the order of
> > > > > arguments
> > > > > > we pass to `canRead()`.*
> > > > > > ```java
> > > > > > private void validateWithStrategy(Descriptors.Descriptor
> > toValidate,
> > > > > > Descriptors.Descriptor fromDescriptor) throws
> > > > > ProtoBufCanReadCheckException
> > > > > > {
> > > > > > switch (strategy) {
> > > > > > case CanReadExistingStrategy -> canRead(fromDescriptor,
> > toValidate);
> > > > > > case CanBeReadByExistingStrategy -> canRead(toValidate,
> > > > fromDescriptor);
> > > > > > case CanBeReadMutualStrategy -> {
> > > > > > canRead(toValidate, fromDescriptor);
> > > > > > canRead(fromDescriptor, toValidate);
> > > > > > }
> > > > > > }
> > > > > > }
> > > > > >
> > > > > > private void canRead(Descriptors.Descriptor writtenSchema,
> > > > > > Descriptors.Descriptor readSchema) throws
> > > > ProtoBufCanReadCheckException {
> > > > > >
> > > > >
> > > >
> > ProtobufNativeSchemaBreakCheckUtils.checkSchemaCompatibility(writtenSchema,
> > > > > > readSchema);
> > > > > > }
> > > > > > ```
> > > > > >
> > > > > >
> > > > > I get that you want to take inspiration from the existing Avro Schema
> > > > > compatibility check, to do your code design.
> > > > > I also understand you *won't* use any existing avro code for that.
> > > > > I also understand, you have to write the validation check on your
> > own,
> > > > > since there is no 3rd party to explain that.
> > > > >
> > > > > The only thing I can't understand are the actual rules as I wrote
> > before,
> > > > > since they are written confusingly.
> > > > > So, I repeat what I asked before:
> > > > >
> > > > > I think you should structure the validation rules differently:
> > > > >
> > > > > * Backward checks
> > > > > ** List down rules, where use newSchema (the schema used by producer
> > or
> > > > > consumer) and existingSchema (last schema used)
> > > > > * Forward
> > > > > ** List down rules, where use newSchema (the schema used by producer
> > or
> > > > > consumer) and existingSchema (last schema used)
> > > > >
> > > > > Once that's accomplished I will be able to understand the different
> > > > > validation rules for each compatibility check.
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > >
> > > > > > Thanks,
> > > > > > sinan
> > > > > >
> > > > > >
> > > > > >
> > > > > > Asaf Mesika <as...@gmail.com> 于2023年3月1日周三 21:19写道:
> > > > > > >
> > > > > > > On Mon, Feb 27, 2023 at 3:47 PM SiNan Liu <
> > liusinan1998@gmail.com>
> > > > > > wrote:
> > > > > > >
> > > > > > > > >
> > > > > > > > > I read it and they look identical. What's the difference
> > between
> > > > > > them?
> > > > > > > >
> > > > > > > > Current avro,json, and protobuf schemas are all implemented
> > based
> > > > on
> > > > > > AVRO.
> > > > > > > > > What do you mean, they are all implemented based on Avro? You
> > > > mean
> > > > > > the
> > > > > > > > > protobuf schema is converted into an Avro Schema, and then
> > you
> > > > use
> > > > > > Avro
> > > > > > > > > compatibility validation?
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > > >
> > `org.apache.pulsar.broker.service.schema.ProtobufSchemaCompatibilityCheck`
> > > > > > > >
> > > > >
> > `org.apache.pulsar.broker.service.schema.AvroSchemaCompatibilityCheck`
> > > > > > > >
> > > > >
> > `org.apache.pulsar.broker.service.schema.JsonSchemaCompatibilityCheck`
> > > > > > > > They all extends `AvroSchemaBasedCompatibilityCheck`, the
> > > > > > > > `checkCompatible()` is the same implementation with `AVRO`.
> > > > > > > >
> > > > > > >
> > > > > > > Can you please explain how a Protobuf Schema descriptor can be
> > > > > validated
> > > > > > > for backward compatibility check using Avro based compatibility
> > > > rules?
> > > > > > > Doesn't it expect the schema to be Avro, but it is actually a
> > > > Protobuf
> > > > > > > descriptor?
> > > > > > > Is there some translation happening?
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > I think you should structure the validation rules differently:
> > > > > > > >
> > > > > > > >
> > > > > > > > The Compatibility check strategy is described on the website
> > > > > > > >
> > > > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > https://pulsar.apache.org/docs/next/schema-understand/#schema-compatibility-check-strategy
> > > > > > > > 1. BACKWARD(CanReadExistingStrategy): Consumers using schema
> > V3 can
> > > > > > process
> > > > > > > > data written by producers using the last schema version V2. So
> > V2
> > > > is
> > > > > > > > "writtenSchema" and V3 is "readSchema".
> > > > > > > > 2. FORWARD(CanBeReadByExistingStrategy): Consumers using the
> > last
> > > > > > schema
> > > > > > > > version V2 can process data written by producers using a new
> > schema
> > > > > V3,
> > > > > > > > even though they may not be able to use the full capabilities
> > of
> > > > the
> > > > > > new
> > > > > > > > schema. So V3 is "writtenSchema" and V2 is "readSchema".
> > > > > > > > 3. FULL(CanBeReadMutualStrategy): Schemas are both backward and
> > > > > forward
> > > > > > > > compatible.
> > > > > > > > Schema can evolve. The old version schema and the new version
> > > > schema
> > > > > > should
> > > > > > > > be well understood.
> > > > > > > >
> > > > > > > >
> > > > > > > I'm sorry - I don't understand.
> > > > > > > I understand the different compatibility check strategies.
> > > > > > > If you just spell them out here, then as you say, just translate
> > the
> > > > > > > Protobuf Descriptor into an Avro schema and run the Avro
> > > > > > > compatibility validation, no?
> > > > > > > I believe the answer is no, since you may want to verify
> > different
> > > > > things
> > > > > > > when it comes to Protobuf, which are different then Avro.
> > > > > > >
> > > > > > > At the current state, I can't understand your design at all.
> > Please
> > > > > help
> > > > > > > clarify that.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > So each strategy should have its own section.
> > > > > > > >
> > > > > > > >
> > > > > > > > The arguments of `canRead()` are writtenSchema and readSchema.
> > As
> > > > > we've
> > > > > > > > just described, we only need to change the order of arguments
> > we
> > > > pass
> > > > > > to
> > > > > > > > `canRead()`.
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > sinan
> > > > > > > >
> > > > > > > >
> > > > > > > > Asaf Mesika <as...@gmail.com> 于2023年2月27日周一 20:49写道:
> > > > > > > >
> > > > > > > > > >
> > > > > > > > > > And you can see the difference between ProtoBuf and
> > > > > ProtoBufNative:
> > > > > > > > > >
> > > > > > > > > >
> > > > > >
> > https://pulsar.apache.org/docs/next/schema-get-started/#protobufnative
> > > > > > > > > >
> > > > > > > > > >
> > > > https://pulsar.apache.org/docs/next/schema-get-started/#protobuf
> > > > > > > > > >
> > > > > > > > >  I read it and they look identical. What's the difference
> > between
> > > > > > them?
> > > > > > > > >
> > > > > > > > > Current avro,json, and protobuf schemas are all implemented
> > based
> > > > > on
> > > > > > > > AVRO.
> > > > > > > > >
> > > > > > > > > What do you mean, they are all implemented based on Avro? You
> > > > mean
> > > > > > the
> > > > > > > > > protobuf schema is converted into an Avro Schema, and then
> > you
> > > > use
> > > > > > Avro
> > > > > > > > > compatibility validation?
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > > *Here are the basic compatibility rules we've defined:*
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > I think you should structure the validation rules
> > differently:
> > > > > > > > >
> > > > > > > > > * Backward checks
> > > > > > > > > ** List down rules, where use newSchema (the schema used by
> > > > > producer
> > > > > > or
> > > > > > > > > consumer) and existingSchema (last schema used)
> > > > > > > > > * Forward
> > > > > > > > > ** List down rules, where use newSchema (the schema used by
> > > > > producer
> > > > > > or
> > > > > > > > > consumer) and existingSchema (last schema used)
> > > > > > > > >
> > > > > > > > > So each strategy should have its own section.
> > > > > > > > >
> > > > > > > > > I'm saying this since you used "writttenSchema" word but it
> > > > > > represents
> > > > > > > > > something completely different if it's backward or forward
> > check.
> > > > > > > > >
> > > > > > > > > Once you'll have that structure like that, I personally will
> > be
> > > > > able
> > > > > > to
> > > > > > > > > read and understand it.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > The motivation and problem statement are now good - thanks
> > for
> > > > > > improving
> > > > > > > > > it.
> > > > > > > > >
> > > > > > > > > On Mon, Feb 27, 2023 at 8:20 AM SiNan Liu <
> > > > liusinan1998@gmail.com>
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi! I updated the PIP issue again. This time I've added
> > some
> > > > > > background
> > > > > > > > > and
> > > > > > > > > > some explanations.
> > > > > > > > > >
> > > > > > > > > > The compatibility check rules are already written in the
> > > > > > > > Implementation.
> > > > > > > > > > ProtoBufNative implements the same canRead method as Apache
> > > > Avro.
> > > > > > > > > > It does this by checking whether the schema for writing and
> > > > > reading
> > > > > > is
> > > > > > > > > > compatible. I also indicate whether the writtenSchema and
> > > > > > readSchema of
> > > > > > > > > the
> > > > > > > > > > Backward, Forward, and Full strategies are the old or the
> > new
> > > > > > version
> > > > > > > > of
> > > > > > > > > > the schema.
> > > > > > > > > >
> > > > > > > > > > Thanks,
> > > > > > > > > > sinan
> > > > > > > > > >
> > > > > > > > > > Asaf Mesika <as...@gmail.com> 于2023年2月26日周日 23:24写道:
> > > > > > > > > >
> > > > > > > > > > > I'm sorry, but this PIP lacks a lot of background
> > knowledge,
> > > > so
> > > > > > you
> > > > > > > > > need
> > > > > > > > > > to
> > > > > > > > > > > add IMO for people to understand it. You don't need to
> > > > explain
> > > > > > the
> > > > > > > > > entire
> > > > > > > > > > > pulsar in this PIP, but at the very least a few
> > paragraphs
> > > > > > detailing
> > > > > > > > > all
> > > > > > > > > > > you need to know, to put you in context:
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >    - Start by saying Pulsar as a built-in schema registry
> > > > > inside
> > > > > > > > Pulsar
> > > > > > > > > > >    broker.
> > > > > > > > > > >       - Every time the client updates the schema, it
> > uploads
> > > > it
> > > > > > to
> > > > > > > > the
> > > > > > > > > > >       broker. When that happens, it has a feature which
> > > > > validates
> > > > > > if
> > > > > > > > > the
> > > > > > > > > > > new
> > > > > > > > > > >       schema version is compatible with the previous
> > > > versions.
> > > > > > There
> > > > > > > > > > > are 4 types
> > > > > > > > > > >       of compatibility: Full, ... (complete and explain
> > each
> > > > > one
> > > > > > > > > briefly)
> > > > > > > > > > >    - Also explain Pulsar Schema registry supports various
> > > > > schema
> > > > > > > > > > >    protocols:  Avro, protobuf native, ... (complete the
> > > > rest),
> > > > > > each
> > > > > > > > > > > protocol
> > > > > > > > > > >    has a schema which dictates how to serialize and
> > > > deserialize
> > > > > > the
> > > > > > > > > > message
> > > > > > > > > > >    content into typed object.
> > > > > > > > > > >    - Explain in short what is protobuf native (compare
> > > > protobuf
> > > > > > > > > > non-native)
> > > > > > > > > > >    - Please don't paste code instead of explaining.
> > > > > > > > > > >       - Explain that protobuf native current validation
> > check
> > > > > is
> > > > > > only
> > > > > > > > > > >       composed of checking the root message name is the
> > same
> > > > > > between
> > > > > > > > > > > the current
> > > > > > > > > > >       schema version and the new version.
> > > > > > > > > > >          - Explain briefly what is a root message and its
> > > > name.
> > > > > > > > > > >       - Explain the problem (list scenarios) that we have
> > > > > because
> > > > > > > > > > protobuf
> > > > > > > > > > >       native schema only supports FULL compatibility
> > > > > validation.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Regarding high level design - as in what you plan to do.
> > > > > > > > > > > I suggest you add "High Level Design" and in it detail
> > how
> > > > you
> > > > > > plan
> > > > > > > > to
> > > > > > > > > > > validate, per protobuf version, per compatibility check
> > > > > > (backward,
> > > > > > > > > > forward,
> > > > > > > > > > > full,...).
> > > > > > > > > > > I tried reading the implementation - for me , it's all
> > over
> > > > the
> > > > > > > > place.
> > > > > > > > > > Can
> > > > > > > > > > > you please list in order what I wrote above, and list the
> > > > > > validation
> > > > > > > > > > rules
> > > > > > > > > > > with a good explanation why you validate it like that?
> > > > > > > > > > >
> > > > > > > > > > > Lastly, one you have all the validation rules clearly
> > stated,
> > > > > you
> > > > > > can
> > > > > > > > > use
> > > > > > > > > > > it to document it properly so users can know what
> > validation
> > > > to
> > > > > > > > expect.
> > > > > > > > > > >
> > > > > > > > > > > Thanks,
> > > > > > > > > > >
> > > > > > > > > > > Asaf
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On Wed, Feb 22, 2023 at 5:10 PM SiNan Liu <
> > > > > > liusinan1998@gmail.com>
> > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Sorry, my mistake. I removed the code and described the
> > > > > design
> > > > > > to
> > > > > > > > > > improve
> > > > > > > > > > > > the PROTOBUF_NATIVE schema compatibility checks. You
> > can
> > > > > have a
> > > > > > > > look.
> > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Asaf Mesika <as...@gmail.com> 于2023年2月22日周三
> > 21:16写道:
> > > > > > > > > > > >
> > > > > > > > > > > > > I read it but you're almost directly diving into the
> > > > code -
> > > > > > it
> > > > > > > > will
> > > > > > > > > > > take
> > > > > > > > > > > > me
> > > > > > > > > > > > > hours just to reverse engineer your design.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Can you please include a "High Level Design" section
> > in
> > > > > which
> > > > > > you
> > > > > > > > > > > explain
> > > > > > > > > > > > > how you plan to tackle any issue?
> > > > > > > > > > > > > If I can read that section and explain to someone
> > else
> > > > how
> > > > > > this
> > > > > > > > > will
> > > > > > > > > > > > work,
> > > > > > > > > > > > > it means the section is complete.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Let's leave the code to the PRs.
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Sun, Feb 19, 2023 at 2:59 PM SiNan Liu <
> > > > > > > > liusinan1998@gmail.com>
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Hi all,
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I made a PIP to discuss:
> > > > > > > > > > > https://github.com/apache/pulsar/issues/19565
> > > > > > > > > > > > .
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > We can talk about the current design here.
> > Especially
> > > > for
> > > > > > the
> > > > > > > > > field
> > > > > > > > > > > > type
> > > > > > > > > > > > > > change check rules, please give your valuable
> > advice.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > Sinan
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > > >
> >

Re: [DISCUSS] PIP-246: Improved PROTOBUF_NATIVE schema compatibility checks without using avro-protobuf

Posted by SiNan Liu <li...@gmail.com>.
Great to see your comment, bo!

1. The first way. The protobuf website has a description of the rules, but
no plans to implement them.
https://protobuf.dev/programming-guides/proto/#updating

2. I think this PIP can be divided into two parts.
(1) Add a flag(`ValidatorClassName`), load it into
`ProtobufNativeSchemaCompatibilityCheck` when the broker starts.
ValidatorClassName is empty by default, and the implementation continues as
before, with no change for the user.
```java
    ProtobufNativeSchemaValidator DEFAULT = (fromDescriptors, toDescriptor)
-> {
        for (Descriptors.Descriptor fromDescriptor : fromDescriptors) {
            // The default implementation only checks if the root message
has changed.
            if
(!fromDescriptor.getFullName().equals(toDescriptor.getFullName())) {
                throw new ProtoBufCanReadCheckException("Protobuf root
message isn't allow change!");
            }
        }
    };
```
`ValidatorClassName` value also can be set to the current implementation of
PIP add
`org.apache.pulsar.broker.service.schema.validator.ProtobufNativeSchemaBreakValidatorImpl`.

(2) Recoding the `ProtobufNativeSchemaCompatibilityCheck`. Through the flag
(`ValidatorClassName`) to build different `ProtobufNativeSchemaValidator`.
Isn't it just a plug-in? The user can develop and choose a different
`ProtobufNativeSchemaValidator`. I think it didn't change the logic, it
just allowed him to expand it.


I think this PIP should be an enhancement and supplement to the function,
and there is no such thing as unnecessary and meaningless.


Thanks,
sinan





丛搏 <bo...@apache.org> 于2023年3月7日周二 11:53写道:

> I think we have two ways to do that.
>
> First way: We need to advance the improvement of java in protobuf. Ask
> if they have plans to improve.
>
> Second way: the new PROTOBUF_NATIVE `SchemaCompatibilityCheck` should
> be implemented as a plugin, don't change any existing plugin logic
> (it's simple and already used). I don't recommend adding flags for
> rollback, it adds configuration and makes little sense.
>
> Thanks,
> Bo
>
> Asaf Mesika <as...@gmail.com> 于2023年3月6日周一 23:00写道:
>
> >
> > Can you convert the code block which is actually a quote in the
> > beginning of the PIP to something which doesn't require to scroll
> > horizontally so much?
> > Use
> >
> https://docs.github.com/en/get-started/writing-on-github/getting-started-with-writing-and-formatting-on-github/basic-writing-and-formatting-syntax#quoting-text
> >
> > Let's improve the clarity of what you wrote:
> >
> > "the PROTOBUF uses avro struct to store."
> > -->
> > When Schema type PROTOBUF is used, Pulsar Client assumes the object given
> > to it as message data is an auto-generated POJO containing the
> annotations
> > encoding the schema. The client is using a converter, which converts a
> > Protobuf schema descriptor into an Avro schema and sends that as the
> Schema
> > of the producer/consumer.
> >
> > "On the broker side, protobuf and avro both use SchemaData converted to
> > org.apache.avro.Schema."
> > -->
> > Since the schema is an Avro schema, the implementation of compatibility
> > check on the broker side is to simply re-use the compatibility check of
> the
> > AVRO schema type.
> >
> > "ProtobufSchema is different from ProtobufNativeSchema in schema
> > compatibility check it uses avro-protobuf.
> >
> https://central.sonatype.com/artifact/org.apache.avro/avro-protobuf/1.11.1/overview
> > But the current implementation of ProtobufNative schema compatibility
> > check only
> > checked if the root message name is changed."
> >
> > -->
> > PROTOBUF_NATIVE schema type is different.
> > The client is actually using Protobuf Descriptor as the schema, as
> opposed
> > to Avro schema of PROTOBUF schema type. In the broker, the
> PROTOBUF_NATIVE
> > compatibility check actually hasn't implemented any rule, besides one:
> > checking if the root message name has changed.
> >
> >
> >
> > >    1. For now, there is no official or third-party solution for
> ProtoBuf
> > >    compatibility. If in the future have better solutions of a third
> party or
> > >    the official, we develop new ProtobufNativeSchemaValidator and use,
> so
> > >    add a flag.
> > >
> > > Who do you need to make that configurable? Once you found a third
> party,
> > just switch to it? Who knows, maybe you never will. Introduce it when you
> > find it, not now.
> >
> >
> > We improve in ProtobufNativeSchemaCompatibilityCheck BACKWARD, FORWARD
> > > these strategies. As with the AVRO implementation, protobuf
> compatibility
> > > checking need implementing the canRead method. *This will check that
> > > the writtenschema can be read by readSchema.*
> >
> >
> > I completely disagree.
> > Avro implementation is confusing for our use case. Don't copy that.
> >
> > You have
> >
> > public void checkCompatible(SchemaData from, SchemaData to,
> > SchemaCompatibilityStrategy strategy)
> >         throws IncompatibleSchemaException {
> >     Descriptor fromDescriptor =
> > ProtobufNativeSchemaUtils.deserialize(from.getData());
> >     Descriptor toDescriptor =
> > ProtobufNativeSchemaUtils.deserialize(to.getData());
> >     switch (strategy) {
> >         case BACKWARD_TRANSITIVE:
> >         case BACKWARD:
> >         case FORWARD_TRANSITIVE:
> >         case FORWARD:
> >         case FULL_TRANSITIVE:
> >         case FULL:
> >             checkRootMessageChange(fromDescriptor, toDescriptor,
> strategy);
> >             return;
> >         case ALWAYS_COMPATIBLE:
> >             return;
> >         default:
> >             throw new IncompatibleSchemaException("Unknown
> > SchemaCompatibilityStrategy.");
> >     }
> > }
> >
> > I would rename :
> > from --> currentSchema
> > to --> newSchema
> >
> > Use that switch case and have a method for each like:
> > validateBackwardsCompatibility(currentSchema, newSchema)
> >
> > I dislike canRead and usage of writtenSchema, since you have two
> completely
> > different use cases: from the producing side and the consumer side.
> >
> > schemaValidatorBuilder
> > >
> > > I dislike this proposal. IMO Avro implementation is way too
> complicated.
> > Why not have a simple function for validation for each switch case above?
> > Why do we need strategy and builder, and all this complexity?
> >
> >
> > *Here are the basic compatibility rules we've defined:*
> >
> >
> > IMO it's impossible to read the validation rules as you described them.
> > I wrote how they should be structured numerous times above.
> > I can't validate them.
> >
> >
> > IMO, the current design is very hard to read.
> > Please try to avoid jumping into code sections.
> > Write a high level design section, in which you describe in words what
> you
> > plan to do.
> > Write the validation rules in the structure that is easy to understand:
> > rules per each compatibility check, and use proper words (current schema,
> > new schema), since new schema can be once used for read and once used for
> > write.
> >
> > In its current form it takes too much time to understand the design, and
> it
> > shouldn't be the case.
> >
> > Thanks,
> >
> > Asaf
> >
> >
> > >
> >
> >
> >
> > On Sun, Mar 5, 2023 at 3:58 PM SiNan Liu <li...@gmail.com> wrote:
> >
> > > Hi! I updated the explanation of some things in the PIP issue. And also
> > > added a new “flag” in the conf is used as the different
> > > ProtobufNativeSchemaValidator implementation, also set
> > > ProtobufNativeSchemaValidator default only check whether the name of
> the
> > > root message is the same.
> > >
> > >
> > > Thanks,
> > > sinan
> > >
> > >
> > > Asaf Mesika <as...@gmail.com> 于2023年3月5日周日 20:21写道:
> > >
> > > > On Wed, Mar 1, 2023 at 4:33 PM SiNan Liu <li...@gmail.com>
> wrote:
> > > >
> > > > > >
> > > > > > Can you please explain how a Protobuf Schema descriptor can be
> > > > validated
> > > > > > for backward compatibility check using Avro based compatibility
> > > rules?
> > > > > > Doesn't it expect the schema to be Avro, but it is actually a
> > > Protobuf
> > > > > > descriptor?
> > > > > > Is there some translation happening?
> > > > >
> > > > >
> > > > > 1. *You can take a quick look at the previous design, the PROTOBUF
> uses
> > > > > avro struct to store.*
> > > > > https://github.com/apache/pulsar/pull/1954
> > > > >
> > > > >
> > > >
> > >
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/ProtobufSchema.java#L59-L61
> > > > >
> > > > >
> > > >
> > >
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/ProtobufSchema.java#L110-L115
> > > >
> > > >
> > > > Ok. So to summarize your code (easier to write it than send links):
> > > > * Pulsar Client, when used with Protobuf Schema, actually converts
> the
> > > > Protobuf descriptor into an Avro Schema (using code found inside Avro
> > > > library) and saves that Avro schema as the schema. It's not saving
> the
> > > > protobuf descriptor at all. Very confusing I have to add - never
> expected
> > > > that.
> > > > This explains why In the ProtobufSchemaCompatibilityCheck they just
> > > extend
> > > > the Avro without doing any translation.
> > > >
> > > > Thanks for that.
> > > >
> > > > Now thatI finally understand this, I can say that: you *must* explain
> > > that
> > > > in the motivation part in your PIP.
> > > >
> > > >
> > > >
> > > > >
> > > > >
> > > > > 2. *On the broker side, protobuf and avro both use `SchemaData`
> > > converted
> > > > > to `org.apache.avro.Schema`.*
> > > > >
> > > > >
> > > >
> > >
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/ServerCnx.java#L1280-L1293
> > > > >
> > > > >
> > > >
> > >
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/schema/ProtobufSchemaCompatibilityCheck.java#L26-L31
> > > > >
> > > > >
> > > >
> > >
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/schema/AvroSchemaBasedCompatibilityCheck.java#L47-L70
> > > >
> > > >
> > > > Actually those links don't really help.
> > > > The main link that helps is:
> > > >
> > > >
> > >
> https://github.com/apache/pulsar/blob/ec102fb024a6ea2b195826778300f20e330dff06/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/ProtobufSchema.java#L102-L122
> > > >
> > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > I'm sorry - I don't understand.
> > > > > > I understand the different compatibility check strategies.
> > > > > > If you just spell them out here, then as you say, just translate
> the
> > > > > > Protobuf Descriptor into an Avro schema and run the Avro
> > > > > > compatibility validation, no?
> > > > > > I believe the answer is no, since you may want to verify
> different
> > > > things
> > > > > > when it comes to Protobuf, which are different then Avro.
> > > > >
> > > > >
> > > > > 1.
> > > > > *ProtobufSchema is different from ProtobufNativeSchema in that it
> uses
> > > > > avro-protobuf.*
> > > > >
> > > > >
> > > >
> > >
> https://central.sonatype.com/artifact/org.apache.avro/avro-protobuf/1.11.1/overview
> > > > > *ProtobufNativeSchema needs a native compatibility check, but
> there is
> > > no
> > > > > official or third party implementation. So this PIP does not use
> > > > > avro-protobuf for protobuf compatibility checking.*
> > > > >
> > > > > 2. *By the way, this is implemented in much the same way that
> Apache
> > > avro
> > > > > does compatibility checking.*
> > > > >
> > > > >
> > > >
> > >
> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/SchemaValidatorBuilder.java
> > > > > `canReadStrategy`,`canBeReadStrategy`,`mutualReadStrategy`
> > > > >
> > > > >
> > > >
> > >
> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/ValidateCanRead.java
> > > > >
> > > > >
> > > >
> > >
> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/ValidateCanBeRead.java
> > > > >
> > > > >
> > > >
> > >
> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/ValidateMutualRead.java
> > > > > *In `ValidateMutualRead.java`, the arguments of `canRead()` are
> > > > > writtenSchema and readSchema. We only need to change the order of
> > > > arguments
> > > > > we pass to `canRead()`.*
> > > > > ```java
> > > > > private void validateWithStrategy(Descriptors.Descriptor
> toValidate,
> > > > > Descriptors.Descriptor fromDescriptor) throws
> > > > ProtoBufCanReadCheckException
> > > > > {
> > > > > switch (strategy) {
> > > > > case CanReadExistingStrategy -> canRead(fromDescriptor,
> toValidate);
> > > > > case CanBeReadByExistingStrategy -> canRead(toValidate,
> > > fromDescriptor);
> > > > > case CanBeReadMutualStrategy -> {
> > > > > canRead(toValidate, fromDescriptor);
> > > > > canRead(fromDescriptor, toValidate);
> > > > > }
> > > > > }
> > > > > }
> > > > >
> > > > > private void canRead(Descriptors.Descriptor writtenSchema,
> > > > > Descriptors.Descriptor readSchema) throws
> > > ProtoBufCanReadCheckException {
> > > > >
> > > >
> > >
> ProtobufNativeSchemaBreakCheckUtils.checkSchemaCompatibility(writtenSchema,
> > > > > readSchema);
> > > > > }
> > > > > ```
> > > > >
> > > > >
> > > > I get that you want to take inspiration from the existing Avro Schema
> > > > compatibility check, to do your code design.
> > > > I also understand you *won't* use any existing avro code for that.
> > > > I also understand, you have to write the validation check on your
> own,
> > > > since there is no 3rd party to explain that.
> > > >
> > > > The only thing I can't understand are the actual rules as I wrote
> before,
> > > > since they are written confusingly.
> > > > So, I repeat what I asked before:
> > > >
> > > > I think you should structure the validation rules differently:
> > > >
> > > > * Backward checks
> > > > ** List down rules, where use newSchema (the schema used by producer
> or
> > > > consumer) and existingSchema (last schema used)
> > > > * Forward
> > > > ** List down rules, where use newSchema (the schema used by producer
> or
> > > > consumer) and existingSchema (last schema used)
> > > >
> > > > Once that's accomplished I will be able to understand the different
> > > > validation rules for each compatibility check.
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > >
> > > > > Thanks,
> > > > > sinan
> > > > >
> > > > >
> > > > >
> > > > > Asaf Mesika <as...@gmail.com> 于2023年3月1日周三 21:19写道:
> > > > > >
> > > > > > On Mon, Feb 27, 2023 at 3:47 PM SiNan Liu <
> liusinan1998@gmail.com>
> > > > > wrote:
> > > > > >
> > > > > > > >
> > > > > > > > I read it and they look identical. What's the difference
> between
> > > > > them?
> > > > > > >
> > > > > > > Current avro,json, and protobuf schemas are all implemented
> based
> > > on
> > > > > AVRO.
> > > > > > > > What do you mean, they are all implemented based on Avro? You
> > > mean
> > > > > the
> > > > > > > > protobuf schema is converted into an Avro Schema, and then
> you
> > > use
> > > > > Avro
> > > > > > > > compatibility validation?
> > > > > > >
> > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> `org.apache.pulsar.broker.service.schema.ProtobufSchemaCompatibilityCheck`
> > > > > > >
> > > >
> `org.apache.pulsar.broker.service.schema.AvroSchemaCompatibilityCheck`
> > > > > > >
> > > >
> `org.apache.pulsar.broker.service.schema.JsonSchemaCompatibilityCheck`
> > > > > > > They all extends `AvroSchemaBasedCompatibilityCheck`, the
> > > > > > > `checkCompatible()` is the same implementation with `AVRO`.
> > > > > > >
> > > > > >
> > > > > > Can you please explain how a Protobuf Schema descriptor can be
> > > > validated
> > > > > > for backward compatibility check using Avro based compatibility
> > > rules?
> > > > > > Doesn't it expect the schema to be Avro, but it is actually a
> > > Protobuf
> > > > > > descriptor?
> > > > > > Is there some translation happening?
> > > > > >
> > > > > >
> > > > > >
> > > > > > >
> > > > > > >
> > > > > > > I think you should structure the validation rules differently:
> > > > > > >
> > > > > > >
> > > > > > > The Compatibility check strategy is described on the website
> > > > > > >
> > > > > > >
> > > > >
> > > > >
> > > >
> > >
> https://pulsar.apache.org/docs/next/schema-understand/#schema-compatibility-check-strategy
> > > > > > > 1. BACKWARD(CanReadExistingStrategy): Consumers using schema
> V3 can
> > > > > process
> > > > > > > data written by producers using the last schema version V2. So
> V2
> > > is
> > > > > > > "writtenSchema" and V3 is "readSchema".
> > > > > > > 2. FORWARD(CanBeReadByExistingStrategy): Consumers using the
> last
> > > > > schema
> > > > > > > version V2 can process data written by producers using a new
> schema
> > > > V3,
> > > > > > > even though they may not be able to use the full capabilities
> of
> > > the
> > > > > new
> > > > > > > schema. So V3 is "writtenSchema" and V2 is "readSchema".
> > > > > > > 3. FULL(CanBeReadMutualStrategy): Schemas are both backward and
> > > > forward
> > > > > > > compatible.
> > > > > > > Schema can evolve. The old version schema and the new version
> > > schema
> > > > > should
> > > > > > > be well understood.
> > > > > > >
> > > > > > >
> > > > > > I'm sorry - I don't understand.
> > > > > > I understand the different compatibility check strategies.
> > > > > > If you just spell them out here, then as you say, just translate
> the
> > > > > > Protobuf Descriptor into an Avro schema and run the Avro
> > > > > > compatibility validation, no?
> > > > > > I believe the answer is no, since you may want to verify
> different
> > > > things
> > > > > > when it comes to Protobuf, which are different then Avro.
> > > > > >
> > > > > > At the current state, I can't understand your design at all.
> Please
> > > > help
> > > > > > clarify that.
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > >
> > > > > > > So each strategy should have its own section.
> > > > > > >
> > > > > > >
> > > > > > > The arguments of `canRead()` are writtenSchema and readSchema.
> As
> > > > we've
> > > > > > > just described, we only need to change the order of arguments
> we
> > > pass
> > > > > to
> > > > > > > `canRead()`.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Thanks,
> > > > > > > sinan
> > > > > > >
> > > > > > >
> > > > > > > Asaf Mesika <as...@gmail.com> 于2023年2月27日周一 20:49写道:
> > > > > > >
> > > > > > > > >
> > > > > > > > > And you can see the difference between ProtoBuf and
> > > > ProtoBufNative:
> > > > > > > > >
> > > > > > > > >
> > > > >
> https://pulsar.apache.org/docs/next/schema-get-started/#protobufnative
> > > > > > > > >
> > > > > > > > >
> > > https://pulsar.apache.org/docs/next/schema-get-started/#protobuf
> > > > > > > > >
> > > > > > > >  I read it and they look identical. What's the difference
> between
> > > > > them?
> > > > > > > >
> > > > > > > > Current avro,json, and protobuf schemas are all implemented
> based
> > > > on
> > > > > > > AVRO.
> > > > > > > >
> > > > > > > > What do you mean, they are all implemented based on Avro? You
> > > mean
> > > > > the
> > > > > > > > protobuf schema is converted into an Avro Schema, and then
> you
> > > use
> > > > > Avro
> > > > > > > > compatibility validation?
> > > > > > > >
> > > > > > > >
> > > > > > > > > *Here are the basic compatibility rules we've defined:*
> > > > > > > >
> > > > > > > >
> > > > > > > > I think you should structure the validation rules
> differently:
> > > > > > > >
> > > > > > > > * Backward checks
> > > > > > > > ** List down rules, where use newSchema (the schema used by
> > > > producer
> > > > > or
> > > > > > > > consumer) and existingSchema (last schema used)
> > > > > > > > * Forward
> > > > > > > > ** List down rules, where use newSchema (the schema used by
> > > > producer
> > > > > or
> > > > > > > > consumer) and existingSchema (last schema used)
> > > > > > > >
> > > > > > > > So each strategy should have its own section.
> > > > > > > >
> > > > > > > > I'm saying this since you used "writttenSchema" word but it
> > > > > represents
> > > > > > > > something completely different if it's backward or forward
> check.
> > > > > > > >
> > > > > > > > Once you'll have that structure like that, I personally will
> be
> > > > able
> > > > > to
> > > > > > > > read and understand it.
> > > > > > > >
> > > > > > > >
> > > > > > > > The motivation and problem statement are now good - thanks
> for
> > > > > improving
> > > > > > > > it.
> > > > > > > >
> > > > > > > > On Mon, Feb 27, 2023 at 8:20 AM SiNan Liu <
> > > liusinan1998@gmail.com>
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi! I updated the PIP issue again. This time I've added
> some
> > > > > background
> > > > > > > > and
> > > > > > > > > some explanations.
> > > > > > > > >
> > > > > > > > > The compatibility check rules are already written in the
> > > > > > > Implementation.
> > > > > > > > > ProtoBufNative implements the same canRead method as Apache
> > > Avro.
> > > > > > > > > It does this by checking whether the schema for writing and
> > > > reading
> > > > > is
> > > > > > > > > compatible. I also indicate whether the writtenSchema and
> > > > > readSchema of
> > > > > > > > the
> > > > > > > > > Backward, Forward, and Full strategies are the old or the
> new
> > > > > version
> > > > > > > of
> > > > > > > > > the schema.
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > sinan
> > > > > > > > >
> > > > > > > > > Asaf Mesika <as...@gmail.com> 于2023年2月26日周日 23:24写道:
> > > > > > > > >
> > > > > > > > > > I'm sorry, but this PIP lacks a lot of background
> knowledge,
> > > so
> > > > > you
> > > > > > > > need
> > > > > > > > > to
> > > > > > > > > > add IMO for people to understand it. You don't need to
> > > explain
> > > > > the
> > > > > > > > entire
> > > > > > > > > > pulsar in this PIP, but at the very least a few
> paragraphs
> > > > > detailing
> > > > > > > > all
> > > > > > > > > > you need to know, to put you in context:
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >    - Start by saying Pulsar as a built-in schema registry
> > > > inside
> > > > > > > Pulsar
> > > > > > > > > >    broker.
> > > > > > > > > >       - Every time the client updates the schema, it
> uploads
> > > it
> > > > > to
> > > > > > > the
> > > > > > > > > >       broker. When that happens, it has a feature which
> > > > validates
> > > > > if
> > > > > > > > the
> > > > > > > > > > new
> > > > > > > > > >       schema version is compatible with the previous
> > > versions.
> > > > > There
> > > > > > > > > > are 4 types
> > > > > > > > > >       of compatibility: Full, ... (complete and explain
> each
> > > > one
> > > > > > > > briefly)
> > > > > > > > > >    - Also explain Pulsar Schema registry supports various
> > > > schema
> > > > > > > > > >    protocols:  Avro, protobuf native, ... (complete the
> > > rest),
> > > > > each
> > > > > > > > > > protocol
> > > > > > > > > >    has a schema which dictates how to serialize and
> > > deserialize
> > > > > the
> > > > > > > > > message
> > > > > > > > > >    content into typed object.
> > > > > > > > > >    - Explain in short what is protobuf native (compare
> > > protobuf
> > > > > > > > > non-native)
> > > > > > > > > >    - Please don't paste code instead of explaining.
> > > > > > > > > >       - Explain that protobuf native current validation
> check
> > > > is
> > > > > only
> > > > > > > > > >       composed of checking the root message name is the
> same
> > > > > between
> > > > > > > > > > the current
> > > > > > > > > >       schema version and the new version.
> > > > > > > > > >          - Explain briefly what is a root message and its
> > > name.
> > > > > > > > > >       - Explain the problem (list scenarios) that we have
> > > > because
> > > > > > > > > protobuf
> > > > > > > > > >       native schema only supports FULL compatibility
> > > > validation.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Regarding high level design - as in what you plan to do.
> > > > > > > > > > I suggest you add "High Level Design" and in it detail
> how
> > > you
> > > > > plan
> > > > > > > to
> > > > > > > > > > validate, per protobuf version, per compatibility check
> > > > > (backward,
> > > > > > > > > forward,
> > > > > > > > > > full,...).
> > > > > > > > > > I tried reading the implementation - for me , it's all
> over
> > > the
> > > > > > > place.
> > > > > > > > > Can
> > > > > > > > > > you please list in order what I wrote above, and list the
> > > > > validation
> > > > > > > > > rules
> > > > > > > > > > with a good explanation why you validate it like that?
> > > > > > > > > >
> > > > > > > > > > Lastly, one you have all the validation rules clearly
> stated,
> > > > you
> > > > > can
> > > > > > > > use
> > > > > > > > > > it to document it properly so users can know what
> validation
> > > to
> > > > > > > expect.
> > > > > > > > > >
> > > > > > > > > > Thanks,
> > > > > > > > > >
> > > > > > > > > > Asaf
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Wed, Feb 22, 2023 at 5:10 PM SiNan Liu <
> > > > > liusinan1998@gmail.com>
> > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Sorry, my mistake. I removed the code and described the
> > > > design
> > > > > to
> > > > > > > > > improve
> > > > > > > > > > > the PROTOBUF_NATIVE schema compatibility checks. You
> can
> > > > have a
> > > > > > > look.
> > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Asaf Mesika <as...@gmail.com> 于2023年2月22日周三
> 21:16写道:
> > > > > > > > > > >
> > > > > > > > > > > > I read it but you're almost directly diving into the
> > > code -
> > > > > it
> > > > > > > will
> > > > > > > > > > take
> > > > > > > > > > > me
> > > > > > > > > > > > hours just to reverse engineer your design.
> > > > > > > > > > > >
> > > > > > > > > > > > Can you please include a "High Level Design" section
> in
> > > > which
> > > > > you
> > > > > > > > > > explain
> > > > > > > > > > > > how you plan to tackle any issue?
> > > > > > > > > > > > If I can read that section and explain to someone
> else
> > > how
> > > > > this
> > > > > > > > will
> > > > > > > > > > > work,
> > > > > > > > > > > > it means the section is complete.
> > > > > > > > > > > >
> > > > > > > > > > > > Let's leave the code to the PRs.
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > On Sun, Feb 19, 2023 at 2:59 PM SiNan Liu <
> > > > > > > liusinan1998@gmail.com>
> > > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > Hi all,
> > > > > > > > > > > > >
> > > > > > > > > > > > > I made a PIP to discuss:
> > > > > > > > > > https://github.com/apache/pulsar/issues/19565
> > > > > > > > > > > .
> > > > > > > > > > > > >
> > > > > > > > > > > > > We can talk about the current design here.
> Especially
> > > for
> > > > > the
> > > > > > > > field
> > > > > > > > > > > type
> > > > > > > > > > > > > change check rules, please give your valuable
> advice.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > Sinan
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
>

Re: [DISCUSS] PIP-246: Improved PROTOBUF_NATIVE schema compatibility checks without using avro-protobuf

Posted by Asaf Mesika <as...@gmail.com>.
On Tue, Mar 7, 2023 at 1:01 PM SiNan Liu <li...@gmail.com> wrote:

> >
> > Ok.
> > First, the name is confusing. Flags are normally true/false, in your case
> > it's a string, so the name should be a Configuration property.
> > Second, I agree - you're basically saying we must allow users to keep
> > existing implementation, or switch to new implementation. Just using a
> > boolean feature flag might suffice now, but how will we improve the rules
> > without breaking existing users. So we choose to encode the new code as a
> > new class and have the user select it via config, by specifying the class
> > name.
> > 3rd, now that I understand that, you need to revise your explanation.
> It's
> > completely unclear from the current explanation.
>
>
> The description here should not be flag, but configuration. I will modify
> it.
>
> Of course you shouldn't make a gigantic class having all validation
> > methods. I'm fine with having
> > `ProtobufNativeBackwardComptabilityValidation` class and each type of
> > compatibility strategy can have its own class, but just have it simple:
> > each strategy is mapped to one simple validation method. I don't see any
> > need for the added layers you have there of the builders, etc - current
> > Avro design is too complicated, so let's not copy it.
> > I understand
> > As I wrote before, using `writtenSchema` and `readSchema` is confusing.
> > Example: You write: .....
> > This I can read and understand.
>
>
> 1. All the things you said. Explain the checking rules for each
> schema-compatible-check-strategy separately, indicating that you do not
> understand the current design. I've explained this many times.
> In `Implementation` 1:
> ```
> Schema compatibility checking can be understood as producer (write)
> consumer (read). If a producer uses one schema to write data to the topic
> and a consumer can read it, then the two schemas are compatible. A
> "writtenSchema" canRead by a "readSchema".
> ```
> Whether two schemas are compatible is whether they can be ****read**** by
> each other. If they cannot be read, they are incompatible!!!
>

I perfectly understand that, but as I said before:

1. a new schema can be introduced by a producer  - which from now on will
use that new schema to write messages --> written schema. Or, a new schema
can be introduced by a consumer - the consumer will register this new
schema and use it to read messages written by the previous schema. Now,
what do you think we should call the new schema in this case?

In the first case I described:
new schema is written shema, existing schema is the read schema.

In the second case I described:
the new schema is the read schema, and the existing schema is the written
schema.

So while I understand what you're saying, you need to realize that there
are two ways to register a new schema for the schema registry, hence we
shouldn't call the new schema writtenSchema because as I explained, in the
case of a new consumer registering it, it's wrong.

Regarding saying I don't understand the current design - you are correct.
I'm still trying and spending HUGE amounts of time to do so.
I have vast experience in distributed systems and you can ask the PMC
members here such as Matteo (PMC chair) or Penghui what's the level of my
knowledge.
Believe me when I say that if *I* can't understand your design, a big chunk
of Pulsar users and developers won't.

So my time spent here is done to verify not only you and 2 other people can
read and understand it. That's easy.
It's that *the rest of us* in the community both user and dev will
understand it.
It ain't the student's fault my friend.



>
> 2. Instead of discussing what the validation rules are for each
> schema-compatible-check-strategy, find out what the schema compatibility
> checking behavior looks like and build the Validator! All it takes is three
> parameters!
> In the Implementation of the 2, we are through
> ProtobufNativeSchemaValidatorBuilder build different according to different
> parameters of the Validator.
> (1) ProtobufNativeSchemaValidationStrategy: According to the description of
> `schema-compatibility-check-strategy` on pulsar official website, And the
> description of "writtenSchema" (schema used by the Producer) and
> "readSchema" (schema used by the Consumer) in Implementation 1. The
> schema-compatibility-check-strategy mapping to
> `ProtobufNativeSchemaValidationStrategy` of three strategies.
> (2) isOnlyValidateLatest: This parameter is explained in `Implementation 2
> (2)`. This also maps to what kind of checking behavior TRANSITIVE is
> .(whether to check only the last schema that already exists)
> (3) validatorClassName: This is the validator implementation class, which
> by default checks only the name of the root message.
>
> 3. Once the Validator has been created, don't worry about every
> schema-compatibility-check-strategy at this point! Only need to consider
> three `ProtobufNativeSchemaValidationStrategy` how to achieve!
> ```
> So the existsSchema here is V1 and V2 schema, and the newSchema is schema
> V3. We just need to change the arguments that are passed in to canRead
> check.
> ```
> Here we map existsSchema and newSchema directly to writtenSchema and
> readSchema.
> (V1,V2 is existsSchema. V3 is newSchema)
>
> 4. Finally, the implementation of `canRead`. And don't say anything about
> what the rules for each schema-compatibility-check-strategy are! The
> inspection rules are the same, as described on protobuf website.
> `According to the protobuf official website
> https://protobuf.dev/programming-guides/proto/#updating for compatibility.
> Here are the basic compatibility rules we've defined:`
>
> 5.
>
> > What you linked (https://protobuf.dev/programming-guides/proto/#updating
> )
> > are a bunch of guidelines and rules.
> > It's not what you are actually going to validate.
>
>
> *That's the most important thing. canRead is a rule defined on the official
> website. It only checks whether two protobufs are compatible.*
> *schema-compatibility-check-strategy not considered! This is described in
> Implementation 2 and on the pulsar official website.*
>
> 6. Why implement each schema-compatibility-check-strategy separately? Isn't
> it good to find out the commonality of the compatibility checking behavior
> of multiple policies?
>

Yes, it is good to find the commonalities.
But it is also good to write code people can understand.

First, thanks for the explanation - now I understand what you wanted to
achieve.
The problem is I still think it's complicated :)
You basically say: I want to encode something like:
* Should I compare only the last version or all?
* I will input the read schema and the write schema - sometimes they will
be a list, sometimes they will be one depending on which direction you're
looking at.
and then run the validation.

So you encode all of that into a validator as its "arguments" and then you
just call validate.
Seems fine, except that it hides the logic and makes it hard for newcomers
to understand.

Here is what I suggest - after this snippet you will find a detailed
explanation of it.

interface ProtobufCompatabilityValidator {
readerSchemaCanReadWriterSchema(readerSchema, writerSchema)
}

BACKWARD:
newSchema canRead existingSchema
==>
protobufCompatabilityValidator.readerSchemaCanReadWriterSchema(
readerSchema = newSchema
writerSchema = existingSchema
)

BACKWARD_TRANSITIVE:
newSchema canRead all existingSchema
==>
for (existingSchema : existingSchema) {
protobufCompatabilityValidator.readerSchemaCanReadWriterSchema(
readerSchema = newSchema
writerSchema = existingSchema
)
}

FORWARD:
existingSchema canRead newSchema
==>
protobufCompatabilityValidator.readerSchemaCanReadWriterSchema(
readerSchema = existingSchema
writerSchema = newSchema
)
FORWARD_TRANSITIVE:
all existingSchema canRead newSchema
==>
for (existingSchema : existingSchema) {
protobufCompatabilityValidator.readerSchemaCanReadWriterSchema(
readerSchema = existingSchema
writerSchema = newSchema
)
}

FULL:
newSchema canRead existingSchema
existingSchema canRead newSchema
==>
backward()
forward()

FULL_TRANSITIVE:
all existingSchema canRead newSchema
newSchema canRead all existingSchema

==>
backwardTransitive()
forwardTransitive()





ProtobufCompatabilityValidator is the interface you wanted to expose as a
configuration option, so people can choose the different implementations
for it.
I tried giving it a reasonable name and argument names so it's easy to
figure out what it does. This is your encapsulation.

Before that, I add a small explanation:

When you define a topic, you configure a *single* compatibility check
strategy. This strategy dictates what to check exactly when a new schema is
registered.
Sometimes it is the consumer who registers the new schema, and sometimes it
is the producer registering the new schema.
One thing is always correct: Messages are written using a certain schema
(version), and later read using a different schema (version)

The checks those strategies dictate can be summarized as follows:

   - BACKWARD strategy
      - newSchema can read existingSchema
   - BACKWARD_TRANSITIVE strategy
      - newSchema can read all existingSchema
   - FORWARD
      - existingSchema can read newSchema
   - FORWARD_TRANSITIVE
      - all existingSchema can read newSchema
   - FULL
      - newSchema can read existingSchema
      - existingSchema can read newSchema
   - FULL_TRANSITIVE
      - newSchema can read all existingSchema
      - all existingSchema can read newSchema


then, I'll explain.

We introduce a new interface called `ProtobufComptabilityValidator`, which
will have two implementations. This object goal is to implement all the
validation rules needed to verify whether messages written using a certain
schema can be read by a different schema.

The interface will look like this:
interface ProtobufCompatabilityValidator {
readerSchemaCanReadWriterSchema(readerSchema, writerSchema)
}

One implementation will encode the current logic [EXPAND], and the new
implementation we will add is the one encoding a list of validation rules
we will specify below based on the best practices in the protobuf site
(place link here).

Once we need to validate the new schema, we will act according to the
strategy configured by the user for this topic:

BACKWARD:
protobufCompatabilityValidator.readerSchemaCanReadWriterSchema(
readerSchema = newSchema
writerSchema = existingSchema
)

BACKWARD_TRANSITIVE:
for (existingSchema : existingSchema) {
protobufCompatabilityValidator.readerSchemaCanReadWriterSchema(
readerSchema = newSchema
writerSchema = existingSchema
)
}

FORWARD:
protobufCompatabilityValidator.readerSchemaCanReadWriterSchema(
readerSchema = existingSchema
writerSchema = newSchema
)

FORWARD_TRANSITIVE:
for (existingSchema : existingSchema) {
protobufCompatabilityValidator.readerSchemaCanReadWriterSchema(
readerSchema = existingSchema
writerSchema = newSchema
)
}

FULL:
// what's written above
backward()
forward()

FULL_TRANSITIVE:
// what's written above
backwardTransitive()
forwardTransitive()

Essentially a switch case.

Easy, concise, straight forward.

I find it hard to believe someone will have a hard time understanding the
design with this explanation.


Now, based on everything I wrote above, you must be *explicit* and write a
detailed explanation of the rules you plan to implement in your
implementation of `ProtobufComptabilityValidator`, based on those best
practices.

Thanks!

Asaf




>
>
> Thanks,
> sinan
>
>
>
>
> Asaf Mesika <as...@gmail.com> 于2023年3月7日周二 17:45写道:
>
> > On Tue, Mar 7, 2023 at 6:51 AM SiNan Liu <li...@gmail.com> wrote:
> >
> > > Thanks for the advice, Asaf.
> > >
> > > 1.
> > >
> > > > For now, there is no official or third-party solution for ProtoBuf
> > > > compatibility. If in the future have better solutions of a third
> party
> > or
> > > > the official, we develop new ProtobufNativeSchemaValidator and use,
> so
> > > add
> > > > a flag.
> > >
> > > Flag defaults not set, where the schema Compatibility Checking Rule
> > checks
> > > only the name of the root message. If you want to use the current PIP
> > > implementation, Can be set to
> > >
> > >
> >
> `org.apache.pulsar.broker.service.schema.validator.ProtobufNativeSchemaBreakValidatorImpl`.
> > > This explains what the flag does, and I'm not going to delete it. The
> > user
> > > can choose whether to use the previous implementation (just check that
> > the
> > > root message name is the same), but this may not be enough, so you can
> > > choose the current PIP implementation. If there is a better third party
> > or
> > > official solution in the future, it can be well developed and replaced.
> > It
> > > is necessary to add the flag to the PIP and keep the implementation
> > > extensible.
> > >
> >
> > Ok.
> > First, the name is confusing. Flags are normally true/false, in your case
> > it's a string, so the name should be a Configuration property.
> > Second, I agree - you're basically saying we must allow users to keep
> > existing implementation, or switch to new implementation. Just using a
> > boolean feature flag might suffice now, but how will we improve the rules
> > without breaking existing users. So we choose to encode the new code as a
> > new class and have the user select it via config, by specifying the class
> > name.
> > 3rd, now that I understand that, you need to revise your explanation.
> It's
> > completely unclear from the current explanation.
> >
> >
> >
> >
> >
> > >
> > > 2.
> > >
> > > > Why not have a simple function for validation for each switch case
> > above?
> > > > Why do we need strategy and builder, and all this complexity?
> > >
> > > I don't see how it's complicated. It's easy to understand and it's not
> > > redundant. The only function of the validator is to check whether two
> > > Protobufs are compatible. The builder builds checkers based on
> different
> > > compatibility checking strategies. If all the implementation is done in
> > the
> > > validator, it will be messy and there will be a lot of duplication. And
> > if
> > > a new validator is extended later, it won't extend well. So discarding
> > > encapsulation is more complex and loses scalability. I won't change
> this
> > > design.
> > >
> >
> > Of course you shouldn't make a gigantic class having all validation
> > methods. I'm fine with having
> > `ProtobufNativeBackwardComptabilityValidation` class and each type of
> > compatibility strategy can have its own class, but just have it simple:
> > each strategy is mapped to one simple validation method. I don't see any
> > need for the added layers you have there of the builders, etc - current
> > Avro design is too complicated, so let's not copy it.
> >
> >
> >
> > >
> > > 3. *Here are the basic compatibility rules we've defined:*
> > > https://protobuf.dev/programming-guides/proto/#updating
> > > According to the rules formulated by the official website, not
> customized
> > > by myself.
> > >
> >
> > I understand
> >
> > As I wrote before, using `writtenSchema` and `readSchema` is confusing.
> >
> > Example: You write:
> >
> > > (1) Create:
> > >
> > >    - The writtenSchema cannot add required fields, but optional or
> > >    duplicate fields can be added (The field number must be new).
> > >
> > > You said writtenSchema is the schema of the producer.
> > What do you mean by create? Do you mean the producer is publishing the
> > first schema ever to the registry?
> > If so, what if you explain:
> >
> > newSchema - a new schema registered either by a producer or a consumer.
> > currentSchema - the current latest schema in the schema registry
> >
> > BACKWARD:
> >
> >    - new fields added in newSchema compared with currentSchema
> >       - New fields are defined as fields with field numbers that don't
> >       exist in the existing schema
> >       - The new fields must be optional and not required.
> >    - Modifying fields in the newSchema which already exists (update) in
> the
> >    currentSchema
> >       - Modifying is defined as changing field details for the same field
> >       number.
> >       - Changing the name is permitted.
> >       - The type can be changed only if the type is compatible. The
> allowed
> >       changes are ....
> >    - Removing fields in the newSchema which exists in the currentSchema
> >       - Removing a field is defined as removing the usage of the field
> >       number, used by a field in the currentSchema, so it won't exist in
> > the
> >       schema.
> >       - ...
> >
> >
> > This I can read and understand.
> >
> > You wrote
> >
> > >
> > >    - The writtenSchema do not change the field number of any field in
> > >    'readSchema' (the field name is the same, but the field number is
> > >    different).
> > >
> > > So if I have
> >
> > oldSchema
> > 1 customerName String
> > 2 customerAge integer
> >
> > newSchema
> > reserved 1
> > 2 customerAge integer
> > 3 customerName string
> >
> > This is perfectly valid.
> >
> > I actually think we should force that - when you remove a field (field
> > number removed), but add the same definition using a different field
> > number, we should force you to declare reserved.
> >
> >
> > What you linked (https://protobuf.dev/programming-guides/proto/#updating
> )
> > are a bunch of guidelines and rules.
> > It's not what you are actually going to validate.
> >
> > So how can I review your validation rules, if you don't write them
> > explicitly?
> > 1. By compatibility check strategy?
> > 2. use correct naming, and not read and write schema, as the consumer can
> > publish new schema and producer can publish new schema as well. Use
> current
> > and new to differentiate what you are comparing.
> > 3. Write the rules *explicitly*. All rules you will use. For example,
> types
> > you allow changing (you wrote that under alternatives).
> >
> >
> >
> > >
> > > 4. Other parts have been updated with explanations.
> > >
> >
> > Sinan, listen - I spend *a lot* of time reviewing this PIP. Many rounds.
> > You write "all parts updated", yet I go to the PIP and I see my first
> > comment from the previous email was not implemented, nor got any reply
> > here.
> >
> > Can you please review all of my previous comments and reply to each one
> of
> > them?
> >
> >
> > Please bear in mind - all the work we're doing here in the review is:
> > 1. Help future engineers understand the code better
> > 2. Help future users understand this feature better
> > 3. Make Pulsar look good. When we have designs which are super hard to
> read
> > and understand, it reflects badly on the community as a whole.
> > 4. Once the doc is crystal clear, you can actually spend the time to find
> > the faults in it.
> >
> > Thanks!
> >
> > Asaf
> >
> >
> >
> > >
> > >
> > > Thanks,
> > > sinan
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > 丛搏 <bo...@apache.org> 于2023年3月7日周二 11:53写道:
> > >
> > > > I think we have two ways to do that.
> > > >
> > > > First way: We need to advance the improvement of java in protobuf.
> Ask
> > > > if they have plans to improve.
> > > >
> > > > Second way: the new PROTOBUF_NATIVE `SchemaCompatibilityCheck` should
> > > > be implemented as a plugin, don't change any existing plugin logic
> > > > (it's simple and already used). I don't recommend adding flags for
> > > > rollback, it adds configuration and makes little sense.
> > > >
> > > > Thanks,
> > > > Bo
> > > >
> > > > Asaf Mesika <as...@gmail.com> 于2023年3月6日周一 23:00写道:
> > > >
> > > > >
> > > > > Can you convert the code block which is actually a quote in the
> > > > > beginning of the PIP to something which doesn't require to scroll
> > > > > horizontally so much?
> > > > > Use
> > > > >
> > > >
> > >
> >
> https://docs.github.com/en/get-started/writing-on-github/getting-started-with-writing-and-formatting-on-github/basic-writing-and-formatting-syntax#quoting-text
> > > > >
> > > > > Let's improve the clarity of what you wrote:
> > > > >
> > > > > "the PROTOBUF uses avro struct to store."
> > > > > -->
> > > > > When Schema type PROTOBUF is used, Pulsar Client assumes the object
> > > given
> > > > > to it as message data is an auto-generated POJO containing the
> > > > annotations
> > > > > encoding the schema. The client is using a converter, which
> converts
> > a
> > > > > Protobuf schema descriptor into an Avro schema and sends that as
> the
> > > > Schema
> > > > > of the producer/consumer.
> > > > >
> > > > > "On the broker side, protobuf and avro both use SchemaData
> converted
> > to
> > > > > org.apache.avro.Schema."
> > > > > -->
> > > > > Since the schema is an Avro schema, the implementation of
> > compatibility
> > > > > check on the broker side is to simply re-use the compatibility
> check
> > of
> > > > the
> > > > > AVRO schema type.
> > > > >
> > > > > "ProtobufSchema is different from ProtobufNativeSchema in schema
> > > > > compatibility check it uses avro-protobuf.
> > > > >
> > > >
> > >
> >
> https://central.sonatype.com/artifact/org.apache.avro/avro-protobuf/1.11.1/overview
> > > > > But the current implementation of ProtobufNative schema
> compatibility
> > > > > check only
> > > > > checked if the root message name is changed."
> > > > >
> > > > > -->
> > > > > PROTOBUF_NATIVE schema type is different.
> > > > > The client is actually using Protobuf Descriptor as the schema, as
> > > > opposed
> > > > > to Avro schema of PROTOBUF schema type. In the broker, the
> > > > PROTOBUF_NATIVE
> > > > > compatibility check actually hasn't implemented any rule, besides
> > one:
> > > > > checking if the root message name has changed.
> > > > >
> > > > >
> > > > >
> > > > > >    1. For now, there is no official or third-party solution for
> > > > ProtoBuf
> > > > > >    compatibility. If in the future have better solutions of a
> third
> > > > party or
> > > > > >    the official, we develop new ProtobufNativeSchemaValidator and
> > > use,
> > > > so
> > > > > >    add a flag.
> > > > > >
> > > > > > Who do you need to make that configurable? Once you found a third
> > > > party,
> > > > > just switch to it? Who knows, maybe you never will. Introduce it
> when
> > > you
> > > > > find it, not now.
> > > > >
> > > > >
> > > > > We improve in ProtobufNativeSchemaCompatibilityCheck BACKWARD,
> > FORWARD
> > > > > > these strategies. As with the AVRO implementation, protobuf
> > > > compatibility
> > > > > > checking need implementing the canRead method. *This will check
> > that
> > > > > > the writtenschema can be read by readSchema.*
> > > > >
> > > > >
> > > > > I completely disagree.
> > > > > Avro implementation is confusing for our use case. Don't copy that.
> > > > >
> > > > > You have
> > > > >
> > > > > public void checkCompatible(SchemaData from, SchemaData to,
> > > > > SchemaCompatibilityStrategy strategy)
> > > > >         throws IncompatibleSchemaException {
> > > > >     Descriptor fromDescriptor =
> > > > > ProtobufNativeSchemaUtils.deserialize(from.getData());
> > > > >     Descriptor toDescriptor =
> > > > > ProtobufNativeSchemaUtils.deserialize(to.getData());
> > > > >     switch (strategy) {
> > > > >         case BACKWARD_TRANSITIVE:
> > > > >         case BACKWARD:
> > > > >         case FORWARD_TRANSITIVE:
> > > > >         case FORWARD:
> > > > >         case FULL_TRANSITIVE:
> > > > >         case FULL:
> > > > >             checkRootMessageChange(fromDescriptor, toDescriptor,
> > > > strategy);
> > > > >             return;
> > > > >         case ALWAYS_COMPATIBLE:
> > > > >             return;
> > > > >         default:
> > > > >             throw new IncompatibleSchemaException("Unknown
> > > > > SchemaCompatibilityStrategy.");
> > > > >     }
> > > > > }
> > > > >
> > > > > I would rename :
> > > > > from --> currentSchema
> > > > > to --> newSchema
> > > > >
> > > > > Use that switch case and have a method for each like:
> > > > > validateBackwardsCompatibility(currentSchema, newSchema)
> > > > >
> > > > > I dislike canRead and usage of writtenSchema, since you have two
> > > > completely
> > > > > different use cases: from the producing side and the consumer side.
> > > > >
> > > > > schemaValidatorBuilder
> > > > > >
> > > > > > I dislike this proposal. IMO Avro implementation is way too
> > > > complicated.
> > > > > Why not have a simple function for validation for each switch case
> > > above?
> > > > > Why do we need strategy and builder, and all this complexity?
> > > > >
> > > > >
> > > > > *Here are the basic compatibility rules we've defined:*
> > > > >
> > > > >
> > > > > IMO it's impossible to read the validation rules as you described
> > them.
> > > > > I wrote how they should be structured numerous times above.
> > > > > I can't validate them.
> > > > >
> > > > >
> > > > > IMO, the current design is very hard to read.
> > > > > Please try to avoid jumping into code sections.
> > > > > Write a high level design section, in which you describe in words
> > what
> > > > you
> > > > > plan to do.
> > > > > Write the validation rules in the structure that is easy to
> > understand:
> > > > > rules per each compatibility check, and use proper words (current
> > > schema,
> > > > > new schema), since new schema can be once used for read and once
> used
> > > for
> > > > > write.
> > > > >
> > > > > In its current form it takes too much time to understand the
> design,
> > > and
> > > > it
> > > > > shouldn't be the case.
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Asaf
> > > > >
> > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > On Sun, Mar 5, 2023 at 3:58 PM SiNan Liu <li...@gmail.com>
> > > wrote:
> > > > >
> > > > > > Hi! I updated the explanation of some things in the PIP issue.
> And
> > > also
> > > > > > added a new “flag” in the conf is used as the different
> > > > > > ProtobufNativeSchemaValidator implementation, also set
> > > > > > ProtobufNativeSchemaValidator default only check whether the name
> > of
> > > > the
> > > > > > root message is the same.
> > > > > >
> > > > > >
> > > > > > Thanks,
> > > > > > sinan
> > > > > >
> > > > > >
> > > > > > Asaf Mesika <as...@gmail.com> 于2023年3月5日周日 20:21写道:
> > > > > >
> > > > > > > On Wed, Mar 1, 2023 at 4:33 PM SiNan Liu <
> liusinan1998@gmail.com
> > >
> > > > wrote:
> > > > > > >
> > > > > > > > >
> > > > > > > > > Can you please explain how a Protobuf Schema descriptor can
> > be
> > > > > > > validated
> > > > > > > > > for backward compatibility check using Avro based
> > compatibility
> > > > > > rules?
> > > > > > > > > Doesn't it expect the schema to be Avro, but it is
> actually a
> > > > > > Protobuf
> > > > > > > > > descriptor?
> > > > > > > > > Is there some translation happening?
> > > > > > > >
> > > > > > > >
> > > > > > > > 1. *You can take a quick look at the previous design, the
> > > PROTOBUF
> > > > uses
> > > > > > > > avro struct to store.*
> > > > > > > > https://github.com/apache/pulsar/pull/1954
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > >
> > >
> >
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/ProtobufSchema.java#L59-L61
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > >
> > >
> >
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/ProtobufSchema.java#L110-L115
> > > > > > >
> > > > > > >
> > > > > > > Ok. So to summarize your code (easier to write it than send
> > links):
> > > > > > > * Pulsar Client, when used with Protobuf Schema, actually
> > converts
> > > > the
> > > > > > > Protobuf descriptor into an Avro Schema (using code found
> inside
> > > Avro
> > > > > > > library) and saves that Avro schema as the schema. It's not
> > saving
> > > > the
> > > > > > > protobuf descriptor at all. Very confusing I have to add -
> never
> > > > expected
> > > > > > > that.
> > > > > > > This explains why In the ProtobufSchemaCompatibilityCheck they
> > just
> > > > > > extend
> > > > > > > the Avro without doing any translation.
> > > > > > >
> > > > > > > Thanks for that.
> > > > > > >
> > > > > > > Now thatI finally understand this, I can say that: you *must*
> > > explain
> > > > > > that
> > > > > > > in the motivation part in your PIP.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > 2. *On the broker side, protobuf and avro both use
> `SchemaData`
> > > > > > converted
> > > > > > > > to `org.apache.avro.Schema`.*
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > >
> > >
> >
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/ServerCnx.java#L1280-L1293
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > >
> > >
> >
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/schema/ProtobufSchemaCompatibilityCheck.java#L26-L31
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > >
> > >
> >
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/schema/AvroSchemaBasedCompatibilityCheck.java#L47-L70
> > > > > > >
> > > > > > >
> > > > > > > Actually those links don't really help.
> > > > > > > The main link that helps is:
> > > > > > >
> > > > > > >
> > > > > >
> > > >
> > >
> >
> https://github.com/apache/pulsar/blob/ec102fb024a6ea2b195826778300f20e330dff06/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/ProtobufSchema.java#L102-L122
> > > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > I'm sorry - I don't understand.
> > > > > > > > > I understand the different compatibility check strategies.
> > > > > > > > > If you just spell them out here, then as you say, just
> > > translate
> > > > the
> > > > > > > > > Protobuf Descriptor into an Avro schema and run the Avro
> > > > > > > > > compatibility validation, no?
> > > > > > > > > I believe the answer is no, since you may want to verify
> > > > different
> > > > > > > things
> > > > > > > > > when it comes to Protobuf, which are different then Avro.
> > > > > > > >
> > > > > > > >
> > > > > > > > 1.
> > > > > > > > *ProtobufSchema is different from ProtobufNativeSchema in
> that
> > it
> > > > uses
> > > > > > > > avro-protobuf.*
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > >
> > >
> >
> https://central.sonatype.com/artifact/org.apache.avro/avro-protobuf/1.11.1/overview
> > > > > > > > *ProtobufNativeSchema needs a native compatibility check, but
> > > > there is
> > > > > > no
> > > > > > > > official or third party implementation. So this PIP does not
> > use
> > > > > > > > avro-protobuf for protobuf compatibility checking.*
> > > > > > > >
> > > > > > > > 2. *By the way, this is implemented in much the same way that
> > > > Apache
> > > > > > avro
> > > > > > > > does compatibility checking.*
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > >
> > >
> >
> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/SchemaValidatorBuilder.java
> > > > > > > > `canReadStrategy`,`canBeReadStrategy`,`mutualReadStrategy`
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > >
> > >
> >
> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/ValidateCanRead.java
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > >
> > >
> >
> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/ValidateCanBeRead.java
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > >
> > >
> >
> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/ValidateMutualRead.java
> > > > > > > > *In `ValidateMutualRead.java`, the arguments of `canRead()`
> are
> > > > > > > > writtenSchema and readSchema. We only need to change the
> order
> > of
> > > > > > > arguments
> > > > > > > > we pass to `canRead()`.*
> > > > > > > > ```java
> > > > > > > > private void validateWithStrategy(Descriptors.Descriptor
> > > > toValidate,
> > > > > > > > Descriptors.Descriptor fromDescriptor) throws
> > > > > > > ProtoBufCanReadCheckException
> > > > > > > > {
> > > > > > > > switch (strategy) {
> > > > > > > > case CanReadExistingStrategy -> canRead(fromDescriptor,
> > > > toValidate);
> > > > > > > > case CanBeReadByExistingStrategy -> canRead(toValidate,
> > > > > > fromDescriptor);
> > > > > > > > case CanBeReadMutualStrategy -> {
> > > > > > > > canRead(toValidate, fromDescriptor);
> > > > > > > > canRead(fromDescriptor, toValidate);
> > > > > > > > }
> > > > > > > > }
> > > > > > > > }
> > > > > > > >
> > > > > > > > private void canRead(Descriptors.Descriptor writtenSchema,
> > > > > > > > Descriptors.Descriptor readSchema) throws
> > > > > > ProtoBufCanReadCheckException {
> > > > > > > >
> > > > > > >
> > > > > >
> > > >
> > >
> >
> ProtobufNativeSchemaBreakCheckUtils.checkSchemaCompatibility(writtenSchema,
> > > > > > > > readSchema);
> > > > > > > > }
> > > > > > > > ```
> > > > > > > >
> > > > > > > >
> > > > > > > I get that you want to take inspiration from the existing Avro
> > > Schema
> > > > > > > compatibility check, to do your code design.
> > > > > > > I also understand you *won't* use any existing avro code for
> > that.
> > > > > > > I also understand, you have to write the validation check on
> your
> > > > own,
> > > > > > > since there is no 3rd party to explain that.
> > > > > > >
> > > > > > > The only thing I can't understand are the actual rules as I
> wrote
> > > > before,
> > > > > > > since they are written confusingly.
> > > > > > > So, I repeat what I asked before:
> > > > > > >
> > > > > > > I think you should structure the validation rules differently:
> > > > > > >
> > > > > > > * Backward checks
> > > > > > > ** List down rules, where use newSchema (the schema used by
> > > producer
> > > > or
> > > > > > > consumer) and existingSchema (last schema used)
> > > > > > > * Forward
> > > > > > > ** List down rules, where use newSchema (the schema used by
> > > producer
> > > > or
> > > > > > > consumer) and existingSchema (last schema used)
> > > > > > >
> > > > > > > Once that's accomplished I will be able to understand the
> > different
> > > > > > > validation rules for each compatibility check.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > sinan
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > Asaf Mesika <as...@gmail.com> 于2023年3月1日周三 21:19写道:
> > > > > > > > >
> > > > > > > > > On Mon, Feb 27, 2023 at 3:47 PM SiNan Liu <
> > > > liusinan1998@gmail.com>
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > I read it and they look identical. What's the
> difference
> > > > between
> > > > > > > > them?
> > > > > > > > > >
> > > > > > > > > > Current avro,json, and protobuf schemas are all
> implemented
> > > > based
> > > > > > on
> > > > > > > > AVRO.
> > > > > > > > > > > What do you mean, they are all implemented based on
> Avro?
> > > You
> > > > > > mean
> > > > > > > > the
> > > > > > > > > > > protobuf schema is converted into an Avro Schema, and
> > then
> > > > you
> > > > > > use
> > > > > > > > Avro
> > > > > > > > > > > compatibility validation?
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > >
> > >
> >
> `org.apache.pulsar.broker.service.schema.ProtobufSchemaCompatibilityCheck`
> > > > > > > > > >
> > > > > > >
> > > >
> `org.apache.pulsar.broker.service.schema.AvroSchemaCompatibilityCheck`
> > > > > > > > > >
> > > > > > >
> > > >
> `org.apache.pulsar.broker.service.schema.JsonSchemaCompatibilityCheck`
> > > > > > > > > > They all extends `AvroSchemaBasedCompatibilityCheck`, the
> > > > > > > > > > `checkCompatible()` is the same implementation with
> `AVRO`.
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > Can you please explain how a Protobuf Schema descriptor can
> > be
> > > > > > > validated
> > > > > > > > > for backward compatibility check using Avro based
> > compatibility
> > > > > > rules?
> > > > > > > > > Doesn't it expect the schema to be Avro, but it is
> actually a
> > > > > > Protobuf
> > > > > > > > > descriptor?
> > > > > > > > > Is there some translation happening?
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > I think you should structure the validation rules
> > > differently:
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > The Compatibility check strategy is described on the
> > website
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > >
> > >
> >
> https://pulsar.apache.org/docs/next/schema-understand/#schema-compatibility-check-strategy
> > > > > > > > > > 1. BACKWARD(CanReadExistingStrategy): Consumers using
> > schema
> > > > V3 can
> > > > > > > > process
> > > > > > > > > > data written by producers using the last schema version
> V2.
> > > So
> > > > V2
> > > > > > is
> > > > > > > > > > "writtenSchema" and V3 is "readSchema".
> > > > > > > > > > 2. FORWARD(CanBeReadByExistingStrategy): Consumers using
> > the
> > > > last
> > > > > > > > schema
> > > > > > > > > > version V2 can process data written by producers using a
> > new
> > > > schema
> > > > > > > V3,
> > > > > > > > > > even though they may not be able to use the full
> > capabilities
> > > > of
> > > > > > the
> > > > > > > > new
> > > > > > > > > > schema. So V3 is "writtenSchema" and V2 is "readSchema".
> > > > > > > > > > 3. FULL(CanBeReadMutualStrategy): Schemas are both
> backward
> > > and
> > > > > > > forward
> > > > > > > > > > compatible.
> > > > > > > > > > Schema can evolve. The old version schema and the new
> > version
> > > > > > schema
> > > > > > > > should
> > > > > > > > > > be well understood.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > I'm sorry - I don't understand.
> > > > > > > > > I understand the different compatibility check strategies.
> > > > > > > > > If you just spell them out here, then as you say, just
> > > translate
> > > > the
> > > > > > > > > Protobuf Descriptor into an Avro schema and run the Avro
> > > > > > > > > compatibility validation, no?
> > > > > > > > > I believe the answer is no, since you may want to verify
> > > > different
> > > > > > > things
> > > > > > > > > when it comes to Protobuf, which are different then Avro.
> > > > > > > > >
> > > > > > > > > At the current state, I can't understand your design at
> all.
> > > > Please
> > > > > > > help
> > > > > > > > > clarify that.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > So each strategy should have its own section.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > The arguments of `canRead()` are writtenSchema and
> > > readSchema.
> > > > As
> > > > > > > we've
> > > > > > > > > > just described, we only need to change the order of
> > arguments
> > > > we
> > > > > > pass
> > > > > > > > to
> > > > > > > > > > `canRead()`.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Thanks,
> > > > > > > > > > sinan
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Asaf Mesika <as...@gmail.com> 于2023年2月27日周一
> 20:49写道:
> > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > And you can see the difference between ProtoBuf and
> > > > > > > ProtoBufNative:
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > >
> > > >
> https://pulsar.apache.org/docs/next/schema-get-started/#protobufnative
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > https://pulsar.apache.org/docs/next/schema-get-started/#protobuf
> > > > > > > > > > > >
> > > > > > > > > > >  I read it and they look identical. What's the
> difference
> > > > between
> > > > > > > > them?
> > > > > > > > > > >
> > > > > > > > > > > Current avro,json, and protobuf schemas are all
> > implemented
> > > > based
> > > > > > > on
> > > > > > > > > > AVRO.
> > > > > > > > > > >
> > > > > > > > > > > What do you mean, they are all implemented based on
> Avro?
> > > You
> > > > > > mean
> > > > > > > > the
> > > > > > > > > > > protobuf schema is converted into an Avro Schema, and
> > then
> > > > you
> > > > > > use
> > > > > > > > Avro
> > > > > > > > > > > compatibility validation?
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > > *Here are the basic compatibility rules we've
> defined:*
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > I think you should structure the validation rules
> > > > differently:
> > > > > > > > > > >
> > > > > > > > > > > * Backward checks
> > > > > > > > > > > ** List down rules, where use newSchema (the schema
> used
> > by
> > > > > > > producer
> > > > > > > > or
> > > > > > > > > > > consumer) and existingSchema (last schema used)
> > > > > > > > > > > * Forward
> > > > > > > > > > > ** List down rules, where use newSchema (the schema
> used
> > by
> > > > > > > producer
> > > > > > > > or
> > > > > > > > > > > consumer) and existingSchema (last schema used)
> > > > > > > > > > >
> > > > > > > > > > > So each strategy should have its own section.
> > > > > > > > > > >
> > > > > > > > > > > I'm saying this since you used "writttenSchema" word
> but
> > it
> > > > > > > > represents
> > > > > > > > > > > something completely different if it's backward or
> > forward
> > > > check.
> > > > > > > > > > >
> > > > > > > > > > > Once you'll have that structure like that, I personally
> > > will
> > > > be
> > > > > > > able
> > > > > > > > to
> > > > > > > > > > > read and understand it.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > The motivation and problem statement are now good -
> > thanks
> > > > for
> > > > > > > > improving
> > > > > > > > > > > it.
> > > > > > > > > > >
> > > > > > > > > > > On Mon, Feb 27, 2023 at 8:20 AM SiNan Liu <
> > > > > > liusinan1998@gmail.com>
> > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Hi! I updated the PIP issue again. This time I've
> added
> > > > some
> > > > > > > > background
> > > > > > > > > > > and
> > > > > > > > > > > > some explanations.
> > > > > > > > > > > >
> > > > > > > > > > > > The compatibility check rules are already written in
> > the
> > > > > > > > > > Implementation.
> > > > > > > > > > > > ProtoBufNative implements the same canRead method as
> > > Apache
> > > > > > Avro.
> > > > > > > > > > > > It does this by checking whether the schema for
> writing
> > > and
> > > > > > > reading
> > > > > > > > is
> > > > > > > > > > > > compatible. I also indicate whether the writtenSchema
> > and
> > > > > > > > readSchema of
> > > > > > > > > > > the
> > > > > > > > > > > > Backward, Forward, and Full strategies are the old or
> > the
> > > > new
> > > > > > > > version
> > > > > > > > > > of
> > > > > > > > > > > > the schema.
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks,
> > > > > > > > > > > > sinan
> > > > > > > > > > > >
> > > > > > > > > > > > Asaf Mesika <as...@gmail.com> 于2023年2月26日周日
> > > 23:24写道:
> > > > > > > > > > > >
> > > > > > > > > > > > > I'm sorry, but this PIP lacks a lot of background
> > > > knowledge,
> > > > > > so
> > > > > > > > you
> > > > > > > > > > > need
> > > > > > > > > > > > to
> > > > > > > > > > > > > add IMO for people to understand it. You don't need
> > to
> > > > > > explain
> > > > > > > > the
> > > > > > > > > > > entire
> > > > > > > > > > > > > pulsar in this PIP, but at the very least a few
> > > > paragraphs
> > > > > > > > detailing
> > > > > > > > > > > all
> > > > > > > > > > > > > you need to know, to put you in context:
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >    - Start by saying Pulsar as a built-in schema
> > > registry
> > > > > > > inside
> > > > > > > > > > Pulsar
> > > > > > > > > > > > >    broker.
> > > > > > > > > > > > >       - Every time the client updates the schema,
> it
> > > > uploads
> > > > > > it
> > > > > > > > to
> > > > > > > > > > the
> > > > > > > > > > > > >       broker. When that happens, it has a feature
> > which
> > > > > > > validates
> > > > > > > > if
> > > > > > > > > > > the
> > > > > > > > > > > > > new
> > > > > > > > > > > > >       schema version is compatible with the
> previous
> > > > > > versions.
> > > > > > > > There
> > > > > > > > > > > > > are 4 types
> > > > > > > > > > > > >       of compatibility: Full, ... (complete and
> > explain
> > > > each
> > > > > > > one
> > > > > > > > > > > briefly)
> > > > > > > > > > > > >    - Also explain Pulsar Schema registry supports
> > > various
> > > > > > > schema
> > > > > > > > > > > > >    protocols:  Avro, protobuf native, ... (complete
> > the
> > > > > > rest),
> > > > > > > > each
> > > > > > > > > > > > > protocol
> > > > > > > > > > > > >    has a schema which dictates how to serialize and
> > > > > > deserialize
> > > > > > > > the
> > > > > > > > > > > > message
> > > > > > > > > > > > >    content into typed object.
> > > > > > > > > > > > >    - Explain in short what is protobuf native
> > (compare
> > > > > > protobuf
> > > > > > > > > > > > non-native)
> > > > > > > > > > > > >    - Please don't paste code instead of explaining.
> > > > > > > > > > > > >       - Explain that protobuf native current
> > validation
> > > > check
> > > > > > > is
> > > > > > > > only
> > > > > > > > > > > > >       composed of checking the root message name is
> > the
> > > > same
> > > > > > > > between
> > > > > > > > > > > > > the current
> > > > > > > > > > > > >       schema version and the new version.
> > > > > > > > > > > > >          - Explain briefly what is a root message
> and
> > > its
> > > > > > name.
> > > > > > > > > > > > >       - Explain the problem (list scenarios) that
> we
> > > have
> > > > > > > because
> > > > > > > > > > > > protobuf
> > > > > > > > > > > > >       native schema only supports FULL
> compatibility
> > > > > > > validation.
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > Regarding high level design - as in what you plan
> to
> > > do.
> > > > > > > > > > > > > I suggest you add "High Level Design" and in it
> > detail
> > > > how
> > > > > > you
> > > > > > > > plan
> > > > > > > > > > to
> > > > > > > > > > > > > validate, per protobuf version, per compatibility
> > check
> > > > > > > > (backward,
> > > > > > > > > > > > forward,
> > > > > > > > > > > > > full,...).
> > > > > > > > > > > > > I tried reading the implementation - for me , it's
> > all
> > > > over
> > > > > > the
> > > > > > > > > > place.
> > > > > > > > > > > > Can
> > > > > > > > > > > > > you please list in order what I wrote above, and
> list
> > > the
> > > > > > > > validation
> > > > > > > > > > > > rules
> > > > > > > > > > > > > with a good explanation why you validate it like
> > that?
> > > > > > > > > > > > >
> > > > > > > > > > > > > Lastly, one you have all the validation rules
> clearly
> > > > stated,
> > > > > > > you
> > > > > > > > can
> > > > > > > > > > > use
> > > > > > > > > > > > > it to document it properly so users can know what
> > > > validation
> > > > > > to
> > > > > > > > > > expect.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > >
> > > > > > > > > > > > > Asaf
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Wed, Feb 22, 2023 at 5:10 PM SiNan Liu <
> > > > > > > > liusinan1998@gmail.com>
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Sorry, my mistake. I removed the code and
> described
> > > the
> > > > > > > design
> > > > > > > > to
> > > > > > > > > > > > improve
> > > > > > > > > > > > > > the PROTOBUF_NATIVE schema compatibility checks.
> > You
> > > > can
> > > > > > > have a
> > > > > > > > > > look.
> > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Asaf Mesika <as...@gmail.com>
> 于2023年2月22日周三
> > > > 21:16写道:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I read it but you're almost directly diving
> into
> > > the
> > > > > > code -
> > > > > > > > it
> > > > > > > > > > will
> > > > > > > > > > > > > take
> > > > > > > > > > > > > > me
> > > > > > > > > > > > > > > hours just to reverse engineer your design.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Can you please include a "High Level Design"
> > > section
> > > > in
> > > > > > > which
> > > > > > > > you
> > > > > > > > > > > > > explain
> > > > > > > > > > > > > > > how you plan to tackle any issue?
> > > > > > > > > > > > > > > If I can read that section and explain to
> someone
> > > > else
> > > > > > how
> > > > > > > > this
> > > > > > > > > > > will
> > > > > > > > > > > > > > work,
> > > > > > > > > > > > > > > it means the section is complete.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Let's leave the code to the PRs.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Sun, Feb 19, 2023 at 2:59 PM SiNan Liu <
> > > > > > > > > > liusinan1998@gmail.com>
> > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Hi all,
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > I made a PIP to discuss:
> > > > > > > > > > > > > https://github.com/apache/pulsar/issues/19565
> > > > > > > > > > > > > > .
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > We can talk about the current design here.
> > > > Especially
> > > > > > for
> > > > > > > > the
> > > > > > > > > > > field
> > > > > > > > > > > > > > type
> > > > > > > > > > > > > > > > change check rules, please give your valuable
> > > > advice.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > > Sinan
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] PIP-246: Improved PROTOBUF_NATIVE schema compatibility checks without using avro-protobuf

Posted by SiNan Liu <li...@gmail.com>.
>
> Ok.
> First, the name is confusing. Flags are normally true/false, in your case
> it's a string, so the name should be a Configuration property.
> Second, I agree - you're basically saying we must allow users to keep
> existing implementation, or switch to new implementation. Just using a
> boolean feature flag might suffice now, but how will we improve the rules
> without breaking existing users. So we choose to encode the new code as a
> new class and have the user select it via config, by specifying the class
> name.
> 3rd, now that I understand that, you need to revise your explanation. It's
> completely unclear from the current explanation.


The description here should not be flag, but configuration. I will modify
it.

Of course you shouldn't make a gigantic class having all validation
> methods. I'm fine with having
> `ProtobufNativeBackwardComptabilityValidation` class and each type of
> compatibility strategy can have its own class, but just have it simple:
> each strategy is mapped to one simple validation method. I don't see any
> need for the added layers you have there of the builders, etc - current
> Avro design is too complicated, so let's not copy it.
> I understand
> As I wrote before, using `writtenSchema` and `readSchema` is confusing.
> Example: You write: .....
> This I can read and understand.


1. All the things you said. Explain the checking rules for each
schema-compatible-check-strategy separately, indicating that you do not
understand the current design. I've explained this many times.
In `Implementation` 1:
```
Schema compatibility checking can be understood as producer (write)
consumer (read). If a producer uses one schema to write data to the topic
and a consumer can read it, then the two schemas are compatible. A
"writtenSchema" canRead by a "readSchema".
```
Whether two schemas are compatible is whether they can be ****read**** by
each other. If they cannot be read, they are incompatible!!!

2. Instead of discussing what the validation rules are for each
schema-compatible-check-strategy, find out what the schema compatibility
checking behavior looks like and build the Validator! All it takes is three
parameters!
In the Implementation of the 2, we are through
ProtobufNativeSchemaValidatorBuilder build different according to different
parameters of the Validator.
(1) ProtobufNativeSchemaValidationStrategy: According to the description of
`schema-compatibility-check-strategy` on pulsar official website, And the
description of "writtenSchema" (schema used by the Producer) and
"readSchema" (schema used by the Consumer) in Implementation 1. The
schema-compatibility-check-strategy mapping to
`ProtobufNativeSchemaValidationStrategy` of three strategies.
(2) isOnlyValidateLatest: This parameter is explained in `Implementation 2
(2)`. This also maps to what kind of checking behavior TRANSITIVE is
.(whether to check only the last schema that already exists)
(3) validatorClassName: This is the validator implementation class, which
by default checks only the name of the root message.

3. Once the Validator has been created, don't worry about every
schema-compatibility-check-strategy at this point! Only need to consider
three `ProtobufNativeSchemaValidationStrategy` how to achieve!
```
So the existsSchema here is V1 and V2 schema, and the newSchema is schema
V3. We just need to change the arguments that are passed in to canRead
check.
```
Here we map existsSchema and newSchema directly to writtenSchema and
readSchema.
(V1,V2 is existsSchema. V3 is newSchema)

4. Finally, the implementation of `canRead`. And don't say anything about
what the rules for each schema-compatibility-check-strategy are! The
inspection rules are the same, as described on protobuf website.
`According to the protobuf official website
https://protobuf.dev/programming-guides/proto/#updating for compatibility.
Here are the basic compatibility rules we've defined:`

5.

> What you linked (https://protobuf.dev/programming-guides/proto/#updating)
> are a bunch of guidelines and rules.
> It's not what you are actually going to validate.


*That's the most important thing. canRead is a rule defined on the official
website. It only checks whether two protobufs are compatible.*
*schema-compatibility-check-strategy not considered! This is described in
Implementation 2 and on the pulsar official website.*

6. Why implement each schema-compatibility-check-strategy separately? Isn't
it good to find out the commonality of the compatibility checking behavior
of multiple policies?


Thanks,
sinan




Asaf Mesika <as...@gmail.com> 于2023年3月7日周二 17:45写道:

> On Tue, Mar 7, 2023 at 6:51 AM SiNan Liu <li...@gmail.com> wrote:
>
> > Thanks for the advice, Asaf.
> >
> > 1.
> >
> > > For now, there is no official or third-party solution for ProtoBuf
> > > compatibility. If in the future have better solutions of a third party
> or
> > > the official, we develop new ProtobufNativeSchemaValidator and use, so
> > add
> > > a flag.
> >
> > Flag defaults not set, where the schema Compatibility Checking Rule
> checks
> > only the name of the root message. If you want to use the current PIP
> > implementation, Can be set to
> >
> >
> `org.apache.pulsar.broker.service.schema.validator.ProtobufNativeSchemaBreakValidatorImpl`.
> > This explains what the flag does, and I'm not going to delete it. The
> user
> > can choose whether to use the previous implementation (just check that
> the
> > root message name is the same), but this may not be enough, so you can
> > choose the current PIP implementation. If there is a better third party
> or
> > official solution in the future, it can be well developed and replaced.
> It
> > is necessary to add the flag to the PIP and keep the implementation
> > extensible.
> >
>
> Ok.
> First, the name is confusing. Flags are normally true/false, in your case
> it's a string, so the name should be a Configuration property.
> Second, I agree - you're basically saying we must allow users to keep
> existing implementation, or switch to new implementation. Just using a
> boolean feature flag might suffice now, but how will we improve the rules
> without breaking existing users. So we choose to encode the new code as a
> new class and have the user select it via config, by specifying the class
> name.
> 3rd, now that I understand that, you need to revise your explanation. It's
> completely unclear from the current explanation.
>
>
>
>
>
> >
> > 2.
> >
> > > Why not have a simple function for validation for each switch case
> above?
> > > Why do we need strategy and builder, and all this complexity?
> >
> > I don't see how it's complicated. It's easy to understand and it's not
> > redundant. The only function of the validator is to check whether two
> > Protobufs are compatible. The builder builds checkers based on different
> > compatibility checking strategies. If all the implementation is done in
> the
> > validator, it will be messy and there will be a lot of duplication. And
> if
> > a new validator is extended later, it won't extend well. So discarding
> > encapsulation is more complex and loses scalability. I won't change this
> > design.
> >
>
> Of course you shouldn't make a gigantic class having all validation
> methods. I'm fine with having
> `ProtobufNativeBackwardComptabilityValidation` class and each type of
> compatibility strategy can have its own class, but just have it simple:
> each strategy is mapped to one simple validation method. I don't see any
> need for the added layers you have there of the builders, etc - current
> Avro design is too complicated, so let's not copy it.
>
>
>
> >
> > 3. *Here are the basic compatibility rules we've defined:*
> > https://protobuf.dev/programming-guides/proto/#updating
> > According to the rules formulated by the official website, not customized
> > by myself.
> >
>
> I understand
>
> As I wrote before, using `writtenSchema` and `readSchema` is confusing.
>
> Example: You write:
>
> > (1) Create:
> >
> >    - The writtenSchema cannot add required fields, but optional or
> >    duplicate fields can be added (The field number must be new).
> >
> > You said writtenSchema is the schema of the producer.
> What do you mean by create? Do you mean the producer is publishing the
> first schema ever to the registry?
> If so, what if you explain:
>
> newSchema - a new schema registered either by a producer or a consumer.
> currentSchema - the current latest schema in the schema registry
>
> BACKWARD:
>
>    - new fields added in newSchema compared with currentSchema
>       - New fields are defined as fields with field numbers that don't
>       exist in the existing schema
>       - The new fields must be optional and not required.
>    - Modifying fields in the newSchema which already exists (update) in the
>    currentSchema
>       - Modifying is defined as changing field details for the same field
>       number.
>       - Changing the name is permitted.
>       - The type can be changed only if the type is compatible. The allowed
>       changes are ....
>    - Removing fields in the newSchema which exists in the currentSchema
>       - Removing a field is defined as removing the usage of the field
>       number, used by a field in the currentSchema, so it won't exist in
> the
>       schema.
>       - ...
>
>
> This I can read and understand.
>
> You wrote
>
> >
> >    - The writtenSchema do not change the field number of any field in
> >    'readSchema' (the field name is the same, but the field number is
> >    different).
> >
> > So if I have
>
> oldSchema
> 1 customerName String
> 2 customerAge integer
>
> newSchema
> reserved 1
> 2 customerAge integer
> 3 customerName string
>
> This is perfectly valid.
>
> I actually think we should force that - when you remove a field (field
> number removed), but add the same definition using a different field
> number, we should force you to declare reserved.
>
>
> What you linked (https://protobuf.dev/programming-guides/proto/#updating)
> are a bunch of guidelines and rules.
> It's not what you are actually going to validate.
>
> So how can I review your validation rules, if you don't write them
> explicitly?
> 1. By compatibility check strategy?
> 2. use correct naming, and not read and write schema, as the consumer can
> publish new schema and producer can publish new schema as well. Use current
> and new to differentiate what you are comparing.
> 3. Write the rules *explicitly*. All rules you will use. For example, types
> you allow changing (you wrote that under alternatives).
>
>
>
> >
> > 4. Other parts have been updated with explanations.
> >
>
> Sinan, listen - I spend *a lot* of time reviewing this PIP. Many rounds.
> You write "all parts updated", yet I go to the PIP and I see my first
> comment from the previous email was not implemented, nor got any reply
> here.
>
> Can you please review all of my previous comments and reply to each one of
> them?
>
>
> Please bear in mind - all the work we're doing here in the review is:
> 1. Help future engineers understand the code better
> 2. Help future users understand this feature better
> 3. Make Pulsar look good. When we have designs which are super hard to read
> and understand, it reflects badly on the community as a whole.
> 4. Once the doc is crystal clear, you can actually spend the time to find
> the faults in it.
>
> Thanks!
>
> Asaf
>
>
>
> >
> >
> > Thanks,
> > sinan
> >
> >
> >
> >
> >
> >
> >
> > 丛搏 <bo...@apache.org> 于2023年3月7日周二 11:53写道:
> >
> > > I think we have two ways to do that.
> > >
> > > First way: We need to advance the improvement of java in protobuf. Ask
> > > if they have plans to improve.
> > >
> > > Second way: the new PROTOBUF_NATIVE `SchemaCompatibilityCheck` should
> > > be implemented as a plugin, don't change any existing plugin logic
> > > (it's simple and already used). I don't recommend adding flags for
> > > rollback, it adds configuration and makes little sense.
> > >
> > > Thanks,
> > > Bo
> > >
> > > Asaf Mesika <as...@gmail.com> 于2023年3月6日周一 23:00写道:
> > >
> > > >
> > > > Can you convert the code block which is actually a quote in the
> > > > beginning of the PIP to something which doesn't require to scroll
> > > > horizontally so much?
> > > > Use
> > > >
> > >
> >
> https://docs.github.com/en/get-started/writing-on-github/getting-started-with-writing-and-formatting-on-github/basic-writing-and-formatting-syntax#quoting-text
> > > >
> > > > Let's improve the clarity of what you wrote:
> > > >
> > > > "the PROTOBUF uses avro struct to store."
> > > > -->
> > > > When Schema type PROTOBUF is used, Pulsar Client assumes the object
> > given
> > > > to it as message data is an auto-generated POJO containing the
> > > annotations
> > > > encoding the schema. The client is using a converter, which converts
> a
> > > > Protobuf schema descriptor into an Avro schema and sends that as the
> > > Schema
> > > > of the producer/consumer.
> > > >
> > > > "On the broker side, protobuf and avro both use SchemaData converted
> to
> > > > org.apache.avro.Schema."
> > > > -->
> > > > Since the schema is an Avro schema, the implementation of
> compatibility
> > > > check on the broker side is to simply re-use the compatibility check
> of
> > > the
> > > > AVRO schema type.
> > > >
> > > > "ProtobufSchema is different from ProtobufNativeSchema in schema
> > > > compatibility check it uses avro-protobuf.
> > > >
> > >
> >
> https://central.sonatype.com/artifact/org.apache.avro/avro-protobuf/1.11.1/overview
> > > > But the current implementation of ProtobufNative schema compatibility
> > > > check only
> > > > checked if the root message name is changed."
> > > >
> > > > -->
> > > > PROTOBUF_NATIVE schema type is different.
> > > > The client is actually using Protobuf Descriptor as the schema, as
> > > opposed
> > > > to Avro schema of PROTOBUF schema type. In the broker, the
> > > PROTOBUF_NATIVE
> > > > compatibility check actually hasn't implemented any rule, besides
> one:
> > > > checking if the root message name has changed.
> > > >
> > > >
> > > >
> > > > >    1. For now, there is no official or third-party solution for
> > > ProtoBuf
> > > > >    compatibility. If in the future have better solutions of a third
> > > party or
> > > > >    the official, we develop new ProtobufNativeSchemaValidator and
> > use,
> > > so
> > > > >    add a flag.
> > > > >
> > > > > Who do you need to make that configurable? Once you found a third
> > > party,
> > > > just switch to it? Who knows, maybe you never will. Introduce it when
> > you
> > > > find it, not now.
> > > >
> > > >
> > > > We improve in ProtobufNativeSchemaCompatibilityCheck BACKWARD,
> FORWARD
> > > > > these strategies. As with the AVRO implementation, protobuf
> > > compatibility
> > > > > checking need implementing the canRead method. *This will check
> that
> > > > > the writtenschema can be read by readSchema.*
> > > >
> > > >
> > > > I completely disagree.
> > > > Avro implementation is confusing for our use case. Don't copy that.
> > > >
> > > > You have
> > > >
> > > > public void checkCompatible(SchemaData from, SchemaData to,
> > > > SchemaCompatibilityStrategy strategy)
> > > >         throws IncompatibleSchemaException {
> > > >     Descriptor fromDescriptor =
> > > > ProtobufNativeSchemaUtils.deserialize(from.getData());
> > > >     Descriptor toDescriptor =
> > > > ProtobufNativeSchemaUtils.deserialize(to.getData());
> > > >     switch (strategy) {
> > > >         case BACKWARD_TRANSITIVE:
> > > >         case BACKWARD:
> > > >         case FORWARD_TRANSITIVE:
> > > >         case FORWARD:
> > > >         case FULL_TRANSITIVE:
> > > >         case FULL:
> > > >             checkRootMessageChange(fromDescriptor, toDescriptor,
> > > strategy);
> > > >             return;
> > > >         case ALWAYS_COMPATIBLE:
> > > >             return;
> > > >         default:
> > > >             throw new IncompatibleSchemaException("Unknown
> > > > SchemaCompatibilityStrategy.");
> > > >     }
> > > > }
> > > >
> > > > I would rename :
> > > > from --> currentSchema
> > > > to --> newSchema
> > > >
> > > > Use that switch case and have a method for each like:
> > > > validateBackwardsCompatibility(currentSchema, newSchema)
> > > >
> > > > I dislike canRead and usage of writtenSchema, since you have two
> > > completely
> > > > different use cases: from the producing side and the consumer side.
> > > >
> > > > schemaValidatorBuilder
> > > > >
> > > > > I dislike this proposal. IMO Avro implementation is way too
> > > complicated.
> > > > Why not have a simple function for validation for each switch case
> > above?
> > > > Why do we need strategy and builder, and all this complexity?
> > > >
> > > >
> > > > *Here are the basic compatibility rules we've defined:*
> > > >
> > > >
> > > > IMO it's impossible to read the validation rules as you described
> them.
> > > > I wrote how they should be structured numerous times above.
> > > > I can't validate them.
> > > >
> > > >
> > > > IMO, the current design is very hard to read.
> > > > Please try to avoid jumping into code sections.
> > > > Write a high level design section, in which you describe in words
> what
> > > you
> > > > plan to do.
> > > > Write the validation rules in the structure that is easy to
> understand:
> > > > rules per each compatibility check, and use proper words (current
> > schema,
> > > > new schema), since new schema can be once used for read and once used
> > for
> > > > write.
> > > >
> > > > In its current form it takes too much time to understand the design,
> > and
> > > it
> > > > shouldn't be the case.
> > > >
> > > > Thanks,
> > > >
> > > > Asaf
> > > >
> > > >
> > > > >
> > > >
> > > >
> > > >
> > > > On Sun, Mar 5, 2023 at 3:58 PM SiNan Liu <li...@gmail.com>
> > wrote:
> > > >
> > > > > Hi! I updated the explanation of some things in the PIP issue. And
> > also
> > > > > added a new “flag” in the conf is used as the different
> > > > > ProtobufNativeSchemaValidator implementation, also set
> > > > > ProtobufNativeSchemaValidator default only check whether the name
> of
> > > the
> > > > > root message is the same.
> > > > >
> > > > >
> > > > > Thanks,
> > > > > sinan
> > > > >
> > > > >
> > > > > Asaf Mesika <as...@gmail.com> 于2023年3月5日周日 20:21写道:
> > > > >
> > > > > > On Wed, Mar 1, 2023 at 4:33 PM SiNan Liu <liusinan1998@gmail.com
> >
> > > wrote:
> > > > > >
> > > > > > > >
> > > > > > > > Can you please explain how a Protobuf Schema descriptor can
> be
> > > > > > validated
> > > > > > > > for backward compatibility check using Avro based
> compatibility
> > > > > rules?
> > > > > > > > Doesn't it expect the schema to be Avro, but it is actually a
> > > > > Protobuf
> > > > > > > > descriptor?
> > > > > > > > Is there some translation happening?
> > > > > > >
> > > > > > >
> > > > > > > 1. *You can take a quick look at the previous design, the
> > PROTOBUF
> > > uses
> > > > > > > avro struct to store.*
> > > > > > > https://github.com/apache/pulsar/pull/1954
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/ProtobufSchema.java#L59-L61
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/ProtobufSchema.java#L110-L115
> > > > > >
> > > > > >
> > > > > > Ok. So to summarize your code (easier to write it than send
> links):
> > > > > > * Pulsar Client, when used with Protobuf Schema, actually
> converts
> > > the
> > > > > > Protobuf descriptor into an Avro Schema (using code found inside
> > Avro
> > > > > > library) and saves that Avro schema as the schema. It's not
> saving
> > > the
> > > > > > protobuf descriptor at all. Very confusing I have to add - never
> > > expected
> > > > > > that.
> > > > > > This explains why In the ProtobufSchemaCompatibilityCheck they
> just
> > > > > extend
> > > > > > the Avro without doing any translation.
> > > > > >
> > > > > > Thanks for that.
> > > > > >
> > > > > > Now thatI finally understand this, I can say that: you *must*
> > explain
> > > > > that
> > > > > > in the motivation part in your PIP.
> > > > > >
> > > > > >
> > > > > >
> > > > > > >
> > > > > > >
> > > > > > > 2. *On the broker side, protobuf and avro both use `SchemaData`
> > > > > converted
> > > > > > > to `org.apache.avro.Schema`.*
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/ServerCnx.java#L1280-L1293
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/schema/ProtobufSchemaCompatibilityCheck.java#L26-L31
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/schema/AvroSchemaBasedCompatibilityCheck.java#L47-L70
> > > > > >
> > > > > >
> > > > > > Actually those links don't really help.
> > > > > > The main link that helps is:
> > > > > >
> > > > > >
> > > > >
> > >
> >
> https://github.com/apache/pulsar/blob/ec102fb024a6ea2b195826778300f20e330dff06/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/ProtobufSchema.java#L102-L122
> > > > > >
> > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > I'm sorry - I don't understand.
> > > > > > > > I understand the different compatibility check strategies.
> > > > > > > > If you just spell them out here, then as you say, just
> > translate
> > > the
> > > > > > > > Protobuf Descriptor into an Avro schema and run the Avro
> > > > > > > > compatibility validation, no?
> > > > > > > > I believe the answer is no, since you may want to verify
> > > different
> > > > > > things
> > > > > > > > when it comes to Protobuf, which are different then Avro.
> > > > > > >
> > > > > > >
> > > > > > > 1.
> > > > > > > *ProtobufSchema is different from ProtobufNativeSchema in that
> it
> > > uses
> > > > > > > avro-protobuf.*
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> https://central.sonatype.com/artifact/org.apache.avro/avro-protobuf/1.11.1/overview
> > > > > > > *ProtobufNativeSchema needs a native compatibility check, but
> > > there is
> > > > > no
> > > > > > > official or third party implementation. So this PIP does not
> use
> > > > > > > avro-protobuf for protobuf compatibility checking.*
> > > > > > >
> > > > > > > 2. *By the way, this is implemented in much the same way that
> > > Apache
> > > > > avro
> > > > > > > does compatibility checking.*
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/SchemaValidatorBuilder.java
> > > > > > > `canReadStrategy`,`canBeReadStrategy`,`mutualReadStrategy`
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/ValidateCanRead.java
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/ValidateCanBeRead.java
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/ValidateMutualRead.java
> > > > > > > *In `ValidateMutualRead.java`, the arguments of `canRead()` are
> > > > > > > writtenSchema and readSchema. We only need to change the order
> of
> > > > > > arguments
> > > > > > > we pass to `canRead()`.*
> > > > > > > ```java
> > > > > > > private void validateWithStrategy(Descriptors.Descriptor
> > > toValidate,
> > > > > > > Descriptors.Descriptor fromDescriptor) throws
> > > > > > ProtoBufCanReadCheckException
> > > > > > > {
> > > > > > > switch (strategy) {
> > > > > > > case CanReadExistingStrategy -> canRead(fromDescriptor,
> > > toValidate);
> > > > > > > case CanBeReadByExistingStrategy -> canRead(toValidate,
> > > > > fromDescriptor);
> > > > > > > case CanBeReadMutualStrategy -> {
> > > > > > > canRead(toValidate, fromDescriptor);
> > > > > > > canRead(fromDescriptor, toValidate);
> > > > > > > }
> > > > > > > }
> > > > > > > }
> > > > > > >
> > > > > > > private void canRead(Descriptors.Descriptor writtenSchema,
> > > > > > > Descriptors.Descriptor readSchema) throws
> > > > > ProtoBufCanReadCheckException {
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> ProtobufNativeSchemaBreakCheckUtils.checkSchemaCompatibility(writtenSchema,
> > > > > > > readSchema);
> > > > > > > }
> > > > > > > ```
> > > > > > >
> > > > > > >
> > > > > > I get that you want to take inspiration from the existing Avro
> > Schema
> > > > > > compatibility check, to do your code design.
> > > > > > I also understand you *won't* use any existing avro code for
> that.
> > > > > > I also understand, you have to write the validation check on your
> > > own,
> > > > > > since there is no 3rd party to explain that.
> > > > > >
> > > > > > The only thing I can't understand are the actual rules as I wrote
> > > before,
> > > > > > since they are written confusingly.
> > > > > > So, I repeat what I asked before:
> > > > > >
> > > > > > I think you should structure the validation rules differently:
> > > > > >
> > > > > > * Backward checks
> > > > > > ** List down rules, where use newSchema (the schema used by
> > producer
> > > or
> > > > > > consumer) and existingSchema (last schema used)
> > > > > > * Forward
> > > > > > ** List down rules, where use newSchema (the schema used by
> > producer
> > > or
> > > > > > consumer) and existingSchema (last schema used)
> > > > > >
> > > > > > Once that's accomplished I will be able to understand the
> different
> > > > > > validation rules for each compatibility check.
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > >
> > > > > > > Thanks,
> > > > > > > sinan
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Asaf Mesika <as...@gmail.com> 于2023年3月1日周三 21:19写道:
> > > > > > > >
> > > > > > > > On Mon, Feb 27, 2023 at 3:47 PM SiNan Liu <
> > > liusinan1998@gmail.com>
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > > >
> > > > > > > > > > I read it and they look identical. What's the difference
> > > between
> > > > > > > them?
> > > > > > > > >
> > > > > > > > > Current avro,json, and protobuf schemas are all implemented
> > > based
> > > > > on
> > > > > > > AVRO.
> > > > > > > > > > What do you mean, they are all implemented based on Avro?
> > You
> > > > > mean
> > > > > > > the
> > > > > > > > > > protobuf schema is converted into an Avro Schema, and
> then
> > > you
> > > > > use
> > > > > > > Avro
> > > > > > > > > > compatibility validation?
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> `org.apache.pulsar.broker.service.schema.ProtobufSchemaCompatibilityCheck`
> > > > > > > > >
> > > > > >
> > > `org.apache.pulsar.broker.service.schema.AvroSchemaCompatibilityCheck`
> > > > > > > > >
> > > > > >
> > > `org.apache.pulsar.broker.service.schema.JsonSchemaCompatibilityCheck`
> > > > > > > > > They all extends `AvroSchemaBasedCompatibilityCheck`, the
> > > > > > > > > `checkCompatible()` is the same implementation with `AVRO`.
> > > > > > > > >
> > > > > > > >
> > > > > > > > Can you please explain how a Protobuf Schema descriptor can
> be
> > > > > > validated
> > > > > > > > for backward compatibility check using Avro based
> compatibility
> > > > > rules?
> > > > > > > > Doesn't it expect the schema to be Avro, but it is actually a
> > > > > Protobuf
> > > > > > > > descriptor?
> > > > > > > > Is there some translation happening?
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > I think you should structure the validation rules
> > differently:
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > The Compatibility check strategy is described on the
> website
> > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> https://pulsar.apache.org/docs/next/schema-understand/#schema-compatibility-check-strategy
> > > > > > > > > 1. BACKWARD(CanReadExistingStrategy): Consumers using
> schema
> > > V3 can
> > > > > > > process
> > > > > > > > > data written by producers using the last schema version V2.
> > So
> > > V2
> > > > > is
> > > > > > > > > "writtenSchema" and V3 is "readSchema".
> > > > > > > > > 2. FORWARD(CanBeReadByExistingStrategy): Consumers using
> the
> > > last
> > > > > > > schema
> > > > > > > > > version V2 can process data written by producers using a
> new
> > > schema
> > > > > > V3,
> > > > > > > > > even though they may not be able to use the full
> capabilities
> > > of
> > > > > the
> > > > > > > new
> > > > > > > > > schema. So V3 is "writtenSchema" and V2 is "readSchema".
> > > > > > > > > 3. FULL(CanBeReadMutualStrategy): Schemas are both backward
> > and
> > > > > > forward
> > > > > > > > > compatible.
> > > > > > > > > Schema can evolve. The old version schema and the new
> version
> > > > > schema
> > > > > > > should
> > > > > > > > > be well understood.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > I'm sorry - I don't understand.
> > > > > > > > I understand the different compatibility check strategies.
> > > > > > > > If you just spell them out here, then as you say, just
> > translate
> > > the
> > > > > > > > Protobuf Descriptor into an Avro schema and run the Avro
> > > > > > > > compatibility validation, no?
> > > > > > > > I believe the answer is no, since you may want to verify
> > > different
> > > > > > things
> > > > > > > > when it comes to Protobuf, which are different then Avro.
> > > > > > > >
> > > > > > > > At the current state, I can't understand your design at all.
> > > Please
> > > > > > help
> > > > > > > > clarify that.
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > So each strategy should have its own section.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > The arguments of `canRead()` are writtenSchema and
> > readSchema.
> > > As
> > > > > > we've
> > > > > > > > > just described, we only need to change the order of
> arguments
> > > we
> > > > > pass
> > > > > > > to
> > > > > > > > > `canRead()`.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > sinan
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Asaf Mesika <as...@gmail.com> 于2023年2月27日周一 20:49写道:
> > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > And you can see the difference between ProtoBuf and
> > > > > > ProtoBufNative:
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > >
> > > https://pulsar.apache.org/docs/next/schema-get-started/#protobufnative
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > https://pulsar.apache.org/docs/next/schema-get-started/#protobuf
> > > > > > > > > > >
> > > > > > > > > >  I read it and they look identical. What's the difference
> > > between
> > > > > > > them?
> > > > > > > > > >
> > > > > > > > > > Current avro,json, and protobuf schemas are all
> implemented
> > > based
> > > > > > on
> > > > > > > > > AVRO.
> > > > > > > > > >
> > > > > > > > > > What do you mean, they are all implemented based on Avro?
> > You
> > > > > mean
> > > > > > > the
> > > > > > > > > > protobuf schema is converted into an Avro Schema, and
> then
> > > you
> > > > > use
> > > > > > > Avro
> > > > > > > > > > compatibility validation?
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > *Here are the basic compatibility rules we've defined:*
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > I think you should structure the validation rules
> > > differently:
> > > > > > > > > >
> > > > > > > > > > * Backward checks
> > > > > > > > > > ** List down rules, where use newSchema (the schema used
> by
> > > > > > producer
> > > > > > > or
> > > > > > > > > > consumer) and existingSchema (last schema used)
> > > > > > > > > > * Forward
> > > > > > > > > > ** List down rules, where use newSchema (the schema used
> by
> > > > > > producer
> > > > > > > or
> > > > > > > > > > consumer) and existingSchema (last schema used)
> > > > > > > > > >
> > > > > > > > > > So each strategy should have its own section.
> > > > > > > > > >
> > > > > > > > > > I'm saying this since you used "writttenSchema" word but
> it
> > > > > > > represents
> > > > > > > > > > something completely different if it's backward or
> forward
> > > check.
> > > > > > > > > >
> > > > > > > > > > Once you'll have that structure like that, I personally
> > will
> > > be
> > > > > > able
> > > > > > > to
> > > > > > > > > > read and understand it.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > The motivation and problem statement are now good -
> thanks
> > > for
> > > > > > > improving
> > > > > > > > > > it.
> > > > > > > > > >
> > > > > > > > > > On Mon, Feb 27, 2023 at 8:20 AM SiNan Liu <
> > > > > liusinan1998@gmail.com>
> > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Hi! I updated the PIP issue again. This time I've added
> > > some
> > > > > > > background
> > > > > > > > > > and
> > > > > > > > > > > some explanations.
> > > > > > > > > > >
> > > > > > > > > > > The compatibility check rules are already written in
> the
> > > > > > > > > Implementation.
> > > > > > > > > > > ProtoBufNative implements the same canRead method as
> > Apache
> > > > > Avro.
> > > > > > > > > > > It does this by checking whether the schema for writing
> > and
> > > > > > reading
> > > > > > > is
> > > > > > > > > > > compatible. I also indicate whether the writtenSchema
> and
> > > > > > > readSchema of
> > > > > > > > > > the
> > > > > > > > > > > Backward, Forward, and Full strategies are the old or
> the
> > > new
> > > > > > > version
> > > > > > > > > of
> > > > > > > > > > > the schema.
> > > > > > > > > > >
> > > > > > > > > > > Thanks,
> > > > > > > > > > > sinan
> > > > > > > > > > >
> > > > > > > > > > > Asaf Mesika <as...@gmail.com> 于2023年2月26日周日
> > 23:24写道:
> > > > > > > > > > >
> > > > > > > > > > > > I'm sorry, but this PIP lacks a lot of background
> > > knowledge,
> > > > > so
> > > > > > > you
> > > > > > > > > > need
> > > > > > > > > > > to
> > > > > > > > > > > > add IMO for people to understand it. You don't need
> to
> > > > > explain
> > > > > > > the
> > > > > > > > > > entire
> > > > > > > > > > > > pulsar in this PIP, but at the very least a few
> > > paragraphs
> > > > > > > detailing
> > > > > > > > > > all
> > > > > > > > > > > > you need to know, to put you in context:
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >    - Start by saying Pulsar as a built-in schema
> > registry
> > > > > > inside
> > > > > > > > > Pulsar
> > > > > > > > > > > >    broker.
> > > > > > > > > > > >       - Every time the client updates the schema, it
> > > uploads
> > > > > it
> > > > > > > to
> > > > > > > > > the
> > > > > > > > > > > >       broker. When that happens, it has a feature
> which
> > > > > > validates
> > > > > > > if
> > > > > > > > > > the
> > > > > > > > > > > > new
> > > > > > > > > > > >       schema version is compatible with the previous
> > > > > versions.
> > > > > > > There
> > > > > > > > > > > > are 4 types
> > > > > > > > > > > >       of compatibility: Full, ... (complete and
> explain
> > > each
> > > > > > one
> > > > > > > > > > briefly)
> > > > > > > > > > > >    - Also explain Pulsar Schema registry supports
> > various
> > > > > > schema
> > > > > > > > > > > >    protocols:  Avro, protobuf native, ... (complete
> the
> > > > > rest),
> > > > > > > each
> > > > > > > > > > > > protocol
> > > > > > > > > > > >    has a schema which dictates how to serialize and
> > > > > deserialize
> > > > > > > the
> > > > > > > > > > > message
> > > > > > > > > > > >    content into typed object.
> > > > > > > > > > > >    - Explain in short what is protobuf native
> (compare
> > > > > protobuf
> > > > > > > > > > > non-native)
> > > > > > > > > > > >    - Please don't paste code instead of explaining.
> > > > > > > > > > > >       - Explain that protobuf native current
> validation
> > > check
> > > > > > is
> > > > > > > only
> > > > > > > > > > > >       composed of checking the root message name is
> the
> > > same
> > > > > > > between
> > > > > > > > > > > > the current
> > > > > > > > > > > >       schema version and the new version.
> > > > > > > > > > > >          - Explain briefly what is a root message and
> > its
> > > > > name.
> > > > > > > > > > > >       - Explain the problem (list scenarios) that we
> > have
> > > > > > because
> > > > > > > > > > > protobuf
> > > > > > > > > > > >       native schema only supports FULL compatibility
> > > > > > validation.
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Regarding high level design - as in what you plan to
> > do.
> > > > > > > > > > > > I suggest you add "High Level Design" and in it
> detail
> > > how
> > > > > you
> > > > > > > plan
> > > > > > > > > to
> > > > > > > > > > > > validate, per protobuf version, per compatibility
> check
> > > > > > > (backward,
> > > > > > > > > > > forward,
> > > > > > > > > > > > full,...).
> > > > > > > > > > > > I tried reading the implementation - for me , it's
> all
> > > over
> > > > > the
> > > > > > > > > place.
> > > > > > > > > > > Can
> > > > > > > > > > > > you please list in order what I wrote above, and list
> > the
> > > > > > > validation
> > > > > > > > > > > rules
> > > > > > > > > > > > with a good explanation why you validate it like
> that?
> > > > > > > > > > > >
> > > > > > > > > > > > Lastly, one you have all the validation rules clearly
> > > stated,
> > > > > > you
> > > > > > > can
> > > > > > > > > > use
> > > > > > > > > > > > it to document it properly so users can know what
> > > validation
> > > > > to
> > > > > > > > > expect.
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks,
> > > > > > > > > > > >
> > > > > > > > > > > > Asaf
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > On Wed, Feb 22, 2023 at 5:10 PM SiNan Liu <
> > > > > > > liusinan1998@gmail.com>
> > > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > Sorry, my mistake. I removed the code and described
> > the
> > > > > > design
> > > > > > > to
> > > > > > > > > > > improve
> > > > > > > > > > > > > the PROTOBUF_NATIVE schema compatibility checks.
> You
> > > can
> > > > > > have a
> > > > > > > > > look.
> > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > Asaf Mesika <as...@gmail.com> 于2023年2月22日周三
> > > 21:16写道:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > I read it but you're almost directly diving into
> > the
> > > > > code -
> > > > > > > it
> > > > > > > > > will
> > > > > > > > > > > > take
> > > > > > > > > > > > > me
> > > > > > > > > > > > > > hours just to reverse engineer your design.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Can you please include a "High Level Design"
> > section
> > > in
> > > > > > which
> > > > > > > you
> > > > > > > > > > > > explain
> > > > > > > > > > > > > > how you plan to tackle any issue?
> > > > > > > > > > > > > > If I can read that section and explain to someone
> > > else
> > > > > how
> > > > > > > this
> > > > > > > > > > will
> > > > > > > > > > > > > work,
> > > > > > > > > > > > > > it means the section is complete.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Let's leave the code to the PRs.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Sun, Feb 19, 2023 at 2:59 PM SiNan Liu <
> > > > > > > > > liusinan1998@gmail.com>
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Hi all,
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I made a PIP to discuss:
> > > > > > > > > > > > https://github.com/apache/pulsar/issues/19565
> > > > > > > > > > > > > .
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > We can talk about the current design here.
> > > Especially
> > > > > for
> > > > > > > the
> > > > > > > > > > field
> > > > > > > > > > > > > type
> > > > > > > > > > > > > > > change check rules, please give your valuable
> > > advice.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > Sinan
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
>

Re: [DISCUSS] PIP-246: Improved PROTOBUF_NATIVE schema compatibility checks without using avro-protobuf

Posted by Asaf Mesika <as...@gmail.com>.
On Tue, Mar 7, 2023 at 6:51 AM SiNan Liu <li...@gmail.com> wrote:

> Thanks for the advice, Asaf.
>
> 1.
>
> > For now, there is no official or third-party solution for ProtoBuf
> > compatibility. If in the future have better solutions of a third party or
> > the official, we develop new ProtobufNativeSchemaValidator and use, so
> add
> > a flag.
>
> Flag defaults not set, where the schema Compatibility Checking Rule checks
> only the name of the root message. If you want to use the current PIP
> implementation, Can be set to
>
> `org.apache.pulsar.broker.service.schema.validator.ProtobufNativeSchemaBreakValidatorImpl`.
> This explains what the flag does, and I'm not going to delete it. The user
> can choose whether to use the previous implementation (just check that the
> root message name is the same), but this may not be enough, so you can
> choose the current PIP implementation. If there is a better third party or
> official solution in the future, it can be well developed and replaced. It
> is necessary to add the flag to the PIP and keep the implementation
> extensible.
>

Ok.
First, the name is confusing. Flags are normally true/false, in your case
it's a string, so the name should be a Configuration property.
Second, I agree - you're basically saying we must allow users to keep
existing implementation, or switch to new implementation. Just using a
boolean feature flag might suffice now, but how will we improve the rules
without breaking existing users. So we choose to encode the new code as a
new class and have the user select it via config, by specifying the class
name.
3rd, now that I understand that, you need to revise your explanation. It's
completely unclear from the current explanation.





>
> 2.
>
> > Why not have a simple function for validation for each switch case above?
> > Why do we need strategy and builder, and all this complexity?
>
> I don't see how it's complicated. It's easy to understand and it's not
> redundant. The only function of the validator is to check whether two
> Protobufs are compatible. The builder builds checkers based on different
> compatibility checking strategies. If all the implementation is done in the
> validator, it will be messy and there will be a lot of duplication. And if
> a new validator is extended later, it won't extend well. So discarding
> encapsulation is more complex and loses scalability. I won't change this
> design.
>

Of course you shouldn't make a gigantic class having all validation
methods. I'm fine with having
`ProtobufNativeBackwardComptabilityValidation` class and each type of
compatibility strategy can have its own class, but just have it simple:
each strategy is mapped to one simple validation method. I don't see any
need for the added layers you have there of the builders, etc - current
Avro design is too complicated, so let's not copy it.



>
> 3. *Here are the basic compatibility rules we've defined:*
> https://protobuf.dev/programming-guides/proto/#updating
> According to the rules formulated by the official website, not customized
> by myself.
>

I understand

As I wrote before, using `writtenSchema` and `readSchema` is confusing.

Example: You write:

> (1) Create:
>
>    - The writtenSchema cannot add required fields, but optional or
>    duplicate fields can be added (The field number must be new).
>
> You said writtenSchema is the schema of the producer.
What do you mean by create? Do you mean the producer is publishing the
first schema ever to the registry?
If so, what if you explain:

newSchema - a new schema registered either by a producer or a consumer.
currentSchema - the current latest schema in the schema registry

BACKWARD:

   - new fields added in newSchema compared with currentSchema
      - New fields are defined as fields with field numbers that don't
      exist in the existing schema
      - The new fields must be optional and not required.
   - Modifying fields in the newSchema which already exists (update) in the
   currentSchema
      - Modifying is defined as changing field details for the same field
      number.
      - Changing the name is permitted.
      - The type can be changed only if the type is compatible. The allowed
      changes are ....
   - Removing fields in the newSchema which exists in the currentSchema
      - Removing a field is defined as removing the usage of the field
      number, used by a field in the currentSchema, so it won't exist in the
      schema.
      - ...


This I can read and understand.

You wrote

>
>    - The writtenSchema do not change the field number of any field in
>    'readSchema' (the field name is the same, but the field number is
>    different).
>
> So if I have

oldSchema
1 customerName String
2 customerAge integer

newSchema
reserved 1
2 customerAge integer
3 customerName string

This is perfectly valid.

I actually think we should force that - when you remove a field (field
number removed), but add the same definition using a different field
number, we should force you to declare reserved.


What you linked (https://protobuf.dev/programming-guides/proto/#updating)
are a bunch of guidelines and rules.
It's not what you are actually going to validate.

So how can I review your validation rules, if you don't write them
explicitly?
1. By compatibility check strategy?
2. use correct naming, and not read and write schema, as the consumer can
publish new schema and producer can publish new schema as well. Use current
and new to differentiate what you are comparing.
3. Write the rules *explicitly*. All rules you will use. For example, types
you allow changing (you wrote that under alternatives).



>
> 4. Other parts have been updated with explanations.
>

Sinan, listen - I spend *a lot* of time reviewing this PIP. Many rounds.
You write "all parts updated", yet I go to the PIP and I see my first
comment from the previous email was not implemented, nor got any reply here.

Can you please review all of my previous comments and reply to each one of
them?


Please bear in mind - all the work we're doing here in the review is:
1. Help future engineers understand the code better
2. Help future users understand this feature better
3. Make Pulsar look good. When we have designs which are super hard to read
and understand, it reflects badly on the community as a whole.
4. Once the doc is crystal clear, you can actually spend the time to find
the faults in it.

Thanks!

Asaf



>
>
> Thanks,
> sinan
>
>
>
>
>
>
>
> 丛搏 <bo...@apache.org> 于2023年3月7日周二 11:53写道:
>
> > I think we have two ways to do that.
> >
> > First way: We need to advance the improvement of java in protobuf. Ask
> > if they have plans to improve.
> >
> > Second way: the new PROTOBUF_NATIVE `SchemaCompatibilityCheck` should
> > be implemented as a plugin, don't change any existing plugin logic
> > (it's simple and already used). I don't recommend adding flags for
> > rollback, it adds configuration and makes little sense.
> >
> > Thanks,
> > Bo
> >
> > Asaf Mesika <as...@gmail.com> 于2023年3月6日周一 23:00写道:
> >
> > >
> > > Can you convert the code block which is actually a quote in the
> > > beginning of the PIP to something which doesn't require to scroll
> > > horizontally so much?
> > > Use
> > >
> >
> https://docs.github.com/en/get-started/writing-on-github/getting-started-with-writing-and-formatting-on-github/basic-writing-and-formatting-syntax#quoting-text
> > >
> > > Let's improve the clarity of what you wrote:
> > >
> > > "the PROTOBUF uses avro struct to store."
> > > -->
> > > When Schema type PROTOBUF is used, Pulsar Client assumes the object
> given
> > > to it as message data is an auto-generated POJO containing the
> > annotations
> > > encoding the schema. The client is using a converter, which converts a
> > > Protobuf schema descriptor into an Avro schema and sends that as the
> > Schema
> > > of the producer/consumer.
> > >
> > > "On the broker side, protobuf and avro both use SchemaData converted to
> > > org.apache.avro.Schema."
> > > -->
> > > Since the schema is an Avro schema, the implementation of compatibility
> > > check on the broker side is to simply re-use the compatibility check of
> > the
> > > AVRO schema type.
> > >
> > > "ProtobufSchema is different from ProtobufNativeSchema in schema
> > > compatibility check it uses avro-protobuf.
> > >
> >
> https://central.sonatype.com/artifact/org.apache.avro/avro-protobuf/1.11.1/overview
> > > But the current implementation of ProtobufNative schema compatibility
> > > check only
> > > checked if the root message name is changed."
> > >
> > > -->
> > > PROTOBUF_NATIVE schema type is different.
> > > The client is actually using Protobuf Descriptor as the schema, as
> > opposed
> > > to Avro schema of PROTOBUF schema type. In the broker, the
> > PROTOBUF_NATIVE
> > > compatibility check actually hasn't implemented any rule, besides one:
> > > checking if the root message name has changed.
> > >
> > >
> > >
> > > >    1. For now, there is no official or third-party solution for
> > ProtoBuf
> > > >    compatibility. If in the future have better solutions of a third
> > party or
> > > >    the official, we develop new ProtobufNativeSchemaValidator and
> use,
> > so
> > > >    add a flag.
> > > >
> > > > Who do you need to make that configurable? Once you found a third
> > party,
> > > just switch to it? Who knows, maybe you never will. Introduce it when
> you
> > > find it, not now.
> > >
> > >
> > > We improve in ProtobufNativeSchemaCompatibilityCheck BACKWARD, FORWARD
> > > > these strategies. As with the AVRO implementation, protobuf
> > compatibility
> > > > checking need implementing the canRead method. *This will check that
> > > > the writtenschema can be read by readSchema.*
> > >
> > >
> > > I completely disagree.
> > > Avro implementation is confusing for our use case. Don't copy that.
> > >
> > > You have
> > >
> > > public void checkCompatible(SchemaData from, SchemaData to,
> > > SchemaCompatibilityStrategy strategy)
> > >         throws IncompatibleSchemaException {
> > >     Descriptor fromDescriptor =
> > > ProtobufNativeSchemaUtils.deserialize(from.getData());
> > >     Descriptor toDescriptor =
> > > ProtobufNativeSchemaUtils.deserialize(to.getData());
> > >     switch (strategy) {
> > >         case BACKWARD_TRANSITIVE:
> > >         case BACKWARD:
> > >         case FORWARD_TRANSITIVE:
> > >         case FORWARD:
> > >         case FULL_TRANSITIVE:
> > >         case FULL:
> > >             checkRootMessageChange(fromDescriptor, toDescriptor,
> > strategy);
> > >             return;
> > >         case ALWAYS_COMPATIBLE:
> > >             return;
> > >         default:
> > >             throw new IncompatibleSchemaException("Unknown
> > > SchemaCompatibilityStrategy.");
> > >     }
> > > }
> > >
> > > I would rename :
> > > from --> currentSchema
> > > to --> newSchema
> > >
> > > Use that switch case and have a method for each like:
> > > validateBackwardsCompatibility(currentSchema, newSchema)
> > >
> > > I dislike canRead and usage of writtenSchema, since you have two
> > completely
> > > different use cases: from the producing side and the consumer side.
> > >
> > > schemaValidatorBuilder
> > > >
> > > > I dislike this proposal. IMO Avro implementation is way too
> > complicated.
> > > Why not have a simple function for validation for each switch case
> above?
> > > Why do we need strategy and builder, and all this complexity?
> > >
> > >
> > > *Here are the basic compatibility rules we've defined:*
> > >
> > >
> > > IMO it's impossible to read the validation rules as you described them.
> > > I wrote how they should be structured numerous times above.
> > > I can't validate them.
> > >
> > >
> > > IMO, the current design is very hard to read.
> > > Please try to avoid jumping into code sections.
> > > Write a high level design section, in which you describe in words what
> > you
> > > plan to do.
> > > Write the validation rules in the structure that is easy to understand:
> > > rules per each compatibility check, and use proper words (current
> schema,
> > > new schema), since new schema can be once used for read and once used
> for
> > > write.
> > >
> > > In its current form it takes too much time to understand the design,
> and
> > it
> > > shouldn't be the case.
> > >
> > > Thanks,
> > >
> > > Asaf
> > >
> > >
> > > >
> > >
> > >
> > >
> > > On Sun, Mar 5, 2023 at 3:58 PM SiNan Liu <li...@gmail.com>
> wrote:
> > >
> > > > Hi! I updated the explanation of some things in the PIP issue. And
> also
> > > > added a new “flag” in the conf is used as the different
> > > > ProtobufNativeSchemaValidator implementation, also set
> > > > ProtobufNativeSchemaValidator default only check whether the name of
> > the
> > > > root message is the same.
> > > >
> > > >
> > > > Thanks,
> > > > sinan
> > > >
> > > >
> > > > Asaf Mesika <as...@gmail.com> 于2023年3月5日周日 20:21写道:
> > > >
> > > > > On Wed, Mar 1, 2023 at 4:33 PM SiNan Liu <li...@gmail.com>
> > wrote:
> > > > >
> > > > > > >
> > > > > > > Can you please explain how a Protobuf Schema descriptor can be
> > > > > validated
> > > > > > > for backward compatibility check using Avro based compatibility
> > > > rules?
> > > > > > > Doesn't it expect the schema to be Avro, but it is actually a
> > > > Protobuf
> > > > > > > descriptor?
> > > > > > > Is there some translation happening?
> > > > > >
> > > > > >
> > > > > > 1. *You can take a quick look at the previous design, the
> PROTOBUF
> > uses
> > > > > > avro struct to store.*
> > > > > > https://github.com/apache/pulsar/pull/1954
> > > > > >
> > > > > >
> > > > >
> > > >
> >
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/ProtobufSchema.java#L59-L61
> > > > > >
> > > > > >
> > > > >
> > > >
> >
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/ProtobufSchema.java#L110-L115
> > > > >
> > > > >
> > > > > Ok. So to summarize your code (easier to write it than send links):
> > > > > * Pulsar Client, when used with Protobuf Schema, actually converts
> > the
> > > > > Protobuf descriptor into an Avro Schema (using code found inside
> Avro
> > > > > library) and saves that Avro schema as the schema. It's not saving
> > the
> > > > > protobuf descriptor at all. Very confusing I have to add - never
> > expected
> > > > > that.
> > > > > This explains why In the ProtobufSchemaCompatibilityCheck they just
> > > > extend
> > > > > the Avro without doing any translation.
> > > > >
> > > > > Thanks for that.
> > > > >
> > > > > Now thatI finally understand this, I can say that: you *must*
> explain
> > > > that
> > > > > in the motivation part in your PIP.
> > > > >
> > > > >
> > > > >
> > > > > >
> > > > > >
> > > > > > 2. *On the broker side, protobuf and avro both use `SchemaData`
> > > > converted
> > > > > > to `org.apache.avro.Schema`.*
> > > > > >
> > > > > >
> > > > >
> > > >
> >
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/ServerCnx.java#L1280-L1293
> > > > > >
> > > > > >
> > > > >
> > > >
> >
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/schema/ProtobufSchemaCompatibilityCheck.java#L26-L31
> > > > > >
> > > > > >
> > > > >
> > > >
> >
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/schema/AvroSchemaBasedCompatibilityCheck.java#L47-L70
> > > > >
> > > > >
> > > > > Actually those links don't really help.
> > > > > The main link that helps is:
> > > > >
> > > > >
> > > >
> >
> https://github.com/apache/pulsar/blob/ec102fb024a6ea2b195826778300f20e330dff06/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/ProtobufSchema.java#L102-L122
> > > > >
> > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > I'm sorry - I don't understand.
> > > > > > > I understand the different compatibility check strategies.
> > > > > > > If you just spell them out here, then as you say, just
> translate
> > the
> > > > > > > Protobuf Descriptor into an Avro schema and run the Avro
> > > > > > > compatibility validation, no?
> > > > > > > I believe the answer is no, since you may want to verify
> > different
> > > > > things
> > > > > > > when it comes to Protobuf, which are different then Avro.
> > > > > >
> > > > > >
> > > > > > 1.
> > > > > > *ProtobufSchema is different from ProtobufNativeSchema in that it
> > uses
> > > > > > avro-protobuf.*
> > > > > >
> > > > > >
> > > > >
> > > >
> >
> https://central.sonatype.com/artifact/org.apache.avro/avro-protobuf/1.11.1/overview
> > > > > > *ProtobufNativeSchema needs a native compatibility check, but
> > there is
> > > > no
> > > > > > official or third party implementation. So this PIP does not use
> > > > > > avro-protobuf for protobuf compatibility checking.*
> > > > > >
> > > > > > 2. *By the way, this is implemented in much the same way that
> > Apache
> > > > avro
> > > > > > does compatibility checking.*
> > > > > >
> > > > > >
> > > > >
> > > >
> >
> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/SchemaValidatorBuilder.java
> > > > > > `canReadStrategy`,`canBeReadStrategy`,`mutualReadStrategy`
> > > > > >
> > > > > >
> > > > >
> > > >
> >
> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/ValidateCanRead.java
> > > > > >
> > > > > >
> > > > >
> > > >
> >
> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/ValidateCanBeRead.java
> > > > > >
> > > > > >
> > > > >
> > > >
> >
> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/ValidateMutualRead.java
> > > > > > *In `ValidateMutualRead.java`, the arguments of `canRead()` are
> > > > > > writtenSchema and readSchema. We only need to change the order of
> > > > > arguments
> > > > > > we pass to `canRead()`.*
> > > > > > ```java
> > > > > > private void validateWithStrategy(Descriptors.Descriptor
> > toValidate,
> > > > > > Descriptors.Descriptor fromDescriptor) throws
> > > > > ProtoBufCanReadCheckException
> > > > > > {
> > > > > > switch (strategy) {
> > > > > > case CanReadExistingStrategy -> canRead(fromDescriptor,
> > toValidate);
> > > > > > case CanBeReadByExistingStrategy -> canRead(toValidate,
> > > > fromDescriptor);
> > > > > > case CanBeReadMutualStrategy -> {
> > > > > > canRead(toValidate, fromDescriptor);
> > > > > > canRead(fromDescriptor, toValidate);
> > > > > > }
> > > > > > }
> > > > > > }
> > > > > >
> > > > > > private void canRead(Descriptors.Descriptor writtenSchema,
> > > > > > Descriptors.Descriptor readSchema) throws
> > > > ProtoBufCanReadCheckException {
> > > > > >
> > > > >
> > > >
> >
> ProtobufNativeSchemaBreakCheckUtils.checkSchemaCompatibility(writtenSchema,
> > > > > > readSchema);
> > > > > > }
> > > > > > ```
> > > > > >
> > > > > >
> > > > > I get that you want to take inspiration from the existing Avro
> Schema
> > > > > compatibility check, to do your code design.
> > > > > I also understand you *won't* use any existing avro code for that.
> > > > > I also understand, you have to write the validation check on your
> > own,
> > > > > since there is no 3rd party to explain that.
> > > > >
> > > > > The only thing I can't understand are the actual rules as I wrote
> > before,
> > > > > since they are written confusingly.
> > > > > So, I repeat what I asked before:
> > > > >
> > > > > I think you should structure the validation rules differently:
> > > > >
> > > > > * Backward checks
> > > > > ** List down rules, where use newSchema (the schema used by
> producer
> > or
> > > > > consumer) and existingSchema (last schema used)
> > > > > * Forward
> > > > > ** List down rules, where use newSchema (the schema used by
> producer
> > or
> > > > > consumer) and existingSchema (last schema used)
> > > > >
> > > > > Once that's accomplished I will be able to understand the different
> > > > > validation rules for each compatibility check.
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > >
> > > > > > Thanks,
> > > > > > sinan
> > > > > >
> > > > > >
> > > > > >
> > > > > > Asaf Mesika <as...@gmail.com> 于2023年3月1日周三 21:19写道:
> > > > > > >
> > > > > > > On Mon, Feb 27, 2023 at 3:47 PM SiNan Liu <
> > liusinan1998@gmail.com>
> > > > > > wrote:
> > > > > > >
> > > > > > > > >
> > > > > > > > > I read it and they look identical. What's the difference
> > between
> > > > > > them?
> > > > > > > >
> > > > > > > > Current avro,json, and protobuf schemas are all implemented
> > based
> > > > on
> > > > > > AVRO.
> > > > > > > > > What do you mean, they are all implemented based on Avro?
> You
> > > > mean
> > > > > > the
> > > > > > > > > protobuf schema is converted into an Avro Schema, and then
> > you
> > > > use
> > > > > > Avro
> > > > > > > > > compatibility validation?
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > > >
> >
> `org.apache.pulsar.broker.service.schema.ProtobufSchemaCompatibilityCheck`
> > > > > > > >
> > > > >
> > `org.apache.pulsar.broker.service.schema.AvroSchemaCompatibilityCheck`
> > > > > > > >
> > > > >
> > `org.apache.pulsar.broker.service.schema.JsonSchemaCompatibilityCheck`
> > > > > > > > They all extends `AvroSchemaBasedCompatibilityCheck`, the
> > > > > > > > `checkCompatible()` is the same implementation with `AVRO`.
> > > > > > > >
> > > > > > >
> > > > > > > Can you please explain how a Protobuf Schema descriptor can be
> > > > > validated
> > > > > > > for backward compatibility check using Avro based compatibility
> > > > rules?
> > > > > > > Doesn't it expect the schema to be Avro, but it is actually a
> > > > Protobuf
> > > > > > > descriptor?
> > > > > > > Is there some translation happening?
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > I think you should structure the validation rules
> differently:
> > > > > > > >
> > > > > > > >
> > > > > > > > The Compatibility check strategy is described on the website
> > > > > > > >
> > > > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> >
> https://pulsar.apache.org/docs/next/schema-understand/#schema-compatibility-check-strategy
> > > > > > > > 1. BACKWARD(CanReadExistingStrategy): Consumers using schema
> > V3 can
> > > > > > process
> > > > > > > > data written by producers using the last schema version V2.
> So
> > V2
> > > > is
> > > > > > > > "writtenSchema" and V3 is "readSchema".
> > > > > > > > 2. FORWARD(CanBeReadByExistingStrategy): Consumers using the
> > last
> > > > > > schema
> > > > > > > > version V2 can process data written by producers using a new
> > schema
> > > > > V3,
> > > > > > > > even though they may not be able to use the full capabilities
> > of
> > > > the
> > > > > > new
> > > > > > > > schema. So V3 is "writtenSchema" and V2 is "readSchema".
> > > > > > > > 3. FULL(CanBeReadMutualStrategy): Schemas are both backward
> and
> > > > > forward
> > > > > > > > compatible.
> > > > > > > > Schema can evolve. The old version schema and the new version
> > > > schema
> > > > > > should
> > > > > > > > be well understood.
> > > > > > > >
> > > > > > > >
> > > > > > > I'm sorry - I don't understand.
> > > > > > > I understand the different compatibility check strategies.
> > > > > > > If you just spell them out here, then as you say, just
> translate
> > the
> > > > > > > Protobuf Descriptor into an Avro schema and run the Avro
> > > > > > > compatibility validation, no?
> > > > > > > I believe the answer is no, since you may want to verify
> > different
> > > > > things
> > > > > > > when it comes to Protobuf, which are different then Avro.
> > > > > > >
> > > > > > > At the current state, I can't understand your design at all.
> > Please
> > > > > help
> > > > > > > clarify that.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > So each strategy should have its own section.
> > > > > > > >
> > > > > > > >
> > > > > > > > The arguments of `canRead()` are writtenSchema and
> readSchema.
> > As
> > > > > we've
> > > > > > > > just described, we only need to change the order of arguments
> > we
> > > > pass
> > > > > > to
> > > > > > > > `canRead()`.
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > sinan
> > > > > > > >
> > > > > > > >
> > > > > > > > Asaf Mesika <as...@gmail.com> 于2023年2月27日周一 20:49写道:
> > > > > > > >
> > > > > > > > > >
> > > > > > > > > > And you can see the difference between ProtoBuf and
> > > > > ProtoBufNative:
> > > > > > > > > >
> > > > > > > > > >
> > > > > >
> > https://pulsar.apache.org/docs/next/schema-get-started/#protobufnative
> > > > > > > > > >
> > > > > > > > > >
> > > > https://pulsar.apache.org/docs/next/schema-get-started/#protobuf
> > > > > > > > > >
> > > > > > > > >  I read it and they look identical. What's the difference
> > between
> > > > > > them?
> > > > > > > > >
> > > > > > > > > Current avro,json, and protobuf schemas are all implemented
> > based
> > > > > on
> > > > > > > > AVRO.
> > > > > > > > >
> > > > > > > > > What do you mean, they are all implemented based on Avro?
> You
> > > > mean
> > > > > > the
> > > > > > > > > protobuf schema is converted into an Avro Schema, and then
> > you
> > > > use
> > > > > > Avro
> > > > > > > > > compatibility validation?
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > > *Here are the basic compatibility rules we've defined:*
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > I think you should structure the validation rules
> > differently:
> > > > > > > > >
> > > > > > > > > * Backward checks
> > > > > > > > > ** List down rules, where use newSchema (the schema used by
> > > > > producer
> > > > > > or
> > > > > > > > > consumer) and existingSchema (last schema used)
> > > > > > > > > * Forward
> > > > > > > > > ** List down rules, where use newSchema (the schema used by
> > > > > producer
> > > > > > or
> > > > > > > > > consumer) and existingSchema (last schema used)
> > > > > > > > >
> > > > > > > > > So each strategy should have its own section.
> > > > > > > > >
> > > > > > > > > I'm saying this since you used "writttenSchema" word but it
> > > > > > represents
> > > > > > > > > something completely different if it's backward or forward
> > check.
> > > > > > > > >
> > > > > > > > > Once you'll have that structure like that, I personally
> will
> > be
> > > > > able
> > > > > > to
> > > > > > > > > read and understand it.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > The motivation and problem statement are now good - thanks
> > for
> > > > > > improving
> > > > > > > > > it.
> > > > > > > > >
> > > > > > > > > On Mon, Feb 27, 2023 at 8:20 AM SiNan Liu <
> > > > liusinan1998@gmail.com>
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi! I updated the PIP issue again. This time I've added
> > some
> > > > > > background
> > > > > > > > > and
> > > > > > > > > > some explanations.
> > > > > > > > > >
> > > > > > > > > > The compatibility check rules are already written in the
> > > > > > > > Implementation.
> > > > > > > > > > ProtoBufNative implements the same canRead method as
> Apache
> > > > Avro.
> > > > > > > > > > It does this by checking whether the schema for writing
> and
> > > > > reading
> > > > > > is
> > > > > > > > > > compatible. I also indicate whether the writtenSchema and
> > > > > > readSchema of
> > > > > > > > > the
> > > > > > > > > > Backward, Forward, and Full strategies are the old or the
> > new
> > > > > > version
> > > > > > > > of
> > > > > > > > > > the schema.
> > > > > > > > > >
> > > > > > > > > > Thanks,
> > > > > > > > > > sinan
> > > > > > > > > >
> > > > > > > > > > Asaf Mesika <as...@gmail.com> 于2023年2月26日周日
> 23:24写道:
> > > > > > > > > >
> > > > > > > > > > > I'm sorry, but this PIP lacks a lot of background
> > knowledge,
> > > > so
> > > > > > you
> > > > > > > > > need
> > > > > > > > > > to
> > > > > > > > > > > add IMO for people to understand it. You don't need to
> > > > explain
> > > > > > the
> > > > > > > > > entire
> > > > > > > > > > > pulsar in this PIP, but at the very least a few
> > paragraphs
> > > > > > detailing
> > > > > > > > > all
> > > > > > > > > > > you need to know, to put you in context:
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >    - Start by saying Pulsar as a built-in schema
> registry
> > > > > inside
> > > > > > > > Pulsar
> > > > > > > > > > >    broker.
> > > > > > > > > > >       - Every time the client updates the schema, it
> > uploads
> > > > it
> > > > > > to
> > > > > > > > the
> > > > > > > > > > >       broker. When that happens, it has a feature which
> > > > > validates
> > > > > > if
> > > > > > > > > the
> > > > > > > > > > > new
> > > > > > > > > > >       schema version is compatible with the previous
> > > > versions.
> > > > > > There
> > > > > > > > > > > are 4 types
> > > > > > > > > > >       of compatibility: Full, ... (complete and explain
> > each
> > > > > one
> > > > > > > > > briefly)
> > > > > > > > > > >    - Also explain Pulsar Schema registry supports
> various
> > > > > schema
> > > > > > > > > > >    protocols:  Avro, protobuf native, ... (complete the
> > > > rest),
> > > > > > each
> > > > > > > > > > > protocol
> > > > > > > > > > >    has a schema which dictates how to serialize and
> > > > deserialize
> > > > > > the
> > > > > > > > > > message
> > > > > > > > > > >    content into typed object.
> > > > > > > > > > >    - Explain in short what is protobuf native (compare
> > > > protobuf
> > > > > > > > > > non-native)
> > > > > > > > > > >    - Please don't paste code instead of explaining.
> > > > > > > > > > >       - Explain that protobuf native current validation
> > check
> > > > > is
> > > > > > only
> > > > > > > > > > >       composed of checking the root message name is the
> > same
> > > > > > between
> > > > > > > > > > > the current
> > > > > > > > > > >       schema version and the new version.
> > > > > > > > > > >          - Explain briefly what is a root message and
> its
> > > > name.
> > > > > > > > > > >       - Explain the problem (list scenarios) that we
> have
> > > > > because
> > > > > > > > > > protobuf
> > > > > > > > > > >       native schema only supports FULL compatibility
> > > > > validation.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Regarding high level design - as in what you plan to
> do.
> > > > > > > > > > > I suggest you add "High Level Design" and in it detail
> > how
> > > > you
> > > > > > plan
> > > > > > > > to
> > > > > > > > > > > validate, per protobuf version, per compatibility check
> > > > > > (backward,
> > > > > > > > > > forward,
> > > > > > > > > > > full,...).
> > > > > > > > > > > I tried reading the implementation - for me , it's all
> > over
> > > > the
> > > > > > > > place.
> > > > > > > > > > Can
> > > > > > > > > > > you please list in order what I wrote above, and list
> the
> > > > > > validation
> > > > > > > > > > rules
> > > > > > > > > > > with a good explanation why you validate it like that?
> > > > > > > > > > >
> > > > > > > > > > > Lastly, one you have all the validation rules clearly
> > stated,
> > > > > you
> > > > > > can
> > > > > > > > > use
> > > > > > > > > > > it to document it properly so users can know what
> > validation
> > > > to
> > > > > > > > expect.
> > > > > > > > > > >
> > > > > > > > > > > Thanks,
> > > > > > > > > > >
> > > > > > > > > > > Asaf
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On Wed, Feb 22, 2023 at 5:10 PM SiNan Liu <
> > > > > > liusinan1998@gmail.com>
> > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Sorry, my mistake. I removed the code and described
> the
> > > > > design
> > > > > > to
> > > > > > > > > > improve
> > > > > > > > > > > > the PROTOBUF_NATIVE schema compatibility checks. You
> > can
> > > > > have a
> > > > > > > > look.
> > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Asaf Mesika <as...@gmail.com> 于2023年2月22日周三
> > 21:16写道:
> > > > > > > > > > > >
> > > > > > > > > > > > > I read it but you're almost directly diving into
> the
> > > > code -
> > > > > > it
> > > > > > > > will
> > > > > > > > > > > take
> > > > > > > > > > > > me
> > > > > > > > > > > > > hours just to reverse engineer your design.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Can you please include a "High Level Design"
> section
> > in
> > > > > which
> > > > > > you
> > > > > > > > > > > explain
> > > > > > > > > > > > > how you plan to tackle any issue?
> > > > > > > > > > > > > If I can read that section and explain to someone
> > else
> > > > how
> > > > > > this
> > > > > > > > > will
> > > > > > > > > > > > work,
> > > > > > > > > > > > > it means the section is complete.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Let's leave the code to the PRs.
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Sun, Feb 19, 2023 at 2:59 PM SiNan Liu <
> > > > > > > > liusinan1998@gmail.com>
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Hi all,
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I made a PIP to discuss:
> > > > > > > > > > > https://github.com/apache/pulsar/issues/19565
> > > > > > > > > > > > .
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > We can talk about the current design here.
> > Especially
> > > > for
> > > > > > the
> > > > > > > > > field
> > > > > > > > > > > > type
> > > > > > > > > > > > > > change check rules, please give your valuable
> > advice.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > Sinan
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > > >
> >
>

Re: [DISCUSS] PIP-246: Improved PROTOBUF_NATIVE schema compatibility checks without using avro-protobuf

Posted by SiNan Liu <li...@gmail.com>.
Thanks for the advice, Asaf.

1.

> For now, there is no official or third-party solution for ProtoBuf
> compatibility. If in the future have better solutions of a third party or
> the official, we develop new ProtobufNativeSchemaValidator and use, so add
> a flag.

Flag defaults not set, where the schema Compatibility Checking Rule checks
only the name of the root message. If you want to use the current PIP
implementation, Can be set to
`org.apache.pulsar.broker.service.schema.validator.ProtobufNativeSchemaBreakValidatorImpl`.
This explains what the flag does, and I'm not going to delete it. The user
can choose whether to use the previous implementation (just check that the
root message name is the same), but this may not be enough, so you can
choose the current PIP implementation. If there is a better third party or
official solution in the future, it can be well developed and replaced. It
is necessary to add the flag to the PIP and keep the implementation
extensible.

2.

> Why not have a simple function for validation for each switch case above?
> Why do we need strategy and builder, and all this complexity?

I don't see how it's complicated. It's easy to understand and it's not
redundant. The only function of the validator is to check whether two
Protobufs are compatible. The builder builds checkers based on different
compatibility checking strategies. If all the implementation is done in the
validator, it will be messy and there will be a lot of duplication. And if
a new validator is extended later, it won't extend well. So discarding
encapsulation is more complex and loses scalability. I won't change this
design.

3. *Here are the basic compatibility rules we've defined:*
https://protobuf.dev/programming-guides/proto/#updating
According to the rules formulated by the official website, not customized
by myself.

4. Other parts have been updated with explanations.


Thanks,
sinan







丛搏 <bo...@apache.org> 于2023年3月7日周二 11:53写道:

> I think we have two ways to do that.
>
> First way: We need to advance the improvement of java in protobuf. Ask
> if they have plans to improve.
>
> Second way: the new PROTOBUF_NATIVE `SchemaCompatibilityCheck` should
> be implemented as a plugin, don't change any existing plugin logic
> (it's simple and already used). I don't recommend adding flags for
> rollback, it adds configuration and makes little sense.
>
> Thanks,
> Bo
>
> Asaf Mesika <as...@gmail.com> 于2023年3月6日周一 23:00写道:
>
> >
> > Can you convert the code block which is actually a quote in the
> > beginning of the PIP to something which doesn't require to scroll
> > horizontally so much?
> > Use
> >
> https://docs.github.com/en/get-started/writing-on-github/getting-started-with-writing-and-formatting-on-github/basic-writing-and-formatting-syntax#quoting-text
> >
> > Let's improve the clarity of what you wrote:
> >
> > "the PROTOBUF uses avro struct to store."
> > -->
> > When Schema type PROTOBUF is used, Pulsar Client assumes the object given
> > to it as message data is an auto-generated POJO containing the
> annotations
> > encoding the schema. The client is using a converter, which converts a
> > Protobuf schema descriptor into an Avro schema and sends that as the
> Schema
> > of the producer/consumer.
> >
> > "On the broker side, protobuf and avro both use SchemaData converted to
> > org.apache.avro.Schema."
> > -->
> > Since the schema is an Avro schema, the implementation of compatibility
> > check on the broker side is to simply re-use the compatibility check of
> the
> > AVRO schema type.
> >
> > "ProtobufSchema is different from ProtobufNativeSchema in schema
> > compatibility check it uses avro-protobuf.
> >
> https://central.sonatype.com/artifact/org.apache.avro/avro-protobuf/1.11.1/overview
> > But the current implementation of ProtobufNative schema compatibility
> > check only
> > checked if the root message name is changed."
> >
> > -->
> > PROTOBUF_NATIVE schema type is different.
> > The client is actually using Protobuf Descriptor as the schema, as
> opposed
> > to Avro schema of PROTOBUF schema type. In the broker, the
> PROTOBUF_NATIVE
> > compatibility check actually hasn't implemented any rule, besides one:
> > checking if the root message name has changed.
> >
> >
> >
> > >    1. For now, there is no official or third-party solution for
> ProtoBuf
> > >    compatibility. If in the future have better solutions of a third
> party or
> > >    the official, we develop new ProtobufNativeSchemaValidator and use,
> so
> > >    add a flag.
> > >
> > > Who do you need to make that configurable? Once you found a third
> party,
> > just switch to it? Who knows, maybe you never will. Introduce it when you
> > find it, not now.
> >
> >
> > We improve in ProtobufNativeSchemaCompatibilityCheck BACKWARD, FORWARD
> > > these strategies. As with the AVRO implementation, protobuf
> compatibility
> > > checking need implementing the canRead method. *This will check that
> > > the writtenschema can be read by readSchema.*
> >
> >
> > I completely disagree.
> > Avro implementation is confusing for our use case. Don't copy that.
> >
> > You have
> >
> > public void checkCompatible(SchemaData from, SchemaData to,
> > SchemaCompatibilityStrategy strategy)
> >         throws IncompatibleSchemaException {
> >     Descriptor fromDescriptor =
> > ProtobufNativeSchemaUtils.deserialize(from.getData());
> >     Descriptor toDescriptor =
> > ProtobufNativeSchemaUtils.deserialize(to.getData());
> >     switch (strategy) {
> >         case BACKWARD_TRANSITIVE:
> >         case BACKWARD:
> >         case FORWARD_TRANSITIVE:
> >         case FORWARD:
> >         case FULL_TRANSITIVE:
> >         case FULL:
> >             checkRootMessageChange(fromDescriptor, toDescriptor,
> strategy);
> >             return;
> >         case ALWAYS_COMPATIBLE:
> >             return;
> >         default:
> >             throw new IncompatibleSchemaException("Unknown
> > SchemaCompatibilityStrategy.");
> >     }
> > }
> >
> > I would rename :
> > from --> currentSchema
> > to --> newSchema
> >
> > Use that switch case and have a method for each like:
> > validateBackwardsCompatibility(currentSchema, newSchema)
> >
> > I dislike canRead and usage of writtenSchema, since you have two
> completely
> > different use cases: from the producing side and the consumer side.
> >
> > schemaValidatorBuilder
> > >
> > > I dislike this proposal. IMO Avro implementation is way too
> complicated.
> > Why not have a simple function for validation for each switch case above?
> > Why do we need strategy and builder, and all this complexity?
> >
> >
> > *Here are the basic compatibility rules we've defined:*
> >
> >
> > IMO it's impossible to read the validation rules as you described them.
> > I wrote how they should be structured numerous times above.
> > I can't validate them.
> >
> >
> > IMO, the current design is very hard to read.
> > Please try to avoid jumping into code sections.
> > Write a high level design section, in which you describe in words what
> you
> > plan to do.
> > Write the validation rules in the structure that is easy to understand:
> > rules per each compatibility check, and use proper words (current schema,
> > new schema), since new schema can be once used for read and once used for
> > write.
> >
> > In its current form it takes too much time to understand the design, and
> it
> > shouldn't be the case.
> >
> > Thanks,
> >
> > Asaf
> >
> >
> > >
> >
> >
> >
> > On Sun, Mar 5, 2023 at 3:58 PM SiNan Liu <li...@gmail.com> wrote:
> >
> > > Hi! I updated the explanation of some things in the PIP issue. And also
> > > added a new “flag” in the conf is used as the different
> > > ProtobufNativeSchemaValidator implementation, also set
> > > ProtobufNativeSchemaValidator default only check whether the name of
> the
> > > root message is the same.
> > >
> > >
> > > Thanks,
> > > sinan
> > >
> > >
> > > Asaf Mesika <as...@gmail.com> 于2023年3月5日周日 20:21写道:
> > >
> > > > On Wed, Mar 1, 2023 at 4:33 PM SiNan Liu <li...@gmail.com>
> wrote:
> > > >
> > > > > >
> > > > > > Can you please explain how a Protobuf Schema descriptor can be
> > > > validated
> > > > > > for backward compatibility check using Avro based compatibility
> > > rules?
> > > > > > Doesn't it expect the schema to be Avro, but it is actually a
> > > Protobuf
> > > > > > descriptor?
> > > > > > Is there some translation happening?
> > > > >
> > > > >
> > > > > 1. *You can take a quick look at the previous design, the PROTOBUF
> uses
> > > > > avro struct to store.*
> > > > > https://github.com/apache/pulsar/pull/1954
> > > > >
> > > > >
> > > >
> > >
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/ProtobufSchema.java#L59-L61
> > > > >
> > > > >
> > > >
> > >
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/ProtobufSchema.java#L110-L115
> > > >
> > > >
> > > > Ok. So to summarize your code (easier to write it than send links):
> > > > * Pulsar Client, when used with Protobuf Schema, actually converts
> the
> > > > Protobuf descriptor into an Avro Schema (using code found inside Avro
> > > > library) and saves that Avro schema as the schema. It's not saving
> the
> > > > protobuf descriptor at all. Very confusing I have to add - never
> expected
> > > > that.
> > > > This explains why In the ProtobufSchemaCompatibilityCheck they just
> > > extend
> > > > the Avro without doing any translation.
> > > >
> > > > Thanks for that.
> > > >
> > > > Now thatI finally understand this, I can say that: you *must* explain
> > > that
> > > > in the motivation part in your PIP.
> > > >
> > > >
> > > >
> > > > >
> > > > >
> > > > > 2. *On the broker side, protobuf and avro both use `SchemaData`
> > > converted
> > > > > to `org.apache.avro.Schema`.*
> > > > >
> > > > >
> > > >
> > >
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/ServerCnx.java#L1280-L1293
> > > > >
> > > > >
> > > >
> > >
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/schema/ProtobufSchemaCompatibilityCheck.java#L26-L31
> > > > >
> > > > >
> > > >
> > >
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/schema/AvroSchemaBasedCompatibilityCheck.java#L47-L70
> > > >
> > > >
> > > > Actually those links don't really help.
> > > > The main link that helps is:
> > > >
> > > >
> > >
> https://github.com/apache/pulsar/blob/ec102fb024a6ea2b195826778300f20e330dff06/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/ProtobufSchema.java#L102-L122
> > > >
> > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > I'm sorry - I don't understand.
> > > > > > I understand the different compatibility check strategies.
> > > > > > If you just spell them out here, then as you say, just translate
> the
> > > > > > Protobuf Descriptor into an Avro schema and run the Avro
> > > > > > compatibility validation, no?
> > > > > > I believe the answer is no, since you may want to verify
> different
> > > > things
> > > > > > when it comes to Protobuf, which are different then Avro.
> > > > >
> > > > >
> > > > > 1.
> > > > > *ProtobufSchema is different from ProtobufNativeSchema in that it
> uses
> > > > > avro-protobuf.*
> > > > >
> > > > >
> > > >
> > >
> https://central.sonatype.com/artifact/org.apache.avro/avro-protobuf/1.11.1/overview
> > > > > *ProtobufNativeSchema needs a native compatibility check, but
> there is
> > > no
> > > > > official or third party implementation. So this PIP does not use
> > > > > avro-protobuf for protobuf compatibility checking.*
> > > > >
> > > > > 2. *By the way, this is implemented in much the same way that
> Apache
> > > avro
> > > > > does compatibility checking.*
> > > > >
> > > > >
> > > >
> > >
> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/SchemaValidatorBuilder.java
> > > > > `canReadStrategy`,`canBeReadStrategy`,`mutualReadStrategy`
> > > > >
> > > > >
> > > >
> > >
> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/ValidateCanRead.java
> > > > >
> > > > >
> > > >
> > >
> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/ValidateCanBeRead.java
> > > > >
> > > > >
> > > >
> > >
> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/ValidateMutualRead.java
> > > > > *In `ValidateMutualRead.java`, the arguments of `canRead()` are
> > > > > writtenSchema and readSchema. We only need to change the order of
> > > > arguments
> > > > > we pass to `canRead()`.*
> > > > > ```java
> > > > > private void validateWithStrategy(Descriptors.Descriptor
> toValidate,
> > > > > Descriptors.Descriptor fromDescriptor) throws
> > > > ProtoBufCanReadCheckException
> > > > > {
> > > > > switch (strategy) {
> > > > > case CanReadExistingStrategy -> canRead(fromDescriptor,
> toValidate);
> > > > > case CanBeReadByExistingStrategy -> canRead(toValidate,
> > > fromDescriptor);
> > > > > case CanBeReadMutualStrategy -> {
> > > > > canRead(toValidate, fromDescriptor);
> > > > > canRead(fromDescriptor, toValidate);
> > > > > }
> > > > > }
> > > > > }
> > > > >
> > > > > private void canRead(Descriptors.Descriptor writtenSchema,
> > > > > Descriptors.Descriptor readSchema) throws
> > > ProtoBufCanReadCheckException {
> > > > >
> > > >
> > >
> ProtobufNativeSchemaBreakCheckUtils.checkSchemaCompatibility(writtenSchema,
> > > > > readSchema);
> > > > > }
> > > > > ```
> > > > >
> > > > >
> > > > I get that you want to take inspiration from the existing Avro Schema
> > > > compatibility check, to do your code design.
> > > > I also understand you *won't* use any existing avro code for that.
> > > > I also understand, you have to write the validation check on your
> own,
> > > > since there is no 3rd party to explain that.
> > > >
> > > > The only thing I can't understand are the actual rules as I wrote
> before,
> > > > since they are written confusingly.
> > > > So, I repeat what I asked before:
> > > >
> > > > I think you should structure the validation rules differently:
> > > >
> > > > * Backward checks
> > > > ** List down rules, where use newSchema (the schema used by producer
> or
> > > > consumer) and existingSchema (last schema used)
> > > > * Forward
> > > > ** List down rules, where use newSchema (the schema used by producer
> or
> > > > consumer) and existingSchema (last schema used)
> > > >
> > > > Once that's accomplished I will be able to understand the different
> > > > validation rules for each compatibility check.
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > >
> > > > > Thanks,
> > > > > sinan
> > > > >
> > > > >
> > > > >
> > > > > Asaf Mesika <as...@gmail.com> 于2023年3月1日周三 21:19写道:
> > > > > >
> > > > > > On Mon, Feb 27, 2023 at 3:47 PM SiNan Liu <
> liusinan1998@gmail.com>
> > > > > wrote:
> > > > > >
> > > > > > > >
> > > > > > > > I read it and they look identical. What's the difference
> between
> > > > > them?
> > > > > > >
> > > > > > > Current avro,json, and protobuf schemas are all implemented
> based
> > > on
> > > > > AVRO.
> > > > > > > > What do you mean, they are all implemented based on Avro? You
> > > mean
> > > > > the
> > > > > > > > protobuf schema is converted into an Avro Schema, and then
> you
> > > use
> > > > > Avro
> > > > > > > > compatibility validation?
> > > > > > >
> > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> `org.apache.pulsar.broker.service.schema.ProtobufSchemaCompatibilityCheck`
> > > > > > >
> > > >
> `org.apache.pulsar.broker.service.schema.AvroSchemaCompatibilityCheck`
> > > > > > >
> > > >
> `org.apache.pulsar.broker.service.schema.JsonSchemaCompatibilityCheck`
> > > > > > > They all extends `AvroSchemaBasedCompatibilityCheck`, the
> > > > > > > `checkCompatible()` is the same implementation with `AVRO`.
> > > > > > >
> > > > > >
> > > > > > Can you please explain how a Protobuf Schema descriptor can be
> > > > validated
> > > > > > for backward compatibility check using Avro based compatibility
> > > rules?
> > > > > > Doesn't it expect the schema to be Avro, but it is actually a
> > > Protobuf
> > > > > > descriptor?
> > > > > > Is there some translation happening?
> > > > > >
> > > > > >
> > > > > >
> > > > > > >
> > > > > > >
> > > > > > > I think you should structure the validation rules differently:
> > > > > > >
> > > > > > >
> > > > > > > The Compatibility check strategy is described on the website
> > > > > > >
> > > > > > >
> > > > >
> > > > >
> > > >
> > >
> https://pulsar.apache.org/docs/next/schema-understand/#schema-compatibility-check-strategy
> > > > > > > 1. BACKWARD(CanReadExistingStrategy): Consumers using schema
> V3 can
> > > > > process
> > > > > > > data written by producers using the last schema version V2. So
> V2
> > > is
> > > > > > > "writtenSchema" and V3 is "readSchema".
> > > > > > > 2. FORWARD(CanBeReadByExistingStrategy): Consumers using the
> last
> > > > > schema
> > > > > > > version V2 can process data written by producers using a new
> schema
> > > > V3,
> > > > > > > even though they may not be able to use the full capabilities
> of
> > > the
> > > > > new
> > > > > > > schema. So V3 is "writtenSchema" and V2 is "readSchema".
> > > > > > > 3. FULL(CanBeReadMutualStrategy): Schemas are both backward and
> > > > forward
> > > > > > > compatible.
> > > > > > > Schema can evolve. The old version schema and the new version
> > > schema
> > > > > should
> > > > > > > be well understood.
> > > > > > >
> > > > > > >
> > > > > > I'm sorry - I don't understand.
> > > > > > I understand the different compatibility check strategies.
> > > > > > If you just spell them out here, then as you say, just translate
> the
> > > > > > Protobuf Descriptor into an Avro schema and run the Avro
> > > > > > compatibility validation, no?
> > > > > > I believe the answer is no, since you may want to verify
> different
> > > > things
> > > > > > when it comes to Protobuf, which are different then Avro.
> > > > > >
> > > > > > At the current state, I can't understand your design at all.
> Please
> > > > help
> > > > > > clarify that.
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > >
> > > > > > > So each strategy should have its own section.
> > > > > > >
> > > > > > >
> > > > > > > The arguments of `canRead()` are writtenSchema and readSchema.
> As
> > > > we've
> > > > > > > just described, we only need to change the order of arguments
> we
> > > pass
> > > > > to
> > > > > > > `canRead()`.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Thanks,
> > > > > > > sinan
> > > > > > >
> > > > > > >
> > > > > > > Asaf Mesika <as...@gmail.com> 于2023年2月27日周一 20:49写道:
> > > > > > >
> > > > > > > > >
> > > > > > > > > And you can see the difference between ProtoBuf and
> > > > ProtoBufNative:
> > > > > > > > >
> > > > > > > > >
> > > > >
> https://pulsar.apache.org/docs/next/schema-get-started/#protobufnative
> > > > > > > > >
> > > > > > > > >
> > > https://pulsar.apache.org/docs/next/schema-get-started/#protobuf
> > > > > > > > >
> > > > > > > >  I read it and they look identical. What's the difference
> between
> > > > > them?
> > > > > > > >
> > > > > > > > Current avro,json, and protobuf schemas are all implemented
> based
> > > > on
> > > > > > > AVRO.
> > > > > > > >
> > > > > > > > What do you mean, they are all implemented based on Avro? You
> > > mean
> > > > > the
> > > > > > > > protobuf schema is converted into an Avro Schema, and then
> you
> > > use
> > > > > Avro
> > > > > > > > compatibility validation?
> > > > > > > >
> > > > > > > >
> > > > > > > > > *Here are the basic compatibility rules we've defined:*
> > > > > > > >
> > > > > > > >
> > > > > > > > I think you should structure the validation rules
> differently:
> > > > > > > >
> > > > > > > > * Backward checks
> > > > > > > > ** List down rules, where use newSchema (the schema used by
> > > > producer
> > > > > or
> > > > > > > > consumer) and existingSchema (last schema used)
> > > > > > > > * Forward
> > > > > > > > ** List down rules, where use newSchema (the schema used by
> > > > producer
> > > > > or
> > > > > > > > consumer) and existingSchema (last schema used)
> > > > > > > >
> > > > > > > > So each strategy should have its own section.
> > > > > > > >
> > > > > > > > I'm saying this since you used "writttenSchema" word but it
> > > > > represents
> > > > > > > > something completely different if it's backward or forward
> check.
> > > > > > > >
> > > > > > > > Once you'll have that structure like that, I personally will
> be
> > > > able
> > > > > to
> > > > > > > > read and understand it.
> > > > > > > >
> > > > > > > >
> > > > > > > > The motivation and problem statement are now good - thanks
> for
> > > > > improving
> > > > > > > > it.
> > > > > > > >
> > > > > > > > On Mon, Feb 27, 2023 at 8:20 AM SiNan Liu <
> > > liusinan1998@gmail.com>
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi! I updated the PIP issue again. This time I've added
> some
> > > > > background
> > > > > > > > and
> > > > > > > > > some explanations.
> > > > > > > > >
> > > > > > > > > The compatibility check rules are already written in the
> > > > > > > Implementation.
> > > > > > > > > ProtoBufNative implements the same canRead method as Apache
> > > Avro.
> > > > > > > > > It does this by checking whether the schema for writing and
> > > > reading
> > > > > is
> > > > > > > > > compatible. I also indicate whether the writtenSchema and
> > > > > readSchema of
> > > > > > > > the
> > > > > > > > > Backward, Forward, and Full strategies are the old or the
> new
> > > > > version
> > > > > > > of
> > > > > > > > > the schema.
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > sinan
> > > > > > > > >
> > > > > > > > > Asaf Mesika <as...@gmail.com> 于2023年2月26日周日 23:24写道:
> > > > > > > > >
> > > > > > > > > > I'm sorry, but this PIP lacks a lot of background
> knowledge,
> > > so
> > > > > you
> > > > > > > > need
> > > > > > > > > to
> > > > > > > > > > add IMO for people to understand it. You don't need to
> > > explain
> > > > > the
> > > > > > > > entire
> > > > > > > > > > pulsar in this PIP, but at the very least a few
> paragraphs
> > > > > detailing
> > > > > > > > all
> > > > > > > > > > you need to know, to put you in context:
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >    - Start by saying Pulsar as a built-in schema registry
> > > > inside
> > > > > > > Pulsar
> > > > > > > > > >    broker.
> > > > > > > > > >       - Every time the client updates the schema, it
> uploads
> > > it
> > > > > to
> > > > > > > the
> > > > > > > > > >       broker. When that happens, it has a feature which
> > > > validates
> > > > > if
> > > > > > > > the
> > > > > > > > > > new
> > > > > > > > > >       schema version is compatible with the previous
> > > versions.
> > > > > There
> > > > > > > > > > are 4 types
> > > > > > > > > >       of compatibility: Full, ... (complete and explain
> each
> > > > one
> > > > > > > > briefly)
> > > > > > > > > >    - Also explain Pulsar Schema registry supports various
> > > > schema
> > > > > > > > > >    protocols:  Avro, protobuf native, ... (complete the
> > > rest),
> > > > > each
> > > > > > > > > > protocol
> > > > > > > > > >    has a schema which dictates how to serialize and
> > > deserialize
> > > > > the
> > > > > > > > > message
> > > > > > > > > >    content into typed object.
> > > > > > > > > >    - Explain in short what is protobuf native (compare
> > > protobuf
> > > > > > > > > non-native)
> > > > > > > > > >    - Please don't paste code instead of explaining.
> > > > > > > > > >       - Explain that protobuf native current validation
> check
> > > > is
> > > > > only
> > > > > > > > > >       composed of checking the root message name is the
> same
> > > > > between
> > > > > > > > > > the current
> > > > > > > > > >       schema version and the new version.
> > > > > > > > > >          - Explain briefly what is a root message and its
> > > name.
> > > > > > > > > >       - Explain the problem (list scenarios) that we have
> > > > because
> > > > > > > > > protobuf
> > > > > > > > > >       native schema only supports FULL compatibility
> > > > validation.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Regarding high level design - as in what you plan to do.
> > > > > > > > > > I suggest you add "High Level Design" and in it detail
> how
> > > you
> > > > > plan
> > > > > > > to
> > > > > > > > > > validate, per protobuf version, per compatibility check
> > > > > (backward,
> > > > > > > > > forward,
> > > > > > > > > > full,...).
> > > > > > > > > > I tried reading the implementation - for me , it's all
> over
> > > the
> > > > > > > place.
> > > > > > > > > Can
> > > > > > > > > > you please list in order what I wrote above, and list the
> > > > > validation
> > > > > > > > > rules
> > > > > > > > > > with a good explanation why you validate it like that?
> > > > > > > > > >
> > > > > > > > > > Lastly, one you have all the validation rules clearly
> stated,
> > > > you
> > > > > can
> > > > > > > > use
> > > > > > > > > > it to document it properly so users can know what
> validation
> > > to
> > > > > > > expect.
> > > > > > > > > >
> > > > > > > > > > Thanks,
> > > > > > > > > >
> > > > > > > > > > Asaf
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Wed, Feb 22, 2023 at 5:10 PM SiNan Liu <
> > > > > liusinan1998@gmail.com>
> > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Sorry, my mistake. I removed the code and described the
> > > > design
> > > > > to
> > > > > > > > > improve
> > > > > > > > > > > the PROTOBUF_NATIVE schema compatibility checks. You
> can
> > > > have a
> > > > > > > look.
> > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Asaf Mesika <as...@gmail.com> 于2023年2月22日周三
> 21:16写道:
> > > > > > > > > > >
> > > > > > > > > > > > I read it but you're almost directly diving into the
> > > code -
> > > > > it
> > > > > > > will
> > > > > > > > > > take
> > > > > > > > > > > me
> > > > > > > > > > > > hours just to reverse engineer your design.
> > > > > > > > > > > >
> > > > > > > > > > > > Can you please include a "High Level Design" section
> in
> > > > which
> > > > > you
> > > > > > > > > > explain
> > > > > > > > > > > > how you plan to tackle any issue?
> > > > > > > > > > > > If I can read that section and explain to someone
> else
> > > how
> > > > > this
> > > > > > > > will
> > > > > > > > > > > work,
> > > > > > > > > > > > it means the section is complete.
> > > > > > > > > > > >
> > > > > > > > > > > > Let's leave the code to the PRs.
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > On Sun, Feb 19, 2023 at 2:59 PM SiNan Liu <
> > > > > > > liusinan1998@gmail.com>
> > > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > Hi all,
> > > > > > > > > > > > >
> > > > > > > > > > > > > I made a PIP to discuss:
> > > > > > > > > > https://github.com/apache/pulsar/issues/19565
> > > > > > > > > > > .
> > > > > > > > > > > > >
> > > > > > > > > > > > > We can talk about the current design here.
> Especially
> > > for
> > > > > the
> > > > > > > > field
> > > > > > > > > > > type
> > > > > > > > > > > > > change check rules, please give your valuable
> advice.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > Sinan
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
>

Re: [DISCUSS] PIP-246: Improved PROTOBUF_NATIVE schema compatibility checks without using avro-protobuf

Posted by 丛搏 <bo...@apache.org>.
I think we have two ways to do that.

First way: We need to advance the improvement of java in protobuf. Ask
if they have plans to improve.

Second way: the new PROTOBUF_NATIVE `SchemaCompatibilityCheck` should
be implemented as a plugin, don't change any existing plugin logic
(it's simple and already used). I don't recommend adding flags for
rollback, it adds configuration and makes little sense.

Thanks,
Bo

Asaf Mesika <as...@gmail.com> 于2023年3月6日周一 23:00写道:

>
> Can you convert the code block which is actually a quote in the
> beginning of the PIP to something which doesn't require to scroll
> horizontally so much?
> Use
> https://docs.github.com/en/get-started/writing-on-github/getting-started-with-writing-and-formatting-on-github/basic-writing-and-formatting-syntax#quoting-text
>
> Let's improve the clarity of what you wrote:
>
> "the PROTOBUF uses avro struct to store."
> -->
> When Schema type PROTOBUF is used, Pulsar Client assumes the object given
> to it as message data is an auto-generated POJO containing the annotations
> encoding the schema. The client is using a converter, which converts a
> Protobuf schema descriptor into an Avro schema and sends that as the Schema
> of the producer/consumer.
>
> "On the broker side, protobuf and avro both use SchemaData converted to
> org.apache.avro.Schema."
> -->
> Since the schema is an Avro schema, the implementation of compatibility
> check on the broker side is to simply re-use the compatibility check of the
> AVRO schema type.
>
> "ProtobufSchema is different from ProtobufNativeSchema in schema
> compatibility check it uses avro-protobuf.
> https://central.sonatype.com/artifact/org.apache.avro/avro-protobuf/1.11.1/overview
> But the current implementation of ProtobufNative schema compatibility
> check only
> checked if the root message name is changed."
>
> -->
> PROTOBUF_NATIVE schema type is different.
> The client is actually using Protobuf Descriptor as the schema, as opposed
> to Avro schema of PROTOBUF schema type. In the broker, the PROTOBUF_NATIVE
> compatibility check actually hasn't implemented any rule, besides one:
> checking if the root message name has changed.
>
>
>
> >    1. For now, there is no official or third-party solution for ProtoBuf
> >    compatibility. If in the future have better solutions of a third party or
> >    the official, we develop new ProtobufNativeSchemaValidator and use, so
> >    add a flag.
> >
> > Who do you need to make that configurable? Once you found a third party,
> just switch to it? Who knows, maybe you never will. Introduce it when you
> find it, not now.
>
>
> We improve in ProtobufNativeSchemaCompatibilityCheck BACKWARD, FORWARD
> > these strategies. As with the AVRO implementation, protobuf compatibility
> > checking need implementing the canRead method. *This will check that
> > the writtenschema can be read by readSchema.*
>
>
> I completely disagree.
> Avro implementation is confusing for our use case. Don't copy that.
>
> You have
>
> public void checkCompatible(SchemaData from, SchemaData to,
> SchemaCompatibilityStrategy strategy)
>         throws IncompatibleSchemaException {
>     Descriptor fromDescriptor =
> ProtobufNativeSchemaUtils.deserialize(from.getData());
>     Descriptor toDescriptor =
> ProtobufNativeSchemaUtils.deserialize(to.getData());
>     switch (strategy) {
>         case BACKWARD_TRANSITIVE:
>         case BACKWARD:
>         case FORWARD_TRANSITIVE:
>         case FORWARD:
>         case FULL_TRANSITIVE:
>         case FULL:
>             checkRootMessageChange(fromDescriptor, toDescriptor, strategy);
>             return;
>         case ALWAYS_COMPATIBLE:
>             return;
>         default:
>             throw new IncompatibleSchemaException("Unknown
> SchemaCompatibilityStrategy.");
>     }
> }
>
> I would rename :
> from --> currentSchema
> to --> newSchema
>
> Use that switch case and have a method for each like:
> validateBackwardsCompatibility(currentSchema, newSchema)
>
> I dislike canRead and usage of writtenSchema, since you have two completely
> different use cases: from the producing side and the consumer side.
>
> schemaValidatorBuilder
> >
> > I dislike this proposal. IMO Avro implementation is way too complicated.
> Why not have a simple function for validation for each switch case above?
> Why do we need strategy and builder, and all this complexity?
>
>
> *Here are the basic compatibility rules we've defined:*
>
>
> IMO it's impossible to read the validation rules as you described them.
> I wrote how they should be structured numerous times above.
> I can't validate them.
>
>
> IMO, the current design is very hard to read.
> Please try to avoid jumping into code sections.
> Write a high level design section, in which you describe in words what you
> plan to do.
> Write the validation rules in the structure that is easy to understand:
> rules per each compatibility check, and use proper words (current schema,
> new schema), since new schema can be once used for read and once used for
> write.
>
> In its current form it takes too much time to understand the design, and it
> shouldn't be the case.
>
> Thanks,
>
> Asaf
>
>
> >
>
>
>
> On Sun, Mar 5, 2023 at 3:58 PM SiNan Liu <li...@gmail.com> wrote:
>
> > Hi! I updated the explanation of some things in the PIP issue. And also
> > added a new “flag” in the conf is used as the different
> > ProtobufNativeSchemaValidator implementation, also set
> > ProtobufNativeSchemaValidator default only check whether the name of the
> > root message is the same.
> >
> >
> > Thanks,
> > sinan
> >
> >
> > Asaf Mesika <as...@gmail.com> 于2023年3月5日周日 20:21写道:
> >
> > > On Wed, Mar 1, 2023 at 4:33 PM SiNan Liu <li...@gmail.com> wrote:
> > >
> > > > >
> > > > > Can you please explain how a Protobuf Schema descriptor can be
> > > validated
> > > > > for backward compatibility check using Avro based compatibility
> > rules?
> > > > > Doesn't it expect the schema to be Avro, but it is actually a
> > Protobuf
> > > > > descriptor?
> > > > > Is there some translation happening?
> > > >
> > > >
> > > > 1. *You can take a quick look at the previous design, the PROTOBUF uses
> > > > avro struct to store.*
> > > > https://github.com/apache/pulsar/pull/1954
> > > >
> > > >
> > >
> > https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/ProtobufSchema.java#L59-L61
> > > >
> > > >
> > >
> > https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/ProtobufSchema.java#L110-L115
> > >
> > >
> > > Ok. So to summarize your code (easier to write it than send links):
> > > * Pulsar Client, when used with Protobuf Schema, actually converts the
> > > Protobuf descriptor into an Avro Schema (using code found inside Avro
> > > library) and saves that Avro schema as the schema. It's not saving the
> > > protobuf descriptor at all. Very confusing I have to add - never expected
> > > that.
> > > This explains why In the ProtobufSchemaCompatibilityCheck they just
> > extend
> > > the Avro without doing any translation.
> > >
> > > Thanks for that.
> > >
> > > Now thatI finally understand this, I can say that: you *must* explain
> > that
> > > in the motivation part in your PIP.
> > >
> > >
> > >
> > > >
> > > >
> > > > 2. *On the broker side, protobuf and avro both use `SchemaData`
> > converted
> > > > to `org.apache.avro.Schema`.*
> > > >
> > > >
> > >
> > https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/ServerCnx.java#L1280-L1293
> > > >
> > > >
> > >
> > https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/schema/ProtobufSchemaCompatibilityCheck.java#L26-L31
> > > >
> > > >
> > >
> > https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/schema/AvroSchemaBasedCompatibilityCheck.java#L47-L70
> > >
> > >
> > > Actually those links don't really help.
> > > The main link that helps is:
> > >
> > >
> > https://github.com/apache/pulsar/blob/ec102fb024a6ea2b195826778300f20e330dff06/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/ProtobufSchema.java#L102-L122
> > >
> > >
> > > >
> > > >
> > > >
> > > >
> > > > I'm sorry - I don't understand.
> > > > > I understand the different compatibility check strategies.
> > > > > If you just spell them out here, then as you say, just translate the
> > > > > Protobuf Descriptor into an Avro schema and run the Avro
> > > > > compatibility validation, no?
> > > > > I believe the answer is no, since you may want to verify different
> > > things
> > > > > when it comes to Protobuf, which are different then Avro.
> > > >
> > > >
> > > > 1.
> > > > *ProtobufSchema is different from ProtobufNativeSchema in that it uses
> > > > avro-protobuf.*
> > > >
> > > >
> > >
> > https://central.sonatype.com/artifact/org.apache.avro/avro-protobuf/1.11.1/overview
> > > > *ProtobufNativeSchema needs a native compatibility check, but there is
> > no
> > > > official or third party implementation. So this PIP does not use
> > > > avro-protobuf for protobuf compatibility checking.*
> > > >
> > > > 2. *By the way, this is implemented in much the same way that Apache
> > avro
> > > > does compatibility checking.*
> > > >
> > > >
> > >
> > https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/SchemaValidatorBuilder.java
> > > > `canReadStrategy`,`canBeReadStrategy`,`mutualReadStrategy`
> > > >
> > > >
> > >
> > https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/ValidateCanRead.java
> > > >
> > > >
> > >
> > https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/ValidateCanBeRead.java
> > > >
> > > >
> > >
> > https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/ValidateMutualRead.java
> > > > *In `ValidateMutualRead.java`, the arguments of `canRead()` are
> > > > writtenSchema and readSchema. We only need to change the order of
> > > arguments
> > > > we pass to `canRead()`.*
> > > > ```java
> > > > private void validateWithStrategy(Descriptors.Descriptor toValidate,
> > > > Descriptors.Descriptor fromDescriptor) throws
> > > ProtoBufCanReadCheckException
> > > > {
> > > > switch (strategy) {
> > > > case CanReadExistingStrategy -> canRead(fromDescriptor, toValidate);
> > > > case CanBeReadByExistingStrategy -> canRead(toValidate,
> > fromDescriptor);
> > > > case CanBeReadMutualStrategy -> {
> > > > canRead(toValidate, fromDescriptor);
> > > > canRead(fromDescriptor, toValidate);
> > > > }
> > > > }
> > > > }
> > > >
> > > > private void canRead(Descriptors.Descriptor writtenSchema,
> > > > Descriptors.Descriptor readSchema) throws
> > ProtoBufCanReadCheckException {
> > > >
> > >
> > ProtobufNativeSchemaBreakCheckUtils.checkSchemaCompatibility(writtenSchema,
> > > > readSchema);
> > > > }
> > > > ```
> > > >
> > > >
> > > I get that you want to take inspiration from the existing Avro Schema
> > > compatibility check, to do your code design.
> > > I also understand you *won't* use any existing avro code for that.
> > > I also understand, you have to write the validation check on your own,
> > > since there is no 3rd party to explain that.
> > >
> > > The only thing I can't understand are the actual rules as I wrote before,
> > > since they are written confusingly.
> > > So, I repeat what I asked before:
> > >
> > > I think you should structure the validation rules differently:
> > >
> > > * Backward checks
> > > ** List down rules, where use newSchema (the schema used by producer or
> > > consumer) and existingSchema (last schema used)
> > > * Forward
> > > ** List down rules, where use newSchema (the schema used by producer or
> > > consumer) and existingSchema (last schema used)
> > >
> > > Once that's accomplished I will be able to understand the different
> > > validation rules for each compatibility check.
> > >
> > >
> > >
> > >
> > >
> > >
> > > >
> > > > Thanks,
> > > > sinan
> > > >
> > > >
> > > >
> > > > Asaf Mesika <as...@gmail.com> 于2023年3月1日周三 21:19写道:
> > > > >
> > > > > On Mon, Feb 27, 2023 at 3:47 PM SiNan Liu <li...@gmail.com>
> > > > wrote:
> > > > >
> > > > > > >
> > > > > > > I read it and they look identical. What's the difference between
> > > > them?
> > > > > >
> > > > > > Current avro,json, and protobuf schemas are all implemented based
> > on
> > > > AVRO.
> > > > > > > What do you mean, they are all implemented based on Avro? You
> > mean
> > > > the
> > > > > > > protobuf schema is converted into an Avro Schema, and then you
> > use
> > > > Avro
> > > > > > > compatibility validation?
> > > > > >
> > > > > >
> > > > > >
> > > >
> > >
> > `org.apache.pulsar.broker.service.schema.ProtobufSchemaCompatibilityCheck`
> > > > > >
> > > `org.apache.pulsar.broker.service.schema.AvroSchemaCompatibilityCheck`
> > > > > >
> > > `org.apache.pulsar.broker.service.schema.JsonSchemaCompatibilityCheck`
> > > > > > They all extends `AvroSchemaBasedCompatibilityCheck`, the
> > > > > > `checkCompatible()` is the same implementation with `AVRO`.
> > > > > >
> > > > >
> > > > > Can you please explain how a Protobuf Schema descriptor can be
> > > validated
> > > > > for backward compatibility check using Avro based compatibility
> > rules?
> > > > > Doesn't it expect the schema to be Avro, but it is actually a
> > Protobuf
> > > > > descriptor?
> > > > > Is there some translation happening?
> > > > >
> > > > >
> > > > >
> > > > > >
> > > > > >
> > > > > > I think you should structure the validation rules differently:
> > > > > >
> > > > > >
> > > > > > The Compatibility check strategy is described on the website
> > > > > >
> > > > > >
> > > >
> > > >
> > >
> > https://pulsar.apache.org/docs/next/schema-understand/#schema-compatibility-check-strategy
> > > > > > 1. BACKWARD(CanReadExistingStrategy): Consumers using schema V3 can
> > > > process
> > > > > > data written by producers using the last schema version V2. So V2
> > is
> > > > > > "writtenSchema" and V3 is "readSchema".
> > > > > > 2. FORWARD(CanBeReadByExistingStrategy): Consumers using the last
> > > > schema
> > > > > > version V2 can process data written by producers using a new schema
> > > V3,
> > > > > > even though they may not be able to use the full capabilities of
> > the
> > > > new
> > > > > > schema. So V3 is "writtenSchema" and V2 is "readSchema".
> > > > > > 3. FULL(CanBeReadMutualStrategy): Schemas are both backward and
> > > forward
> > > > > > compatible.
> > > > > > Schema can evolve. The old version schema and the new version
> > schema
> > > > should
> > > > > > be well understood.
> > > > > >
> > > > > >
> > > > > I'm sorry - I don't understand.
> > > > > I understand the different compatibility check strategies.
> > > > > If you just spell them out here, then as you say, just translate the
> > > > > Protobuf Descriptor into an Avro schema and run the Avro
> > > > > compatibility validation, no?
> > > > > I believe the answer is no, since you may want to verify different
> > > things
> > > > > when it comes to Protobuf, which are different then Avro.
> > > > >
> > > > > At the current state, I can't understand your design at all. Please
> > > help
> > > > > clarify that.
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > >
> > > > > > So each strategy should have its own section.
> > > > > >
> > > > > >
> > > > > > The arguments of `canRead()` are writtenSchema and readSchema. As
> > > we've
> > > > > > just described, we only need to change the order of arguments we
> > pass
> > > > to
> > > > > > `canRead()`.
> > > > > >
> > > > > >
> > > > > >
> > > > > > Thanks,
> > > > > > sinan
> > > > > >
> > > > > >
> > > > > > Asaf Mesika <as...@gmail.com> 于2023年2月27日周一 20:49写道:
> > > > > >
> > > > > > > >
> > > > > > > > And you can see the difference between ProtoBuf and
> > > ProtoBufNative:
> > > > > > > >
> > > > > > > >
> > > > https://pulsar.apache.org/docs/next/schema-get-started/#protobufnative
> > > > > > > >
> > > > > > > >
> > https://pulsar.apache.org/docs/next/schema-get-started/#protobuf
> > > > > > > >
> > > > > > >  I read it and they look identical. What's the difference between
> > > > them?
> > > > > > >
> > > > > > > Current avro,json, and protobuf schemas are all implemented based
> > > on
> > > > > > AVRO.
> > > > > > >
> > > > > > > What do you mean, they are all implemented based on Avro? You
> > mean
> > > > the
> > > > > > > protobuf schema is converted into an Avro Schema, and then you
> > use
> > > > Avro
> > > > > > > compatibility validation?
> > > > > > >
> > > > > > >
> > > > > > > > *Here are the basic compatibility rules we've defined:*
> > > > > > >
> > > > > > >
> > > > > > > I think you should structure the validation rules differently:
> > > > > > >
> > > > > > > * Backward checks
> > > > > > > ** List down rules, where use newSchema (the schema used by
> > > producer
> > > > or
> > > > > > > consumer) and existingSchema (last schema used)
> > > > > > > * Forward
> > > > > > > ** List down rules, where use newSchema (the schema used by
> > > producer
> > > > or
> > > > > > > consumer) and existingSchema (last schema used)
> > > > > > >
> > > > > > > So each strategy should have its own section.
> > > > > > >
> > > > > > > I'm saying this since you used "writttenSchema" word but it
> > > > represents
> > > > > > > something completely different if it's backward or forward check.
> > > > > > >
> > > > > > > Once you'll have that structure like that, I personally will be
> > > able
> > > > to
> > > > > > > read and understand it.
> > > > > > >
> > > > > > >
> > > > > > > The motivation and problem statement are now good - thanks for
> > > > improving
> > > > > > > it.
> > > > > > >
> > > > > > > On Mon, Feb 27, 2023 at 8:20 AM SiNan Liu <
> > liusinan1998@gmail.com>
> > > > > > wrote:
> > > > > > >
> > > > > > > > Hi! I updated the PIP issue again. This time I've added some
> > > > background
> > > > > > > and
> > > > > > > > some explanations.
> > > > > > > >
> > > > > > > > The compatibility check rules are already written in the
> > > > > > Implementation.
> > > > > > > > ProtoBufNative implements the same canRead method as Apache
> > Avro.
> > > > > > > > It does this by checking whether the schema for writing and
> > > reading
> > > > is
> > > > > > > > compatible. I also indicate whether the writtenSchema and
> > > > readSchema of
> > > > > > > the
> > > > > > > > Backward, Forward, and Full strategies are the old or the new
> > > > version
> > > > > > of
> > > > > > > > the schema.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > sinan
> > > > > > > >
> > > > > > > > Asaf Mesika <as...@gmail.com> 于2023年2月26日周日 23:24写道:
> > > > > > > >
> > > > > > > > > I'm sorry, but this PIP lacks a lot of background knowledge,
> > so
> > > > you
> > > > > > > need
> > > > > > > > to
> > > > > > > > > add IMO for people to understand it. You don't need to
> > explain
> > > > the
> > > > > > > entire
> > > > > > > > > pulsar in this PIP, but at the very least a few paragraphs
> > > > detailing
> > > > > > > all
> > > > > > > > > you need to know, to put you in context:
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >    - Start by saying Pulsar as a built-in schema registry
> > > inside
> > > > > > Pulsar
> > > > > > > > >    broker.
> > > > > > > > >       - Every time the client updates the schema, it uploads
> > it
> > > > to
> > > > > > the
> > > > > > > > >       broker. When that happens, it has a feature which
> > > validates
> > > > if
> > > > > > > the
> > > > > > > > > new
> > > > > > > > >       schema version is compatible with the previous
> > versions.
> > > > There
> > > > > > > > > are 4 types
> > > > > > > > >       of compatibility: Full, ... (complete and explain each
> > > one
> > > > > > > briefly)
> > > > > > > > >    - Also explain Pulsar Schema registry supports various
> > > schema
> > > > > > > > >    protocols:  Avro, protobuf native, ... (complete the
> > rest),
> > > > each
> > > > > > > > > protocol
> > > > > > > > >    has a schema which dictates how to serialize and
> > deserialize
> > > > the
> > > > > > > > message
> > > > > > > > >    content into typed object.
> > > > > > > > >    - Explain in short what is protobuf native (compare
> > protobuf
> > > > > > > > non-native)
> > > > > > > > >    - Please don't paste code instead of explaining.
> > > > > > > > >       - Explain that protobuf native current validation check
> > > is
> > > > only
> > > > > > > > >       composed of checking the root message name is the same
> > > > between
> > > > > > > > > the current
> > > > > > > > >       schema version and the new version.
> > > > > > > > >          - Explain briefly what is a root message and its
> > name.
> > > > > > > > >       - Explain the problem (list scenarios) that we have
> > > because
> > > > > > > > protobuf
> > > > > > > > >       native schema only supports FULL compatibility
> > > validation.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Regarding high level design - as in what you plan to do.
> > > > > > > > > I suggest you add "High Level Design" and in it detail how
> > you
> > > > plan
> > > > > > to
> > > > > > > > > validate, per protobuf version, per compatibility check
> > > > (backward,
> > > > > > > > forward,
> > > > > > > > > full,...).
> > > > > > > > > I tried reading the implementation - for me , it's all over
> > the
> > > > > > place.
> > > > > > > > Can
> > > > > > > > > you please list in order what I wrote above, and list the
> > > > validation
> > > > > > > > rules
> > > > > > > > > with a good explanation why you validate it like that?
> > > > > > > > >
> > > > > > > > > Lastly, one you have all the validation rules clearly stated,
> > > you
> > > > can
> > > > > > > use
> > > > > > > > > it to document it properly so users can know what validation
> > to
> > > > > > expect.
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > >
> > > > > > > > > Asaf
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Wed, Feb 22, 2023 at 5:10 PM SiNan Liu <
> > > > liusinan1998@gmail.com>
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Sorry, my mistake. I removed the code and described the
> > > design
> > > > to
> > > > > > > > improve
> > > > > > > > > > the PROTOBUF_NATIVE schema compatibility checks. You can
> > > have a
> > > > > > look.
> > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Asaf Mesika <as...@gmail.com> 于2023年2月22日周三 21:16写道:
> > > > > > > > > >
> > > > > > > > > > > I read it but you're almost directly diving into the
> > code -
> > > > it
> > > > > > will
> > > > > > > > > take
> > > > > > > > > > me
> > > > > > > > > > > hours just to reverse engineer your design.
> > > > > > > > > > >
> > > > > > > > > > > Can you please include a "High Level Design" section in
> > > which
> > > > you
> > > > > > > > > explain
> > > > > > > > > > > how you plan to tackle any issue?
> > > > > > > > > > > If I can read that section and explain to someone else
> > how
> > > > this
> > > > > > > will
> > > > > > > > > > work,
> > > > > > > > > > > it means the section is complete.
> > > > > > > > > > >
> > > > > > > > > > > Let's leave the code to the PRs.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On Sun, Feb 19, 2023 at 2:59 PM SiNan Liu <
> > > > > > liusinan1998@gmail.com>
> > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Hi all,
> > > > > > > > > > > >
> > > > > > > > > > > > I made a PIP to discuss:
> > > > > > > > > https://github.com/apache/pulsar/issues/19565
> > > > > > > > > > .
> > > > > > > > > > > >
> > > > > > > > > > > > We can talk about the current design here. Especially
> > for
> > > > the
> > > > > > > field
> > > > > > > > > > type
> > > > > > > > > > > > change check rules, please give your valuable advice.
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks,
> > > > > > > > > > > > Sinan
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > >
> > >
> >

Re: [DISCUSS] PIP-246: Improved PROTOBUF_NATIVE schema compatibility checks without using avro-protobuf

Posted by Asaf Mesika <as...@gmail.com>.
Can you convert the code block which is actually a quote in the
beginning of the PIP to something which doesn't require to scroll
horizontally so much?
Use
https://docs.github.com/en/get-started/writing-on-github/getting-started-with-writing-and-formatting-on-github/basic-writing-and-formatting-syntax#quoting-text

Let's improve the clarity of what you wrote:

"the PROTOBUF uses avro struct to store."
-->
When Schema type PROTOBUF is used, Pulsar Client assumes the object given
to it as message data is an auto-generated POJO containing the annotations
encoding the schema. The client is using a converter, which converts a
Protobuf schema descriptor into an Avro schema and sends that as the Schema
of the producer/consumer.

"On the broker side, protobuf and avro both use SchemaData converted to
org.apache.avro.Schema."
-->
Since the schema is an Avro schema, the implementation of compatibility
check on the broker side is to simply re-use the compatibility check of the
AVRO schema type.

"ProtobufSchema is different from ProtobufNativeSchema in schema
compatibility check it uses avro-protobuf.
https://central.sonatype.com/artifact/org.apache.avro/avro-protobuf/1.11.1/overview
But the current implementation of ProtobufNative schema compatibility
check only
checked if the root message name is changed."

-->
PROTOBUF_NATIVE schema type is different.
The client is actually using Protobuf Descriptor as the schema, as opposed
to Avro schema of PROTOBUF schema type. In the broker, the PROTOBUF_NATIVE
compatibility check actually hasn't implemented any rule, besides one:
checking if the root message name has changed.



>    1. For now, there is no official or third-party solution for ProtoBuf
>    compatibility. If in the future have better solutions of a third party or
>    the official, we develop new ProtobufNativeSchemaValidator and use, so
>    add a flag.
>
> Who do you need to make that configurable? Once you found a third party,
just switch to it? Who knows, maybe you never will. Introduce it when you
find it, not now.


We improve in ProtobufNativeSchemaCompatibilityCheck BACKWARD, FORWARD
> these strategies. As with the AVRO implementation, protobuf compatibility
> checking need implementing the canRead method. *This will check that
> the writtenschema can be read by readSchema.*


I completely disagree.
Avro implementation is confusing for our use case. Don't copy that.

You have

public void checkCompatible(SchemaData from, SchemaData to,
SchemaCompatibilityStrategy strategy)
        throws IncompatibleSchemaException {
    Descriptor fromDescriptor =
ProtobufNativeSchemaUtils.deserialize(from.getData());
    Descriptor toDescriptor =
ProtobufNativeSchemaUtils.deserialize(to.getData());
    switch (strategy) {
        case BACKWARD_TRANSITIVE:
        case BACKWARD:
        case FORWARD_TRANSITIVE:
        case FORWARD:
        case FULL_TRANSITIVE:
        case FULL:
            checkRootMessageChange(fromDescriptor, toDescriptor, strategy);
            return;
        case ALWAYS_COMPATIBLE:
            return;
        default:
            throw new IncompatibleSchemaException("Unknown
SchemaCompatibilityStrategy.");
    }
}

I would rename :
from --> currentSchema
to --> newSchema

Use that switch case and have a method for each like:
validateBackwardsCompatibility(currentSchema, newSchema)

I dislike canRead and usage of writtenSchema, since you have two completely
different use cases: from the producing side and the consumer side.

schemaValidatorBuilder
>
> I dislike this proposal. IMO Avro implementation is way too complicated.
Why not have a simple function for validation for each switch case above?
Why do we need strategy and builder, and all this complexity?


*Here are the basic compatibility rules we've defined:*


IMO it's impossible to read the validation rules as you described them.
I wrote how they should be structured numerous times above.
I can't validate them.


IMO, the current design is very hard to read.
Please try to avoid jumping into code sections.
Write a high level design section, in which you describe in words what you
plan to do.
Write the validation rules in the structure that is easy to understand:
rules per each compatibility check, and use proper words (current schema,
new schema), since new schema can be once used for read and once used for
write.

In its current form it takes too much time to understand the design, and it
shouldn't be the case.

Thanks,

Asaf


>



On Sun, Mar 5, 2023 at 3:58 PM SiNan Liu <li...@gmail.com> wrote:

> Hi! I updated the explanation of some things in the PIP issue. And also
> added a new “flag” in the conf is used as the different
> ProtobufNativeSchemaValidator implementation, also set
> ProtobufNativeSchemaValidator default only check whether the name of the
> root message is the same.
>
>
> Thanks,
> sinan
>
>
> Asaf Mesika <as...@gmail.com> 于2023年3月5日周日 20:21写道:
>
> > On Wed, Mar 1, 2023 at 4:33 PM SiNan Liu <li...@gmail.com> wrote:
> >
> > > >
> > > > Can you please explain how a Protobuf Schema descriptor can be
> > validated
> > > > for backward compatibility check using Avro based compatibility
> rules?
> > > > Doesn't it expect the schema to be Avro, but it is actually a
> Protobuf
> > > > descriptor?
> > > > Is there some translation happening?
> > >
> > >
> > > 1. *You can take a quick look at the previous design, the PROTOBUF uses
> > > avro struct to store.*
> > > https://github.com/apache/pulsar/pull/1954
> > >
> > >
> >
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/ProtobufSchema.java#L59-L61
> > >
> > >
> >
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/ProtobufSchema.java#L110-L115
> >
> >
> > Ok. So to summarize your code (easier to write it than send links):
> > * Pulsar Client, when used with Protobuf Schema, actually converts the
> > Protobuf descriptor into an Avro Schema (using code found inside Avro
> > library) and saves that Avro schema as the schema. It's not saving the
> > protobuf descriptor at all. Very confusing I have to add - never expected
> > that.
> > This explains why In the ProtobufSchemaCompatibilityCheck they just
> extend
> > the Avro without doing any translation.
> >
> > Thanks for that.
> >
> > Now thatI finally understand this, I can say that: you *must* explain
> that
> > in the motivation part in your PIP.
> >
> >
> >
> > >
> > >
> > > 2. *On the broker side, protobuf and avro both use `SchemaData`
> converted
> > > to `org.apache.avro.Schema`.*
> > >
> > >
> >
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/ServerCnx.java#L1280-L1293
> > >
> > >
> >
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/schema/ProtobufSchemaCompatibilityCheck.java#L26-L31
> > >
> > >
> >
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/schema/AvroSchemaBasedCompatibilityCheck.java#L47-L70
> >
> >
> > Actually those links don't really help.
> > The main link that helps is:
> >
> >
> https://github.com/apache/pulsar/blob/ec102fb024a6ea2b195826778300f20e330dff06/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/ProtobufSchema.java#L102-L122
> >
> >
> > >
> > >
> > >
> > >
> > > I'm sorry - I don't understand.
> > > > I understand the different compatibility check strategies.
> > > > If you just spell them out here, then as you say, just translate the
> > > > Protobuf Descriptor into an Avro schema and run the Avro
> > > > compatibility validation, no?
> > > > I believe the answer is no, since you may want to verify different
> > things
> > > > when it comes to Protobuf, which are different then Avro.
> > >
> > >
> > > 1.
> > > *ProtobufSchema is different from ProtobufNativeSchema in that it uses
> > > avro-protobuf.*
> > >
> > >
> >
> https://central.sonatype.com/artifact/org.apache.avro/avro-protobuf/1.11.1/overview
> > > *ProtobufNativeSchema needs a native compatibility check, but there is
> no
> > > official or third party implementation. So this PIP does not use
> > > avro-protobuf for protobuf compatibility checking.*
> > >
> > > 2. *By the way, this is implemented in much the same way that Apache
> avro
> > > does compatibility checking.*
> > >
> > >
> >
> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/SchemaValidatorBuilder.java
> > > `canReadStrategy`,`canBeReadStrategy`,`mutualReadStrategy`
> > >
> > >
> >
> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/ValidateCanRead.java
> > >
> > >
> >
> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/ValidateCanBeRead.java
> > >
> > >
> >
> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/ValidateMutualRead.java
> > > *In `ValidateMutualRead.java`, the arguments of `canRead()` are
> > > writtenSchema and readSchema. We only need to change the order of
> > arguments
> > > we pass to `canRead()`.*
> > > ```java
> > > private void validateWithStrategy(Descriptors.Descriptor toValidate,
> > > Descriptors.Descriptor fromDescriptor) throws
> > ProtoBufCanReadCheckException
> > > {
> > > switch (strategy) {
> > > case CanReadExistingStrategy -> canRead(fromDescriptor, toValidate);
> > > case CanBeReadByExistingStrategy -> canRead(toValidate,
> fromDescriptor);
> > > case CanBeReadMutualStrategy -> {
> > > canRead(toValidate, fromDescriptor);
> > > canRead(fromDescriptor, toValidate);
> > > }
> > > }
> > > }
> > >
> > > private void canRead(Descriptors.Descriptor writtenSchema,
> > > Descriptors.Descriptor readSchema) throws
> ProtoBufCanReadCheckException {
> > >
> >
> ProtobufNativeSchemaBreakCheckUtils.checkSchemaCompatibility(writtenSchema,
> > > readSchema);
> > > }
> > > ```
> > >
> > >
> > I get that you want to take inspiration from the existing Avro Schema
> > compatibility check, to do your code design.
> > I also understand you *won't* use any existing avro code for that.
> > I also understand, you have to write the validation check on your own,
> > since there is no 3rd party to explain that.
> >
> > The only thing I can't understand are the actual rules as I wrote before,
> > since they are written confusingly.
> > So, I repeat what I asked before:
> >
> > I think you should structure the validation rules differently:
> >
> > * Backward checks
> > ** List down rules, where use newSchema (the schema used by producer or
> > consumer) and existingSchema (last schema used)
> > * Forward
> > ** List down rules, where use newSchema (the schema used by producer or
> > consumer) and existingSchema (last schema used)
> >
> > Once that's accomplished I will be able to understand the different
> > validation rules for each compatibility check.
> >
> >
> >
> >
> >
> >
> > >
> > > Thanks,
> > > sinan
> > >
> > >
> > >
> > > Asaf Mesika <as...@gmail.com> 于2023年3月1日周三 21:19写道:
> > > >
> > > > On Mon, Feb 27, 2023 at 3:47 PM SiNan Liu <li...@gmail.com>
> > > wrote:
> > > >
> > > > > >
> > > > > > I read it and they look identical. What's the difference between
> > > them?
> > > > >
> > > > > Current avro,json, and protobuf schemas are all implemented based
> on
> > > AVRO.
> > > > > > What do you mean, they are all implemented based on Avro? You
> mean
> > > the
> > > > > > protobuf schema is converted into an Avro Schema, and then you
> use
> > > Avro
> > > > > > compatibility validation?
> > > > >
> > > > >
> > > > >
> > >
> >
> `org.apache.pulsar.broker.service.schema.ProtobufSchemaCompatibilityCheck`
> > > > >
> > `org.apache.pulsar.broker.service.schema.AvroSchemaCompatibilityCheck`
> > > > >
> > `org.apache.pulsar.broker.service.schema.JsonSchemaCompatibilityCheck`
> > > > > They all extends `AvroSchemaBasedCompatibilityCheck`, the
> > > > > `checkCompatible()` is the same implementation with `AVRO`.
> > > > >
> > > >
> > > > Can you please explain how a Protobuf Schema descriptor can be
> > validated
> > > > for backward compatibility check using Avro based compatibility
> rules?
> > > > Doesn't it expect the schema to be Avro, but it is actually a
> Protobuf
> > > > descriptor?
> > > > Is there some translation happening?
> > > >
> > > >
> > > >
> > > > >
> > > > >
> > > > > I think you should structure the validation rules differently:
> > > > >
> > > > >
> > > > > The Compatibility check strategy is described on the website
> > > > >
> > > > >
> > >
> > >
> >
> https://pulsar.apache.org/docs/next/schema-understand/#schema-compatibility-check-strategy
> > > > > 1. BACKWARD(CanReadExistingStrategy): Consumers using schema V3 can
> > > process
> > > > > data written by producers using the last schema version V2. So V2
> is
> > > > > "writtenSchema" and V3 is "readSchema".
> > > > > 2. FORWARD(CanBeReadByExistingStrategy): Consumers using the last
> > > schema
> > > > > version V2 can process data written by producers using a new schema
> > V3,
> > > > > even though they may not be able to use the full capabilities of
> the
> > > new
> > > > > schema. So V3 is "writtenSchema" and V2 is "readSchema".
> > > > > 3. FULL(CanBeReadMutualStrategy): Schemas are both backward and
> > forward
> > > > > compatible.
> > > > > Schema can evolve. The old version schema and the new version
> schema
> > > should
> > > > > be well understood.
> > > > >
> > > > >
> > > > I'm sorry - I don't understand.
> > > > I understand the different compatibility check strategies.
> > > > If you just spell them out here, then as you say, just translate the
> > > > Protobuf Descriptor into an Avro schema and run the Avro
> > > > compatibility validation, no?
> > > > I believe the answer is no, since you may want to verify different
> > things
> > > > when it comes to Protobuf, which are different then Avro.
> > > >
> > > > At the current state, I can't understand your design at all. Please
> > help
> > > > clarify that.
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > >
> > > > > So each strategy should have its own section.
> > > > >
> > > > >
> > > > > The arguments of `canRead()` are writtenSchema and readSchema. As
> > we've
> > > > > just described, we only need to change the order of arguments we
> pass
> > > to
> > > > > `canRead()`.
> > > > >
> > > > >
> > > > >
> > > > > Thanks,
> > > > > sinan
> > > > >
> > > > >
> > > > > Asaf Mesika <as...@gmail.com> 于2023年2月27日周一 20:49写道:
> > > > >
> > > > > > >
> > > > > > > And you can see the difference between ProtoBuf and
> > ProtoBufNative:
> > > > > > >
> > > > > > >
> > > https://pulsar.apache.org/docs/next/schema-get-started/#protobufnative
> > > > > > >
> > > > > > >
> https://pulsar.apache.org/docs/next/schema-get-started/#protobuf
> > > > > > >
> > > > > >  I read it and they look identical. What's the difference between
> > > them?
> > > > > >
> > > > > > Current avro,json, and protobuf schemas are all implemented based
> > on
> > > > > AVRO.
> > > > > >
> > > > > > What do you mean, they are all implemented based on Avro? You
> mean
> > > the
> > > > > > protobuf schema is converted into an Avro Schema, and then you
> use
> > > Avro
> > > > > > compatibility validation?
> > > > > >
> > > > > >
> > > > > > > *Here are the basic compatibility rules we've defined:*
> > > > > >
> > > > > >
> > > > > > I think you should structure the validation rules differently:
> > > > > >
> > > > > > * Backward checks
> > > > > > ** List down rules, where use newSchema (the schema used by
> > producer
> > > or
> > > > > > consumer) and existingSchema (last schema used)
> > > > > > * Forward
> > > > > > ** List down rules, where use newSchema (the schema used by
> > producer
> > > or
> > > > > > consumer) and existingSchema (last schema used)
> > > > > >
> > > > > > So each strategy should have its own section.
> > > > > >
> > > > > > I'm saying this since you used "writttenSchema" word but it
> > > represents
> > > > > > something completely different if it's backward or forward check.
> > > > > >
> > > > > > Once you'll have that structure like that, I personally will be
> > able
> > > to
> > > > > > read and understand it.
> > > > > >
> > > > > >
> > > > > > The motivation and problem statement are now good - thanks for
> > > improving
> > > > > > it.
> > > > > >
> > > > > > On Mon, Feb 27, 2023 at 8:20 AM SiNan Liu <
> liusinan1998@gmail.com>
> > > > > wrote:
> > > > > >
> > > > > > > Hi! I updated the PIP issue again. This time I've added some
> > > background
> > > > > > and
> > > > > > > some explanations.
> > > > > > >
> > > > > > > The compatibility check rules are already written in the
> > > > > Implementation.
> > > > > > > ProtoBufNative implements the same canRead method as Apache
> Avro.
> > > > > > > It does this by checking whether the schema for writing and
> > reading
> > > is
> > > > > > > compatible. I also indicate whether the writtenSchema and
> > > readSchema of
> > > > > > the
> > > > > > > Backward, Forward, and Full strategies are the old or the new
> > > version
> > > > > of
> > > > > > > the schema.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > sinan
> > > > > > >
> > > > > > > Asaf Mesika <as...@gmail.com> 于2023年2月26日周日 23:24写道:
> > > > > > >
> > > > > > > > I'm sorry, but this PIP lacks a lot of background knowledge,
> so
> > > you
> > > > > > need
> > > > > > > to
> > > > > > > > add IMO for people to understand it. You don't need to
> explain
> > > the
> > > > > > entire
> > > > > > > > pulsar in this PIP, but at the very least a few paragraphs
> > > detailing
> > > > > > all
> > > > > > > > you need to know, to put you in context:
> > > > > > > >
> > > > > > > >
> > > > > > > >    - Start by saying Pulsar as a built-in schema registry
> > inside
> > > > > Pulsar
> > > > > > > >    broker.
> > > > > > > >       - Every time the client updates the schema, it uploads
> it
> > > to
> > > > > the
> > > > > > > >       broker. When that happens, it has a feature which
> > validates
> > > if
> > > > > > the
> > > > > > > > new
> > > > > > > >       schema version is compatible with the previous
> versions.
> > > There
> > > > > > > > are 4 types
> > > > > > > >       of compatibility: Full, ... (complete and explain each
> > one
> > > > > > briefly)
> > > > > > > >    - Also explain Pulsar Schema registry supports various
> > schema
> > > > > > > >    protocols:  Avro, protobuf native, ... (complete the
> rest),
> > > each
> > > > > > > > protocol
> > > > > > > >    has a schema which dictates how to serialize and
> deserialize
> > > the
> > > > > > > message
> > > > > > > >    content into typed object.
> > > > > > > >    - Explain in short what is protobuf native (compare
> protobuf
> > > > > > > non-native)
> > > > > > > >    - Please don't paste code instead of explaining.
> > > > > > > >       - Explain that protobuf native current validation check
> > is
> > > only
> > > > > > > >       composed of checking the root message name is the same
> > > between
> > > > > > > > the current
> > > > > > > >       schema version and the new version.
> > > > > > > >          - Explain briefly what is a root message and its
> name.
> > > > > > > >       - Explain the problem (list scenarios) that we have
> > because
> > > > > > > protobuf
> > > > > > > >       native schema only supports FULL compatibility
> > validation.
> > > > > > > >
> > > > > > > >
> > > > > > > > Regarding high level design - as in what you plan to do.
> > > > > > > > I suggest you add "High Level Design" and in it detail how
> you
> > > plan
> > > > > to
> > > > > > > > validate, per protobuf version, per compatibility check
> > > (backward,
> > > > > > > forward,
> > > > > > > > full,...).
> > > > > > > > I tried reading the implementation - for me , it's all over
> the
> > > > > place.
> > > > > > > Can
> > > > > > > > you please list in order what I wrote above, and list the
> > > validation
> > > > > > > rules
> > > > > > > > with a good explanation why you validate it like that?
> > > > > > > >
> > > > > > > > Lastly, one you have all the validation rules clearly stated,
> > you
> > > can
> > > > > > use
> > > > > > > > it to document it properly so users can know what validation
> to
> > > > > expect.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > >
> > > > > > > > Asaf
> > > > > > > >
> > > > > > > >
> > > > > > > > On Wed, Feb 22, 2023 at 5:10 PM SiNan Liu <
> > > liusinan1998@gmail.com>
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Sorry, my mistake. I removed the code and described the
> > design
> > > to
> > > > > > > improve
> > > > > > > > > the PROTOBUF_NATIVE schema compatibility checks. You can
> > have a
> > > > > look.
> > > > > > >
> > > > > > > > >
> > > > > > > > > Asaf Mesika <as...@gmail.com> 于2023年2月22日周三 21:16写道:
> > > > > > > > >
> > > > > > > > > > I read it but you're almost directly diving into the
> code -
> > > it
> > > > > will
> > > > > > > > take
> > > > > > > > > me
> > > > > > > > > > hours just to reverse engineer your design.
> > > > > > > > > >
> > > > > > > > > > Can you please include a "High Level Design" section in
> > which
> > > you
> > > > > > > > explain
> > > > > > > > > > how you plan to tackle any issue?
> > > > > > > > > > If I can read that section and explain to someone else
> how
> > > this
> > > > > > will
> > > > > > > > > work,
> > > > > > > > > > it means the section is complete.
> > > > > > > > > >
> > > > > > > > > > Let's leave the code to the PRs.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Sun, Feb 19, 2023 at 2:59 PM SiNan Liu <
> > > > > liusinan1998@gmail.com>
> > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Hi all,
> > > > > > > > > > >
> > > > > > > > > > > I made a PIP to discuss:
> > > > > > > > https://github.com/apache/pulsar/issues/19565
> > > > > > > > > .
> > > > > > > > > > >
> > > > > > > > > > > We can talk about the current design here. Especially
> for
> > > the
> > > > > > field
> > > > > > > > > type
> > > > > > > > > > > change check rules, please give your valuable advice.
> > > > > > > > > > >
> > > > > > > > > > > Thanks,
> > > > > > > > > > > Sinan
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
>

Re: [DISCUSS] PIP-246: Improved PROTOBUF_NATIVE schema compatibility checks without using avro-protobuf

Posted by SiNan Liu <li...@gmail.com>.
Hi! I updated the explanation of some things in the PIP issue. And also
added a new “flag” in the conf is used as the different
ProtobufNativeSchemaValidator implementation, also set
ProtobufNativeSchemaValidator default only check whether the name of the
root message is the same.


Thanks,
sinan


Asaf Mesika <as...@gmail.com> 于2023年3月5日周日 20:21写道:

> On Wed, Mar 1, 2023 at 4:33 PM SiNan Liu <li...@gmail.com> wrote:
>
> > >
> > > Can you please explain how a Protobuf Schema descriptor can be
> validated
> > > for backward compatibility check using Avro based compatibility rules?
> > > Doesn't it expect the schema to be Avro, but it is actually a Protobuf
> > > descriptor?
> > > Is there some translation happening?
> >
> >
> > 1. *You can take a quick look at the previous design, the PROTOBUF uses
> > avro struct to store.*
> > https://github.com/apache/pulsar/pull/1954
> >
> >
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/ProtobufSchema.java#L59-L61
> >
> >
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/ProtobufSchema.java#L110-L115
>
>
> Ok. So to summarize your code (easier to write it than send links):
> * Pulsar Client, when used with Protobuf Schema, actually converts the
> Protobuf descriptor into an Avro Schema (using code found inside Avro
> library) and saves that Avro schema as the schema. It's not saving the
> protobuf descriptor at all. Very confusing I have to add - never expected
> that.
> This explains why In the ProtobufSchemaCompatibilityCheck they just extend
> the Avro without doing any translation.
>
> Thanks for that.
>
> Now thatI finally understand this, I can say that: you *must* explain that
> in the motivation part in your PIP.
>
>
>
> >
> >
> > 2. *On the broker side, protobuf and avro both use `SchemaData` converted
> > to `org.apache.avro.Schema`.*
> >
> >
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/ServerCnx.java#L1280-L1293
> >
> >
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/schema/ProtobufSchemaCompatibilityCheck.java#L26-L31
> >
> >
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/schema/AvroSchemaBasedCompatibilityCheck.java#L47-L70
>
>
> Actually those links don't really help.
> The main link that helps is:
>
> https://github.com/apache/pulsar/blob/ec102fb024a6ea2b195826778300f20e330dff06/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/ProtobufSchema.java#L102-L122
>
>
> >
> >
> >
> >
> > I'm sorry - I don't understand.
> > > I understand the different compatibility check strategies.
> > > If you just spell them out here, then as you say, just translate the
> > > Protobuf Descriptor into an Avro schema and run the Avro
> > > compatibility validation, no?
> > > I believe the answer is no, since you may want to verify different
> things
> > > when it comes to Protobuf, which are different then Avro.
> >
> >
> > 1.
> > *ProtobufSchema is different from ProtobufNativeSchema in that it uses
> > avro-protobuf.*
> >
> >
> https://central.sonatype.com/artifact/org.apache.avro/avro-protobuf/1.11.1/overview
> > *ProtobufNativeSchema needs a native compatibility check, but there is no
> > official or third party implementation. So this PIP does not use
> > avro-protobuf for protobuf compatibility checking.*
> >
> > 2. *By the way, this is implemented in much the same way that Apache avro
> > does compatibility checking.*
> >
> >
> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/SchemaValidatorBuilder.java
> > `canReadStrategy`,`canBeReadStrategy`,`mutualReadStrategy`
> >
> >
> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/ValidateCanRead.java
> >
> >
> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/ValidateCanBeRead.java
> >
> >
> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/ValidateMutualRead.java
> > *In `ValidateMutualRead.java`, the arguments of `canRead()` are
> > writtenSchema and readSchema. We only need to change the order of
> arguments
> > we pass to `canRead()`.*
> > ```java
> > private void validateWithStrategy(Descriptors.Descriptor toValidate,
> > Descriptors.Descriptor fromDescriptor) throws
> ProtoBufCanReadCheckException
> > {
> > switch (strategy) {
> > case CanReadExistingStrategy -> canRead(fromDescriptor, toValidate);
> > case CanBeReadByExistingStrategy -> canRead(toValidate, fromDescriptor);
> > case CanBeReadMutualStrategy -> {
> > canRead(toValidate, fromDescriptor);
> > canRead(fromDescriptor, toValidate);
> > }
> > }
> > }
> >
> > private void canRead(Descriptors.Descriptor writtenSchema,
> > Descriptors.Descriptor readSchema) throws ProtoBufCanReadCheckException {
> >
> ProtobufNativeSchemaBreakCheckUtils.checkSchemaCompatibility(writtenSchema,
> > readSchema);
> > }
> > ```
> >
> >
> I get that you want to take inspiration from the existing Avro Schema
> compatibility check, to do your code design.
> I also understand you *won't* use any existing avro code for that.
> I also understand, you have to write the validation check on your own,
> since there is no 3rd party to explain that.
>
> The only thing I can't understand are the actual rules as I wrote before,
> since they are written confusingly.
> So, I repeat what I asked before:
>
> I think you should structure the validation rules differently:
>
> * Backward checks
> ** List down rules, where use newSchema (the schema used by producer or
> consumer) and existingSchema (last schema used)
> * Forward
> ** List down rules, where use newSchema (the schema used by producer or
> consumer) and existingSchema (last schema used)
>
> Once that's accomplished I will be able to understand the different
> validation rules for each compatibility check.
>
>
>
>
>
>
> >
> > Thanks,
> > sinan
> >
> >
> >
> > Asaf Mesika <as...@gmail.com> 于2023年3月1日周三 21:19写道:
> > >
> > > On Mon, Feb 27, 2023 at 3:47 PM SiNan Liu <li...@gmail.com>
> > wrote:
> > >
> > > > >
> > > > > I read it and they look identical. What's the difference between
> > them?
> > > >
> > > > Current avro,json, and protobuf schemas are all implemented based on
> > AVRO.
> > > > > What do you mean, they are all implemented based on Avro? You mean
> > the
> > > > > protobuf schema is converted into an Avro Schema, and then you use
> > Avro
> > > > > compatibility validation?
> > > >
> > > >
> > > >
> >
> `org.apache.pulsar.broker.service.schema.ProtobufSchemaCompatibilityCheck`
> > > >
> `org.apache.pulsar.broker.service.schema.AvroSchemaCompatibilityCheck`
> > > >
> `org.apache.pulsar.broker.service.schema.JsonSchemaCompatibilityCheck`
> > > > They all extends `AvroSchemaBasedCompatibilityCheck`, the
> > > > `checkCompatible()` is the same implementation with `AVRO`.
> > > >
> > >
> > > Can you please explain how a Protobuf Schema descriptor can be
> validated
> > > for backward compatibility check using Avro based compatibility rules?
> > > Doesn't it expect the schema to be Avro, but it is actually a Protobuf
> > > descriptor?
> > > Is there some translation happening?
> > >
> > >
> > >
> > > >
> > > >
> > > > I think you should structure the validation rules differently:
> > > >
> > > >
> > > > The Compatibility check strategy is described on the website
> > > >
> > > >
> >
> >
> https://pulsar.apache.org/docs/next/schema-understand/#schema-compatibility-check-strategy
> > > > 1. BACKWARD(CanReadExistingStrategy): Consumers using schema V3 can
> > process
> > > > data written by producers using the last schema version V2. So V2 is
> > > > "writtenSchema" and V3 is "readSchema".
> > > > 2. FORWARD(CanBeReadByExistingStrategy): Consumers using the last
> > schema
> > > > version V2 can process data written by producers using a new schema
> V3,
> > > > even though they may not be able to use the full capabilities of the
> > new
> > > > schema. So V3 is "writtenSchema" and V2 is "readSchema".
> > > > 3. FULL(CanBeReadMutualStrategy): Schemas are both backward and
> forward
> > > > compatible.
> > > > Schema can evolve. The old version schema and the new version schema
> > should
> > > > be well understood.
> > > >
> > > >
> > > I'm sorry - I don't understand.
> > > I understand the different compatibility check strategies.
> > > If you just spell them out here, then as you say, just translate the
> > > Protobuf Descriptor into an Avro schema and run the Avro
> > > compatibility validation, no?
> > > I believe the answer is no, since you may want to verify different
> things
> > > when it comes to Protobuf, which are different then Avro.
> > >
> > > At the current state, I can't understand your design at all. Please
> help
> > > clarify that.
> > >
> > >
> > >
> > >
> > >
> > > >
> > > > So each strategy should have its own section.
> > > >
> > > >
> > > > The arguments of `canRead()` are writtenSchema and readSchema. As
> we've
> > > > just described, we only need to change the order of arguments we pass
> > to
> > > > `canRead()`.
> > > >
> > > >
> > > >
> > > > Thanks,
> > > > sinan
> > > >
> > > >
> > > > Asaf Mesika <as...@gmail.com> 于2023年2月27日周一 20:49写道:
> > > >
> > > > > >
> > > > > > And you can see the difference between ProtoBuf and
> ProtoBufNative:
> > > > > >
> > > > > >
> > https://pulsar.apache.org/docs/next/schema-get-started/#protobufnative
> > > > > >
> > > > > > https://pulsar.apache.org/docs/next/schema-get-started/#protobuf
> > > > > >
> > > > >  I read it and they look identical. What's the difference between
> > them?
> > > > >
> > > > > Current avro,json, and protobuf schemas are all implemented based
> on
> > > > AVRO.
> > > > >
> > > > > What do you mean, they are all implemented based on Avro? You mean
> > the
> > > > > protobuf schema is converted into an Avro Schema, and then you use
> > Avro
> > > > > compatibility validation?
> > > > >
> > > > >
> > > > > > *Here are the basic compatibility rules we've defined:*
> > > > >
> > > > >
> > > > > I think you should structure the validation rules differently:
> > > > >
> > > > > * Backward checks
> > > > > ** List down rules, where use newSchema (the schema used by
> producer
> > or
> > > > > consumer) and existingSchema (last schema used)
> > > > > * Forward
> > > > > ** List down rules, where use newSchema (the schema used by
> producer
> > or
> > > > > consumer) and existingSchema (last schema used)
> > > > >
> > > > > So each strategy should have its own section.
> > > > >
> > > > > I'm saying this since you used "writttenSchema" word but it
> > represents
> > > > > something completely different if it's backward or forward check.
> > > > >
> > > > > Once you'll have that structure like that, I personally will be
> able
> > to
> > > > > read and understand it.
> > > > >
> > > > >
> > > > > The motivation and problem statement are now good - thanks for
> > improving
> > > > > it.
> > > > >
> > > > > On Mon, Feb 27, 2023 at 8:20 AM SiNan Liu <li...@gmail.com>
> > > > wrote:
> > > > >
> > > > > > Hi! I updated the PIP issue again. This time I've added some
> > background
> > > > > and
> > > > > > some explanations.
> > > > > >
> > > > > > The compatibility check rules are already written in the
> > > > Implementation.
> > > > > > ProtoBufNative implements the same canRead method as Apache Avro.
> > > > > > It does this by checking whether the schema for writing and
> reading
> > is
> > > > > > compatible. I also indicate whether the writtenSchema and
> > readSchema of
> > > > > the
> > > > > > Backward, Forward, and Full strategies are the old or the new
> > version
> > > > of
> > > > > > the schema.
> > > > > >
> > > > > > Thanks,
> > > > > > sinan
> > > > > >
> > > > > > Asaf Mesika <as...@gmail.com> 于2023年2月26日周日 23:24写道:
> > > > > >
> > > > > > > I'm sorry, but this PIP lacks a lot of background knowledge, so
> > you
> > > > > need
> > > > > > to
> > > > > > > add IMO for people to understand it. You don't need to explain
> > the
> > > > > entire
> > > > > > > pulsar in this PIP, but at the very least a few paragraphs
> > detailing
> > > > > all
> > > > > > > you need to know, to put you in context:
> > > > > > >
> > > > > > >
> > > > > > >    - Start by saying Pulsar as a built-in schema registry
> inside
> > > > Pulsar
> > > > > > >    broker.
> > > > > > >       - Every time the client updates the schema, it uploads it
> > to
> > > > the
> > > > > > >       broker. When that happens, it has a feature which
> validates
> > if
> > > > > the
> > > > > > > new
> > > > > > >       schema version is compatible with the previous versions.
> > There
> > > > > > > are 4 types
> > > > > > >       of compatibility: Full, ... (complete and explain each
> one
> > > > > briefly)
> > > > > > >    - Also explain Pulsar Schema registry supports various
> schema
> > > > > > >    protocols:  Avro, protobuf native, ... (complete the rest),
> > each
> > > > > > > protocol
> > > > > > >    has a schema which dictates how to serialize and deserialize
> > the
> > > > > > message
> > > > > > >    content into typed object.
> > > > > > >    - Explain in short what is protobuf native (compare protobuf
> > > > > > non-native)
> > > > > > >    - Please don't paste code instead of explaining.
> > > > > > >       - Explain that protobuf native current validation check
> is
> > only
> > > > > > >       composed of checking the root message name is the same
> > between
> > > > > > > the current
> > > > > > >       schema version and the new version.
> > > > > > >          - Explain briefly what is a root message and its name.
> > > > > > >       - Explain the problem (list scenarios) that we have
> because
> > > > > > protobuf
> > > > > > >       native schema only supports FULL compatibility
> validation.
> > > > > > >
> > > > > > >
> > > > > > > Regarding high level design - as in what you plan to do.
> > > > > > > I suggest you add "High Level Design" and in it detail how you
> > plan
> > > > to
> > > > > > > validate, per protobuf version, per compatibility check
> > (backward,
> > > > > > forward,
> > > > > > > full,...).
> > > > > > > I tried reading the implementation - for me , it's all over the
> > > > place.
> > > > > > Can
> > > > > > > you please list in order what I wrote above, and list the
> > validation
> > > > > > rules
> > > > > > > with a good explanation why you validate it like that?
> > > > > > >
> > > > > > > Lastly, one you have all the validation rules clearly stated,
> you
> > can
> > > > > use
> > > > > > > it to document it properly so users can know what validation to
> > > > expect.
> > > > > > >
> > > > > > > Thanks,
> > > > > > >
> > > > > > > Asaf
> > > > > > >
> > > > > > >
> > > > > > > On Wed, Feb 22, 2023 at 5:10 PM SiNan Liu <
> > liusinan1998@gmail.com>
> > > > > > wrote:
> > > > > > >
> > > > > > > > Sorry, my mistake. I removed the code and described the
> design
> > to
> > > > > > improve
> > > > > > > > the PROTOBUF_NATIVE schema compatibility checks. You can
> have a
> > > > look.
> > > > > >
> > > > > > > >
> > > > > > > > Asaf Mesika <as...@gmail.com> 于2023年2月22日周三 21:16写道:
> > > > > > > >
> > > > > > > > > I read it but you're almost directly diving into the code -
> > it
> > > > will
> > > > > > > take
> > > > > > > > me
> > > > > > > > > hours just to reverse engineer your design.
> > > > > > > > >
> > > > > > > > > Can you please include a "High Level Design" section in
> which
> > you
> > > > > > > explain
> > > > > > > > > how you plan to tackle any issue?
> > > > > > > > > If I can read that section and explain to someone else how
> > this
> > > > > will
> > > > > > > > work,
> > > > > > > > > it means the section is complete.
> > > > > > > > >
> > > > > > > > > Let's leave the code to the PRs.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Sun, Feb 19, 2023 at 2:59 PM SiNan Liu <
> > > > liusinan1998@gmail.com>
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi all,
> > > > > > > > > >
> > > > > > > > > > I made a PIP to discuss:
> > > > > > > https://github.com/apache/pulsar/issues/19565
> > > > > > > > .
> > > > > > > > > >
> > > > > > > > > > We can talk about the current design here. Especially for
> > the
> > > > > field
> > > > > > > > type
> > > > > > > > > > change check rules, please give your valuable advice.
> > > > > > > > > >
> > > > > > > > > > Thanks,
> > > > > > > > > > Sinan
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> >
>

Re: [DISCUSS] PIP-246: Improved PROTOBUF_NATIVE schema compatibility checks without using avro-protobuf

Posted by Asaf Mesika <as...@gmail.com>.
On Wed, Mar 1, 2023 at 4:33 PM SiNan Liu <li...@gmail.com> wrote:

> >
> > Can you please explain how a Protobuf Schema descriptor can be validated
> > for backward compatibility check using Avro based compatibility rules?
> > Doesn't it expect the schema to be Avro, but it is actually a Protobuf
> > descriptor?
> > Is there some translation happening?
>
>
> 1. *You can take a quick look at the previous design, the PROTOBUF uses
> avro struct to store.*
> https://github.com/apache/pulsar/pull/1954
>
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/ProtobufSchema.java#L59-L61
>
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/ProtobufSchema.java#L110-L115


Ok. So to summarize your code (easier to write it than send links):
* Pulsar Client, when used with Protobuf Schema, actually converts the
Protobuf descriptor into an Avro Schema (using code found inside Avro
library) and saves that Avro schema as the schema. It's not saving the
protobuf descriptor at all. Very confusing I have to add - never expected
that.
This explains why In the ProtobufSchemaCompatibilityCheck they just extend
the Avro without doing any translation.

Thanks for that.

Now thatI finally understand this, I can say that: you *must* explain that
in the motivation part in your PIP.



>
>
> 2. *On the broker side, protobuf and avro both use `SchemaData` converted
> to `org.apache.avro.Schema`.*
>
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/ServerCnx.java#L1280-L1293
>
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/schema/ProtobufSchemaCompatibilityCheck.java#L26-L31
>
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/schema/AvroSchemaBasedCompatibilityCheck.java#L47-L70


Actually those links don't really help.
The main link that helps is:
https://github.com/apache/pulsar/blob/ec102fb024a6ea2b195826778300f20e330dff06/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/ProtobufSchema.java#L102-L122


>
>
>
>
> I'm sorry - I don't understand.
> > I understand the different compatibility check strategies.
> > If you just spell them out here, then as you say, just translate the
> > Protobuf Descriptor into an Avro schema and run the Avro
> > compatibility validation, no?
> > I believe the answer is no, since you may want to verify different things
> > when it comes to Protobuf, which are different then Avro.
>
>
> 1.
> *ProtobufSchema is different from ProtobufNativeSchema in that it uses
> avro-protobuf.*
>
> https://central.sonatype.com/artifact/org.apache.avro/avro-protobuf/1.11.1/overview
> *ProtobufNativeSchema needs a native compatibility check, but there is no
> official or third party implementation. So this PIP does not use
> avro-protobuf for protobuf compatibility checking.*
>
> 2. *By the way, this is implemented in much the same way that Apache avro
> does compatibility checking.*
>
> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/SchemaValidatorBuilder.java
> `canReadStrategy`,`canBeReadStrategy`,`mutualReadStrategy`
>
> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/ValidateCanRead.java
>
> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/ValidateCanBeRead.java
>
> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/ValidateMutualRead.java
> *In `ValidateMutualRead.java`, the arguments of `canRead()` are
> writtenSchema and readSchema. We only need to change the order of arguments
> we pass to `canRead()`.*
> ```java
> private void validateWithStrategy(Descriptors.Descriptor toValidate,
> Descriptors.Descriptor fromDescriptor) throws ProtoBufCanReadCheckException
> {
> switch (strategy) {
> case CanReadExistingStrategy -> canRead(fromDescriptor, toValidate);
> case CanBeReadByExistingStrategy -> canRead(toValidate, fromDescriptor);
> case CanBeReadMutualStrategy -> {
> canRead(toValidate, fromDescriptor);
> canRead(fromDescriptor, toValidate);
> }
> }
> }
>
> private void canRead(Descriptors.Descriptor writtenSchema,
> Descriptors.Descriptor readSchema) throws ProtoBufCanReadCheckException {
> ProtobufNativeSchemaBreakCheckUtils.checkSchemaCompatibility(writtenSchema,
> readSchema);
> }
> ```
>
>
I get that you want to take inspiration from the existing Avro Schema
compatibility check, to do your code design.
I also understand you *won't* use any existing avro code for that.
I also understand, you have to write the validation check on your own,
since there is no 3rd party to explain that.

The only thing I can't understand are the actual rules as I wrote before,
since they are written confusingly.
So, I repeat what I asked before:

I think you should structure the validation rules differently:

* Backward checks
** List down rules, where use newSchema (the schema used by producer or
consumer) and existingSchema (last schema used)
* Forward
** List down rules, where use newSchema (the schema used by producer or
consumer) and existingSchema (last schema used)

Once that's accomplished I will be able to understand the different
validation rules for each compatibility check.






>
> Thanks,
> sinan
>
>
>
> Asaf Mesika <as...@gmail.com> 于2023年3月1日周三 21:19写道:
> >
> > On Mon, Feb 27, 2023 at 3:47 PM SiNan Liu <li...@gmail.com>
> wrote:
> >
> > > >
> > > > I read it and they look identical. What's the difference between
> them?
> > >
> > > Current avro,json, and protobuf schemas are all implemented based on
> AVRO.
> > > > What do you mean, they are all implemented based on Avro? You mean
> the
> > > > protobuf schema is converted into an Avro Schema, and then you use
> Avro
> > > > compatibility validation?
> > >
> > >
> > >
> `org.apache.pulsar.broker.service.schema.ProtobufSchemaCompatibilityCheck`
> > > `org.apache.pulsar.broker.service.schema.AvroSchemaCompatibilityCheck`
> > > `org.apache.pulsar.broker.service.schema.JsonSchemaCompatibilityCheck`
> > > They all extends `AvroSchemaBasedCompatibilityCheck`, the
> > > `checkCompatible()` is the same implementation with `AVRO`.
> > >
> >
> > Can you please explain how a Protobuf Schema descriptor can be validated
> > for backward compatibility check using Avro based compatibility rules?
> > Doesn't it expect the schema to be Avro, but it is actually a Protobuf
> > descriptor?
> > Is there some translation happening?
> >
> >
> >
> > >
> > >
> > > I think you should structure the validation rules differently:
> > >
> > >
> > > The Compatibility check strategy is described on the website
> > >
> > >
>
> https://pulsar.apache.org/docs/next/schema-understand/#schema-compatibility-check-strategy
> > > 1. BACKWARD(CanReadExistingStrategy): Consumers using schema V3 can
> process
> > > data written by producers using the last schema version V2. So V2 is
> > > "writtenSchema" and V3 is "readSchema".
> > > 2. FORWARD(CanBeReadByExistingStrategy): Consumers using the last
> schema
> > > version V2 can process data written by producers using a new schema V3,
> > > even though they may not be able to use the full capabilities of the
> new
> > > schema. So V3 is "writtenSchema" and V2 is "readSchema".
> > > 3. FULL(CanBeReadMutualStrategy): Schemas are both backward and forward
> > > compatible.
> > > Schema can evolve. The old version schema and the new version schema
> should
> > > be well understood.
> > >
> > >
> > I'm sorry - I don't understand.
> > I understand the different compatibility check strategies.
> > If you just spell them out here, then as you say, just translate the
> > Protobuf Descriptor into an Avro schema and run the Avro
> > compatibility validation, no?
> > I believe the answer is no, since you may want to verify different things
> > when it comes to Protobuf, which are different then Avro.
> >
> > At the current state, I can't understand your design at all. Please help
> > clarify that.
> >
> >
> >
> >
> >
> > >
> > > So each strategy should have its own section.
> > >
> > >
> > > The arguments of `canRead()` are writtenSchema and readSchema. As we've
> > > just described, we only need to change the order of arguments we pass
> to
> > > `canRead()`.
> > >
> > >
> > >
> > > Thanks,
> > > sinan
> > >
> > >
> > > Asaf Mesika <as...@gmail.com> 于2023年2月27日周一 20:49写道:
> > >
> > > > >
> > > > > And you can see the difference between ProtoBuf and ProtoBufNative:
> > > > >
> > > > >
> https://pulsar.apache.org/docs/next/schema-get-started/#protobufnative
> > > > >
> > > > > https://pulsar.apache.org/docs/next/schema-get-started/#protobuf
> > > > >
> > > >  I read it and they look identical. What's the difference between
> them?
> > > >
> > > > Current avro,json, and protobuf schemas are all implemented based on
> > > AVRO.
> > > >
> > > > What do you mean, they are all implemented based on Avro? You mean
> the
> > > > protobuf schema is converted into an Avro Schema, and then you use
> Avro
> > > > compatibility validation?
> > > >
> > > >
> > > > > *Here are the basic compatibility rules we've defined:*
> > > >
> > > >
> > > > I think you should structure the validation rules differently:
> > > >
> > > > * Backward checks
> > > > ** List down rules, where use newSchema (the schema used by producer
> or
> > > > consumer) and existingSchema (last schema used)
> > > > * Forward
> > > > ** List down rules, where use newSchema (the schema used by producer
> or
> > > > consumer) and existingSchema (last schema used)
> > > >
> > > > So each strategy should have its own section.
> > > >
> > > > I'm saying this since you used "writttenSchema" word but it
> represents
> > > > something completely different if it's backward or forward check.
> > > >
> > > > Once you'll have that structure like that, I personally will be able
> to
> > > > read and understand it.
> > > >
> > > >
> > > > The motivation and problem statement are now good - thanks for
> improving
> > > > it.
> > > >
> > > > On Mon, Feb 27, 2023 at 8:20 AM SiNan Liu <li...@gmail.com>
> > > wrote:
> > > >
> > > > > Hi! I updated the PIP issue again. This time I've added some
> background
> > > > and
> > > > > some explanations.
> > > > >
> > > > > The compatibility check rules are already written in the
> > > Implementation.
> > > > > ProtoBufNative implements the same canRead method as Apache Avro.
> > > > > It does this by checking whether the schema for writing and reading
> is
> > > > > compatible. I also indicate whether the writtenSchema and
> readSchema of
> > > > the
> > > > > Backward, Forward, and Full strategies are the old or the new
> version
> > > of
> > > > > the schema.
> > > > >
> > > > > Thanks,
> > > > > sinan
> > > > >
> > > > > Asaf Mesika <as...@gmail.com> 于2023年2月26日周日 23:24写道:
> > > > >
> > > > > > I'm sorry, but this PIP lacks a lot of background knowledge, so
> you
> > > > need
> > > > > to
> > > > > > add IMO for people to understand it. You don't need to explain
> the
> > > > entire
> > > > > > pulsar in this PIP, but at the very least a few paragraphs
> detailing
> > > > all
> > > > > > you need to know, to put you in context:
> > > > > >
> > > > > >
> > > > > >    - Start by saying Pulsar as a built-in schema registry inside
> > > Pulsar
> > > > > >    broker.
> > > > > >       - Every time the client updates the schema, it uploads it
> to
> > > the
> > > > > >       broker. When that happens, it has a feature which validates
> if
> > > > the
> > > > > > new
> > > > > >       schema version is compatible with the previous versions.
> There
> > > > > > are 4 types
> > > > > >       of compatibility: Full, ... (complete and explain each one
> > > > briefly)
> > > > > >    - Also explain Pulsar Schema registry supports various schema
> > > > > >    protocols:  Avro, protobuf native, ... (complete the rest),
> each
> > > > > > protocol
> > > > > >    has a schema which dictates how to serialize and deserialize
> the
> > > > > message
> > > > > >    content into typed object.
> > > > > >    - Explain in short what is protobuf native (compare protobuf
> > > > > non-native)
> > > > > >    - Please don't paste code instead of explaining.
> > > > > >       - Explain that protobuf native current validation check is
> only
> > > > > >       composed of checking the root message name is the same
> between
> > > > > > the current
> > > > > >       schema version and the new version.
> > > > > >          - Explain briefly what is a root message and its name.
> > > > > >       - Explain the problem (list scenarios) that we have because
> > > > > protobuf
> > > > > >       native schema only supports FULL compatibility validation.
> > > > > >
> > > > > >
> > > > > > Regarding high level design - as in what you plan to do.
> > > > > > I suggest you add "High Level Design" and in it detail how you
> plan
> > > to
> > > > > > validate, per protobuf version, per compatibility check
> (backward,
> > > > > forward,
> > > > > > full,...).
> > > > > > I tried reading the implementation - for me , it's all over the
> > > place.
> > > > > Can
> > > > > > you please list in order what I wrote above, and list the
> validation
> > > > > rules
> > > > > > with a good explanation why you validate it like that?
> > > > > >
> > > > > > Lastly, one you have all the validation rules clearly stated, you
> can
> > > > use
> > > > > > it to document it properly so users can know what validation to
> > > expect.
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Asaf
> > > > > >
> > > > > >
> > > > > > On Wed, Feb 22, 2023 at 5:10 PM SiNan Liu <
> liusinan1998@gmail.com>
> > > > > wrote:
> > > > > >
> > > > > > > Sorry, my mistake. I removed the code and described the design
> to
> > > > > improve
> > > > > > > the PROTOBUF_NATIVE schema compatibility checks. You can have a
> > > look.
> > > > >
> > > > > > >
> > > > > > > Asaf Mesika <as...@gmail.com> 于2023年2月22日周三 21:16写道:
> > > > > > >
> > > > > > > > I read it but you're almost directly diving into the code -
> it
> > > will
> > > > > > take
> > > > > > > me
> > > > > > > > hours just to reverse engineer your design.
> > > > > > > >
> > > > > > > > Can you please include a "High Level Design" section in which
> you
> > > > > > explain
> > > > > > > > how you plan to tackle any issue?
> > > > > > > > If I can read that section and explain to someone else how
> this
> > > > will
> > > > > > > work,
> > > > > > > > it means the section is complete.
> > > > > > > >
> > > > > > > > Let's leave the code to the PRs.
> > > > > > > >
> > > > > > > >
> > > > > > > > On Sun, Feb 19, 2023 at 2:59 PM SiNan Liu <
> > > liusinan1998@gmail.com>
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi all,
> > > > > > > > >
> > > > > > > > > I made a PIP to discuss:
> > > > > > https://github.com/apache/pulsar/issues/19565
> > > > > > > .
> > > > > > > > >
> > > > > > > > > We can talk about the current design here. Especially for
> the
> > > > field
> > > > > > > type
> > > > > > > > > change check rules, please give your valuable advice.
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > Sinan
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
>

Re: [DISCUSS] PIP-246: Improved PROTOBUF_NATIVE schema compatibility checks without using avro-protobuf

Posted by Enrico Olivelli <eo...@gmail.com>.
(I apologise for top posting)

Would it be possible to add a flag to fallback to the previous behaviour ?
I know that adding such flags is a burden but if the upgrade breaks
some workflows then users won't be able to upgrade.
We can add the flag in the next release and drop it in the next major release

Enrico

Il giorno mer 1 mar 2023 alle ore 15:33 SiNan Liu
<li...@gmail.com> ha scritto:
>
> >
> > Can you please explain how a Protobuf Schema descriptor can be validated
> > for backward compatibility check using Avro based compatibility rules?
> > Doesn't it expect the schema to be Avro, but it is actually a Protobuf
> > descriptor?
> > Is there some translation happening?
>
>
> 1. *You can take a quick look at the previous design, the PROTOBUF uses
> avro struct to store.*
> https://github.com/apache/pulsar/pull/1954
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/ProtobufSchema.java#L59-L61
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/ProtobufSchema.java#L110-L115
>
> 2. *On the broker side, protobuf and avro both use `SchemaData` converted
> to `org.apache.avro.Schema`.*
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/ServerCnx.java#L1280-L1293
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/schema/ProtobufSchemaCompatibilityCheck.java#L26-L31
> https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/schema/AvroSchemaBasedCompatibilityCheck.java#L47-L70
>
>
>
> I'm sorry - I don't understand.
> > I understand the different compatibility check strategies.
> > If you just spell them out here, then as you say, just translate the
> > Protobuf Descriptor into an Avro schema and run the Avro
> > compatibility validation, no?
> > I believe the answer is no, since you may want to verify different things
> > when it comes to Protobuf, which are different then Avro.
>
>
> 1.
> *ProtobufSchema is different from ProtobufNativeSchema in that it uses
> avro-protobuf.*
> https://central.sonatype.com/artifact/org.apache.avro/avro-protobuf/1.11.1/overview
> *ProtobufNativeSchema needs a native compatibility check, but there is no
> official or third party implementation. So this PIP does not use
> avro-protobuf for protobuf compatibility checking.*
>
> 2. *By the way, this is implemented in much the same way that Apache avro
> does compatibility checking.*
> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/SchemaValidatorBuilder.java
> `canReadStrategy`,`canBeReadStrategy`,`mutualReadStrategy`
> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/ValidateCanRead.java
> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/ValidateCanBeRead.java
> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/ValidateMutualRead.java
> *In `ValidateMutualRead.java`, the arguments of `canRead()` are
> writtenSchema and readSchema. We only need to change the order of arguments
> we pass to `canRead()`.*
> ```java
> private void validateWithStrategy(Descriptors.Descriptor toValidate,
> Descriptors.Descriptor fromDescriptor) throws ProtoBufCanReadCheckException
> {
> switch (strategy) {
> case CanReadExistingStrategy -> canRead(fromDescriptor, toValidate);
> case CanBeReadByExistingStrategy -> canRead(toValidate, fromDescriptor);
> case CanBeReadMutualStrategy -> {
> canRead(toValidate, fromDescriptor);
> canRead(fromDescriptor, toValidate);
> }
> }
> }
>
> private void canRead(Descriptors.Descriptor writtenSchema,
> Descriptors.Descriptor readSchema) throws ProtoBufCanReadCheckException {
> ProtobufNativeSchemaBreakCheckUtils.checkSchemaCompatibility(writtenSchema,
> readSchema);
> }
> ```
>
>
> Thanks,
> sinan
>
>
>
> Asaf Mesika <as...@gmail.com> 于2023年3月1日周三 21:19写道:
> >
> > On Mon, Feb 27, 2023 at 3:47 PM SiNan Liu <li...@gmail.com> wrote:
> >
> > > >
> > > > I read it and they look identical. What's the difference between them?
> > >
> > > Current avro,json, and protobuf schemas are all implemented based on
> AVRO.
> > > > What do you mean, they are all implemented based on Avro? You mean the
> > > > protobuf schema is converted into an Avro Schema, and then you use
> Avro
> > > > compatibility validation?
> > >
> > >
> > >
> `org.apache.pulsar.broker.service.schema.ProtobufSchemaCompatibilityCheck`
> > > `org.apache.pulsar.broker.service.schema.AvroSchemaCompatibilityCheck`
> > > `org.apache.pulsar.broker.service.schema.JsonSchemaCompatibilityCheck`
> > > They all extends `AvroSchemaBasedCompatibilityCheck`, the
> > > `checkCompatible()` is the same implementation with `AVRO`.
> > >
> >
> > Can you please explain how a Protobuf Schema descriptor can be validated
> > for backward compatibility check using Avro based compatibility rules?
> > Doesn't it expect the schema to be Avro, but it is actually a Protobuf
> > descriptor?
> > Is there some translation happening?
> >
> >
> >
> > >
> > >
> > > I think you should structure the validation rules differently:
> > >
> > >
> > > The Compatibility check strategy is described on the website
> > >
> > >
> https://pulsar.apache.org/docs/next/schema-understand/#schema-compatibility-check-strategy
> > > 1. BACKWARD(CanReadExistingStrategy): Consumers using schema V3 can
> process
> > > data written by producers using the last schema version V2. So V2 is
> > > "writtenSchema" and V3 is "readSchema".
> > > 2. FORWARD(CanBeReadByExistingStrategy): Consumers using the last schema
> > > version V2 can process data written by producers using a new schema V3,
> > > even though they may not be able to use the full capabilities of the new
> > > schema. So V3 is "writtenSchema" and V2 is "readSchema".
> > > 3. FULL(CanBeReadMutualStrategy): Schemas are both backward and forward
> > > compatible.
> > > Schema can evolve. The old version schema and the new version schema
> should
> > > be well understood.
> > >
> > >
> > I'm sorry - I don't understand.
> > I understand the different compatibility check strategies.
> > If you just spell them out here, then as you say, just translate the
> > Protobuf Descriptor into an Avro schema and run the Avro
> > compatibility validation, no?
> > I believe the answer is no, since you may want to verify different things
> > when it comes to Protobuf, which are different then Avro.
> >
> > At the current state, I can't understand your design at all. Please help
> > clarify that.
> >
> >
> >
> >
> >
> > >
> > > So each strategy should have its own section.
> > >
> > >
> > > The arguments of `canRead()` are writtenSchema and readSchema. As we've
> > > just described, we only need to change the order of arguments we pass to
> > > `canRead()`.
> > >
> > >
> > >
> > > Thanks,
> > > sinan
> > >
> > >
> > > Asaf Mesika <as...@gmail.com> 于2023年2月27日周一 20:49写道:
> > >
> > > > >
> > > > > And you can see the difference between ProtoBuf and ProtoBufNative:
> > > > >
> > > > >
> https://pulsar.apache.org/docs/next/schema-get-started/#protobufnative
> > > > >
> > > > > https://pulsar.apache.org/docs/next/schema-get-started/#protobuf
> > > > >
> > > >  I read it and they look identical. What's the difference between
> them?
> > > >
> > > > Current avro,json, and protobuf schemas are all implemented based on
> > > AVRO.
> > > >
> > > > What do you mean, they are all implemented based on Avro? You mean the
> > > > protobuf schema is converted into an Avro Schema, and then you use
> Avro
> > > > compatibility validation?
> > > >
> > > >
> > > > > *Here are the basic compatibility rules we've defined:*
> > > >
> > > >
> > > > I think you should structure the validation rules differently:
> > > >
> > > > * Backward checks
> > > > ** List down rules, where use newSchema (the schema used by producer
> or
> > > > consumer) and existingSchema (last schema used)
> > > > * Forward
> > > > ** List down rules, where use newSchema (the schema used by producer
> or
> > > > consumer) and existingSchema (last schema used)
> > > >
> > > > So each strategy should have its own section.
> > > >
> > > > I'm saying this since you used "writttenSchema" word but it represents
> > > > something completely different if it's backward or forward check.
> > > >
> > > > Once you'll have that structure like that, I personally will be able
> to
> > > > read and understand it.
> > > >
> > > >
> > > > The motivation and problem statement are now good - thanks for
> improving
> > > > it.
> > > >
> > > > On Mon, Feb 27, 2023 at 8:20 AM SiNan Liu <li...@gmail.com>
> > > wrote:
> > > >
> > > > > Hi! I updated the PIP issue again. This time I've added some
> background
> > > > and
> > > > > some explanations.
> > > > >
> > > > > The compatibility check rules are already written in the
> > > Implementation.
> > > > > ProtoBufNative implements the same canRead method as Apache Avro.
> > > > > It does this by checking whether the schema for writing and reading
> is
> > > > > compatible. I also indicate whether the writtenSchema and
> readSchema of
> > > > the
> > > > > Backward, Forward, and Full strategies are the old or the new
> version
> > > of
> > > > > the schema.
> > > > >
> > > > > Thanks,
> > > > > sinan
> > > > >
> > > > > Asaf Mesika <as...@gmail.com> 于2023年2月26日周日 23:24写道:
> > > > >
> > > > > > I'm sorry, but this PIP lacks a lot of background knowledge, so
> you
> > > > need
> > > > > to
> > > > > > add IMO for people to understand it. You don't need to explain the
> > > > entire
> > > > > > pulsar in this PIP, but at the very least a few paragraphs
> detailing
> > > > all
> > > > > > you need to know, to put you in context:
> > > > > >
> > > > > >
> > > > > >    - Start by saying Pulsar as a built-in schema registry inside
> > > Pulsar
> > > > > >    broker.
> > > > > >       - Every time the client updates the schema, it uploads it to
> > > the
> > > > > >       broker. When that happens, it has a feature which validates
> if
> > > > the
> > > > > > new
> > > > > >       schema version is compatible with the previous versions.
> There
> > > > > > are 4 types
> > > > > >       of compatibility: Full, ... (complete and explain each one
> > > > briefly)
> > > > > >    - Also explain Pulsar Schema registry supports various schema
> > > > > >    protocols:  Avro, protobuf native, ... (complete the rest),
> each
> > > > > > protocol
> > > > > >    has a schema which dictates how to serialize and deserialize
> the
> > > > > message
> > > > > >    content into typed object.
> > > > > >    - Explain in short what is protobuf native (compare protobuf
> > > > > non-native)
> > > > > >    - Please don't paste code instead of explaining.
> > > > > >       - Explain that protobuf native current validation check is
> only
> > > > > >       composed of checking the root message name is the same
> between
> > > > > > the current
> > > > > >       schema version and the new version.
> > > > > >          - Explain briefly what is a root message and its name.
> > > > > >       - Explain the problem (list scenarios) that we have because
> > > > > protobuf
> > > > > >       native schema only supports FULL compatibility validation.
> > > > > >
> > > > > >
> > > > > > Regarding high level design - as in what you plan to do.
> > > > > > I suggest you add "High Level Design" and in it detail how you
> plan
> > > to
> > > > > > validate, per protobuf version, per compatibility check (backward,
> > > > > forward,
> > > > > > full,...).
> > > > > > I tried reading the implementation - for me , it's all over the
> > > place.
> > > > > Can
> > > > > > you please list in order what I wrote above, and list the
> validation
> > > > > rules
> > > > > > with a good explanation why you validate it like that?
> > > > > >
> > > > > > Lastly, one you have all the validation rules clearly stated, you
> can
> > > > use
> > > > > > it to document it properly so users can know what validation to
> > > expect.
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Asaf
> > > > > >
> > > > > >
> > > > > > On Wed, Feb 22, 2023 at 5:10 PM SiNan Liu <li...@gmail.com>
> > > > > wrote:
> > > > > >
> > > > > > > Sorry, my mistake. I removed the code and described the design
> to
> > > > > improve
> > > > > > > the PROTOBUF_NATIVE schema compatibility checks. You can have a
> > > look.
> > > > >
> > > > > > >
> > > > > > > Asaf Mesika <as...@gmail.com> 于2023年2月22日周三 21:16写道:
> > > > > > >
> > > > > > > > I read it but you're almost directly diving into the code - it
> > > will
> > > > > > take
> > > > > > > me
> > > > > > > > hours just to reverse engineer your design.
> > > > > > > >
> > > > > > > > Can you please include a "High Level Design" section in which
> you
> > > > > > explain
> > > > > > > > how you plan to tackle any issue?
> > > > > > > > If I can read that section and explain to someone else how
> this
> > > > will
> > > > > > > work,
> > > > > > > > it means the section is complete.
> > > > > > > >
> > > > > > > > Let's leave the code to the PRs.
> > > > > > > >
> > > > > > > >
> > > > > > > > On Sun, Feb 19, 2023 at 2:59 PM SiNan Liu <
> > > liusinan1998@gmail.com>
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi all,
> > > > > > > > >
> > > > > > > > > I made a PIP to discuss:
> > > > > > https://github.com/apache/pulsar/issues/19565
> > > > > > > .
> > > > > > > > >
> > > > > > > > > We can talk about the current design here. Especially for
> the
> > > > field
> > > > > > > type
> > > > > > > > > change check rules, please give your valuable advice.
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > Sinan
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >

Re: [DISCUSS] PIP-246: Improved PROTOBUF_NATIVE schema compatibility checks without using avro-protobuf

Posted by SiNan Liu <li...@gmail.com>.
>
> Can you please explain how a Protobuf Schema descriptor can be validated
> for backward compatibility check using Avro based compatibility rules?
> Doesn't it expect the schema to be Avro, but it is actually a Protobuf
> descriptor?
> Is there some translation happening?


1. *You can take a quick look at the previous design, the PROTOBUF uses
avro struct to store.*
https://github.com/apache/pulsar/pull/1954
https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/ProtobufSchema.java#L59-L61
https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/ProtobufSchema.java#L110-L115

2. *On the broker side, protobuf and avro both use `SchemaData` converted
to `org.apache.avro.Schema`.*
https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/ServerCnx.java#L1280-L1293
https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/schema/ProtobufSchemaCompatibilityCheck.java#L26-L31
https://github.com/apache/pulsar/blob/579f22c8449be287ee1209a477aeaad346495289/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/schema/AvroSchemaBasedCompatibilityCheck.java#L47-L70



I'm sorry - I don't understand.
> I understand the different compatibility check strategies.
> If you just spell them out here, then as you say, just translate the
> Protobuf Descriptor into an Avro schema and run the Avro
> compatibility validation, no?
> I believe the answer is no, since you may want to verify different things
> when it comes to Protobuf, which are different then Avro.


1.
*ProtobufSchema is different from ProtobufNativeSchema in that it uses
avro-protobuf.*
https://central.sonatype.com/artifact/org.apache.avro/avro-protobuf/1.11.1/overview
*ProtobufNativeSchema needs a native compatibility check, but there is no
official or third party implementation. So this PIP does not use
avro-protobuf for protobuf compatibility checking.*

2. *By the way, this is implemented in much the same way that Apache avro
does compatibility checking.*
https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/SchemaValidatorBuilder.java
`canReadStrategy`,`canBeReadStrategy`,`mutualReadStrategy`
https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/ValidateCanRead.java
https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/ValidateCanBeRead.java
https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/ValidateMutualRead.java
*In `ValidateMutualRead.java`, the arguments of `canRead()` are
writtenSchema and readSchema. We only need to change the order of arguments
we pass to `canRead()`.*
```java
private void validateWithStrategy(Descriptors.Descriptor toValidate,
Descriptors.Descriptor fromDescriptor) throws ProtoBufCanReadCheckException
{
switch (strategy) {
case CanReadExistingStrategy -> canRead(fromDescriptor, toValidate);
case CanBeReadByExistingStrategy -> canRead(toValidate, fromDescriptor);
case CanBeReadMutualStrategy -> {
canRead(toValidate, fromDescriptor);
canRead(fromDescriptor, toValidate);
}
}
}

private void canRead(Descriptors.Descriptor writtenSchema,
Descriptors.Descriptor readSchema) throws ProtoBufCanReadCheckException {
ProtobufNativeSchemaBreakCheckUtils.checkSchemaCompatibility(writtenSchema,
readSchema);
}
```


Thanks,
sinan



Asaf Mesika <as...@gmail.com> 于2023年3月1日周三 21:19写道:
>
> On Mon, Feb 27, 2023 at 3:47 PM SiNan Liu <li...@gmail.com> wrote:
>
> > >
> > > I read it and they look identical. What's the difference between them?
> >
> > Current avro,json, and protobuf schemas are all implemented based on
AVRO.
> > > What do you mean, they are all implemented based on Avro? You mean the
> > > protobuf schema is converted into an Avro Schema, and then you use
Avro
> > > compatibility validation?
> >
> >
> >
`org.apache.pulsar.broker.service.schema.ProtobufSchemaCompatibilityCheck`
> > `org.apache.pulsar.broker.service.schema.AvroSchemaCompatibilityCheck`
> > `org.apache.pulsar.broker.service.schema.JsonSchemaCompatibilityCheck`
> > They all extends `AvroSchemaBasedCompatibilityCheck`, the
> > `checkCompatible()` is the same implementation with `AVRO`.
> >
>
> Can you please explain how a Protobuf Schema descriptor can be validated
> for backward compatibility check using Avro based compatibility rules?
> Doesn't it expect the schema to be Avro, but it is actually a Protobuf
> descriptor?
> Is there some translation happening?
>
>
>
> >
> >
> > I think you should structure the validation rules differently:
> >
> >
> > The Compatibility check strategy is described on the website
> >
> >
https://pulsar.apache.org/docs/next/schema-understand/#schema-compatibility-check-strategy
> > 1. BACKWARD(CanReadExistingStrategy): Consumers using schema V3 can
process
> > data written by producers using the last schema version V2. So V2 is
> > "writtenSchema" and V3 is "readSchema".
> > 2. FORWARD(CanBeReadByExistingStrategy): Consumers using the last schema
> > version V2 can process data written by producers using a new schema V3,
> > even though they may not be able to use the full capabilities of the new
> > schema. So V3 is "writtenSchema" and V2 is "readSchema".
> > 3. FULL(CanBeReadMutualStrategy): Schemas are both backward and forward
> > compatible.
> > Schema can evolve. The old version schema and the new version schema
should
> > be well understood.
> >
> >
> I'm sorry - I don't understand.
> I understand the different compatibility check strategies.
> If you just spell them out here, then as you say, just translate the
> Protobuf Descriptor into an Avro schema and run the Avro
> compatibility validation, no?
> I believe the answer is no, since you may want to verify different things
> when it comes to Protobuf, which are different then Avro.
>
> At the current state, I can't understand your design at all. Please help
> clarify that.
>
>
>
>
>
> >
> > So each strategy should have its own section.
> >
> >
> > The arguments of `canRead()` are writtenSchema and readSchema. As we've
> > just described, we only need to change the order of arguments we pass to
> > `canRead()`.
> >
> >
> >
> > Thanks,
> > sinan
> >
> >
> > Asaf Mesika <as...@gmail.com> 于2023年2月27日周一 20:49写道:
> >
> > > >
> > > > And you can see the difference between ProtoBuf and ProtoBufNative:
> > > >
> > > >
https://pulsar.apache.org/docs/next/schema-get-started/#protobufnative
> > > >
> > > > https://pulsar.apache.org/docs/next/schema-get-started/#protobuf
> > > >
> > >  I read it and they look identical. What's the difference between
them?
> > >
> > > Current avro,json, and protobuf schemas are all implemented based on
> > AVRO.
> > >
> > > What do you mean, they are all implemented based on Avro? You mean the
> > > protobuf schema is converted into an Avro Schema, and then you use
Avro
> > > compatibility validation?
> > >
> > >
> > > > *Here are the basic compatibility rules we've defined:*
> > >
> > >
> > > I think you should structure the validation rules differently:
> > >
> > > * Backward checks
> > > ** List down rules, where use newSchema (the schema used by producer
or
> > > consumer) and existingSchema (last schema used)
> > > * Forward
> > > ** List down rules, where use newSchema (the schema used by producer
or
> > > consumer) and existingSchema (last schema used)
> > >
> > > So each strategy should have its own section.
> > >
> > > I'm saying this since you used "writttenSchema" word but it represents
> > > something completely different if it's backward or forward check.
> > >
> > > Once you'll have that structure like that, I personally will be able
to
> > > read and understand it.
> > >
> > >
> > > The motivation and problem statement are now good - thanks for
improving
> > > it.
> > >
> > > On Mon, Feb 27, 2023 at 8:20 AM SiNan Liu <li...@gmail.com>
> > wrote:
> > >
> > > > Hi! I updated the PIP issue again. This time I've added some
background
> > > and
> > > > some explanations.
> > > >
> > > > The compatibility check rules are already written in the
> > Implementation.
> > > > ProtoBufNative implements the same canRead method as Apache Avro.
> > > > It does this by checking whether the schema for writing and reading
is
> > > > compatible. I also indicate whether the writtenSchema and
readSchema of
> > > the
> > > > Backward, Forward, and Full strategies are the old or the new
version
> > of
> > > > the schema.
> > > >
> > > > Thanks,
> > > > sinan
> > > >
> > > > Asaf Mesika <as...@gmail.com> 于2023年2月26日周日 23:24写道:
> > > >
> > > > > I'm sorry, but this PIP lacks a lot of background knowledge, so
you
> > > need
> > > > to
> > > > > add IMO for people to understand it. You don't need to explain the
> > > entire
> > > > > pulsar in this PIP, but at the very least a few paragraphs
detailing
> > > all
> > > > > you need to know, to put you in context:
> > > > >
> > > > >
> > > > >    - Start by saying Pulsar as a built-in schema registry inside
> > Pulsar
> > > > >    broker.
> > > > >       - Every time the client updates the schema, it uploads it to
> > the
> > > > >       broker. When that happens, it has a feature which validates
if
> > > the
> > > > > new
> > > > >       schema version is compatible with the previous versions.
There
> > > > > are 4 types
> > > > >       of compatibility: Full, ... (complete and explain each one
> > > briefly)
> > > > >    - Also explain Pulsar Schema registry supports various schema
> > > > >    protocols:  Avro, protobuf native, ... (complete the rest),
each
> > > > > protocol
> > > > >    has a schema which dictates how to serialize and deserialize
the
> > > > message
> > > > >    content into typed object.
> > > > >    - Explain in short what is protobuf native (compare protobuf
> > > > non-native)
> > > > >    - Please don't paste code instead of explaining.
> > > > >       - Explain that protobuf native current validation check is
only
> > > > >       composed of checking the root message name is the same
between
> > > > > the current
> > > > >       schema version and the new version.
> > > > >          - Explain briefly what is a root message and its name.
> > > > >       - Explain the problem (list scenarios) that we have because
> > > > protobuf
> > > > >       native schema only supports FULL compatibility validation.
> > > > >
> > > > >
> > > > > Regarding high level design - as in what you plan to do.
> > > > > I suggest you add "High Level Design" and in it detail how you
plan
> > to
> > > > > validate, per protobuf version, per compatibility check (backward,
> > > > forward,
> > > > > full,...).
> > > > > I tried reading the implementation - for me , it's all over the
> > place.
> > > > Can
> > > > > you please list in order what I wrote above, and list the
validation
> > > > rules
> > > > > with a good explanation why you validate it like that?
> > > > >
> > > > > Lastly, one you have all the validation rules clearly stated, you
can
> > > use
> > > > > it to document it properly so users can know what validation to
> > expect.
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Asaf
> > > > >
> > > > >
> > > > > On Wed, Feb 22, 2023 at 5:10 PM SiNan Liu <li...@gmail.com>
> > > > wrote:
> > > > >
> > > > > > Sorry, my mistake. I removed the code and described the design
to
> > > > improve
> > > > > > the PROTOBUF_NATIVE schema compatibility checks. You can have a
> > look.
> > > >
> > > > > >
> > > > > > Asaf Mesika <as...@gmail.com> 于2023年2月22日周三 21:16写道:
> > > > > >
> > > > > > > I read it but you're almost directly diving into the code - it
> > will
> > > > > take
> > > > > > me
> > > > > > > hours just to reverse engineer your design.
> > > > > > >
> > > > > > > Can you please include a "High Level Design" section in which
you
> > > > > explain
> > > > > > > how you plan to tackle any issue?
> > > > > > > If I can read that section and explain to someone else how
this
> > > will
> > > > > > work,
> > > > > > > it means the section is complete.
> > > > > > >
> > > > > > > Let's leave the code to the PRs.
> > > > > > >
> > > > > > >
> > > > > > > On Sun, Feb 19, 2023 at 2:59 PM SiNan Liu <
> > liusinan1998@gmail.com>
> > > > > > wrote:
> > > > > > >
> > > > > > > > Hi all,
> > > > > > > >
> > > > > > > > I made a PIP to discuss:
> > > > > https://github.com/apache/pulsar/issues/19565
> > > > > > .
> > > > > > > >
> > > > > > > > We can talk about the current design here. Especially for
the
> > > field
> > > > > > type
> > > > > > > > change check rules, please give your valuable advice.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Sinan
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >