You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pulsar.apache.org by Yunze Xu <yz...@streamnative.io.INVALID> on 2022/05/17 03:24:46 UTC

Re: [DISCUSS] Byte schema compatibility issue

For case 1, if you are using bytes schema to produce messages, it will be
user's responsibility to ensure the schema compatibility. Then at consumer side,
`Message#getValue`, which decodes the bytes internally via the schema,
should throw a `SchemaSerializationException` if the bytes of the value cannot
Be decoded.

Unfortunately, there is a bug that prevents bytes being decoded and it always
failed before decoding. I opened a PR to fix this issue:
https://github.com/apache/pulsar/pull/15622 

If you don’t want to check the schema compatibility at consumer side, you can
configure `isSchemaValidationEnforced` with true so that the creation of a producer
without schema on a topic with schema will fail.

IMO, bytes schema is treated as “without schema”. The issue is actually:
- Produce messages without schema
- Consume messages with schema

If `isSchemaValidationEnforced` is true, the producer cannot be created.
Otherwise, since we cannot guarantee the format of the message at producer side
and we cannot try to decode it at broker side. The only way is handling the error
at consumer side:
1. Decoding the message successfully, return the decoded value.
2. Otherwise, throw a `SchemaSerializationException`.

There is no problem with current implementation except what I tried to fix in #15622.


Thanks,
Yunze




> 2022年3月8日 10:55,guo jiwei <te...@apache.org> 写道:
> 
> Hi,
>   I want to discuss the compatibility issue with the byte schema here.
>   For now, the byte-schema is compatible with all other schemas. This may
> introduce more issues.
>   Case 1:
>          1. Consumer1 init with JSON schema for topic A.
>          2. But producer1 init without schema and send byte messages
> directly to topic A.
>          This will cause consumer1 to deserialize msg error.  Also,
> producer1 may send unsafe byte data.
> 
>     Case 2:
>           1. Consumer1 init with byte schema for topic A.
>           2. But producer1 init with AVRO/JSON schema and send messages to
> topic A.
>           This will cause consumer1 don't know how to deserialize msg.
> 
>    To avoid the above issues, Byte schema should also follow the schema
> compatibility policy. I'm open #13701
> <https://github.com/apache/pulsar/issues/13701> to track this. If the idea
> is accepted, I will start a PIP.
> 
>     Please give some suggestions about this idea.
> 
> 
> Regards
> Jiwei Guo (Tboy)


Re: [DISCUSS] Byte schema compatibility issue

Posted by guo jiwei <te...@apache.org>.
Good idea @Yunze
Since `isSchemaValidationEnforced` is only on broker side, I decide to
support it on namespace and topic level.


Regards
Jiwei Guo (Tboy)


On Tue, May 17, 2022 at 11:24 AM Yunze Xu <yz...@streamnative.io.invalid>
wrote:

> For case 1, if you are using bytes schema to produce messages, it will be
> user's responsibility to ensure the schema compatibility. Then at consumer
> side,
> `Message#getValue`, which decodes the bytes internally via the schema,
> should throw a `SchemaSerializationException` if the bytes of the value
> cannot
> Be decoded.
>
> Unfortunately, there is a bug that prevents bytes being decoded and it
> always
> failed before decoding. I opened a PR to fix this issue:
> https://github.com/apache/pulsar/pull/15622
>
> If you don’t want to check the schema compatibility at consumer side, you
> can
> configure `isSchemaValidationEnforced` with true so that the creation of a
> producer
> without schema on a topic with schema will fail.
>
> IMO, bytes schema is treated as “without schema”. The issue is actually:
> - Produce messages without schema
> - Consume messages with schema
>
> If `isSchemaValidationEnforced` is true, the producer cannot be created.
> Otherwise, since we cannot guarantee the format of the message at producer
> side
> and we cannot try to decode it at broker side. The only way is handling
> the error
> at consumer side:
> 1. Decoding the message successfully, return the decoded value.
> 2. Otherwise, throw a `SchemaSerializationException`.
>
> There is no problem with current implementation except what I tried to fix
> in #15622.
>
>
> Thanks,
> Yunze
>
>
>
>
> > 2022年3月8日 10:55,guo jiwei <te...@apache.org> 写道:
> >
> > Hi,
> >   I want to discuss the compatibility issue with the byte schema here.
> >   For now, the byte-schema is compatible with all other schemas. This may
> > introduce more issues.
> >   Case 1:
> >          1. Consumer1 init with JSON schema for topic A.
> >          2. But producer1 init without schema and send byte messages
> > directly to topic A.
> >          This will cause consumer1 to deserialize msg error.  Also,
> > producer1 may send unsafe byte data.
> >
> >     Case 2:
> >           1. Consumer1 init with byte schema for topic A.
> >           2. But producer1 init with AVRO/JSON schema and send messages
> to
> > topic A.
> >           This will cause consumer1 don't know how to deserialize msg.
> >
> >    To avoid the above issues, Byte schema should also follow the schema
> > compatibility policy. I'm open #13701
> > <https://github.com/apache/pulsar/issues/13701> to track this. If the
> idea
> > is accepted, I will start a PIP.
> >
> >     Please give some suggestions about this idea.
> >
> >
> > Regards
> > Jiwei Guo (Tboy)
>
>