You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pulsar.apache.org by Yunze Xu <yz...@streamnative.io.INVALID> on 2023/03/29 08:00:32 UTC

[Python] Should we make the schema default compatible with Java client?

Hi all,

Recently I found the default generated schema definition in the Python
client is different from the Java client, which leads to some
unexpected behavior.

For example, given the following class definition in Python:

```python
class Data(Record):
    i = Integer()
```

The type of `i` field is a union: "type": ["null", "int"]

While given the following class definition in Java:

```java
class Data {
    private final int i;
    /* ... */
}
```

The type of `i` field is an integer: "type": "int"

It brings an issue that if a Python consumer subscribes to a topic
with schema defined above, then a Java producer will fail to create
because of the schema incompatibility.

Currently, the workaround is to change the schema compatibility
strategy to FORWARD.

Should we change the way to generate schema definition in the Python
client to be compatible with the Java client? It could bring breaking
changes to old Python clients, but it could guarantee compatibility
with the Java client.

If not, we still have to introduce an extra configuration to make
Python schema compatible with Java schema. But it requires code
changes. e.g. here is a possible solution:

```python
class Data(Record):
    # NOTE: Users might have to add this extra field to control how to
generate the schema
    __java_compatible = True
    i = Integer()
```

Thanks,
Yunze

Re: [Python] Should we make the schema default compatible with Java client?

Posted by 丛搏 <co...@gmail.com>.
Hi, Yunze:

+1

> Just checked this thread and found I didn't paste this issue:
> https://github.com/apache/pulsar-client-python/issues/108. You can see
> the schema compatibility strategy is FORWARD, then the sorted schema
> from the Java client overwrote the unsorted schema from the Python
> client. However, the Python consumer that uses the old schema failed
> to decode the message of the new schema.

if this changes the default behavior, the user upgrading the python
client will register one more schema. This is a breaking change, so the
old users need to change these using python schema code when they
upgrade the python client. We need to note it in the release note.

Thanks
Bo
>
> My goal is to make the Python client act the same as the Java client
> since the next formal release. Regarding how the broker processes it,
> I think it's another thing to be fixed.
>
> Thanks,
> Yunze
>
> On Thu, Mar 30, 2023 at 8:42 PM 丛搏 <co...@gmail.com> wrote:
> >
> > Hi, Yunze:
> >
> > > Regarding the 1st question, yes, that's why I open this thread to
> > > discuss. If we change these default values, the behavior of new Python
> > > clients will be like the Java client. In addition, it actually reverts
> > > the breaking change brought in #12232.
> >
> > I also kind of forget why we have #12232 to change the default behavior
> > Maybe the python2 and python3 order rule is different.
> >
> > If we change the order is the default value, for every topic that uses
> > python client will register a new schema. Will it register a new
> > schema? Maybe we should add a special logic in the broker to
> > check the python client version and make it will not register
> > a new schema. Otherwise, the impact may still be quite large.
> >
> > Thanks,
> > Bo
> > >
> > > Regarding the 2nd question, yes, they are both sorted in alphabetical
> > > order. I don't know the behavior of the .NET clients, for C++, Golang,
> > > Node.js clients, they all do not support generating schema definition
> > > from a DTO.
> > >
> > > Thanks,
> > > Yunze
> > >
> > > On Thu, Mar 30, 2023 at 10:14 AM 丛搏 <co...@gmail.com> wrote:
> > > >
> > > > Hi, Yunze :
> > > >
> > > > 1. If the changes may cause some compatibility issues.
> > > > How do we solve the compatibility issues? It may be a
> > > > breaking change.
> > > >
> > > > 2. Another question is if sorting is enabled by default,
> > > > is the sorting rule the same as java or other clients?
> > > >
> > > > Putting aside the above two problems, I think it is
> > > > good to be consistent with other clients.
> > > >
> > > > Thanks,
> > > > Bo
> > > >
> > > > Eric Hare <er...@datastax.com> 于2023年3月29日周三 22:42写道:
> > > > >
> > > > > +1 - i think keeping the `_sorted_fields` and `_required` defaults consistent between the clients is the way to go.
> > > > >
> > > > > > On Mar 29, 2023, at 7:09 AM, Yunze Xu <yz...@streamnative.io.INVALID> wrote:
> > > > > >
> > > > > > I found the Python client has two options to control the behavior:
> > > > > > 1. Set `_sorted_fields`. It's false by default in the Python client,
> > > > > > but it's true in the Java client. i.e. the Java client sorts all
> > > > > > fields by default.
> > > > > > 2. Set `_required`. It's false by default for all types in the Python
> > > > > > client, but it's only false for the string type in the Java client.
> > > > > >
> > > > > > i.e. given the following Java class:
> > > > > >
> > > > > > ```java
> > > > > > class User {
> > > > > >    String name;
> > > > > >    int age;
> > > > > >    double score;
> > > > > > }
> > > > > > ```
> > > > > >
> > > > > > We have to give the following definition in Python:
> > > > > >
> > > > > > ```python
> > > > > > class User(Record):
> > > > > >    _sorted_fields = True
> > > > > >    name = String()
> > > > > >    age = Integer(required=True)
> > > > > >    score = Double(required=True)
> > > > > > ```
> > > > > >
> > > > > > I see https://github.com/apache/pulsar/pull/12232 adds the
> > > > > > `_sorted_fields` field and disables the field sort by default. It
> > > > > > breaks compatibility with the Java client.
> > > > > >
> > > > > > IMO, we should make `_sorted_fields` true by default and `_required`
> > > > > > true for all types other than `String` by default.
> > > > > >
> > > > > > Thanks,
> > > > > > Yunze
> > > > > >
> > > > > > On Wed, Mar 29, 2023 at 4:00 PM Yunze Xu <yz...@streamnative.io> wrote:
> > > > > >>
> > > > > >> Hi all,
> > > > > >>
> > > > > >> Recently I found the default generated schema definition in the Python
> > > > > >> client is different from the Java client, which leads to some
> > > > > >> unexpected behavior.
> > > > > >>
> > > > > >> For example, given the following class definition in Python:
> > > > > >>
> > > > > >> ```python
> > > > > >> class Data(Record):
> > > > > >>    i = Integer()
> > > > > >> ```
> > > > > >>
> > > > > >> The type of `i` field is a union: "type": ["null", "int"]
> > > > > >>
> > > > > >> While given the following class definition in Java:
> > > > > >>
> > > > > >> ```java
> > > > > >> class Data {
> > > > > >>    private final int i;
> > > > > >>    /* ... */
> > > > > >> }
> > > > > >> ```
> > > > > >>
> > > > > >> The type of `i` field is an integer: "type": "int"
> > > > > >>
> > > > > >> It brings an issue that if a Python consumer subscribes to a topic
> > > > > >> with schema defined above, then a Java producer will fail to create
> > > > > >> because of the schema incompatibility.
> > > > > >>
> > > > > >> Currently, the workaround is to change the schema compatibility
> > > > > >> strategy to FORWARD.
> > > > > >>
> > > > > >> Should we change the way to generate schema definition in the Python
> > > > > >> client to be compatible with the Java client? It could bring breaking
> > > > > >> changes to old Python clients, but it could guarantee compatibility
> > > > > >> with the Java client.
> > > > > >>
> > > > > >> If not, we still have to introduce an extra configuration to make
> > > > > >> Python schema compatible with Java schema. But it requires code
> > > > > >> changes. e.g. here is a possible solution:
> > > > > >>
> > > > > >> ```python
> > > > > >> class Data(Record):
> > > > > >>    # NOTE: Users might have to add this extra field to control how to
> > > > > >> generate the schema
> > > > > >>    __java_compatible = True
> > > > > >>    i = Integer()
> > > > > >> ```
> > > > > >>
> > > > > >> Thanks,
> > > > > >> Yunze
> > > > >

Re: [Python] Should we make the schema default compatible with Java client?

Posted by Yunze Xu <yz...@streamnative.io.INVALID>.
> Will it register a new schema?

Only when it could pass the schema compatibility strategy. BTW, the
existing schema compatibility checker does not check the order of
fields, while it is very important. IMO, it's a bug of the broker.

Just checked this thread and found I didn't paste this issue:
https://github.com/apache/pulsar-client-python/issues/108. You can see
the schema compatibility strategy is FORWARD, then the sorted schema
from the Java client overwrote the unsorted schema from the Python
client. However, the Python consumer that uses the old schema failed
to decode the message of the new schema.

My goal is to make the Python client act the same as the Java client
since the next formal release. Regarding how the broker processes it,
I think it's another thing to be fixed.

Thanks,
Yunze

On Thu, Mar 30, 2023 at 8:42 PM 丛搏 <co...@gmail.com> wrote:
>
> Hi, Yunze:
>
> > Regarding the 1st question, yes, that's why I open this thread to
> > discuss. If we change these default values, the behavior of new Python
> > clients will be like the Java client. In addition, it actually reverts
> > the breaking change brought in #12232.
>
> I also kind of forget why we have #12232 to change the default behavior
> Maybe the python2 and python3 order rule is different.
>
> If we change the order is the default value, for every topic that uses
> python client will register a new schema. Will it register a new
> schema? Maybe we should add a special logic in the broker to
> check the python client version and make it will not register
> a new schema. Otherwise, the impact may still be quite large.
>
> Thanks,
> Bo
> >
> > Regarding the 2nd question, yes, they are both sorted in alphabetical
> > order. I don't know the behavior of the .NET clients, for C++, Golang,
> > Node.js clients, they all do not support generating schema definition
> > from a DTO.
> >
> > Thanks,
> > Yunze
> >
> > On Thu, Mar 30, 2023 at 10:14 AM 丛搏 <co...@gmail.com> wrote:
> > >
> > > Hi, Yunze :
> > >
> > > 1. If the changes may cause some compatibility issues.
> > > How do we solve the compatibility issues? It may be a
> > > breaking change.
> > >
> > > 2. Another question is if sorting is enabled by default,
> > > is the sorting rule the same as java or other clients?
> > >
> > > Putting aside the above two problems, I think it is
> > > good to be consistent with other clients.
> > >
> > > Thanks,
> > > Bo
> > >
> > > Eric Hare <er...@datastax.com> 于2023年3月29日周三 22:42写道:
> > > >
> > > > +1 - i think keeping the `_sorted_fields` and `_required` defaults consistent between the clients is the way to go.
> > > >
> > > > > On Mar 29, 2023, at 7:09 AM, Yunze Xu <yz...@streamnative.io.INVALID> wrote:
> > > > >
> > > > > I found the Python client has two options to control the behavior:
> > > > > 1. Set `_sorted_fields`. It's false by default in the Python client,
> > > > > but it's true in the Java client. i.e. the Java client sorts all
> > > > > fields by default.
> > > > > 2. Set `_required`. It's false by default for all types in the Python
> > > > > client, but it's only false for the string type in the Java client.
> > > > >
> > > > > i.e. given the following Java class:
> > > > >
> > > > > ```java
> > > > > class User {
> > > > >    String name;
> > > > >    int age;
> > > > >    double score;
> > > > > }
> > > > > ```
> > > > >
> > > > > We have to give the following definition in Python:
> > > > >
> > > > > ```python
> > > > > class User(Record):
> > > > >    _sorted_fields = True
> > > > >    name = String()
> > > > >    age = Integer(required=True)
> > > > >    score = Double(required=True)
> > > > > ```
> > > > >
> > > > > I see https://github.com/apache/pulsar/pull/12232 adds the
> > > > > `_sorted_fields` field and disables the field sort by default. It
> > > > > breaks compatibility with the Java client.
> > > > >
> > > > > IMO, we should make `_sorted_fields` true by default and `_required`
> > > > > true for all types other than `String` by default.
> > > > >
> > > > > Thanks,
> > > > > Yunze
> > > > >
> > > > > On Wed, Mar 29, 2023 at 4:00 PM Yunze Xu <yz...@streamnative.io> wrote:
> > > > >>
> > > > >> Hi all,
> > > > >>
> > > > >> Recently I found the default generated schema definition in the Python
> > > > >> client is different from the Java client, which leads to some
> > > > >> unexpected behavior.
> > > > >>
> > > > >> For example, given the following class definition in Python:
> > > > >>
> > > > >> ```python
> > > > >> class Data(Record):
> > > > >>    i = Integer()
> > > > >> ```
> > > > >>
> > > > >> The type of `i` field is a union: "type": ["null", "int"]
> > > > >>
> > > > >> While given the following class definition in Java:
> > > > >>
> > > > >> ```java
> > > > >> class Data {
> > > > >>    private final int i;
> > > > >>    /* ... */
> > > > >> }
> > > > >> ```
> > > > >>
> > > > >> The type of `i` field is an integer: "type": "int"
> > > > >>
> > > > >> It brings an issue that if a Python consumer subscribes to a topic
> > > > >> with schema defined above, then a Java producer will fail to create
> > > > >> because of the schema incompatibility.
> > > > >>
> > > > >> Currently, the workaround is to change the schema compatibility
> > > > >> strategy to FORWARD.
> > > > >>
> > > > >> Should we change the way to generate schema definition in the Python
> > > > >> client to be compatible with the Java client? It could bring breaking
> > > > >> changes to old Python clients, but it could guarantee compatibility
> > > > >> with the Java client.
> > > > >>
> > > > >> If not, we still have to introduce an extra configuration to make
> > > > >> Python schema compatible with Java schema. But it requires code
> > > > >> changes. e.g. here is a possible solution:
> > > > >>
> > > > >> ```python
> > > > >> class Data(Record):
> > > > >>    # NOTE: Users might have to add this extra field to control how to
> > > > >> generate the schema
> > > > >>    __java_compatible = True
> > > > >>    i = Integer()
> > > > >> ```
> > > > >>
> > > > >> Thanks,
> > > > >> Yunze
> > > >

Re: [Python] Should we make the schema default compatible with Java client?

Posted by 丛搏 <co...@gmail.com>.
Hi, Yunze:

> Regarding the 1st question, yes, that's why I open this thread to
> discuss. If we change these default values, the behavior of new Python
> clients will be like the Java client. In addition, it actually reverts
> the breaking change brought in #12232.

I also kind of forget why we have #12232 to change the default behavior
Maybe the python2 and python3 order rule is different.

If we change the order is the default value, for every topic that uses
python client will register a new schema. Will it register a new
schema? Maybe we should add a special logic in the broker to
check the python client version and make it will not register
a new schema. Otherwise, the impact may still be quite large.

Thanks,
Bo
>
> Regarding the 2nd question, yes, they are both sorted in alphabetical
> order. I don't know the behavior of the .NET clients, for C++, Golang,
> Node.js clients, they all do not support generating schema definition
> from a DTO.
>
> Thanks,
> Yunze
>
> On Thu, Mar 30, 2023 at 10:14 AM 丛搏 <co...@gmail.com> wrote:
> >
> > Hi, Yunze :
> >
> > 1. If the changes may cause some compatibility issues.
> > How do we solve the compatibility issues? It may be a
> > breaking change.
> >
> > 2. Another question is if sorting is enabled by default,
> > is the sorting rule the same as java or other clients?
> >
> > Putting aside the above two problems, I think it is
> > good to be consistent with other clients.
> >
> > Thanks,
> > Bo
> >
> > Eric Hare <er...@datastax.com> 于2023年3月29日周三 22:42写道:
> > >
> > > +1 - i think keeping the `_sorted_fields` and `_required` defaults consistent between the clients is the way to go.
> > >
> > > > On Mar 29, 2023, at 7:09 AM, Yunze Xu <yz...@streamnative.io.INVALID> wrote:
> > > >
> > > > I found the Python client has two options to control the behavior:
> > > > 1. Set `_sorted_fields`. It's false by default in the Python client,
> > > > but it's true in the Java client. i.e. the Java client sorts all
> > > > fields by default.
> > > > 2. Set `_required`. It's false by default for all types in the Python
> > > > client, but it's only false for the string type in the Java client.
> > > >
> > > > i.e. given the following Java class:
> > > >
> > > > ```java
> > > > class User {
> > > >    String name;
> > > >    int age;
> > > >    double score;
> > > > }
> > > > ```
> > > >
> > > > We have to give the following definition in Python:
> > > >
> > > > ```python
> > > > class User(Record):
> > > >    _sorted_fields = True
> > > >    name = String()
> > > >    age = Integer(required=True)
> > > >    score = Double(required=True)
> > > > ```
> > > >
> > > > I see https://github.com/apache/pulsar/pull/12232 adds the
> > > > `_sorted_fields` field and disables the field sort by default. It
> > > > breaks compatibility with the Java client.
> > > >
> > > > IMO, we should make `_sorted_fields` true by default and `_required`
> > > > true for all types other than `String` by default.
> > > >
> > > > Thanks,
> > > > Yunze
> > > >
> > > > On Wed, Mar 29, 2023 at 4:00 PM Yunze Xu <yz...@streamnative.io> wrote:
> > > >>
> > > >> Hi all,
> > > >>
> > > >> Recently I found the default generated schema definition in the Python
> > > >> client is different from the Java client, which leads to some
> > > >> unexpected behavior.
> > > >>
> > > >> For example, given the following class definition in Python:
> > > >>
> > > >> ```python
> > > >> class Data(Record):
> > > >>    i = Integer()
> > > >> ```
> > > >>
> > > >> The type of `i` field is a union: "type": ["null", "int"]
> > > >>
> > > >> While given the following class definition in Java:
> > > >>
> > > >> ```java
> > > >> class Data {
> > > >>    private final int i;
> > > >>    /* ... */
> > > >> }
> > > >> ```
> > > >>
> > > >> The type of `i` field is an integer: "type": "int"
> > > >>
> > > >> It brings an issue that if a Python consumer subscribes to a topic
> > > >> with schema defined above, then a Java producer will fail to create
> > > >> because of the schema incompatibility.
> > > >>
> > > >> Currently, the workaround is to change the schema compatibility
> > > >> strategy to FORWARD.
> > > >>
> > > >> Should we change the way to generate schema definition in the Python
> > > >> client to be compatible with the Java client? It could bring breaking
> > > >> changes to old Python clients, but it could guarantee compatibility
> > > >> with the Java client.
> > > >>
> > > >> If not, we still have to introduce an extra configuration to make
> > > >> Python schema compatible with Java schema. But it requires code
> > > >> changes. e.g. here is a possible solution:
> > > >>
> > > >> ```python
> > > >> class Data(Record):
> > > >>    # NOTE: Users might have to add this extra field to control how to
> > > >> generate the schema
> > > >>    __java_compatible = True
> > > >>    i = Integer()
> > > >> ```
> > > >>
> > > >> Thanks,
> > > >> Yunze
> > >

Re: [Python] Should we make the schema default compatible with Java client?

Posted by Yunze Xu <yz...@streamnative.io.INVALID>.
Hi Bo,

Regarding the 1st question, yes, that's why I open this thread to
discuss. If we change these default values, the behavior of new Python
clients will be like the Java client. In addition, it actually reverts
the breaking change brought in #12232.

Regarding the 2nd question, yes, they are both sorted in alphabetical
order. I don't know the behavior of the .NET clients, for C++, Golang,
Node.js clients, they all do not support generating schema definition
from a DTO.

Thanks,
Yunze

On Thu, Mar 30, 2023 at 10:14 AM 丛搏 <co...@gmail.com> wrote:
>
> Hi, Yunze :
>
> 1. If the changes may cause some compatibility issues.
> How do we solve the compatibility issues? It may be a
> breaking change.
>
> 2. Another question is if sorting is enabled by default,
> is the sorting rule the same as java or other clients?
>
> Putting aside the above two problems, I think it is
> good to be consistent with other clients.
>
> Thanks,
> Bo
>
> Eric Hare <er...@datastax.com> 于2023年3月29日周三 22:42写道:
> >
> > +1 - i think keeping the `_sorted_fields` and `_required` defaults consistent between the clients is the way to go.
> >
> > > On Mar 29, 2023, at 7:09 AM, Yunze Xu <yz...@streamnative.io.INVALID> wrote:
> > >
> > > I found the Python client has two options to control the behavior:
> > > 1. Set `_sorted_fields`. It's false by default in the Python client,
> > > but it's true in the Java client. i.e. the Java client sorts all
> > > fields by default.
> > > 2. Set `_required`. It's false by default for all types in the Python
> > > client, but it's only false for the string type in the Java client.
> > >
> > > i.e. given the following Java class:
> > >
> > > ```java
> > > class User {
> > >    String name;
> > >    int age;
> > >    double score;
> > > }
> > > ```
> > >
> > > We have to give the following definition in Python:
> > >
> > > ```python
> > > class User(Record):
> > >    _sorted_fields = True
> > >    name = String()
> > >    age = Integer(required=True)
> > >    score = Double(required=True)
> > > ```
> > >
> > > I see https://github.com/apache/pulsar/pull/12232 adds the
> > > `_sorted_fields` field and disables the field sort by default. It
> > > breaks compatibility with the Java client.
> > >
> > > IMO, we should make `_sorted_fields` true by default and `_required`
> > > true for all types other than `String` by default.
> > >
> > > Thanks,
> > > Yunze
> > >
> > > On Wed, Mar 29, 2023 at 4:00 PM Yunze Xu <yz...@streamnative.io> wrote:
> > >>
> > >> Hi all,
> > >>
> > >> Recently I found the default generated schema definition in the Python
> > >> client is different from the Java client, which leads to some
> > >> unexpected behavior.
> > >>
> > >> For example, given the following class definition in Python:
> > >>
> > >> ```python
> > >> class Data(Record):
> > >>    i = Integer()
> > >> ```
> > >>
> > >> The type of `i` field is a union: "type": ["null", "int"]
> > >>
> > >> While given the following class definition in Java:
> > >>
> > >> ```java
> > >> class Data {
> > >>    private final int i;
> > >>    /* ... */
> > >> }
> > >> ```
> > >>
> > >> The type of `i` field is an integer: "type": "int"
> > >>
> > >> It brings an issue that if a Python consumer subscribes to a topic
> > >> with schema defined above, then a Java producer will fail to create
> > >> because of the schema incompatibility.
> > >>
> > >> Currently, the workaround is to change the schema compatibility
> > >> strategy to FORWARD.
> > >>
> > >> Should we change the way to generate schema definition in the Python
> > >> client to be compatible with the Java client? It could bring breaking
> > >> changes to old Python clients, but it could guarantee compatibility
> > >> with the Java client.
> > >>
> > >> If not, we still have to introduce an extra configuration to make
> > >> Python schema compatible with Java schema. But it requires code
> > >> changes. e.g. here is a possible solution:
> > >>
> > >> ```python
> > >> class Data(Record):
> > >>    # NOTE: Users might have to add this extra field to control how to
> > >> generate the schema
> > >>    __java_compatible = True
> > >>    i = Integer()
> > >> ```
> > >>
> > >> Thanks,
> > >> Yunze
> >

Re: [Python] Should we make the schema default compatible with Java client?

Posted by 丛搏 <co...@gmail.com>.
Hi, Yunze :

1. If the changes may cause some compatibility issues.
How do we solve the compatibility issues? It may be a
breaking change.

2. Another question is if sorting is enabled by default,
is the sorting rule the same as java or other clients?

Putting aside the above two problems, I think it is
good to be consistent with other clients.

Thanks,
Bo

Eric Hare <er...@datastax.com> 于2023年3月29日周三 22:42写道:
>
> +1 - i think keeping the `_sorted_fields` and `_required` defaults consistent between the clients is the way to go.
>
> > On Mar 29, 2023, at 7:09 AM, Yunze Xu <yz...@streamnative.io.INVALID> wrote:
> >
> > I found the Python client has two options to control the behavior:
> > 1. Set `_sorted_fields`. It's false by default in the Python client,
> > but it's true in the Java client. i.e. the Java client sorts all
> > fields by default.
> > 2. Set `_required`. It's false by default for all types in the Python
> > client, but it's only false for the string type in the Java client.
> >
> > i.e. given the following Java class:
> >
> > ```java
> > class User {
> >    String name;
> >    int age;
> >    double score;
> > }
> > ```
> >
> > We have to give the following definition in Python:
> >
> > ```python
> > class User(Record):
> >    _sorted_fields = True
> >    name = String()
> >    age = Integer(required=True)
> >    score = Double(required=True)
> > ```
> >
> > I see https://github.com/apache/pulsar/pull/12232 adds the
> > `_sorted_fields` field and disables the field sort by default. It
> > breaks compatibility with the Java client.
> >
> > IMO, we should make `_sorted_fields` true by default and `_required`
> > true for all types other than `String` by default.
> >
> > Thanks,
> > Yunze
> >
> > On Wed, Mar 29, 2023 at 4:00 PM Yunze Xu <yz...@streamnative.io> wrote:
> >>
> >> Hi all,
> >>
> >> Recently I found the default generated schema definition in the Python
> >> client is different from the Java client, which leads to some
> >> unexpected behavior.
> >>
> >> For example, given the following class definition in Python:
> >>
> >> ```python
> >> class Data(Record):
> >>    i = Integer()
> >> ```
> >>
> >> The type of `i` field is a union: "type": ["null", "int"]
> >>
> >> While given the following class definition in Java:
> >>
> >> ```java
> >> class Data {
> >>    private final int i;
> >>    /* ... */
> >> }
> >> ```
> >>
> >> The type of `i` field is an integer: "type": "int"
> >>
> >> It brings an issue that if a Python consumer subscribes to a topic
> >> with schema defined above, then a Java producer will fail to create
> >> because of the schema incompatibility.
> >>
> >> Currently, the workaround is to change the schema compatibility
> >> strategy to FORWARD.
> >>
> >> Should we change the way to generate schema definition in the Python
> >> client to be compatible with the Java client? It could bring breaking
> >> changes to old Python clients, but it could guarantee compatibility
> >> with the Java client.
> >>
> >> If not, we still have to introduce an extra configuration to make
> >> Python schema compatible with Java schema. But it requires code
> >> changes. e.g. here is a possible solution:
> >>
> >> ```python
> >> class Data(Record):
> >>    # NOTE: Users might have to add this extra field to control how to
> >> generate the schema
> >>    __java_compatible = True
> >>    i = Integer()
> >> ```
> >>
> >> Thanks,
> >> Yunze
>

Re: [Python] Should we make the schema default compatible with Java client?

Posted by Eric Hare <er...@datastax.com>.
+1 - i think keeping the `_sorted_fields` and `_required` defaults consistent between the clients is the way to go. 

> On Mar 29, 2023, at 7:09 AM, Yunze Xu <yz...@streamnative.io.INVALID> wrote:
> 
> I found the Python client has two options to control the behavior:
> 1. Set `_sorted_fields`. It's false by default in the Python client,
> but it's true in the Java client. i.e. the Java client sorts all
> fields by default.
> 2. Set `_required`. It's false by default for all types in the Python
> client, but it's only false for the string type in the Java client.
> 
> i.e. given the following Java class:
> 
> ```java
> class User {
>    String name;
>    int age;
>    double score;
> }
> ```
> 
> We have to give the following definition in Python:
> 
> ```python
> class User(Record):
>    _sorted_fields = True
>    name = String()
>    age = Integer(required=True)
>    score = Double(required=True)
> ```
> 
> I see https://github.com/apache/pulsar/pull/12232 adds the
> `_sorted_fields` field and disables the field sort by default. It
> breaks compatibility with the Java client.
> 
> IMO, we should make `_sorted_fields` true by default and `_required`
> true for all types other than `String` by default.
> 
> Thanks,
> Yunze
> 
> On Wed, Mar 29, 2023 at 4:00 PM Yunze Xu <yz...@streamnative.io> wrote:
>> 
>> Hi all,
>> 
>> Recently I found the default generated schema definition in the Python
>> client is different from the Java client, which leads to some
>> unexpected behavior.
>> 
>> For example, given the following class definition in Python:
>> 
>> ```python
>> class Data(Record):
>>    i = Integer()
>> ```
>> 
>> The type of `i` field is a union: "type": ["null", "int"]
>> 
>> While given the following class definition in Java:
>> 
>> ```java
>> class Data {
>>    private final int i;
>>    /* ... */
>> }
>> ```
>> 
>> The type of `i` field is an integer: "type": "int"
>> 
>> It brings an issue that if a Python consumer subscribes to a topic
>> with schema defined above, then a Java producer will fail to create
>> because of the schema incompatibility.
>> 
>> Currently, the workaround is to change the schema compatibility
>> strategy to FORWARD.
>> 
>> Should we change the way to generate schema definition in the Python
>> client to be compatible with the Java client? It could bring breaking
>> changes to old Python clients, but it could guarantee compatibility
>> with the Java client.
>> 
>> If not, we still have to introduce an extra configuration to make
>> Python schema compatible with Java schema. But it requires code
>> changes. e.g. here is a possible solution:
>> 
>> ```python
>> class Data(Record):
>>    # NOTE: Users might have to add this extra field to control how to
>> generate the schema
>>    __java_compatible = True
>>    i = Integer()
>> ```
>> 
>> Thanks,
>> Yunze


Re: [Python] Should we make the schema default compatible with Java client?

Posted by Yunze Xu <yz...@streamnative.io.INVALID>.
I found the Python client has two options to control the behavior:
1. Set `_sorted_fields`. It's false by default in the Python client,
but it's true in the Java client. i.e. the Java client sorts all
fields by default.
2. Set `_required`. It's false by default for all types in the Python
client, but it's only false for the string type in the Java client.

i.e. given the following Java class:

```java
class User {
    String name;
    int age;
    double score;
}
```

We have to give the following definition in Python:

```python
class User(Record):
    _sorted_fields = True
    name = String()
    age = Integer(required=True)
    score = Double(required=True)
```

I see https://github.com/apache/pulsar/pull/12232 adds the
`_sorted_fields` field and disables the field sort by default. It
breaks compatibility with the Java client.

IMO, we should make `_sorted_fields` true by default and `_required`
true for all types other than `String` by default.

Thanks,
Yunze

On Wed, Mar 29, 2023 at 4:00 PM Yunze Xu <yz...@streamnative.io> wrote:
>
> Hi all,
>
> Recently I found the default generated schema definition in the Python
> client is different from the Java client, which leads to some
> unexpected behavior.
>
> For example, given the following class definition in Python:
>
> ```python
> class Data(Record):
>     i = Integer()
> ```
>
> The type of `i` field is a union: "type": ["null", "int"]
>
> While given the following class definition in Java:
>
> ```java
> class Data {
>     private final int i;
>     /* ... */
> }
> ```
>
> The type of `i` field is an integer: "type": "int"
>
> It brings an issue that if a Python consumer subscribes to a topic
> with schema defined above, then a Java producer will fail to create
> because of the schema incompatibility.
>
> Currently, the workaround is to change the schema compatibility
> strategy to FORWARD.
>
> Should we change the way to generate schema definition in the Python
> client to be compatible with the Java client? It could bring breaking
> changes to old Python clients, but it could guarantee compatibility
> with the Java client.
>
> If not, we still have to introduce an extra configuration to make
> Python schema compatible with Java schema. But it requires code
> changes. e.g. here is a possible solution:
>
> ```python
> class Data(Record):
>     # NOTE: Users might have to add this extra field to control how to
> generate the schema
>     __java_compatible = True
>     i = Integer()
> ```
>
> Thanks,
> Yunze