You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2021/09/29 03:12:35 UTC

[GitHub] [pulsar] merlimat opened a new pull request #12232: [Python] Do not sort schema fields by default

merlimat opened a new pull request #12232:
URL: https://github.com/apache/pulsar/pull/12232


   ### Motivation
   
   In Avro schema, the order of fields is used in the validation process, so if we are sorting the fields, that will generate an unexpected schema for a python producer/consumer and it will make it not interoperable with Java and other clients.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] codelipenghui merged pull request #12232: [Python] Do not sort schema fields by default

Posted by GitBox <gi...@apache.org>.
codelipenghui merged pull request #12232:
URL: https://github.com/apache/pulsar/pull/12232


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] hangc0276 commented on pull request #12232: [Python] Do not sort schema fields by default

Posted by GitBox <gi...@apache.org>.
hangc0276 commented on pull request #12232:
URL: https://github.com/apache/pulsar/pull/12232#issuecomment-930256440


   > @hangc0276 Could you please help create a separate issue to track the issue you have mentioned?
   
   @codelipenghui I create an issue to track it https://github.com/apache/pulsar/issues/12241


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] codelipenghui commented on pull request #12232: [Python] Do not sort schema fields by default

Posted by GitBox <gi...@apache.org>.
codelipenghui commented on pull request #12232:
URL: https://github.com/apache/pulsar/pull/12232#issuecomment-930042502


   @hangc0276 Could you please help create a separate issue to track the issue you have mentioned?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] gaoran10 commented on a change in pull request #12232: [Python] Do not sort schema fields by default

Posted by GitBox <gi...@apache.org>.
gaoran10 commented on a change in pull request #12232:
URL: https://github.com/apache/pulsar/pull/12232#discussion_r718136171



##########
File path: pulsar-client-cpp/python/pulsar/schema/definition.py
##########
@@ -60,6 +60,9 @@ class Record(with_metaclass(RecordMeta, object)):
     # This field is used to set namespace for Avro Record schema.
     _avro_namespace = None
 
+    # Generate a schema where fields are sorted alphabetically
+    _sorted_fields = False

Review comment:
       Do we need to set this value to `True` to avoid compatibility problems?

##########
File path: pulsar-client-cpp/python/pulsar/schema/definition.py
##########
@@ -60,6 +60,9 @@ class Record(with_metaclass(RecordMeta, object)):
     # This field is used to set namespace for Avro Record schema.
     _avro_namespace = None
 
+    # Generate a schema where fields are sorted alphabetically
+    _sorted_fields = False

Review comment:
       Maybe other users will meet new issues?

##########
File path: pulsar-client-cpp/python/pulsar/schema/definition.py
##########
@@ -60,6 +60,9 @@ class Record(with_metaclass(RecordMeta, object)):
     # This field is used to set namespace for Avro Record schema.
     _avro_namespace = None
 
+    # Generate a schema where fields are sorted alphabetically
+    _sorted_fields = False

Review comment:
       Maybe other users will meet new issues? This user is fine, but other users may be stuck.

##########
File path: pulsar-client-cpp/python/pulsar/schema/definition.py
##########
@@ -60,6 +60,9 @@ class Record(with_metaclass(RecordMeta, object)):
     # This field is used to set namespace for Avro Record schema.
     _avro_namespace = None
 
+    # Generate a schema where fields are sorted alphabetically
+    _sorted_fields = False

Review comment:
       Maybe other users will meet new issues? This user is OK, but other users may be stuck.

##########
File path: pulsar-client-cpp/python/pulsar/schema/definition.py
##########
@@ -60,6 +60,9 @@ class Record(with_metaclass(RecordMeta, object)):
     # This field is used to set namespace for Avro Record schema.
     _avro_namespace = None
 
+    # Generate a schema where fields are sorted alphabetically
+    _sorted_fields = False

Review comment:
       Maybe other users will meet new issues? This user is ok, but other users may be stuck.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] gaoran10 commented on a change in pull request #12232: [Python] Do not sort schema fields by default

Posted by GitBox <gi...@apache.org>.
gaoran10 commented on a change in pull request #12232:
URL: https://github.com/apache/pulsar/pull/12232#discussion_r718145207



##########
File path: pulsar-client-cpp/python/pulsar/schema/definition.py
##########
@@ -60,6 +60,9 @@ class Record(with_metaclass(RecordMeta, object)):
     # This field is used to set namespace for Avro Record schema.
     _avro_namespace = None
 
+    # Generate a schema where fields are sorted alphabetically
+    _sorted_fields = False

Review comment:
       Maybe other users will meet new issues? This user is OK, but other users may be stuck.

##########
File path: pulsar-client-cpp/python/pulsar/schema/definition.py
##########
@@ -60,6 +60,9 @@ class Record(with_metaclass(RecordMeta, object)):
     # This field is used to set namespace for Avro Record schema.
     _avro_namespace = None
 
+    # Generate a schema where fields are sorted alphabetically
+    _sorted_fields = False

Review comment:
       Maybe other users will meet new issues? This user is ok, but other users may be stuck.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] gaoran10 commented on a change in pull request #12232: [Python] Do not sort schema fields by default

Posted by GitBox <gi...@apache.org>.
gaoran10 commented on a change in pull request #12232:
URL: https://github.com/apache/pulsar/pull/12232#discussion_r718145207



##########
File path: pulsar-client-cpp/python/pulsar/schema/definition.py
##########
@@ -60,6 +60,9 @@ class Record(with_metaclass(RecordMeta, object)):
     # This field is used to set namespace for Avro Record schema.
     _avro_namespace = None
 
+    # Generate a schema where fields are sorted alphabetically
+    _sorted_fields = False

Review comment:
       Maybe other users will meet new issues? This user is fine, but other users may be stuck.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] hangc0276 edited a comment on pull request #12232: [Python] Do not sort schema fields by default

Posted by GitBox <gi...@apache.org>.
hangc0276 edited a comment on pull request #12232:
URL: https://github.com/apache/pulsar/pull/12232#issuecomment-930034245


   I found a case about the schema without sort field.
   If we define the class like:
   ```Python
   class T2(Record):
               b = Integer()
               a = Integer()
               d = String()
               c = Double()
   ```
   But the schema fields of T2 without sort is:
   ```
   [{'type': ['null', 'int'], 'name': 'a'}, {'type': ['null', 'double'], 'name': 'c'}, {'type': ['null', 'int'], 'name': 'b'}, {'type': ['null', 'string'], 'name': 'd'}]
   ```
   
   However we expected order is the field define order of the class (I am not sure), such as `b, a, d, c`, but the output order is `a, c, b, d`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] codelipenghui merged pull request #12232: [Python] Do not sort schema fields by default

Posted by GitBox <gi...@apache.org>.
codelipenghui merged pull request #12232:
URL: https://github.com/apache/pulsar/pull/12232


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] merlimat commented on a change in pull request #12232: [Python] Do not sort schema fields by default

Posted by GitBox <gi...@apache.org>.
merlimat commented on a change in pull request #12232:
URL: https://github.com/apache/pulsar/pull/12232#discussion_r718136729



##########
File path: pulsar-client-cpp/python/pulsar/schema/definition.py
##########
@@ -60,6 +60,9 @@ class Record(with_metaclass(RecordMeta, object)):
     # This field is used to set namespace for Avro Record schema.
     _avro_namespace = None
 
+    # Generate a schema where fields are sorted alphabetically
+    _sorted_fields = False

Review comment:
       My only concern is that we'll keep getting users stuck with this issue if we don't fix the default behavior.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] hangc0276 commented on pull request #12232: [Python] Do not sort schema fields by default

Posted by GitBox <gi...@apache.org>.
hangc0276 commented on pull request #12232:
URL: https://github.com/apache/pulsar/pull/12232#issuecomment-930034245


   I found a case about the schema without sort field.
   If we define the class like:
   ```Python
   class T2(Record):
               b = Integer()
               a = Integer()
               d = String()
               c = Double()
   ```
   But the schema fields of T2 without sort is:
   ```
   [{'type': ['null', 'int'], 'name': 'a'}, {'type': ['null', 'double'], 'name': 'c'}, {'type': ['null', 'int'], 'name': 'b'}, {'type': ['null', 'string'], 'name': 'd'}]
   ```
   
   However we expected order is `b, a, d, c`, but the output order is `a, c, b, d`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] hangc0276 edited a comment on pull request #12232: [Python] Do not sort schema fields by default

Posted by GitBox <gi...@apache.org>.
hangc0276 edited a comment on pull request #12232:
URL: https://github.com/apache/pulsar/pull/12232#issuecomment-930034245


   I found a case about the schema without sort field.
   If we define the class like:
   ```Python
   class T2(Record):
               b = Integer()
               a = Integer()
               d = String()
               c = Double()
   ```
   But the schema fields of T2 without sort is:
   ```
   [{'type': ['null', 'int'], 'name': 'a'}, {'type': ['null', 'double'], 'name': 'c'}, {'type': ['null', 'int'], 'name': 'b'}, {'type': ['null', 'string'], 'name': 'd'}]
   ```
   
   However we expected order is the field define order of the class (I am not sure), such as `b, a, d, c`, but the output order is `a, c, b, d`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] hangc0276 commented on pull request #12232: [Python] Do not sort schema fields by default

Posted by GitBox <gi...@apache.org>.
hangc0276 commented on pull request #12232:
URL: https://github.com/apache/pulsar/pull/12232#issuecomment-929866753






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] merlimat commented on a change in pull request #12232: [Python] Do not sort schema fields by default

Posted by GitBox <gi...@apache.org>.
merlimat commented on a change in pull request #12232:
URL: https://github.com/apache/pulsar/pull/12232#discussion_r718136729



##########
File path: pulsar-client-cpp/python/pulsar/schema/definition.py
##########
@@ -60,6 +60,9 @@ class Record(with_metaclass(RecordMeta, object)):
     # This field is used to set namespace for Avro Record schema.
     _avro_namespace = None
 
+    # Generate a schema where fields are sorted alphabetically
+    _sorted_fields = False

Review comment:
       My only concern is that we'll keep getting users stuck with this issue if we don't fix the default behavior.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] sijie commented on pull request #12232: [Python] Do not sort schema fields by default

Posted by GitBox <gi...@apache.org>.
sijie commented on pull request #12232:
URL: https://github.com/apache/pulsar/pull/12232#issuecomment-929867677


   @eolivelli Putting the compatibility conversation aside, the PR here is valuable. Because it provides a flag to control whether to sort schema fields or not. So it will ensure python clients and java clients can generate the schema in the same order, which will improve debuggability and readability. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] codelipenghui commented on pull request #12232: [Python] Do not sort schema fields by default

Posted by GitBox <gi...@apache.org>.
codelipenghui commented on pull request #12232:
URL: https://github.com/apache/pulsar/pull/12232#issuecomment-930042502


   @hangc0276 Could you please help create a separate issue to track the issue you have mentioned?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] sijie commented on pull request #12232: [Python] Do not sort schema fields by default

Posted by GitBox <gi...@apache.org>.
sijie commented on pull request #12232:
URL: https://github.com/apache/pulsar/pull/12232#issuecomment-929867677


   @eolivelli Putting the compatibility conversation aside, the PR here is valuable. Because it provides a flag to control whether to sort schema fields or not. So it will ensure python clients and java clients can generate the schema in the same order, which will improve debuggability and readability. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] gaoran10 commented on a change in pull request #12232: [Python] Do not sort schema fields by default

Posted by GitBox <gi...@apache.org>.
gaoran10 commented on a change in pull request #12232:
URL: https://github.com/apache/pulsar/pull/12232#discussion_r718136171



##########
File path: pulsar-client-cpp/python/pulsar/schema/definition.py
##########
@@ -60,6 +60,9 @@ class Record(with_metaclass(RecordMeta, object)):
     # This field is used to set namespace for Avro Record schema.
     _avro_namespace = None
 
+    # Generate a schema where fields are sorted alphabetically
+    _sorted_fields = False

Review comment:
       Do we need to set this value to `True` to avoid compatibility problems?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] gaoran10 commented on a change in pull request #12232: [Python] Do not sort schema fields by default

Posted by GitBox <gi...@apache.org>.
gaoran10 commented on a change in pull request #12232:
URL: https://github.com/apache/pulsar/pull/12232#discussion_r718145207



##########
File path: pulsar-client-cpp/python/pulsar/schema/definition.py
##########
@@ -60,6 +60,9 @@ class Record(with_metaclass(RecordMeta, object)):
     # This field is used to set namespace for Avro Record schema.
     _avro_namespace = None
 
+    # Generate a schema where fields are sorted alphabetically
+    _sorted_fields = False

Review comment:
       Maybe other users will meet new issues?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] hangc0276 commented on pull request #12232: [Python] Do not sort schema fields by default

Posted by GitBox <gi...@apache.org>.
hangc0276 commented on pull request #12232:
URL: https://github.com/apache/pulsar/pull/12232#issuecomment-929866753


   @merlimat Some tests run failed, Please take a look, thanks.
   ```
   FAIL: test_avro_required_default (schema_test.SchemaTest)
   ----------------------------------------------------------------------
   Traceback (most recent call last):
     File "/tmp/schema_test.py", line 856, in test_avro_required_default
       "default": None
   AssertionError: {'fields': [{'default': None, 'type': ['null', 'int'], 'name': 'a'}, {'default': [truncated]... != {'fields': [{'default': None, 'type': ['null', 'int'], 'name': 'a'}, {'default': [truncated]...
   Diff is 2313 characters long. Set self.maxDiff to None to see it.
   
   ======================================================================
   FAIL: test_complex (schema_test.SchemaTest)
   ----------------------------------------------------------------------
   Traceback (most recent call last):
     File "/tmp/schema_test.py", line 106, in test_complex
       "type": ["null", 'MySubRecord']
   AssertionError: {'fields': [{'type': ['null', 'string'], 'name': 'a'}, {'type': ['null', {'field [truncated]... != {'fields': [{'type': ['null', 'string'], 'name': 'a'}, {'type': ['null', {'field [truncated]...
   Diff is 1195 characters long. Set self.maxDiff to None to see it.
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org