You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2021/04/28 19:09:30 UTC

[GitHub] [pulsar] ta1meng opened a new issue #10426: Pulsar schema validation: schema info should log "default": null if it was part of Avro schema

ta1meng opened a new issue #10426:
URL: https://github.com/apache/pulsar/issues/10426


   **Describe the bug**
   The Avro schema supports default values of null. The syntax is `"default": null`. pulsar-admin accepts this syntax, but support for this syntax is lacking elsewhere in Pulsar, resulting in IncompatibleSchema exceptions between schemas that appear identical.
   
   This ticket asks for improved logging for schema info objects that contain the `"default": null` specification.
   
   **To Reproduce**
   Steps to reproduce the behavior:
   1. Using Pulsar 2.7.1,  run `bin/pulsar standalone`
   2. Configure schema compatibility policies on a namespace:
   ```
   bin/pulsar-admin namespaces set-is-allow-auto-update-schema --disable climate/field-service
   
   bin/pulsar-admin namespaces set-schema-compatibility-strategy climate/field-service --compatibility FORWARD_TRANSITIVE
   
   bin/pulsar-admin namespaces set-schema-validation-enforce --enable climate/field-service
   
   bin/pulsar-admin namespaces set-schema-autoupdate-strategy climate/field-service --disabled
   ```
   3. Upload the following schemas into a new topic. They differ in only one place, the specification of `"default":null`.
   ```
   // ActionV0.schema 
   {
       "type": "AVRO",
       "schema": "{\"name\":\"Action\",\"type\":\"record\",\"fields\":[{\"name\":\"action\",\"type\":[\"null\",\"string\"],\"default\":null}]}",
       "properties": {}
   }
   
   // ActionV1.schema 
   {
       "type": "AVRO",
       "schema": "{\"name\":\"Action\",\"type\":\"record\",\"fields\":[{\"name\":\"action\",\"type\":[\"null\",\"string\"]}]}",
       "properties": {}
   }
   
   tai.meng@C02Z65UGLVDQ ~/pulsar/apache-pulsar-2.7.1 $ bin/pulsar-admin schemas upload --filename ~/pulsar/pythonSandbox/schemas/ActionV0.schema climate/field-service/actions
   tai.meng@C02Z65UGLVDQ ~/pulsar/apache-pulsar-2.7.1 $ bin/pulsar-admin schemas upload --filename ~/pulsar/pythonSandbox/schemas/ActionV1.schema climate/field-service/actions
   ```
   4. Two schema versions are uploaded because they are compatible. They are printed as the same, so it's impossible to see their difference after uploading them:
   ```
   tai.meng@C02Z65UGLVDQ ~/pulsar/apache-pulsar-2.7.1 $ bin/pulsar-admin schemas get climate/field-service/actions --version 0                                                  
   
   {
     "name": "actions",
     "schema": {
       "name": "Action",
       "type": "record",
       "fields": [
         {
           "name": "action",
           "type": [
             "null",
             "string"
           ]
         }
       ]
     },
     "type": "AVRO",
     "properties": {}
   }
   tai.meng@C02Z65UGLVDQ ~/pulsar/apache-pulsar-2.7.1 $ bin/pulsar-admin schemas get climate/field-service/actions --version 1
   
   {
     "name": "actions",
     "schema": {
       "name": "Action",
       "type": "record",
       "fields": [
         {
           "name": "action",
           "type": [
             "null",
             "string"
           ]
         }
       ]
     },
     "type": "AVRO",
     "properties": {}
   }
   ```
   5. Using the Python client library, I found no way to produce a message using version 0 of the schema. Everything I tried resulted in an `IncompatibleSchema` exception.
   ```
   class Action(Record):
       action = String()
   ```
   6. However, the Action class above works with version 1 of the schema, the one without `"\default\":null` specified.
   
   **Expected behavior**
   The two schemas are _different_, so they should not be printed as _identical_. In this case, the `"default":null` should be printed when calling `bin/pulsar-admin schemas get climate/field-service/actions --version 0`. 
   
   Further, there should be a way to construct a Record class using the Python client library, so an event can be written to a topic with a schema containing `"default":null`.
   
   **Screenshots**
   N/A.
   
   **Desktop (please complete the following information):**
    - OS: MacOS Catalina Version 10.15.17
   
   **Additional context**
   `"default":null` seems like a common default value to specify in Avro schemas. The `IncompatibleSchema` exception that it causes complicated efforts to triage mistakes and bugs that resulted in `IncompatibleSchema`. Bug tickets whose triage was significantly complicated due to the presence of `"default:null`: https://github.com/apache/pulsar/issues/9571, https://github.com/apache/pulsar/issues/8510.
   
   The overall impact is that Avro schema support seems quite broken in Pulsar. There were questions on whether Kafka's Avro schema support is this buggy. If we had still been deciding between Kafka and Pulsar, this may have changed our decision.
   
   Another solution is to create a new doc page for Pulsar's Avro support. On that doc page, known limitations of Pulsar's Avro support should be documented. Sample text for this problem (it might not be correct, but it would help anyone experimenting with Avro support in Pulsar):
   
   ```
   Pulsar implements support a subset of Avro schemas.
   
   Pulsar does not support `"default":null` for string fields. 
   
   To specify a default value of null for a string field, simply omit that clause. 
   
   This is because for string fields without default values, Pulsar consumers will default these fields to null and auto-convert null into the empty string for consumers. 
   
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] codelipenghui commented on issue #10426: Pulsar schema validation: schema info should log "default": null if it was part of Avro schema

Posted by GitBox <gi...@apache.org>.
codelipenghui commented on issue #10426:
URL: https://github.com/apache/pulsar/issues/10426#issuecomment-1058890892


   The issue had no activity for 30 days, mark with Stale label.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org