You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2020/11/10 15:17:49 UTC

[GitHub] [pulsar] michaelkux opened a new issue #8510: AVRO Schema with modified whitespaces uploaded via API is incompatible with Schema generated from Record (IncompatibleSchema)

michaelkux opened a new issue #8510:
URL: https://github.com/apache/pulsar/issues/8510


   **Describe the bug**
   If an AVRO schema is uploaded via CLI / REST API and a client (in the example Python) tries to create a producer with schema=AvroSchema(...) the producer will be rejected with IncompatibleSchema because the schema contains additional/less whitespaces.
   
   E.g.Schema that works:
   `{
   "type": "AVRO",
   "schema": "{\n \"name\": \"SimpleAvroRecord\",\n \"type\": \"record\",\n \"fields\": [\n  {\n   \"name\": \"data\",\n   \"type\": \"string\"\n  }\n ]\n}",
   "properties": {}
   }`
   
   Schema that make troubles (removed the first \n in schema). Any modification (adding/removing whitespace) results in an incompatible schema while create a producer.
   `{
   "type": "AVRO",
   "schema": "{\"name\": \"SimpleAvroRecord\",\n \"type\": \"record\",\n \"fields\": [\n  {\n   \"name\": \"data\",\n   \"type\": \"string\"\n  }\n ]\n}",
   "properties": {}
   }`
   
   It seams that the comparison performs pure string comparison which is a bit risky (if multiple versions, languages are used).
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   1. Upload a schema via REST API (modify whitespaces)
   2. Create a producer with schema=AvroSchema(...)
   3. Send a message
   
   Example to reproduce it (see schema_check.py in attachment)
   
   [SchemaValidationWithWhitespaceFails.zip](https://github.com/apache/pulsar/files/5518227/SchemaValidationWithWhitespaceFails.zip)
   
   This example contains the two variants and a python unittest therefore:
   
   <img width="1598" alt="unittests" src="https://user-images.githubusercontent.com/14221403/98693030-1163e200-2370-11eb-965b-6e23c741f8a9.png">
   
   **Expected behavior**
   
   The whitespaces of the uploaded schema should make difference. The comparison should work on a logical level only (e.g. rename of field, type change, additional/deleted fields,...)
   
   **Desktop (please complete the following information):**
   
   - OS: macOS 10.15 / Docker 2.2
   - Pulsar: 2.6.1 Standalone Docker Image
   - Client: Python 3.8, pulsar-client 2.6


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] michaelkux edited a comment on issue #8510: AVRO Schema with modified whitespaces uploaded via API is incompatible with Schema generated from Record (IncompatibleSchema)

Posted by GitBox <gi...@apache.org>.
michaelkux edited a comment on issue #8510:
URL: https://github.com/apache/pulsar/issues/8510#issuecomment-726121702


   I tried it now with java also: Same/similar issue. Uploading the generated schema leads to an IncompatibleSchema Exception. I hope the way how I upload the schema is correct.
   
   SimpleAvroRecord Pojo:
   
   `public class SimpleAvroRecord {
   
       public String data;
   
       public SimpleAvroRecord() {
   
       }
   
       public SimpleAvroRecord(String data) {
           this.data = data;
       }
   
       public String getData() {
           return data;
       }
   
       public void setData(String data) {
           this.data = data;
       }
   }`
   
   If I use the API (CLI) version "admin.schemas().createSchema(TOPIC_TEST_AVRO, new PostSchemaPayload..." its the same
   
   `@PostConstruct
       public void initPulsar() throws Exception {
           client = PulsarClient.builder()
                   .serviceUrl(pulsarServiceUrl)
                   .build();
   
           admin = PulsarAdmin.builder().serviceHttpUrl(pulsarAdminUrl).build();
   
           admin.schemas().deleteSchema(TOPIC_TEST_AVRO);
           admin.topics().delete(TOPIC_TEST_AVRO, true);
   
           NamespaceName namespaceName = NamespaceName.get("public", "default");
   
           admin.namespaces().setIsAllowAutoUpdateSchema(namespaceName.toString(), false);
           admin.namespaces().setSchemaValidationEnforced(namespaceName.toString(), true);
           admin.namespaces().setSchemaCompatibilityStrategy(namespaceName.toString(), SchemaCompatibilityStrategy.FULL);
           admin.topics().createNonPartitionedTopic(TOPIC_TEST_AVRO);
   
           // admin.schemas().createSchema(TOPIC_TEST_AVRO, Schema.AVRO(SimpleAvroRecord.class).getSchemaInfo());
   
           Map<String, String> prop = new HashMap<>();
           prop.put("__alwaysAllowNull", "true");
           prop.put("__jsr310ConversionEnabled", "false");
           admin.schemas().createSchema(TOPIC_TEST_AVRO,
                   new PostSchemaPayload("AVRO", "\n{\n  \"type\" : \"record\",\n  \"name\" : \"SimpleAvroRecord\",\n  " +
                           "\"fields\" : [ {\n    \"name\" : \"data\",\n    \"type\" : [ \"null\", \"string\" ],\n    \"default\" : null\n  } ]\n}", prop));
   
           Producer<SimpleAvroRecord> producer = client.newProducer(AvroSchema.of(SimpleAvroRecord.class))
                   .topic(TOPIC_TEST_AVRO)
                   .create();
   
           producer.send(new SimpleAvroRecord("test"));
   
           producer.close();
   
           System.out.println("Done");
   
          // testUploadedSchema();
       }`
   
   Throws the following exception:
   
   Caused by: org.apache.pulsar.client.api.PulsarClientException$IncompatibleSchemaException: org.apache.pulsar.broker.service.schema.exceptions.IncompatibleSchemaException: Schema not found and schema auto updating is disabled.
   	at org.apache.pulsar.client.api.PulsarClientException.unwrap(PulsarClientException.java:862) ~[pulsar-client-api-2.6.1.jar:2.6.1]
   	at org.apache.pulsar.client.impl.ProducerBuilderImpl.create(ProducerBuilderImpl.java:93) ~[pulsar-client-admin-2.6.1.jar:2.6.1]
   
   The uploaded schema is equal to that one that is generated by "admin.schemas().createSchema(TOPIC_TEST_AVRO, Schema.AVRO(SimpleAvroRecord.class).getSchemaInfo()).
   
   Here the output via pulsar admin: 
   
   `
   ./bin/pulsar-admin schemas get persistent://public/default/test_avro
   {
     "version": 102,
     "schemaInfo": {
       "name": "test_avro",
       "schema": {
         "type": "record",
         "name": "SimpleAvroRecord",
         "fields": [
           {
             "name": "data",
             "type": [
               "null",
               "string"
             ]
           }
         ]
       },
       "type": "AVRO",
       "properties": {
         "__alwaysAllowNull": "true",
         "__jsr310ConversionEnabled": "false"
       }
     }
   }
   `
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] codelipenghui closed issue #8510: AVRO Schema with modified whitespaces uploaded via API is incompatible with Schema generated from Record (IncompatibleSchema)

Posted by GitBox <gi...@apache.org>.
codelipenghui closed issue #8510:
URL: https://github.com/apache/pulsar/issues/8510


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] codelipenghui commented on issue #8510: AVRO Schema with modified whitespaces uploaded via API is incompatible with Schema generated from Record (IncompatibleSchema)

Posted by GitBox <gi...@apache.org>.
codelipenghui commented on issue #8510:
URL: https://github.com/apache/pulsar/issues/8510#issuecomment-725391232


   @congbobo184 Could you please confirm if the Java Client also has this problem?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] sijie commented on issue #8510: AVRO Schema with modified whitespaces uploaded via API is incompatible with Schema generated from Record (IncompatibleSchema)

Posted by GitBox <gi...@apache.org>.
sijie commented on issue #8510:
URL: https://github.com/apache/pulsar/issues/8510#issuecomment-781043128


   Thanks, @ta1meng and @michaelkux!
   
   @congbobo184 @codelipenghui I think we should improve the comparison logic at the broker side.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] codelipenghui commented on issue #8510: AVRO Schema with modified whitespaces uploaded via API is incompatible with Schema generated from Record (IncompatibleSchema)

Posted by GitBox <gi...@apache.org>.
codelipenghui commented on issue #8510:
URL: https://github.com/apache/pulsar/issues/8510#issuecomment-788984520


   close via https://github.com/apache/pulsar/pull/9612


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] michaelkux commented on issue #8510: AVRO Schema with modified whitespaces uploaded via API is incompatible with Schema generated from Record (IncompatibleSchema)

Posted by GitBox <gi...@apache.org>.
michaelkux commented on issue #8510:
URL: https://github.com/apache/pulsar/issues/8510#issuecomment-726579708


   Thank you for clarifying that. In this case the uploaded schema and used in python and java are equal (except whitespace differences). The schema update (via AllowAutoUpdateSchema) is in this case not necessary.
   
   Maybe a normalisation of the json schema string while updating/comparing the schema on broker side (in case of manual upload or generated from class) would solve this.
   
   Otherwise it could happen by using different clients/versions (of pulsar/avro/jackson...) that the schema is updated every time when a client connects that generates a different representation of the json string.
   
   If this not a bug, please see it as feature request


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] michaelkux commented on issue #8510: AVRO Schema with modified whitespaces uploaded via API is incompatible with Schema generated from Record (IncompatibleSchema)

Posted by GitBox <gi...@apache.org>.
michaelkux commented on issue #8510:
URL: https://github.com/apache/pulsar/issues/8510#issuecomment-726121702


   I tried it now with java also: Same/similar issue. Uploading the generated schema leads to an IncompatibleSchema Exception. I hope the way how I upload the schema is correct.
   
   SimpleAvroRecord Pojo:
   
   `public class SimpleAvroRecord {
   
       public String data;
   
       public SimpleAvroRecord() {
   
       }
   
       public SimpleAvroRecord(String data) {
           this.data = data;
       }
   
       public String getData() {
           return data;
       }
   
       public void setData(String data) {
           this.data = data;
       }
   }`
   
   If I use the API (CLI) version "admin.schemas().createSchema(TOPIC_TEST_AVRO, new PostSchemaPayload..." its the same
   
   `@PostConstruct
       public void initPulsar() throws Exception {
           client = PulsarClient.builder()
                   .serviceUrl(pulsarServiceUrl)
                   .build();
   
           admin = PulsarAdmin.builder().serviceHttpUrl(pulsarAdminUrl).build();
   
           admin.schemas().deleteSchema(TOPIC_TEST_AVRO);
           admin.topics().delete(TOPIC_TEST_AVRO, true);
   
           NamespaceName namespaceName = NamespaceName.get("public", "default");
   
           admin.namespaces().setIsAllowAutoUpdateSchema(namespaceName.toString(), false);
           admin.namespaces().setSchemaValidationEnforced(namespaceName.toString(), true);
           admin.namespaces().setSchemaCompatibilityStrategy(namespaceName.toString(), SchemaCompatibilityStrategy.FULL);
           admin.topics().createNonPartitionedTopic(TOPIC_TEST_AVRO);
   
           // admin.schemas().createSchema(TOPIC_TEST_AVRO, Schema.AVRO(SimpleAvroRecord.class).getSchemaInfo());
   
           Map<String, String> prop = new HashMap<>();
           prop.put("__alwaysAllowNull", "true");
           prop.put("__jsr310ConversionEnabled", "false");
           admin.schemas().createSchema(TOPIC_TEST_AVRO,
                   new PostSchemaPayload("AVRO", "\n{\n  \"type\" : \"record\",\n  \"name\" : \"SimpleAvroRecord\",\n  " +
                           "\"fields\" : [ {\n    \"name\" : \"data\",\n    \"type\" : [ \"null\", \"string\" ],\n    \"default\" : null\n  } ]\n}", prop));
   
           Producer<SimpleAvroRecord> producer = client.newProducer(AvroSchema.of(SimpleAvroRecord.class))
                   .topic(TOPIC_TEST_AVRO)
                   .create();
   
           producer.send(new SimpleAvroRecord("test"));
   
           producer.close();
   
           System.out.println("Done");
   
          // testUploadedSchema();
       }`
   
   Throws the following exception:
   
   Caused by: org.apache.pulsar.client.api.PulsarClientException$IncompatibleSchemaException: org.apache.pulsar.broker.service.schema.exceptions.IncompatibleSchemaException: Schema not found and schema auto updating is disabled.
   	at org.apache.pulsar.client.api.PulsarClientException.unwrap(PulsarClientException.java:862) ~[pulsar-client-api-2.6.1.jar:2.6.1]
   	at org.apache.pulsar.client.impl.ProducerBuilderImpl.create(ProducerBuilderImpl.java:93) ~[pulsar-client-admin-2.6.1.jar:2.6.1]
   
   The uploaded schema is equal to that one that is generated by "admin.schemas().createSchema(TOPIC_TEST_AVRO, Schema.AVRO(SimpleAvroRecord.class).getSchemaInfo()).
   
   Here the output via pulsar admin: 
   
   ./bin/pulsar-admin schemas get persistent://public/default/test_avro
   {
     "version": 102,
     "schemaInfo": {
       "name": "test_avro",
       "schema": {
         "type": "record",
         "name": "SimpleAvroRecord",
         "fields": [
           {
             "name": "data",
             "type": [
               "null",
               "string"
             ]
           }
         ]
       },
       "type": "AVRO",
       "properties": {
         "__alwaysAllowNull": "true",
         "__jsr310ConversionEnabled": "false"
       }
     }
   }
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] ta1meng commented on issue #8510: AVRO Schema with modified whitespaces uploaded via API is incompatible with Schema generated from Record (IncompatibleSchema)

Posted by GitBox <gi...@apache.org>.
ta1meng commented on issue #8510:
URL: https://github.com/apache/pulsar/issues/8510#issuecomment-781607245


   Potential duplicate ticket, https://github.com/apache/pulsar/issues/9571
   
   I've asked the reporter to triage.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] ta1meng commented on issue #8510: AVRO Schema with modified whitespaces uploaded via API is incompatible with Schema generated from Record (IncompatibleSchema)

Posted by GitBox <gi...@apache.org>.
ta1meng commented on issue #8510:
URL: https://github.com/apache/pulsar/issues/8510#issuecomment-780971707


   Thank you @michaelkux for this detailed ticket! I ran into the exact same issue and I use the Java client. The error message was misleading. Thank you so much for pointing out that it was a whitespace problem!! I removed the trailing whitespace from the schema file and it solved the problem.
   
   Pulsar could log a better message than `Schema not found and schema auto updating is disabled`, because the schema was found, but it would be better if whitespace differences are ignored during schema comparisons.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] michaelkux edited a comment on issue #8510: AVRO Schema with modified whitespaces uploaded via API is incompatible with Schema generated from Record (IncompatibleSchema)

Posted by GitBox <gi...@apache.org>.
michaelkux edited a comment on issue #8510:
URL: https://github.com/apache/pulsar/issues/8510#issuecomment-726121702


   I tried it now with java also: Same/similar issue. Uploading the generated schema leads to an IncompatibleSchema Exception. I hope the way how I upload the schema is correct.
   
   SimpleAvroRecord Pojo:
   
   `public class SimpleAvroRecord {
   
       public String data;
   
       public SimpleAvroRecord() {
   
       }
   
       public SimpleAvroRecord(String data) {
           this.data = data;
       }
   
       public String getData() {
           return data;
       }
   
       public void setData(String data) {
           this.data = data;
       }
   }`
   
   If I use the API (CLI) version "admin.schemas().createSchema(TOPIC_TEST_AVRO, new PostSchemaPayload..." its the same
   
   `@PostConstruct
       public void initPulsar() throws Exception {
           client = PulsarClient.builder()
                   .serviceUrl(pulsarServiceUrl)
                   .build();
   
           admin = PulsarAdmin.builder().serviceHttpUrl(pulsarAdminUrl).build();
   
           admin.schemas().deleteSchema(TOPIC_TEST_AVRO);
           admin.topics().delete(TOPIC_TEST_AVRO, true);
   
           NamespaceName namespaceName = NamespaceName.get("public", "default");
   
           admin.namespaces().setIsAllowAutoUpdateSchema(namespaceName.toString(), false);
           admin.namespaces().setSchemaValidationEnforced(namespaceName.toString(), true);
           admin.namespaces().setSchemaCompatibilityStrategy(namespaceName.toString(), SchemaCompatibilityStrategy.FULL);
           admin.topics().createNonPartitionedTopic(TOPIC_TEST_AVRO);
   
           // admin.schemas().createSchema(TOPIC_TEST_AVRO, Schema.AVRO(SimpleAvroRecord.class).getSchemaInfo());
   
           Map<String, String> prop = new HashMap<>();
           prop.put("__alwaysAllowNull", "true");
           prop.put("__jsr310ConversionEnabled", "false");
           admin.schemas().createSchema(TOPIC_TEST_AVRO,
                   new PostSchemaPayload("AVRO", "\n{\n  \"type\" : \"record\",\n  \"name\" : \"SimpleAvroRecord\",\n  " +
                           "\"fields\" : [ {\n    \"name\" : \"data\",\n    \"type\" : [ \"null\", \"string\" ],\n    \"default\" : null\n  } ]\n}", prop));
   
           Producer<SimpleAvroRecord> producer = client.newProducer(AvroSchema.of(SimpleAvroRecord.class))
                   .topic(TOPIC_TEST_AVRO)
                   .create();
   
           producer.send(new SimpleAvroRecord("test"));
   
           producer.close();
   
           System.out.println("Done");
   
          // testUploadedSchema();
       }`
   
   Throws the following exception:
   
   Caused by: org.apache.pulsar.client.api.PulsarClientException$IncompatibleSchemaException: org.apache.pulsar.broker.service.schema.exceptions.IncompatibleSchemaException: Schema not found and schema auto updating is disabled.
   	at org.apache.pulsar.client.api.PulsarClientException.unwrap(PulsarClientException.java:862) ~[pulsar-client-api-2.6.1.jar:2.6.1]
   	at org.apache.pulsar.client.impl.ProducerBuilderImpl.create(ProducerBuilderImpl.java:93) ~[pulsar-client-admin-2.6.1.jar:2.6.1]
   
   The uploaded schema is equal to that one that is generated by "admin.schemas().createSchema(TOPIC_TEST_AVRO, Schema.AVRO(SimpleAvroRecord.class).getSchemaInfo()).
   
   Here the output via pulsar admin: 
   
   
   ./bin/pulsar-admin schemas get persistent://public/default/test_avro
   {
     "version": 102,
     "schemaInfo": {
       "name": "test_avro",
       "schema": {
         "type": "record",
         "name": "SimpleAvroRecord",
         "fields": [
           {
             "name": "data",
             "type": [
               "null",
               "string"
             ]
           }
         ]
       },
       "type": "AVRO",
       "properties": {
         "__alwaysAllowNull": "true",
         "__jsr310ConversionEnabled": "false"
       }
     }
   }
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] congbobo184 commented on issue #8510: AVRO Schema with modified whitespaces uploaded via API is incompatible with Schema generated from Record (IncompatibleSchema)

Posted by GitBox <gi...@apache.org>.
congbobo184 commented on issue #8510:
URL: https://github.com/apache/pulsar/issues/8510#issuecomment-726538565


   `admin.namespaces().setIsAllowAutoUpdateSchema(namespaceName.toString(), false);` this make you can't update schema. if you `admin.namespaces().setIsAllowAutoUpdateSchema(namespaceName.toString(),true);` you can produce the message. but the schema version will update, because broker update the schema version by comparing the SchemaInfo Json String.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] congbobo184 commented on issue #8510: AVRO Schema with modified whitespaces uploaded via API is incompatible with Schema generated from Record (IncompatibleSchema)

Posted by GitBox <gi...@apache.org>.
congbobo184 commented on issue #8510:
URL: https://github.com/apache/pulsar/issues/8510#issuecomment-725789034


   @codelipenghui  Java client does not have this problem.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org