You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2022/05/16 15:59:22 UTC

[GitHub] [pulsar] BewareMyPower opened a new pull request, #15622: [Java Client] Fix wrong schema version of messages without schema

BewareMyPower opened a new pull request, #15622:
URL: https://github.com/apache/pulsar/pull/15622

   ### Motivation
   
   When I tried to consume a topic via a consumer with Avro schema while
   the topic was produced by a producer without schema, the consumption
   failed. It's because `MultiVersionSchemaInfoProvider#getSchemaByVersion`
   doesn't check if `schemaVersion` is an empty byte array. If yes, a
   `BytesSchemaVersion` of an empty array will be passed to `cache.get` and
   then passed to `loadSchema`.
   
   https://github.com/apache/pulsar/blob/f90ef9c6ad88c4f94ce1fcc682bbf3f3189cbf2a/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/generic/MultiVersionSchemaInfoProvider.java#L94-L98
   
   However, `LookupService#getSchema` cannot accept an empty byte array as
   the version, so `loadSchema` failed.
   
   The root cause is that the schema version was set unexpectly when
   messages were sent by a producer without schema. At broker side, the
   returned schema version is never null. If the schema version was an
   empty array, then it means the message doesn't have schema. However, at
   Java client side, the empty byte array is treated as an existing schema
   and the schema version field will be set. When consumer receives the
   message, it will try to load schema whose version is an empty array.
   
   ### Modifications
   
   - When a producer receives a response whose schema version is an empty
     byte array, just ignore it.
   - Fix the existing tests.
   - Add `testConsumeAvroMessagesWithoutSchema` to cover the case that
     messages without schema are compatible with the schema.
   
   This patch also modifies the existing behavior when
   `schemaValidationEnforced` is false and messages are produced by a
   producer without schema and consumed by a consumer with schema.
   
   1. If the message is incompatible with the schema
      - Before: `getSchemaVersion` returns an empty array and `getValue`
        fails with `UncheckedExecutionException`:
   
        > com.google.common.util.concurrent.UncheckedExecutionException: org.apache.commons.lang3.SerializationException: Failed at fetching schema info for EMPTY
   
      - After: `getSchemaVersion` returns `null` and `getValue` fails with
        `SchemaSerializationException`.
   
   2. Otherwise (the message is compatible with the schema)
      - Before: `getSchemaVersion` returns an empty array and `getValue`
        fails with `UncheckedExecutionException`.
      - After: `getSchemaVersion` returns `null` and `getValue` returns the
        correctly decoded object.
   
   ### Documentation
   
   Check the box below or label this PR directly.
   
   Need to update docs? 
   
   - [ ] `doc-required` 
   (Your PR needs to update docs and you will update later)
     
   - [x] `no-need-doc` 
   (Please explain why)
     
   - [ ] `doc` 
   (Your PR contains doc changes)
   
   - [ ] `doc-added`
   (Docs have been already added)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] shibd commented on a diff in pull request #15622: [Java Client] Fix messages sent by producers without schema cannot be decoded

Posted by "shibd (via GitHub)" <gi...@apache.org>.
shibd commented on code in PR #15622:
URL: https://github.com/apache/pulsar/pull/15622#discussion_r1156018535


##########
pulsar-client/src/main/java/org/apache/pulsar/client/impl/MessageImpl.java:
##########
@@ -382,23 +384,28 @@ public Optional<Schema<?>> getReaderSchema() {
         if (schema == null) {
             return Optional.empty();
         }
+        byte[] schemaVersion = getSchemaVersion();
+        if (schemaVersion == null) {
+            return Optional.of(schema);

Review Comment:
   This is a breaking change.
   
   If the producer uses `BYTES` schema(It should be all `Primitive` type will happen), and the consumer use the `AutoConsumer` schema.
   
   ```
   Schema schema = msg.msg.getReaderSchema().get();
   ```
   
   
   - Before 2.9.3, the schema is `ByteSchema` object.
   - After 2.9.3, the schema is `AutoConsumerSchema` object.
   
   Please let me know if I missed something, thanks.
   
   
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] shibd commented on a diff in pull request #15622: [Java Client] Fix messages sent by producers without schema cannot be decoded

Posted by "shibd (via GitHub)" <gi...@apache.org>.
shibd commented on code in PR #15622:
URL: https://github.com/apache/pulsar/pull/15622#discussion_r1156018535


##########
pulsar-client/src/main/java/org/apache/pulsar/client/impl/MessageImpl.java:
##########
@@ -382,23 +384,28 @@ public Optional<Schema<?>> getReaderSchema() {
         if (schema == null) {
             return Optional.empty();
         }
+        byte[] schemaVersion = getSchemaVersion();
+        if (schemaVersion == null) {
+            return Optional.of(schema);

Review Comment:
   This is a breaking change.
   
   If the producer uses `BYTES` schema(It should be all `Primitive` type will happen), and the consumer use the `AutoConsumer` schema.
   
   ```
   Schema schema = msg.msg.getReaderSchema().get();
   ```
   
   
   - Before 2.9.3, the schema is `ByteSchema` object.
   - After 2.9.3, the schema is `AutoConsumerSchema` object.
   
   
   
   
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] codelipenghui commented on pull request #15622: [Java Client] Fix messages sent by producers without schema cannot be decoded

Posted by GitBox <gi...@apache.org>.
codelipenghui commented on PR #15622:
URL: https://github.com/apache/pulsar/pull/15622#issuecomment-1128311771

   @BewareMyPower @Technoboy- I think this one is related to the [discussion](https://lists.apache.org/thread/3js51tq2p3c3oldfrhprn4kcohx7h1wv) in the mailing list? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] BewareMyPower commented on pull request #15622: [Java Client] Fix messages sent by producers without schema cannot be decoded

Posted by GitBox <gi...@apache.org>.
BewareMyPower commented on PR #15622:
URL: https://github.com/apache/pulsar/pull/15622#issuecomment-1128367083

   There are still other failed tests caused by this change, I'll fix them soon.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] BewareMyPower commented on pull request #15622: [Java Client] Fix messages sent by producers without schema cannot be decoded

Posted by GitBox <gi...@apache.org>.
BewareMyPower commented on PR #15622:
URL: https://github.com/apache/pulsar/pull/15622#issuecomment-1128350393

   @codelipenghui Thanks, I will reply in the email.
   
   @Technoboy- Because it still tried to perform topic look up with **an empty byte array schema version**, which always throws an exception before decoding the message value. #14626 just modifies the error message caused by the unexpected look up. See more details in the PR description and my previous explanation.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] BewareMyPower commented on pull request #15622: [Java Client] Fix wrong schema version of messages without schema

Posted by GitBox <gi...@apache.org>.
BewareMyPower commented on PR #15622:
URL: https://github.com/apache/pulsar/pull/15622#issuecomment-1127928783

   ```java
           if (version != null && version.length == 0) {
               schemaFuture.completeExceptionally(new SchemaSerializationException("Empty schema version"));
               return schemaFuture;
           }
   ```
   
   This error message added in #14626 could make users more confused. A `SchemaSerializationException` should represents the error when a byte array is serialized by an Avro schema. "Empty schema version" doesn't indicate anything because schema version is not a part of the message itself.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] shibd commented on a diff in pull request #15622: [Java Client] Fix messages sent by producers without schema cannot be decoded

Posted by "shibd (via GitHub)" <gi...@apache.org>.
shibd commented on code in PR #15622:
URL: https://github.com/apache/pulsar/pull/15622#discussion_r1156018535


##########
pulsar-client/src/main/java/org/apache/pulsar/client/impl/MessageImpl.java:
##########
@@ -382,23 +384,28 @@ public Optional<Schema<?>> getReaderSchema() {
         if (schema == null) {
             return Optional.empty();
         }
+        byte[] schemaVersion = getSchemaVersion();
+        if (schemaVersion == null) {
+            return Optional.of(schema);

Review Comment:
   This is a breaking change.
   
   If the producer uses `BYTES` schema(It should be all `Primitive` type will happen), and the consumer use the `AutoConsumer` schema.
   
   ```
   Schema schema = msg.msg.getReaderSchema().get();
   ```
   
   
   - Before 2.9.3, the schema is `ByteSchema` object.
   - After 2.9.3, the schema is `AutoConsumerSchema` object.
   
   Please let me know if I missed something, thanks.
   
   
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] BewareMyPower commented on pull request #15622: [Java Client] Fix wrong schema version of messages without schema

Posted by GitBox <gi...@apache.org>.
BewareMyPower commented on PR #15622:
URL: https://github.com/apache/pulsar/pull/15622#issuecomment-1127899255

   I found https://github.com/apache/pulsar/pull/14626 also tried to fix the similar issue but it didn't fix in a correct way. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] BewareMyPower commented on pull request #15622: [Java Client] Fix wrong schema version of messages without schema

Posted by GitBox <gi...@apache.org>.
BewareMyPower commented on PR #15622:
URL: https://github.com/apache/pulsar/pull/15622#issuecomment-1127858468

   I'm not sure whether this fix should be cherry-picked to older branches since it changes the current behavior (even if I think it's correct).
   
   Since it's a fix at producer side, for messages produced by older version producer, even if they are compatible with the schema, the consumer still cannot consume them. In this PR, I also checked `schema.length == 0` in `MultiVersionSchemaInfoProvider#getSchemaByVersion`. However, it means if the changes of producer were reverted, the tests could still pass.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] BewareMyPower merged pull request #15622: [Java Client] Fix messages sent by producers without schema cannot be decoded

Posted by GitBox <gi...@apache.org>.
BewareMyPower merged PR #15622:
URL: https://github.com/apache/pulsar/pull/15622


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] Technoboy- commented on pull request #15622: [Java Client] Fix messages sent by producers without schema cannot be decoded

Posted by GitBox <gi...@apache.org>.
Technoboy- commented on PR #15622:
URL: https://github.com/apache/pulsar/pull/15622#issuecomment-1128303319

   > I found #14626 also tried to fix the similar issue but it didn't fix in a correct way. Since #14261 has already been cherry-picked to branch-2.9 and branch-2.10, I also added the same labels.
   
   Why it didn't fix in a correct way ? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] BewareMyPower commented on pull request #15622: [Java Client] Fix messages sent by producers without schema cannot be decoded

Posted by GitBox <gi...@apache.org>.
BewareMyPower commented on PR #15622:
URL: https://github.com/apache/pulsar/pull/15622#issuecomment-1129896801

   @codelipenghui @congbobo184 @Technoboy-  @mattisonchao Now all required tests passed, PTAL.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org