You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2022/05/13 23:34:05 UTC

[GitHub] [pulsar] dlg99 opened a new pull request, #15598: [fix][connector] KCA Sink: org.apache.kafka.connect.errors.DataException: Invalid Java object for schema with type ..

dlg99 opened a new pull request, #15598:
URL: https://github.com/apache/pulsar/pull/15598

   ### Motivation
   
   Kafka Connect Adaptor Sink may fail with 
   
   ```
   ERROR org.apache.pulsar.io.kafka.connect.KafkaConnectSink - Error sending the record SinkRecord(sourceRecord=PulsarRecord(topicName=Optional[persistent://pexpipelinetest/connectors/cbart-pex-text.public.links], partition=0, message=Optional[org.apache.pulsar.client.impl.MessageImpl@528e9a52], schema=AUTO_CONSUME({schemaVersion=,schemaType=BYTES}{schemaVersion=\x00\x00\x00\x00\x00\x00\x00\x06,schemaType=KEY_VALUE}{schemaVersion=org.apache.pulsar.common.protocol.schema.LatestVersion@1ec80826,schemaType=KEY_VALUE}), failFunction=org.apache.pulsar.functions.source.PulsarSource$$Lambda$303/0x0000000100728040@4bcce14b, ackFunction=org.apache.pulsar.functions.source.PulsarSource$$Lambda$302/0x0000000100709c40@24de2714), value=(key = "org.apache.pulsar.client.impl.schema.generic.GenericJsonRecord@6ffabd48", value = "org.apache.pulsar.client.impl.schema.generic.GenericJsonRecord@4900f6ef"))
   org.apache.kafka.connect.errors.DataException: Invalid Java object for schema with type INT64: class java.lang.Integer for field: "txId"
   	at org.apache.kafka.connect.data.ConnectSchema.validateValue(ConnectSchema.java:246) ~[connect-api-2.7.2.jar:?]
   	at org.apache.kafka.connect.data.Struct.put(Struct.java:216) ~[connect-api-2.7.2.jar:?]
   	at org.apache.pulsar.io.kafka.connect.schema.KafkaConnectData.pulsarGenericRecordAsConnectData(KafkaConnectData.java:88) ~[pulsar-io-kafka-connect-adaptor-2.8.0.1.1.35.jar:2.8.0.1.1.35]
   	at org.apache.pulsar.io.kafka.connect.schema.KafkaConnectData.getKafkaConnectData(KafkaConnectData.java:57) ~[pulsar-io-kafka-connect-adaptor-2.8.0.1.1.35.jar:2.8.0.1.1.35]
   	at org.apache.pulsar.io.kafka.connect.schema.KafkaConnectData.pulsarGenericRecordAsConnectData(KafkaConnectData.java:88) ~[pulsar-io-kafka-connect-adaptor-2.8.0.1.1.35.jar:2.8.0.1.1.35]
   	at org.apache.pulsar.io.kafka.connect.schema.KafkaConnectData.getKafkaConnectData(KafkaConnectData.java:57) ~[pulsar-io-kafka-connect-adaptor-2.8.0.1.1.35.jar:2.8.0.1.1.35]
   	at org.apache.pulsar.io.kafka.connect.KafkaConnectSink.toSinkRecord(KafkaConnectSink.java:267) ~[pulsar-io-kafka-connect-adaptor-2.8.0.1.1.35.jar:2.8.0.1.1.35]
   	at org.apache.pulsar.io.kafka.connect.KafkaConnectSink.write(KafkaConnectSink.java:112) [pulsar-io-kafka-connect-adaptor-2.8.0.1.1.35.jar:2.8.0.1.1.35]
   	at org.apache.pulsar.functions.instance.JavaInstanceRunnable.sendOutputMessage(JavaInstanceRunnable.java:363) [com.datastax.oss-pulsar-functions-instance-2.8.0.1.1.42.jar:2.8.0.1.1.42]
   	at org.apache.pulsar.functions.instance.JavaInstanceRunnable.handleResult(JavaInstanceRunnable.java:346) [com.datastax.oss-pulsar-functions-instance-2.8.0.1.1.42.jar:2.8.0.1.1.42]
   	at org.apache.pulsar.functions.instance.JavaInstanceRunnable.run(JavaInstanceRunnable.java:295) [com.datastax.oss-pulsar-functions-instance-2.8.0.1.1.42.jar:2.8.0.1.1.42]
   	at java.lang.Thread.run(Thread.java:829) [?:?]
   ```
   
   The rootcause:
   * json is written with schema of int64 (Long) for field
   * when read, the schema is int64 but actual value we get (genericRecord.getField()) is an Integer
   * Kafka's [ConnectSchema validation](https://github.com/apache/kafka/blob/6ab4d047d563e0fe42a7c0ed6f10ddecda135595/connect/api/src/main/java/org/apache/kafka/connect/data/ConnectSchema.java#L47-L71) wants exact type (Long) and won't accept Integer/won't cast
   
   Fixing this in KCA in case there are similar quirks in the GenericAvroRecord etc + to avoid redeployment of Pulsar when only the connector can be rebuilt + to avoid potentially breaking changes.
   
   It is discussable whether fixing it in GenericJsonRecord/Generic<Whatever>Record is needed (not in this change). 
   I haven't seen it affecting anything outside of KCA yet.
   https://github.com/apache/pulsar/blob/b2678be0a97580d69da0b543a499efb3d9adbd5e/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema/generic/GenericJsonRecord.java#L61-L102
   
   As one can see the type returned is not related to SchemaInfo, e.g. binary can be returned as String, a number (BigInteger) as String even if the schema type is DOUBLE and so on.
   These cases are out of the scope of his PR, here I want to address situations when e.g. a Long/INT64 is written as json but post-read it becomes Integer and fails kafka's data validation. 
   
   ### Modifications
   
   *Describe the modifications you've done.*
   
   ### Verifying this change
   
   - [ ] Make sure that the change passes the CI checks.
   
     -  Added unit tests, modified existing tests (new test reproed the error prior to the fix)
   
   ### Does this pull request potentially affect one of the following parts:
   
   *If `yes` was chosen, please highlight the changes*
   
   NO
   
     - Dependencies (does it add or upgrade a dependency): (yes / no)
     - The public API: (yes / no)
     - The schema: (yes / no / don't know)
     - The default values of configurations: (yes / no)
     - The wire protocol: (yes / no)
     - The rest endpoints: (yes / no)
     - The admin cli options: (yes / no)
     - Anything that affects deployment: (yes / no / don't know)
   
   ### Documentation
   
   Check the box below or label this PR directly.
   
   Need to update docs? 
   
   - [ ] `doc-required` 
   (Your PR needs to update docs and you will update later)
     
   - [x] `no-need-doc` 
   (Please explain why)
     
   - [ ] `doc` 
   (Your PR contains doc changes)
   
   - [ ] `doc-added`
   (Docs have been already added)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] codelipenghui merged pull request #15598: [fix][connector] KCA Sink: org.apache.kafka.connect.errors.DataException: Invalid Java object for schema with type ..

Posted by GitBox <gi...@apache.org>.
codelipenghui merged PR #15598:
URL: https://github.com/apache/pulsar/pull/15598


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org