You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "wolf8334 (via GitHub)" <gi...@apache.org> on 2023/03/22 09:43:08 UTC

[GitHub] [hudi] wolf8334 opened a new issue, #8268: [SUPPORT]Got an NPE when Using HoodieDeltaStreamer with the delete command

wolf8334 opened a new issue, #8268:
URL: https://github.com/apache/hudi/issues/8268

   **Describe the problem you faced**
   
   I have already setup a system sync data from mysql to hudi on hdfs.
   The software I used list below:
   MySQL,Debezium,Kafka,HoodieDeltaStreamer 
   
   When I insert or update the data existing in the table,it works fine.
   but when I run the sql delete from xx,the HoodieDeltaStreamer shows a NPE and exit.
   I write one kafka client and found out the delete command is devided into lots of single delete command and some kafka message which both key and value are null.
   
   I run the delete command in mysql,and I first got one single delete message and the next message's key and value are null.
   
   My Code Is
   `System.out.printf("offset = %d, key = %s, value = %s \n", record.offset(), record.key(), record.value());`
   
   And the result is
   `offset = 538, key = null, value = {"before":xxx omit here}
   offset = 539, key = null, value = null `
   
   and HoodieDeltaStreamer shows log below
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1.setup debezium and kafka and so on
   2.insert some data into mysql table to make sure the table has some data inside
   3.run the delete command
   4.deltastream shows NPE
   
   **Expected behavior**
   
   It runs correctly,which is to delete the data in my hudi table.
   
   **Environment Description**
   
   * Hudi version : 0.13.0
   
   * Spark version : 3.3.2
   
   * Hive version : none
   
   * Hadoop version : 3.3.4
   
   * Storage (HDFS/S3/GCS..) :HDFS
   
   * Running on Docker? (yes/no) :no
   
   
   **Additional context**
   
   **Stacktrace**
   
   ```23/03/22 17:16:32 WARN TaskSetManager: Lost task 0.0 in stage 1.0 (TID 1) (hudi executor 1): java.lang.NullPointerException
           at org.apache.hudi.org.apache.spark.sql.avro.AvroDeserializer.$anonfun$getRecordWriter$3(AvroDeserializer.scala:389)
           at org.apache.hudi.org.apache.spark.sql.avro.AvroDeserializer.$anonfun$getRecordWriter$3$adapted(AvroDeserializer.scala:385)
           at org.apache.hudi.org.apache.spark.sql.avro.AvroDeserializer.$anonfun$converter$4(AvroDeserializer.scala:87)
           at org.apache.hudi.org.apache.spark.sql.avro.AvroDeserializer.deserialize(AvroDeserializer.scala:105)
           at org.apache.hudi.org.apache.spark.sql.avro.HoodieSpark3_3AvroDeserializer.deserialize(HoodieSpark3_3AvroDeserializer.scala:30)
           at org.apache.hudi.AvroConversionUtils$.$anonfun$createAvroToInternalRowConverter$1(AvroConversionUtils.scala:67)
           at org.apache.hudi.AvroConversionUtils$.$anonfun$createConverterToRow$1(AvroConversionUtils.scala:95)
           at org.apache.hudi.AvroConversionUtils$.$anonfun$createDataFrame$2(AvroConversionUtils.scala:129)
           at scala.collection.Iterator$$anon$10.next(Iterator.scala:461)
           at scala.collection.Iterator$$anon$10.next(Iterator.scala:461)
           at scala.collection.Iterator$$anon$10.next(Iterator.scala:461)
           at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:514)
           at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
           at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
           at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
           at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
           at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
           at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
           at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:760)
           at scala.collection.Iterator.isEmpty(Iterator.scala:387)
           at scala.collection.Iterator.isEmpty$(Iterator.scala:387)
           at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.isEmpty(WholeStageCodegenExec.scala:758)
           at org.apache.hudi.HoodieSparkUtils$.$anonfun$createRdd$2(HoodieSparkUtils.scala:99)
           at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2(RDD.scala:855)
           at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2$adapted(RDD.scala:855)
           at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
           at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365)
           at org.apache.spark.rdd.RDD.iterator(RDD.scala:329)
           at org.apache.spark.sql.execution.SQLConfInjectingRDD.$anonfun$compute$1(SQLConfInjectingRDD.scala:55)
           at org.apache.spark.sql.internal.SQLConf$.withExistingConf(SQLConf.scala:158)
           at org.apache.spark.sql.execution.SQLConfInjectingRDD.compute(SQLConfInjectingRDD.scala:55)
           at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365)
           at org.apache.spark.rdd.RDD.iterator(RDD.scala:329)
           at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
           at org.apache.spark.scheduler.Task.run(Task.scala:136)
           at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548)
           at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1504)
           at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551)
           at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
           at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
           at java.lang.Thread.run(Thread.java:750)
   
   23/03/22 17:16:33 ERROR TaskSetManager: Task 0 in stage 1.0 failed 4 times; aborting job
   23/03/22 17:16:33 ERROR HoodieDeltaStreamer: Shutting down delta-sync due to exception
   org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 4) (hudi executor 1): java.lang.NullPointerException
           at org.apache.hudi.org.apache.spark.sql.avro.AvroDeserializer.$anonfun$getRecordWriter$3(AvroDeserializer.scala:389)
           at org.apache.hudi.org.apache.spark.sql.avro.AvroDeserializer.$anonfun$getRecordWriter$3$adapted(AvroDeserializer.scala:385)
           at org.apache.hudi.org.apache.spark.sql.avro.AvroDeserializer.$anonfun$converter$4(AvroDeserializer.scala:87)
           at org.apache.hudi.org.apache.spark.sql.avro.AvroDeserializer.deserialize(AvroDeserializer.scala:105)
           at org.apache.hudi.org.apache.spark.sql.avro.HoodieSpark3_3AvroDeserializer.deserialize(HoodieSpark3_3AvroDeserializer.scala:30)
           at org.apache.hudi.AvroConversionUtils$.$anonfun$createAvroToInternalRowConverter$1(AvroConversionUtils.scala:67)
           at org.apache.hudi.AvroConversionUtils$.$anonfun$createConverterToRow$1(AvroConversionUtils.scala:95)
           at org.apache.hudi.AvroConversionUtils$.$anonfun$createDataFrame$2(AvroConversionUtils.scala:129)
           at scala.collection.Iterator$$anon$10.next(Iterator.scala:461)
           at scala.collection.Iterator$$anon$10.next(Iterator.scala:461)
           at scala.collection.Iterator$$anon$10.next(Iterator.scala:461)
           at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:514)
           at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
           at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
           at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
           at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
           at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
           at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
           at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:760)
           at scala.collection.Iterator.isEmpty(Iterator.scala:387)
           at scala.collection.Iterator.isEmpty$(Iterator.scala:387)
           at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.isEmpty(WholeStageCodegenExec.scala:758)
           at org.apache.hudi.HoodieSparkUtils$.$anonfun$createRdd$2(HoodieSparkUtils.scala:99)
           at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2(RDD.scala:855)
           at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2$adapted(RDD.scala:855)
           at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
           at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365)
           at org.apache.spark.rdd.RDD.iterator(RDD.scala:329)
           at org.apache.spark.sql.execution.SQLConfInjectingRDD.$anonfun$compute$1(SQLConfInjectingRDD.scala:55)
           at org.apache.spark.sql.internal.SQLConf$.withExistingConf(SQLConf.scala:158)
           at org.apache.spark.sql.execution.SQLConfInjectingRDD.compute(SQLConfInjectingRDD.scala:55)
           at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365)
           at org.apache.spark.rdd.RDD.iterator(RDD.scala:329)
           at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
           at org.apache.spark.scheduler.Task.run(Task.scala:136)
           at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548)
           at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1504)
           at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551)
           at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
           at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
           at java.lang.Thread.run(Thread.java:750)
   
   Driver stacktrace:
           at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2672)
           at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2608)
           at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2607)
           at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
           at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
           at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
           at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2607)
           at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1182)
           at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1182)
           at scala.Option.foreach(Option.scala:407)
           at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1182)
           at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2860)
           at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2802)
           at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2791)
           at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
           at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:952)
           at org.apache.spark.SparkContext.runJob(SparkContext.scala:2238)
           at org.apache.spark.SparkContext.runJob(SparkContext.scala:2259)
           at org.apache.spark.SparkContext.runJob(SparkContext.scala:2278)
           at org.apache.spark.rdd.RDD.$anonfun$take$1(RDD.scala:1470)
           at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
           at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
           at org.apache.spark.rdd.RDD.withScope(RDD.scala:406)
           at org.apache.spark.rdd.RDD.take(RDD.scala:1443)
           at org.apache.spark.rdd.RDD.$anonfun$isEmpty$1(RDD.scala:1578)
           at scala.runtime.java8.JFunction0$mcZ$sp.apply(JFunction0$mcZ$sp.java:23)
           at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
           at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
           at org.apache.spark.rdd.RDD.withScope(RDD.scala:406)
           at org.apache.spark.rdd.RDD.isEmpty(RDD.scala:1578)
           at org.apache.spark.api.java.JavaRDDLike.isEmpty(JavaRDDLike.scala:558)
           at org.apache.spark.api.java.JavaRDDLike.isEmpty$(JavaRDDLike.scala:558)
           at org.apache.spark.api.java.AbstractJavaRDDLike.isEmpty(JavaRDDLike.scala:45)
           at org.apache.hudi.utilities.deltastreamer.DeltaSync.fetchFromSource(DeltaSync.java:545)
           at org.apache.hudi.utilities.deltastreamer.DeltaSync.readFromSource(DeltaSync.java:460)
           at org.apache.hudi.utilities.deltastreamer.DeltaSync.syncOnce(DeltaSync.java:364)
           at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer$DeltaSyncService.lambda$startService$1(HoodieDeltaStreamer.java:716)
           at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
           at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
           at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
           at java.lang.Thread.run(Thread.java:750)
   Caused by: java.lang.NullPointerException
           at org.apache.hudi.org.apache.spark.sql.avro.AvroDeserializer.$anonfun$getRecordWriter$3(AvroDeserializer.scala:389)
           at org.apache.hudi.org.apache.spark.sql.avro.AvroDeserializer.$anonfun$getRecordWriter$3$adapted(AvroDeserializer.scala:385)
           at org.apache.hudi.org.apache.spark.sql.avro.AvroDeserializer.$anonfun$converter$4(AvroDeserializer.scala:87)
           at org.apache.hudi.org.apache.spark.sql.avro.AvroDeserializer.deserialize(AvroDeserializer.scala:105)
           at org.apache.hudi.org.apache.spark.sql.avro.HoodieSpark3_3AvroDeserializer.deserialize(HoodieSpark3_3AvroDeserializer.scala:30)
           at org.apache.hudi.AvroConversionUtils$.$anonfun$createAvroToInternalRowConverter$1(AvroConversionUtils.scala:67)
           at org.apache.hudi.AvroConversionUtils$.$anonfun$createConverterToRow$1(AvroConversionUtils.scala:95)
           at org.apache.hudi.AvroConversionUtils$.$anonfun$createDataFrame$2(AvroConversionUtils.scala:129)
           at scala.collection.Iterator$$anon$10.next(Iterator.scala:461)
           at scala.collection.Iterator$$anon$10.next(Iterator.scala:461)
           at scala.collection.Iterator$$anon$10.next(Iterator.scala:461)
           at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:514)
           at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
           at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
           at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
           at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
           at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
           at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
           at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:760)
           at scala.collection.Iterator.isEmpty(Iterator.scala:387)
           at scala.collection.Iterator.isEmpty$(Iterator.scala:387)
           at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.isEmpty(WholeStageCodegenExec.scala:758)
           at org.apache.hudi.HoodieSparkUtils$.$anonfun$createRdd$2(HoodieSparkUtils.scala:99)
           at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2(RDD.scala:855)
           at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2$adapted(RDD.scala:855)
           at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
           at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365)
           at org.apache.spark.rdd.RDD.iterator(RDD.scala:329)
           at org.apache.spark.sql.execution.SQLConfInjectingRDD.$anonfun$compute$1(SQLConfInjectingRDD.scala:55)
           at org.apache.spark.sql.internal.SQLConf$.withExistingConf(SQLConf.scala:158)
           at org.apache.spark.sql.execution.SQLConfInjectingRDD.compute(SQLConfInjectingRDD.scala:55)
           at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365)
           at org.apache.spark.rdd.RDD.iterator(RDD.scala:329)
           at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
           at org.apache.spark.scheduler.Task.run(Task.scala:136)
           at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548)
           at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1504)
           at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551)
           ... 3 more
   23/03/22 17:16:33 ERROR HoodieAsyncService: Service shutdown with error
   java.util.concurrent.ExecutionException: org.apache.hudi.exception.HoodieException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 4) (hudi executor 1): java.lang.NullPointerException
           at org.apache.hudi.org.apache.spark.sql.avro.AvroDeserializer.$anonfun$getRecordWriter$3(AvroDeserializer.scala:389)
           at org.apache.hudi.org.apache.spark.sql.avro.AvroDeserializer.$anonfun$getRecordWriter$3$adapted(AvroDeserializer.scala:385)
           at org.apache.hudi.org.apache.spark.sql.avro.AvroDeserializer.$anonfun$converter$4(AvroDeserializer.scala:87)
           at org.apache.hudi.org.apache.spark.sql.avro.AvroDeserializer.deserialize(AvroDeserializer.scala:105)
           at org.apache.hudi.org.apache.spark.sql.avro.HoodieSpark3_3AvroDeserializer.deserialize(HoodieSpark3_3AvroDeserializer.scala:30)
           at org.apache.hudi.AvroConversionUtils$.$anonfun$createAvroToInternalRowConverter$1(AvroConversionUtils.scala:67)
           at org.apache.hudi.AvroConversionUtils$.$anonfun$createConverterToRow$1(AvroConversionUtils.scala:95)
           at org.apache.hudi.AvroConversionUtils$.$anonfun$createDataFrame$2(AvroConversionUtils.scala:129)
           at scala.collection.Iterator$$anon$10.next(Iterator.scala:461)
           at scala.collection.Iterator$$anon$10.next(Iterator.scala:461)
           at scala.collection.Iterator$$anon$10.next(Iterator.scala:461)
           at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:514)
           at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
           at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
           at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
           at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
           at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
           at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
           at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:760)
           at scala.collection.Iterator.isEmpty(Iterator.scala:387)
           at scala.collection.Iterator.isEmpty$(Iterator.scala:387)
           at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.isEmpty(WholeStageCodegenExec.scala:758)
           at org.apache.hudi.HoodieSparkUtils$.$anonfun$createRdd$2(HoodieSparkUtils.scala:99)
           at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2(RDD.scala:855)
           at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2$adapted(RDD.scala:855)
           at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
           at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365)
           at org.apache.spark.rdd.RDD.iterator(RDD.scala:329)
           at org.apache.spark.sql.execution.SQLConfInjectingRDD.$anonfun$compute$1(SQLConfInjectingRDD.scala:55)
           at org.apache.spark.sql.internal.SQLConf$.withExistingConf(SQLConf.scala:158)
           at org.apache.spark.sql.execution.SQLConfInjectingRDD.compute(SQLConfInjectingRDD.scala:55)
           at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365)
           at org.apache.spark.rdd.RDD.iterator(RDD.scala:329)
           at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
           at org.apache.spark.scheduler.Task.run(Task.scala:136)
           at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548)
           at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1504)
           at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551)
           at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
           at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
           at java.lang.Thread.run(Thread.java:750)
   
   Driver stacktrace:
           at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
           at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
           at org.apache.hudi.async.HoodieAsyncService.waitForShutdown(HoodieAsyncService.java:103)
           at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.lambda$sync$1(HoodieDeltaStreamer.java:195)
           at org.apache.hudi.common.util.Option.ifPresent(Option.java:97)
           at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.sync(HoodieDeltaStreamer.java:192)
           at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.main(HoodieDeltaStreamer.java:573)
           at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
           at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
           at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
           at java.lang.reflect.Method.invoke(Method.java:498)
           at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
           at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:958)
           at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
           at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
           at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
           at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1046)
           at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1055)
           at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
   Caused by: org.apache.hudi.exception.HoodieException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 4) (hudi executor 1): java.lang.NullPointerException
           at org.apache.hudi.org.apache.spark.sql.avro.AvroDeserializer.$anonfun$getRecordWriter$3(AvroDeserializer.scala:389)
           at org.apache.hudi.org.apache.spark.sql.avro.AvroDeserializer.$anonfun$getRecordWriter$3$adapted(AvroDeserializer.scala:385)
           at org.apache.hudi.org.apache.spark.sql.avro.AvroDeserializer.$anonfun$converter$4(AvroDeserializer.scala:87)
           at org.apache.hudi.org.apache.spark.sql.avro.AvroDeserializer.deserialize(AvroDeserializer.scala:105)
           at org.apache.hudi.org.apache.spark.sql.avro.HoodieSpark3_3AvroDeserializer.deserialize(HoodieSpark3_3AvroDeserializer.scala:30)
           at org.apache.hudi.AvroConversionUtils$.$anonfun$createAvroToInternalRowConverter$1(AvroConversionUtils.scala:67)
           at org.apache.hudi.AvroConversionUtils$.$anonfun$createConverterToRow$1(AvroConversionUtils.scala:95)
           at org.apache.hudi.AvroConversionUtils$.$anonfun$createDataFrame$2(AvroConversionUtils.scala:129)
           at scala.collection.Iterator$$anon$10.next(Iterator.scala:461)
           at scala.collection.Iterator$$anon$10.next(Iterator.scala:461)
           at scala.collection.Iterator$$anon$10.next(Iterator.scala:461)
           at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:514)
           at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
           at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
           at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
           at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
           at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
           at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
           at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:760)
           at scala.collection.Iterator.isEmpty(Iterator.scala:387)
           at scala.collection.Iterator.isEmpty$(Iterator.scala:387)
           at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.isEmpty(WholeStageCodegenExec.scala:758)
           at org.apache.hudi.HoodieSparkUtils$.$anonfun$createRdd$2(HoodieSparkUtils.scala:99)
           at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2(RDD.scala:855)
           at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2$adapted(RDD.scala:855)
           at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
           at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365)
           at org.apache.spark.rdd.RDD.iterator(RDD.scala:329)
           at org.apache.spark.sql.execution.SQLConfInjectingRDD.$anonfun$compute$1(SQLConfInjectingRDD.scala:55)
           at org.apache.spark.sql.internal.SQLConf$.withExistingConf(SQLConf.scala:158)
           at org.apache.spark.sql.execution.SQLConfInjectingRDD.compute(SQLConfInjectingRDD.scala:55)
           at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365)
           at org.apache.spark.rdd.RDD.iterator(RDD.scala:329)
           at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
           at org.apache.spark.scheduler.Task.run(Task.scala:136)
           at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548)
           at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1504)
           at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551)
           at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
           at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
           at java.lang.Thread.run(Thread.java:750)
   
   Driver stacktrace:
           at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer$DeltaSyncService.lambda$startService$1(HoodieDeltaStreamer.java:758)
           at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
           at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
           at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
           at java.lang.Thread.run(Thread.java:750)
   Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 4) (hudi executor 1): java.lang.NullPointerException
           at org.apache.hudi.org.apache.spark.sql.avro.AvroDeserializer.$anonfun$getRecordWriter$3(AvroDeserializer.scala:389)
           at org.apache.hudi.org.apache.spark.sql.avro.AvroDeserializer.$anonfun$getRecordWriter$3$adapted(AvroDeserializer.scala:385)
           at org.apache.hudi.org.apache.spark.sql.avro.AvroDeserializer.$anonfun$converter$4(AvroDeserializer.scala:87)
           at org.apache.hudi.org.apache.spark.sql.avro.AvroDeserializer.deserialize(AvroDeserializer.scala:105)
           at org.apache.hudi.org.apache.spark.sql.avro.HoodieSpark3_3AvroDeserializer.deserialize(HoodieSpark3_3AvroDeserializer.scala:30)
           at org.apache.hudi.AvroConversionUtils$.$anonfun$createAvroToInternalRowConverter$1(AvroConversionUtils.scala:67)
           at org.apache.hudi.AvroConversionUtils$.$anonfun$createConverterToRow$1(AvroConversionUtils.scala:95)
           at org.apache.hudi.AvroConversionUtils$.$anonfun$createDataFrame$2(AvroConversionUtils.scala:129)
           at scala.collection.Iterator$$anon$10.next(Iterator.scala:461)
           at scala.collection.Iterator$$anon$10.next(Iterator.scala:461)
           at scala.collection.Iterator$$anon$10.next(Iterator.scala:461)
           at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:514)
           at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
           at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
           at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
           at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
           at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
           at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
           at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:760)
           at scala.collection.Iterator.isEmpty(Iterator.scala:387)
           at scala.collection.Iterator.isEmpty$(Iterator.scala:387)
           at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.isEmpty(WholeStageCodegenExec.scala:758)
           at org.apache.hudi.HoodieSparkUtils$.$anonfun$createRdd$2(HoodieSparkUtils.scala:99)
           at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2(RDD.scala:855)
           at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2$adapted(RDD.scala:855)
           at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
           at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365)
           at org.apache.spark.rdd.RDD.iterator(RDD.scala:329)
           at org.apache.spark.sql.execution.SQLConfInjectingRDD.$anonfun$compute$1(SQLConfInjectingRDD.scala:55)
           at org.apache.spark.sql.internal.SQLConf$.withExistingConf(SQLConf.scala:158)
           at org.apache.spark.sql.execution.SQLConfInjectingRDD.compute(SQLConfInjectingRDD.scala:55)
           at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365)
           at org.apache.spark.rdd.RDD.iterator(RDD.scala:329)
           at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
           at org.apache.spark.scheduler.Task.run(Task.scala:136)
           at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548)
           at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1504)
           at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551)
           at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
           at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
           at java.lang.Thread.run(Thread.java:750).```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on issue #8268: [SUPPORT]Got an NPE when Using HoodieDeltaStreamer with the delete command

Posted by "danny0405 (via GitHub)" <gi...@apache.org>.
danny0405 commented on issue #8268:
URL: https://github.com/apache/hudi/issues/8268#issuecomment-1480559124

   The key should never expect to be null, I guess.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] wolf8334 commented on issue #8268: [SUPPORT]Got an NPE when Using HoodieDeltaStreamer with the delete command

Posted by "wolf8334 (via GitHub)" <gi...@apache.org>.
wolf8334 commented on issue #8268:
URL: https://github.com/apache/hudi/issues/8268#issuecomment-1482198270

   But the fact is,the first message I received,there is one key withe the key op and value d.
   So In my opinion,the first message tells hudi to delete one row,but the second makes no sense.
   I can sync the data when running insert or update,and their key are null.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on issue #8268: [SUPPORT]Got an NPE when Using HoodieDeltaStreamer with the delete command

Posted by "danny0405 (via GitHub)" <gi...@apache.org>.
danny0405 commented on issue #8268:
URL: https://github.com/apache/hudi/issues/8268#issuecomment-1482170915

   The message with a null primary key means nothing, the storage engine can not handle the message if the key is missed. Because usually a delete message indicates a retraction action for previous payload, without the key, the storage has no idea which payload to retract.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] [SUPPORT]Got an NPE when Using HoodieDeltaStreamer with the delete command [hudi]

Posted by "ad1happy2go (via GitHub)" <gi...@apache.org>.
ad1happy2go commented on issue #8268:
URL: https://github.com/apache/hudi/issues/8268#issuecomment-1997238899

   For Everybody , In case we can't disable tombstones at source then we can refer below code to skip the null messages - 
   https://github.com/sydneyhoran/hudi/commit/b864a69e27d50424b6984f28a31c3bd99a025762
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] wolf8334 commented on issue #8268: [SUPPORT]Got an NPE when Using HoodieDeltaStreamer with the delete command

Posted by "wolf8334 (via GitHub)" <gi...@apache.org>.
wolf8334 commented on issue #8268:
URL: https://github.com/apache/hudi/issues/8268#issuecomment-1480631164

   Yes.When I get the message with the null key,I am confused.
   But all of the keys in my POC enviorment are null,and only got null value when I run the delete from xx command.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] ad1happy2go commented on issue #8268: [SUPPORT]Got an NPE when Using HoodieDeltaStreamer with the delete command

Posted by "ad1happy2go (via GitHub)" <gi...@apache.org>.
ad1happy2go commented on issue #8268:
URL: https://github.com/apache/hudi/issues/8268#issuecomment-1578486469

   @wolf8334 This is known issue actually. This happens when tombstones.on.delete is set to true. You can make it false in debezium connector to fix the same. 
   
   When this setting is true, it is producing NULL record to Kafka topic which is failing the pipeline.
   
   JIRA created - https://issues.apache.org/jira/browse/HUDI-6321
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org