You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Vinoth Chandar (Jira)" <ji...@apache.org> on 2021/10/08 01:01:28 UTC

[jira] [Updated] (HUDI-2424) Error checking bloom filter index (NPE)

     [ https://issues.apache.org/jira/browse/HUDI-2424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vinoth Chandar updated HUDI-2424:
---------------------------------
    Labels: user-support-issues  (was: )

> Error checking bloom filter index (NPE)
> ---------------------------------------
>
>                 Key: HUDI-2424
>                 URL: https://issues.apache.org/jira/browse/HUDI-2424
>             Project: Apache Hudi
>          Issue Type: Bug
>            Reporter: Jakub Kubala
>            Priority: Major
>              Labels: user-support-issues
>
> Hi,
> Recently we have encountered an issue with Hudi where NPE is thrown out of nowhere during processing the content.
> As we have over 100k of the content to process, I cannot easily narrow down to what is the troublesome piece.
> We are using configurations that come with AWS EMR v5.30 (Hudi 0.5.2) and v5.33(Hudi 0.7.0)
>  
> {code:java}
> 21/09/10 18:31:14 WARN TaskSetManager: Lost task 1.0 in stage 38.0 (TID 23804, ip-10-208-160-140.eu-central-1.compute.internal, executor 2): java.lang.RuntimeException: org.apache.hudi.exception.HoodieIndexException: Error checking bloom filter index. at org.apache.hudi.client.utils.LazyIterableIterator.next(LazyIterableIterator.java:121) at scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:43) at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:435) at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:441) at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:462) at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440) at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:154) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55) at org.apache.spark.scheduler.Task.run(Task.scala:123) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1405) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.hudi.exception.HoodieIndexException: Error checking bloom filter index. at org.apache.hudi.index.bloom.HoodieBloomIndexCheckFunction$LazyKeyCheckIterator.computeNext(HoodieBloomIndexCheckFunction.java:110) at org.apache.hudi.index.bloom.HoodieBloomIndexCheckFunction$LazyKeyCheckIterator.computeNext(HoodieBloomIndexCheckFunction.java:60) at org.apache.hudi.client.utils.LazyIterableIterator.next(LazyIterableIterator.java:119) ... 15 more Caused by: java.lang.NullPointerException at org.apache.hudi.io.HoodieKeyLookupHandle.addKey(HoodieKeyLookupHandle.java:99) at org.apache.hudi.index.bloom.HoodieBloomIndexCheckFunction$LazyKeyCheckIterator.computeNext(HoodieBloomIndexCheckFunction.java:97) ... 17 more21/09/10 18:31:14 INFO TaskSetManager: Starting task 1.1 in stage 38.0 (TID 23805, ip-10-208-160-140.eu-central-1.compute.internal, executor 1, partition 1, NODE_LOCAL, 7662 bytes) 21/09/10 18:31:18 INFO TaskSetManager: Lost task 1.1 in stage 38.0 (TID 23805) on ip-10-208-160-140.eu-central-1.compute.internal, executor 1: java.lang.RuntimeException (org.apache.hudi.exception.HoodieIndexException: Error checking bloom filter index. ) [duplicate 1] 21/09/10 18:31:18 INFO TaskSetManager: Starting task 1.2 in stage 38.0 (TID 23806, ip-10-208-160-140.eu-central-1.compute.internal, executor 1, partition 1, NODE_LOCAL, 7662 bytes) 21/09/10 18:31:21 INFO TaskSetManager: Lost task 1.2 in stage 38.0 (TID 23806) on ip-10-208-160-140.eu-central-1.compute.internal, executor 1: java.lang.RuntimeException (org.apache.hudi.exception.HoodieIndexException: Error checking bloom filter index. ) [duplicate 2] 21/09/10 18:31:21 INFO TaskSetManager: Starting task 1.3 in stage 38.0 (TID 23807, ip-10-208-160-140.eu-central-1.compute.internal, executor 2, partition 1, NODE_LOCAL, 7662 bytes) 21/09/10 18:31:25 WARN TaskSetManager: Lost task 1.3 in stage 38.0 (TID 23807, ip-10-208-160-140.eu-central-1.compute.internal, executor 2): java.lang.RuntimeException: org.apache.hudi.exception.HoodieIndexException: Error checking bloom filter index. at org.apache.hudi.client.utils.LazyIterableIterator.next(LazyIterableIterator.java:121) at scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:43) at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:435) at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:441) at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:462) at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440) at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:154) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55) at org.apache.spark.scheduler.Task.run(Task.scala:123) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1405) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.hudi.exception.HoodieIndexException: Error checking bloom filter index. at org.apache.hudi.index.bloom.HoodieBloomIndexCheckFunction$LazyKeyCheckIterator.computeNext(HoodieBloomIndexCheckFunction.java:110) at org.apache.hudi.index.bloom.HoodieBloomIndexCheckFunction$LazyKeyCheckIterator.computeNext(HoodieBloomIndexCheckFunction.java:60) at org.apache.hudi.client.utils.LazyIterableIterator.next(LazyIterableIterator.java:119) ... 15 more Caused by: java.lang.NullPointerException at org.apache.hudi.io.HoodieKeyLookupHandle.addKey(HoodieKeyLookupHandle.java:99) at org.apache.hudi.index.bloom.HoodieBloomIndexCheckFunction$LazyKeyCheckIterator.computeNext(HoodieBloomIndexCheckFunction.java:97) ... 17 more21/09/10 18:31:25 ERROR TaskSetManager: Task 1 in stage 38.0 failed 4 times; aborting job 21/09/10 18:31:25 INFO YarnScheduler: Cancelling stage 38 21/09/10 18:31:25 INFO YarnScheduler: Killing all running tasks in stage 38: Stage cancelled 21/09/10 18:31:25 INFO YarnScheduler: Stage 38 was cancelled 21/09/10 18:31:25 INFO DAGScheduler: ShuffleMapStage 38 (flatMapToPair at HoodieBloomIndex.java:308) failed in 15.973 s due to Job aborted due to stage failure: Task 1 in stage 38.0 failed 4 times, most recent failure: Lost task 1.3 in stage 38.0 (TID 23807, ip-10-208-160-140.eu-central-1.compute.internal, executor 2): java.lang.RuntimeException: org.apache.hudi.exception.HoodieIndexException: Error checking bloom filter index. at org.apache.hudi.client.utils.LazyIterableIterator.next(LazyIterableIterator.java:121) at scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:43) at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:435) at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:441) at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:462) at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440) at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:154) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55) at org.apache.spark.scheduler.Task.run(Task.scala:123) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1405) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.hudi.exception.HoodieIndexException: Error checking bloom filter index. at org.apache.hudi.index.bloom.HoodieBloomIndexCheckFunction$LazyKeyCheckIterator.computeNext(HoodieBloomIndexCheckFunction.java:110) at org.apache.hudi.index.bloom.HoodieBloomIndexCheckFunction$LazyKeyCheckIterator.computeNext(HoodieBloomIndexCheckFunction.java:60) at org.apache.hudi.client.utils.LazyIterableIterator.next(LazyIterableIterator.java:119) ... 15 more Caused by: java.lang.NullPointerException at org.apache.hudi.io.HoodieKeyLookupHandle.addKey(HoodieKeyLookupHandle.java:99) at org.apache.hudi.index.bloom.HoodieBloomIndexCheckFunction$LazyKeyCheckIterator.computeNext(HoodieBloomIndexCheckFunction.java:97) ... 17 more
> {code}
> Can you help me with this?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)