You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@hudi.apache.org by "18511327133 (via GitHub)" <gi...@apache.org> on 2023/03/08 11:31:07 UTC

[GitHub] [hudi] 18511327133 opened a new issue, #8130: Spark write meet NoSuchElementException: FileID does not exist

18511327133 opened a new issue, #8130:
URL: https://github.com/apache/hudi/issues/8130

   
   
   Describe the problem you faced
   
   Spark write hudi, upsert, state index
   
   Stacktrace
    org.apache.hudi.exception.HoodieUpsertException: Error upserting bucketType UPDATE for partition :0
           at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:329)
           at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.lambda$mapPartitionsAsRDD$a3ab3c4$1(BaseSparkCommitActionExecutor.java:244)
           at org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102)
           at org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102)
           at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:875)
           at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:875)
           at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
           at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
           at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
           at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
           at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
           at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:359)
           at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:357)
           at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1182)
           at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1156)
           at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1091)
           at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1156)
           at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:882)
           at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:357)
           at org.apache.spark.rdd.RDD.iterator(RDD.scala:308)
           at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
           at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
           at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
           at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
           at org.apache.spark.scheduler.Task.run(Task.scala:123)
           at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
           at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
           at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
           at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
           at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
           at java.lang.Thread.run(Thread.java:750)
   Caused by: java.util.NoSuchElementException: FileID 6942d6dd-22f7-4c37-ba3e-8c7aada81807-0 of partition path p_c=CN_1 does not exist.
           at org.apache.hudi.io.HoodieMergeHandle.getLatestBaseFile(HoodieMergeHandle.java:155)
           at org.apache.hudi.io.HoodieMergeHandle.<init>(HoodieMergeHandle.java:121)
           at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.getUpdateHandle(BaseSparkCommitActionExecutor.java:377)
           at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpdate(BaseSparkCommitActionExecutor.java:348)
           at org.apache.hudi.table.action.deltacommit.BaseSparkDeltaCommitActionExecutor.handleUpdate(BaseSparkDeltaCommitActionExecutor.java:80)
           at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:322)
           ... 30 more
   
   To Reproduce
   
   Steps to reproduce the behavior:
   
   1.Spark job upsert hudi table with state index.
   2.the job will encounter the exception desc above.
   3.cn_1 This partition can no longer be updated with this batch of data, and the new data does not affect
   
   Hudi version : 0.12.1
   spark version:2.4.6 
   
   tblproperties (
     type = 'mor',
     primaryKey = 'pn',
     preCombineField = 'update_date',
     hoodie.cleaner.policy = 'KEEP_LATEST_COMMITS',
     hoodie.cleaner.commits.retained = 2,
     hoodie.keep.min.commits = 5,
     hoodie.keep.max.commits = 10,
     hoodie.datasource.write.hive_style_partitioning = 'true',
     hoodie.compact.inline.trigger.strategy = 'NUM_OR_TIME',
     hoodie.compact.inline.max.delta.commits= 5,
     hoodie.parquet.max.file.size= 230686720,
     hoodie.parquet.small.file.limit= 188743680,
     hoodie.compact.inline.max.delta.seconds= 43200
   )partitioned by (p_c)
   
   
   I tried deleting metadata but it didn't work
   
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] nsivabalan commented on issue #8130: Spark java.util.NoSuchElementException: FileID * partition path p_c=CN_1 does not exist.

Posted by "nsivabalan (via GitHub)" <gi...@apache.org>.

nsivabalan commented on issue #8130:
URL: https://github.com/apache/hudi/issues/8130#issuecomment-1465085889

   how did you delete metadata table btw. can you post the contents of ".hoodie/hoodie.properties"


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] ad1happy2go commented on issue #8130: Spark java.util.NoSuchElementException: FileID * partition path p_c=CN_1 does not exist.

Posted by "ad1happy2go (via GitHub)" <gi...@apache.org>.

ad1happy2go commented on issue #8130:
URL: https://github.com/apache/hudi/issues/8130#issuecomment-1503826578

   @18511327133 
   
   Couldn't able to reproduce the issue. Can you provide exact reproducible script with your datasets.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] nsivabalan commented on issue #8130: Spark java.util.NoSuchElementException: FileID * partition path p_c=CN_1 does not exist.

Posted by "nsivabalan (via GitHub)" <gi...@apache.org>.

nsivabalan commented on issue #8130:
URL: https://github.com/apache/hudi/issues/8130#issuecomment-1465085778

   whats the index type you are using? this seems straight forward. not sure whats the issue here. I don't see any issues w/ write configs as such. 
   Can you give us a reproducible code w/ some sample data(anonymized) 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org