You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "18511327133 (via GitHub)" <gi...@apache.org> on 2023/03/08 11:31:07 UTC
[GitHub] [hudi] 18511327133 opened a new issue, #8130: Spark write meet NoSuchElementException: FileID does not exist
18511327133 opened a new issue, #8130:
URL: https://github.com/apache/hudi/issues/8130
Describe the problem you faced
Spark write hudi, upsert, state index
Stacktrace
org.apache.hudi.exception.HoodieUpsertException: Error upserting bucketType UPDATE for partition :0
at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:329)
at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.lambda$mapPartitionsAsRDD$a3ab3c4$1(BaseSparkCommitActionExecutor.java:244)
at org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102)
at org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:875)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:875)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:359)
at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:357)
at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1182)
at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1156)
at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1091)
at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1156)
at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:882)
at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:357)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:308)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:123)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
Caused by: java.util.NoSuchElementException: FileID 6942d6dd-22f7-4c37-ba3e-8c7aada81807-0 of partition path p_c=CN_1 does not exist.
at org.apache.hudi.io.HoodieMergeHandle.getLatestBaseFile(HoodieMergeHandle.java:155)
at org.apache.hudi.io.HoodieMergeHandle.<init>(HoodieMergeHandle.java:121)
at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.getUpdateHandle(BaseSparkCommitActionExecutor.java:377)
at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpdate(BaseSparkCommitActionExecutor.java:348)
at org.apache.hudi.table.action.deltacommit.BaseSparkDeltaCommitActionExecutor.handleUpdate(BaseSparkDeltaCommitActionExecutor.java:80)
at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:322)
... 30 more
To Reproduce
Steps to reproduce the behavior:
1.Spark job upsert hudi table with state index.
2.the job will encounter the exception desc above.
3.cn_1 This partition can no longer be updated with this batch of data, and the new data does not affect
Hudi version : 0.12.1
spark version:2.4.6
tblproperties (
type = 'mor',
primaryKey = 'pn',
preCombineField = 'update_date',
hoodie.cleaner.policy = 'KEEP_LATEST_COMMITS',
hoodie.cleaner.commits.retained = 2,
hoodie.keep.min.commits = 5,
hoodie.keep.max.commits = 10,
hoodie.datasource.write.hive_style_partitioning = 'true',
hoodie.compact.inline.trigger.strategy = 'NUM_OR_TIME',
hoodie.compact.inline.max.delta.commits= 5,
hoodie.parquet.max.file.size= 230686720,
hoodie.parquet.small.file.limit= 188743680,
hoodie.compact.inline.max.delta.seconds= 43200
)partitioned by (p_c)
I tried deleting metadata but it didn't work
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] nsivabalan commented on issue #8130: Spark java.util.NoSuchElementException: FileID * partition path p_c=CN_1 does not exist.
Posted by "nsivabalan (via GitHub)" <gi...@apache.org>.
nsivabalan commented on issue #8130:
URL: https://github.com/apache/hudi/issues/8130#issuecomment-1465085889
how did you delete metadata table btw. can you post the contents of ".hoodie/hoodie.properties"
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] ad1happy2go commented on issue #8130: Spark java.util.NoSuchElementException: FileID * partition path p_c=CN_1 does not exist.
Posted by "ad1happy2go (via GitHub)" <gi...@apache.org>.
ad1happy2go commented on issue #8130:
URL: https://github.com/apache/hudi/issues/8130#issuecomment-1503826578
@18511327133
Couldn't able to reproduce the issue. Can you provide exact reproducible script with your datasets.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] nsivabalan commented on issue #8130: Spark java.util.NoSuchElementException: FileID * partition path p_c=CN_1 does not exist.
Posted by "nsivabalan (via GitHub)" <gi...@apache.org>.
nsivabalan commented on issue #8130:
URL: https://github.com/apache/hudi/issues/8130#issuecomment-1465085778
whats the index type you are using? this seems straight forward. not sure whats the issue here. I don't see any issues w/ write configs as such.
Can you give us a reproducible code w/ some sample data(anonymized)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org