You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/09/27 02:57:32 UTC

[GitHub] [hudi] tommy810pp opened a new issue, #6804: [SUPPORT] Repairing the hudi table from No such file or directory of parquet file.

tommy810pp opened a new issue, #6804:
URL: https://github.com/apache/hudi/issues/6804

   **Describe the problem you faced**
   we are running spark job on the AWS Glue 3.0. 
   
   sometimes job has failed with this error.
   ```
   ExecutorLostFailure (executor 3 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
   ```
   
   after the failure, when upserting the records on that partition, it tries to read already cleaned up parquet file. and it throws an exception.
   ```
   java.io.FileNotFoundException: No such file or directory 's3://datalake/datasets/table/daas_date=2022-09/726c988b-4ebd-4b35-9889-15cb1363d867-0_1-23-16379_20220921161214958.parquet'
   ```
   
   is there any ways to remove the reference for already deleted parquet file from the hudi table?
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1. spark job failed with ExecutorLostFailure during upserting the records to the table.
   2.  upsert records into same partition
   
   **Expected behavior**
   after deleting the broken reference, hudi doesn't read deleted parquet file and successfully ingest data.
   
   **Environment Description**
   
   * Hudi version :
   0.11.1
   
   * Spark version :
   3.1.1
   
   * Hive version :
   Glue Data Catalog
   
   * Hadoop version :
   3.0.0
   
   * Storage (HDFS/S3/GCS..) :
   S3
   
   * Running on Docker? (yes/no) :
   AWS Glue 3.0
   
   **Additional context**
   
   
   **Stacktrace**
   
   ```
   Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 78 in stage 9.0 failed 4 times, most recent failure: Lost task 78.3 in stage 9.0 (TID 6088) (10.12.32.42 executor 16): org.apache.hudi.exception.HoodieIOException: Failed to read from Parquet file s3://datalake/datasets/table/daas_date=2022-09/726c988b-4ebd-4b35-9889-15cb1363d867-0_1-23-16379_20220921161214958.parquet
   	at org.apache.hudi.common.util.ParquetUtils.getHoodieKeyIterator(ParquetUtils.java:181)
   	at org.apache.hudi.common.util.ParquetUtils.fetchHoodieKeys(ParquetUtils.java:196)
   	at org.apache.hudi.common.util.ParquetUtils.fetchHoodieKeys(ParquetUtils.java:147)
   	at org.apache.hudi.io.HoodieKeyLocationFetchHandle.locations(HoodieKeyLocationFetchHandle.java:62)
   	at org.apache.hudi.index.simple.HoodieSimpleIndex.lambda$fetchRecordLocations$33972fb4$1(HoodieSimpleIndex.java:155)
   	at org.apache.hudi.data.HoodieJavaRDD.lambda$flatMap$a6598fcb$1(HoodieJavaRDD.java:117)
   	at org.apache.spark.api.java.JavaRDDLike.$anonfun$flatMap$1(JavaRDDLike.scala:125)
   	at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:480)
   	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:486)
   	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:454)
   	at org.apache.spark.shuffle.sort.UnsafeShuffleWriter.write(UnsafeShuffleWriter.java:179)
   	at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
   	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
   	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
   	at org.apache.spark.scheduler.Task.run(Task.scala:131)
   	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497)
   	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
   	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500)
   	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
   	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
   	at java.lang.Thread.run(Thread.java:750)
   Caused by: java.io.FileNotFoundException: No such file or directory 's3://datalake/datasets/table/daas_date=2022-09/726c988b-4ebd-4b35-9889-15cb1363d867-0_1-23-16379_20220921161214958.parquet'
   	at com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.getFileStatus(S3NativeFileSystem.java:532)
   	at com.amazon.ws.emr.hadoop.fs.EmrFileSystem.getFileStatus(EmrFileSystem.java:694)
   	at org.apache.parquet.hadoop.ParquetReader$Builder.build(ParquetReader.java:300)
   	at org.apache.hudi.common.util.ParquetUtils.getHoodieKeyIterator(ParquetUtils.java:178)
   	... 20 more
   
   Driver stacktrace:
   	at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2465)
   	at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2414)
   	at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2413)
   	at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:58)
   	at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:51)
   	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
   	at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2413)
   	at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1124)
   	at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1124)
   	at scala.Option.foreach(Option.scala:257)
   	at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1124)
   	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2679)
   	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2621)
   	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2610)
   	at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
   	at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:914)
   	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2238)
   	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2259)
   	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2278)
   	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2303)
   	at org.apache.spark.rdd.RDD.$anonfun$collect$1(RDD.scala:1030)
   	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
   	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
   	at org.apache.spark.rdd.RDD.withScope(RDD.scala:414)
   	at org.apache.spark.rdd.RDD.collect(RDD.scala:1029)
   	at org.apache.spark.rdd.PairRDDFunctions.$anonfun$countByKey$1(PairRDDFunctions.scala:366)
   	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
   	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
   	at org.apache.spark.rdd.RDD.withScope(RDD.scala:414)
   	at org.apache.spark.rdd.PairRDDFunctions.countByKey(PairRDDFunctions.scala:366)
   	at org.apache.spark.api.java.JavaPairRDD.countByKey(JavaPairRDD.scala:314)
   	at org.apache.hudi.data.HoodieJavaPairRDD.countByKey(HoodieJavaPairRDD.java:104)
   	at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.buildProfile(BaseSparkCommitActionExecutor.java:187)
   	at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.execute(BaseSparkCommitActionExecutor.java:156)
   	at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.execute(BaseSparkCommitActionExecutor.java:85)
   	at org.apache.hudi.table.action.commit.BaseWriteHelper.write(BaseWriteHelper.java:57)
   	... 58 more
   Caused by: org.apache.hudi.exception.HoodieIOException: Failed to read from Parquet file s3://datalake/datasets/table/daas_date=2022-09/726c988b-4ebd-4b35-9889-15cb1363d867-0_1-23-16379_20220921161214958.parquet
   	at org.apache.hudi.common.util.ParquetUtils.getHoodieKeyIterator(ParquetUtils.java:181)
   	at org.apache.hudi.common.util.ParquetUtils.fetchHoodieKeys(ParquetUtils.java:196)
   	at org.apache.hudi.common.util.ParquetUtils.fetchHoodieKeys(ParquetUtils.java:147)
   	at org.apache.hudi.io.HoodieKeyLocationFetchHandle.locations(HoodieKeyLocationFetchHandle.java:62)
   	at org.apache.hudi.index.simple.HoodieSimpleIndex.lambda$fetchRecordLocations$33972fb4$1(HoodieSimpleIndex.java:155)
   	at org.apache.hudi.data.HoodieJavaRDD.lambda$flatMap$a6598fcb$1(HoodieJavaRDD.java:117)
   	at org.apache.spark.api.java.JavaRDDLike.$anonfun$flatMap$1(JavaRDDLike.scala:125)
   	at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:480)
   	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:486)
   	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:454)
   	at org.apache.spark.shuffle.sort.UnsafeShuffleWriter.write(UnsafeShuffleWriter.java:179)
   	at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
   	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
   	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
   	at org.apache.spark.scheduler.Task.run(Task.scala:131)
   	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497)
   	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
   	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500)
   	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
   	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
   	at java.lang.Thread.run(Thread.java:750)
   Caused by: java.io.FileNotFoundException: No such file or directory 's3://datalake/datasets/table/daas_date=2022-09/726c988b-4ebd-4b35-9889-15cb1363d867-0_1-23-16379_20220921161214958.parquet'
   	at com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.getFileStatus(S3NativeFileSystem.java:532)
   	at com.amazon.ws.emr.hadoop.fs.EmrFileSystem.getFileStatus(EmrFileSystem.java:694)
   	at org.apache.parquet.hadoop.ParquetReader$Builder.build(ParquetReader.java:300)
   	at org.apache.hudi.common.util.ParquetUtils.getHoodieKeyIterator(ParquetUtils.java:178)
   	... 20 more
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] tommy810pp commented on issue #6804: [SUPPORT] Repairing the hudi table from No such file or directory of parquet file.

Posted by GitBox <gi...@apache.org>.
tommy810pp commented on issue #6804:
URL: https://github.com/apache/hudi/issues/6804#issuecomment-1274200857

   thanks, suggestion.
   I'm not sure about the root cause, but It could be fixed by manually deleted the dangling data by cli and delete metadata by cli and run a job again.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] chenbodeng719 commented on issue #6804: [SUPPORT] Repairing the hudi table from No such file or directory of parquet file.

Posted by "chenbodeng719 (via GitHub)" <gi...@apache.org>.
chenbodeng719 commented on issue #6804:
URL: https://github.com/apache/hudi/issues/6804#issuecomment-1515982058

   > thanks, suggestion. I'm not sure about the root cause, but It could be fixed by manually deleted the dangling data by cli and delete metadata by cli and run a job again.
   
   Can you share the detail about how you solve it?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan commented on issue #6804: [SUPPORT] Repairing the hudi table from No such file or directory of parquet file.

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #6804:
URL: https://github.com/apache/hudi/issues/6804#issuecomment-1263121682

   if not for metadata table, can't think of easier way to go about this. essentially cleaner has cleaned up some data file which is being required by the query. if you have very aggressive cleaner configs, you may try to relax them based on the max time any query can take for the table of interest. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] chenbodeng719 commented on issue #6804: [SUPPORT] Repairing the hudi table from No such file or directory of parquet file.

Posted by "chenbodeng719 (via GitHub)" <gi...@apache.org>.
chenbodeng719 commented on issue #6804:
URL: https://github.com/apache/hudi/issues/6804#issuecomment-1523182272

   @nsivabalan I faced the same issue. It happens once a week, what can I do to avoid it?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan commented on issue #6804: [SUPPORT] Repairing the hudi table from No such file or directory of parquet file.

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #6804:
URL: https://github.com/apache/hudi/issues/6804#issuecomment-1261796062

   if you have enabled metadata table, can you disable it "hoodie.metadata.enable=false". if not for metadata table, you should not be hitting this issue atleast on the write path. 
   After few commits, you can re-enable metadata table. may be metadata table is corrupt. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] tommy810pp closed issue #6804: [SUPPORT] Repairing the hudi table from No such file or directory of parquet file.

Posted by GitBox <gi...@apache.org>.
tommy810pp closed issue #6804: [SUPPORT] Repairing the hudi table from No such file or directory of parquet file.
URL: https://github.com/apache/hudi/issues/6804


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org