You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by "cgpoh (via GitHub)" <gi...@apache.org> on 2023/05/13 04:35:01 UTC

[GitHub] [iceberg] cgpoh opened a new issue, #7599: Empty data directory in MinIO fail Delete Orphan Files action

cgpoh opened a new issue, #7599:
URL: https://github.com/apache/iceberg/issues/7599

   ### Query engine
   
   Spark 3.1.1
   
   ### Question
   
   We accidentally delete the data files in 1 of the data directories in MinIO and it causes Delete Orphan Files action to throw No file found exception and unable to complete the action successfully. What can be done to fix this issue?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] cgpoh commented on issue #7599: Empty data directory in MinIO fail Delete Orphan Files action

Posted by "cgpoh (via GitHub)" <gi...@apache.org>.
cgpoh commented on issue #7599:
URL: https://github.com/apache/iceberg/issues/7599#issuecomment-1546676083

   Hi @RussellSpitzer , this is my full trace:
   
   ```
   23/05/13 14:45:52 ERROR PurgeOrphanFiles: Job fail: Job aborted due to stage failure: Task 5 in stage 3.0 failed 4 times, most recent failure: Lost task 5.3 in stage 3.0 (TID 443) (192.168.0.165 executor 1): org.apache.iceberg.exceptions.RuntimeIOException: java.io.FileNotFoundException: No such file or directory: s3a://cdo/raw/fpltrkjoin/data/timestamp_hour=2023-03-07-09/partition=0
   	at org.apache.iceberg.spark.actions.BaseDeleteOrphanFilesSparkAction.listDirRecursively(BaseDeleteOrphanFilesSparkAction.java:269)
   	at org.apache.iceberg.spark.actions.BaseDeleteOrphanFilesSparkAction.listDirRecursively(BaseDeleteOrphanFilesSparkAction.java:259)
   	at org.apache.iceberg.spark.actions.BaseDeleteOrphanFilesSparkAction.lambda$null$3(BaseDeleteOrphanFilesSparkAction.java:287)
   	at java.base/java.util.Iterator.forEachRemaining(Unknown Source)
   	at scala.collection.convert.Wrappers$IteratorWrapper.forEachRemaining(Wrappers.scala:30)
   	at org.apache.iceberg.spark.actions.BaseDeleteOrphanFilesSparkAction.lambda$listDirsRecursively$478c51c0$1(BaseDeleteOrphanFilesSparkAction.java:285)
   	at org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitions$1(JavaRDDLike.scala:153)
   	at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2(RDD.scala:863)
   	at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2$adapted(RDD.scala:863)
   	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
   	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
   	at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
   	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
   	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
   	at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
   	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
   	at org.apache.spark.scheduler.Task.run(Task.scala:131)
   	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497)
   	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
   	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500)
   	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
   	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
   	at java.base/java.lang.Thread.run(Unknown Source)
   Caused by: java.io.FileNotFoundException: No such file or directory: s3a://sales/raw/store/data/timestamp_hour=2023-03-07-09/partition=0
   	at org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:2344)
   	at org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:2226)
   	at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:2160)
   	at org.apache.hadoop.fs.s3a.S3AFileSystem.innerListStatus(S3AFileSystem.java:1961)
   	at org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$listStatus$9(S3AFileSystem.java:1940)
   	at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:109)
   	at org.apache.hadoop.fs.s3a.S3AFileSystem.listStatus(S3AFileSystem.java:1940)
   	at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1868)
   	at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1910)
   	at org.apache.iceberg.spark.actions.BaseDeleteOrphanFilesSparkAction.listDirRecursively(BaseDeleteOrphanFilesSparkAction.java:244)
           ... 31 more
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] cgpoh commented on issue #7599: Empty data directory in MinIO fail Delete Orphan Files action

Posted by "cgpoh (via GitHub)" <gi...@apache.org>.
cgpoh commented on issue #7599:
URL: https://github.com/apache/iceberg/issues/7599#issuecomment-1546592183

   Thanks @RussellSpitzer , I need to run the job again to catch the exception. Will ran it again later.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on issue #7599: Empty data directory in MinIO fail Delete Orphan Files action

Posted by "RussellSpitzer (via GitHub)" <gi...@apache.org>.
RussellSpitzer commented on issue #7599:
URL: https://github.com/apache/iceberg/issues/7599#issuecomment-1546529790

   What is the exception?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] cgpoh commented on issue #7599: Empty data directory in MinIO fail Delete Orphan Files action

Posted by "cgpoh (via GitHub)" <gi...@apache.org>.
cgpoh commented on issue #7599:
URL: https://github.com/apache/iceberg/issues/7599#issuecomment-1553760112

   @RussellSpitzer I managed to find the solution and is quite trivial. Delete the whole offending directory and the exception is gone when re-running the delete orphan job. Closing this issue for now.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] cgpoh closed issue #7599: Empty data directory in MinIO fail Delete Orphan Files action

Posted by "cgpoh (via GitHub)" <gi...@apache.org>.
cgpoh closed issue #7599: Empty data directory in MinIO fail Delete Orphan Files action
URL: https://github.com/apache/iceberg/issues/7599


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org