You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2020/08/24 10:57:10 UTC

[GitHub] [hudi] dm-tran opened a new issue #2020: [SUPPORT] Compaction fails with "java.io.FileNotFoundException"

dm-tran opened a new issue #2020:
URL: https://github.com/apache/hudi/issues/2020


   **Describe the problem you faced**
   
   We are using Hudi 0.5.3 patched with https://github.com/apache/hudi/pull/1765, so that a compaction that previously failed is retried before new compactions.
   
   When the compaction is retried, it fails with "java.io.FileNotFoundException".
   
   **To Reproduce**
   
   I'm sorry, but I currently don't have a simple way to reproduce this problem.
   
   Here is how I got this error:
   1. Initialize a Hudi table using spark and "bulk insert"
   2. Launch a spark structured streaming application that consumes messages from Kafka and saves them to Hudi, using "upsert"
   
   **Expected behavior**
   
   Compaction should not fail.
   
   **Environment Description**
   
   * Hudi version : 0.5.3 patched with https://github.com/apache/hudi/pull/1765
   
   * Spark version : 2.4.4 (EMR 6.0.0)
   
   * Hive version : 3.1.2
   
   * Hadoop version : 3.2.1
   
   * Storage (HDFS/S3/GCS..) : S3
   
   * Running on Docker? (yes/no) : no
   
   
   **Additional context**
   
   - the throughput is around 15 messages per second.
   - the Hudi table has around 20 partitions.
   - there are no external processes that delete files from s3.
   - the structured streaming job is run every 5 minutes with the following properties:
   ```
   Map(
   "hoodie.upsert.shuffle.parallelism" -> "200",
   "hoodie.compact.inline" -> "true",
   "hoodie.compact.inline.max.delta.commits" -> "1",
   "hoodie.filesystem.view.incr.timeline.sync.enable":"true",
   HIVE_SYNC_ENABLED_OPT_KEY -> "true",
   HIVE_PARTITION_EXTRACTOR_CLASS_OPT_KEY -> classOf[MultiPartKeysValueExtractor].getName,
   HIVE_STYLE_PARTITIONING_OPT_KEY -> "true",
   TABLE_TYPE_OPT_KEY -> MOR_TABLE_TYPE_OPT_VAL,
   OPERATION_OPT_KEY -> UPSERT_OPERATION_OPT_VAL,
   CLEANER_INCREMENTAL_MODE -> "true",
   CLEANER_POLICY_PROP -> HoodieCleaningPolicy.KEEP_LATEST_FILE_VERSIONS.name(),
   CLEANER_FILE_VERSIONS_RETAINED_PROP -> 12,
   )
   ```
   
   Output of `compactions show all` with Hudi CLI:
   ```
   ╔═════════════════════════╤═══════════╤═══════════════════════════════╗
   ║ Compaction Instant Time │ State     │ Total FileIds to be Compacted ║
   ╠═════════════════════════╪═══════════╪═══════════════════════════════╣
   ║ 20200821154520          │ INFLIGHT  │ 57                            ║
   ╟─────────────────────────┼───────────┼───────────────────────────────╢
   ║ 20200821153748          │ COMPLETED │ 56                            ║
   ╟─────────────────────────┼───────────┼───────────────────────────────╢
   ║ 20200821152906          │ COMPLETED │ 50                            ║
   ╟─────────────────────────┼───────────┼───────────────────────────────╢
   ║ 20200821152207          │ COMPLETED │ 52                            ║
   ╟─────────────────────────┼───────────┼───────────────────────────────╢
   ║ 20200821151547          │ COMPLETED │ 57                            ║
   ╟─────────────────────────┼───────────┼───────────────────────────────╢
   ║ 20200821151014          │ COMPLETED │ 48                            ║
   ╟─────────────────────────┼───────────┼───────────────────────────────╢
   ║ 20200821150425          │ COMPLETED │ 54                            ║
   ╟─────────────────────────┼───────────┼───────────────────────────────╢
   ║ 20200821145904          │ COMPLETED │ 49                            ║
   ╟─────────────────────────┼───────────┼───────────────────────────────╢
   ║ 20200821145253          │ COMPLETED │ 60                            ║
   ╟─────────────────────────┼───────────┼───────────────────────────────╢
   ║ 20200821144717          │ COMPLETED │ 55                            ║
   ╟─────────────────────────┼───────────┼───────────────────────────────╢
   ║ 20200821144125          │ COMPLETED │ 59                            ║
   ╟─────────────────────────┼───────────┼───────────────────────────────╢
   ║ 20200821143533          │ COMPLETED │ 56                            ║
   ╟─────────────────────────┼───────────┼───────────────────────────────╢
   ║ 20200821142949          │ COMPLETED │ 55                            ║
   ╟─────────────────────────┼───────────┼───────────────────────────────╢
   ║ 20200821142335          │ COMPLETED │ 59                            ║
   ╟─────────────────────────┼───────────┼───────────────────────────────╢
   ║ 20200821141741          │ COMPLETED │ 63                            ║
   ╚═════════════════════════╧═══════════╧═══════════════════════════════╝
   ```
   
   Output of `cleans show` with Hudi CLI:
   ```
   ╔════════════════╤═════════════════════════╤═════════════════════╤══════════════════╗
   ║ CleanTime      │ EarliestCommandRetained │ Total Files Deleted │ Total Time Taken ║
   ╠════════════════╪═════════════════════════╪═════════════════════╪══════════════════╣
   ║ 20200821152814 │                         │ 619                 │ -1               ║
   ╟────────────────┼─────────────────────────┼─────────────────────┼──────────────────╢
   ║ 20200821152115 │                         │ 24                  │ -1               ║
   ╟────────────────┼─────────────────────────┼─────────────────────┼──────────────────╢
   ║ 20200821151459 │                         │ 4                   │ -1               ║
   ╟────────────────┼─────────────────────────┼─────────────────────┼──────────────────╢
   ║ 20200821150921 │                         │ 6                   │ -1               ║
   ╟────────────────┼─────────────────────────┼─────────────────────┼──────────────────╢
   ║ 20200821150334 │                         │ 97                  │ -1               ║
   ╟────────────────┼─────────────────────────┼─────────────────────┼──────────────────╢
   ║ 20200821145815 │                         │ 192                 │ -1               ║
   ╟────────────────┼─────────────────────────┼─────────────────────┼──────────────────╢
   ║ 20200821145201 │                         │ 128                 │ -1               ║
   ╟────────────────┼─────────────────────────┼─────────────────────┼──────────────────╢
   ║ 20200821144630 │                         │ 24                  │ -1               ║
   ╟────────────────┼─────────────────────────┼─────────────────────┼──────────────────╢
   ║ 20200821144033 │                         │ 14                  │ -1               ║
   ╟────────────────┼─────────────────────────┼─────────────────────┼──────────────────╢
   ║ 20200821143441 │                         │ 28                  │ -1               ║
   ╟────────────────┼─────────────────────────┼─────────────────────┼──────────────────╢
   ║ 20200821142858 │                         │ 114                 │ -1               ║
   ╟────────────────┼─────────────────────────┼─────────────────────┼──────────────────╢
   ║ 20200821142242 │                         │ 614                 │ -1               ║
   ╟────────────────┼─────────────────────────┼─────────────────────┼──────────────────╢
   ║ 20200821141650 │                         │ 79                  │ -1               ║
   ╟────────────────┼─────────────────────────┼─────────────────────┼──────────────────╢
   ║ 20200821141111 │                         │ 12                  │ -1               ║
   ╟────────────────┼─────────────────────────┼─────────────────────┼──────────────────╢
   ║ 20200821140501 │                         │ 38                  │ -1               ║
   ╟────────────────┼─────────────────────────┼─────────────────────┼──────────────────╢
   ║ 20200821135933 │                         │ 8                   │ -1               ║
   ╟────────────────┼─────────────────────────┼─────────────────────┼──────────────────╢
   ║ 20200821135412 │                         │ 147                 │ -1               ║
   ╟────────────────┼─────────────────────────┼─────────────────────┼──────────────────╢
   ║ 20200821134904 │                         │ 99                  │ -1               ║
   ╟────────────────┼─────────────────────────┼─────────────────────┼──────────────────╢
   ║ 20200821134339 │                         │ 77                  │ -1               ║
   ╟────────────────┼─────────────────────────┼─────────────────────┼──────────────────╢
   ║ 20200821133821 │                         │ 41                  │ -1               ║
   ╟────────────────┼─────────────────────────┼─────────────────────┼──────────────────╢
   ║ 20200821133227 │                         │ 1                   │ -1               ║
   ╚════════════════╧═════════════════════════╧═════════════════════╧══════════════════╝
   ```
   
   **Stacktrace**
   
   ```
   20/08/24 03:55:31 WARN YarnSchedulerBackend$YarnSchedulerEndpoint: Attempted to request executors before the AM has registered!
   20/08/24 03:57:48 ERROR HoodieMergeOnReadTable: Rolling back instant [==>20200821154520__compaction__INFLIGHT]
   20/08/24 03:58:03 WARN HoodieCopyOnWriteTable: Rollback finished without deleting inflight instant file. Instant=[==>20200821154520__compaction__INFLIGHT]
   20/08/24 03:58:33 WARN TaskSetManager: Lost task 7.0 in stage 39.0 (TID 2576, ip-xxx-xxx-xxx-xxx.ap-northeast-1.compute.internal, executor 1): org.apache.hudi.exception.HoodieException: java.io.FileNotFoundException: No such file or directory 's3://myBucket/absolute_path_to/daas_date=2020-05/0c376059-0279-4967-8002-70c3cd9c6b8e-0_6-3401-224110_20200821153748.parquet'
       at org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdateInternal(HoodieCopyOnWriteTable.java:207)
       at org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdate(HoodieCopyOnWriteTable.java:190)
       at org.apache.hudi.table.compact.HoodieMergeOnReadTableCompactor.compact(HoodieMergeOnReadTableCompactor.java:139)
       at org.apache.hudi.table.compact.HoodieMergeOnReadTableCompactor.lambda$compact$644ebad7$1(HoodieMergeOnReadTableCompactor.java:98)
       at org.apache.spark.api.java.JavaPairRDD$.$anonfun$toScalaFunction$1(JavaPairRDD.scala:1040)
       at scala.collection.Iterator$$anon$10.next(Iterator.scala:459)
       at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:484)
       at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:490)
       at org.apache.spark.storage.memory.MemoryStore.putIterator(MemoryStore.scala:221)
       at org.apache.spark.storage.memory.MemoryStore.putIteratorAsBytes(MemoryStore.scala:349)
       at org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1182)
       at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1091)
       at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1156)
       at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:882)
       at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:335)
       at org.apache.spark.rdd.RDD.iterator(RDD.scala:286)
       at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
       at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
       at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
       at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
       at org.apache.spark.scheduler.Task.run(Task.scala:123)
       at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:411)
       at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
       at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
       at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
       at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
       at java.lang.Thread.run(Thread.java:748)
   Caused by: java.io.FileNotFoundException: No such file or directory 's3://myBucket/absolute_path_to/daas_date=2020-05/0c376059-0279-4967-8002-70c3cd9c6b8e-0_6-3401-224110_20200821153748.parquet'
       at com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.getFileStatus(S3NativeFileSystem.java:617)
       at com.amazon.ws.emr.hadoop.fs.EmrFileSystem.getFileStatus(EmrFileSystem.java:553)
       at org.apache.parquet.hadoop.ParquetReader$Builder.build(ParquetReader.java:300)
       at org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdateInternal(HoodieCopyOnWriteTable.java:202)
       ... 26 more
   
   20/08/24 03:58:49 WARN TaskSetManager: Lost task 7.3 in stage 39.0 (TID 2582, ip-xxx-xxx-xxx-xxx.ap-northeast-1.compute.internal, executor 1): org.apache.hudi.exception.HoodieException: java.io.FileNotFoundException: No such file or directory 's3://myBucket/absolute_path_to/daas_date=2020-05/0c376059-0279-4967-8002-70c3cd9c6b8e-0_6-3401-224110_20200821153748.parquet'
       at org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdateInternal(HoodieCopyOnWriteTable.java:207)
       at org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdate(HoodieCopyOnWriteTable.java:190)
       at org.apache.hudi.table.compact.HoodieMergeOnReadTableCompactor.compact(HoodieMergeOnReadTableCompactor.java:139)
       at org.apache.hudi.table.compact.HoodieMergeOnReadTableCompactor.lambda$compact$644ebad7$1(HoodieMergeOnReadTableCompactor.java:98)
       at org.apache.spark.api.java.JavaPairRDD$.$anonfun$toScalaFunction$1(JavaPairRDD.scala:1040)
       at scala.collection.Iterator$$anon$10.next(Iterator.scala:459)
       at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:484)
       at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:490)
       at org.apache.spark.storage.memory.MemoryStore.putIterator(MemoryStore.scala:221)
       at org.apache.spark.storage.memory.MemoryStore.putIteratorAsBytes(MemoryStore.scala:349)
       at org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1182)
       at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1091)
       at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1156)
       at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:882)
       at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:335)
       at org.apache.spark.rdd.RDD.iterator(RDD.scala:286)
       at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
       at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
       at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
       at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
       at org.apache.spark.scheduler.Task.run(Task.scala:123)
       at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:411)
       at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
       at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
       at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
       at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
       at java.lang.Thread.run(Thread.java:748)
   Caused by: java.io.FileNotFoundException: No such file or directory 's3://myBucket/absolute_path_to/daas_date=2020-05/0c376059-0279-4967-8002-70c3cd9c6b8e-0_6-3401-224110_20200821153748.parquet'
       at com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.getFileStatus(S3NativeFileSystem.java:617)
       at com.amazon.ws.emr.hadoop.fs.EmrFileSystem.getFileStatus(EmrFileSystem.java:553)
       at org.apache.parquet.hadoop.ParquetReader$Builder.build(ParquetReader.java:300)
       at org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdateInternal(HoodieCopyOnWriteTable.java:202)
       ... 26 more
   
   20/08/24 03:58:49 ERROR TaskSetManager: Task 7 in stage 39.0 failed 4 times; aborting job
   20/08/24 03:58:49 ERROR MicroBatchExecution: Query [id = 418bbb3a-3def-4a20-987b-2ac7a0ca7004, runId = ff16cb78-6247-413f-bd94-afd1c3ef48ed] terminated with error
   org.apache.spark.SparkException: Job aborted due to stage failure: Task 7 in stage 39.0 failed 4 times, most recent failure: Lost task 7.3 in stage 39.0 (TID 2582, ip-xxx-xxx-xxx-xxx.ap-northeast-1.compute.internal, executor 1): org.apache.hudi.exception.HoodieException: java.io.FileNotFoundException: No such file or directory 's3://myBucket/absolute_path_to/daas_date=2020-05/0c376059-0279-4967-8002-70c3cd9c6b8e-0_6-3401-224110_20200821153748.parquet'
       at org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdateInternal(HoodieCopyOnWriteTable.java:207)
       at org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdate(HoodieCopyOnWriteTable.java:190)
       at org.apache.hudi.table.compact.HoodieMergeOnReadTableCompactor.compact(HoodieMergeOnReadTableCompactor.java:139)
       at org.apache.hudi.table.compact.HoodieMergeOnReadTableCompactor.lambda$compact$644ebad7$1(HoodieMergeOnReadTableCompactor.java:98)
       at org.apache.spark.api.java.JavaPairRDD$.$anonfun$toScalaFunction$1(JavaPairRDD.scala:1040)
       at scala.collection.Iterator$$anon$10.next(Iterator.scala:459)
       at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:484)
       at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:490)
       at org.apache.spark.storage.memory.MemoryStore.putIterator(MemoryStore.scala:221)
       at org.apache.spark.storage.memory.MemoryStore.putIteratorAsBytes(MemoryStore.scala:349)
       at org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1182)
       at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1091)
       at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1156)
       at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:882)
       at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:335)
       at org.apache.spark.rdd.RDD.iterator(RDD.scala:286)
       at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
       at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
       at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
       at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
       at org.apache.spark.scheduler.Task.run(Task.scala:123)
       at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:411)
       at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
       at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
       at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
       at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
       at java.lang.Thread.run(Thread.java:748)
   Caused by: java.io.FileNotFoundException: No such file or directory 's3://myBucket/absolute_path_to/daas_date=2020-05/0c376059-0279-4967-8002-70c3cd9c6b8e-0_6-3401-224110_20200821153748.parquet'
       at com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.getFileStatus(S3NativeFileSystem.java:617)
       at com.amazon.ws.emr.hadoop.fs.EmrFileSystem.getFileStatus(EmrFileSystem.java:553)
       at org.apache.parquet.hadoop.ParquetReader$Builder.build(ParquetReader.java:300)
       at org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdateInternal(HoodieCopyOnWriteTable.java:202)
       ... 26 more
   
   Driver stacktrace:
       at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2041)
       at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2029)
       at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2028)
       at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
       at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
       at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
       at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2028)
       at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:966)
       at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:966)
       at scala.Option.foreach(Option.scala:407)
       at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:966)
       at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2262)
       at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2211)
       at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2200)
       at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
       at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:777)
       at org.apache.spark.SparkContext.runJob(SparkContext.scala:2061)
       at org.apache.spark.SparkContext.runJob(SparkContext.scala:2082)
       at org.apache.spark.SparkContext.runJob(SparkContext.scala:2101)
       at org.apache.spark.SparkContext.runJob(SparkContext.scala:2126)
       at org.apache.spark.rdd.RDD.$anonfun$collect$1(RDD.scala:945)
       at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
       at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
       at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
       at org.apache.spark.rdd.RDD.collect(RDD.scala:944)
       at org.apache.spark.api.java.JavaRDDLike.collect(JavaRDDLike.scala:361)
       at org.apache.spark.api.java.JavaRDDLike.collect$(JavaRDDLike.scala:360)
       at org.apache.spark.api.java.AbstractJavaRDDLike.collect(JavaRDDLike.scala:45)
       at org.apache.hudi.client.HoodieWriteClient.doCompactionCommit(HoodieWriteClient.java:1134)
       at org.apache.hudi.client.HoodieWriteClient.commitCompaction(HoodieWriteClient.java:1102)
       at org.apache.hudi.client.HoodieWriteClient.runCompaction(HoodieWriteClient.java:1085)
       at org.apache.hudi.client.HoodieWriteClient.compact(HoodieWriteClient.java:1056)
       at org.apache.hudi.client.HoodieWriteClient.lambda$runEarlierInflightCompactions$3(HoodieWriteClient.java:524)
       at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
       at java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:647)
       at org.apache.hudi.client.HoodieWriteClient.runEarlierInflightCompactions(HoodieWriteClient.java:521)
       at org.apache.hudi.client.HoodieWriteClient.postCommit(HoodieWriteClient.java:501)
       at org.apache.hudi.client.AbstractHoodieWriteClient.commit(AbstractHoodieWriteClient.java:157)
       at org.apache.hudi.client.AbstractHoodieWriteClient.commit(AbstractHoodieWriteClient.java:101)
       at org.apache.hudi.client.AbstractHoodieWriteClient.commit(AbstractHoodieWriteClient.java:92)
       at org.apache.hudi.HoodieSparkSqlWriter$.checkWriteStatus(HoodieSparkSqlWriter.scala:268)
       at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:188)
       at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:108)
       at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:46)
       at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
       at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
       at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
       at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:131)
       at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:156)
       at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
       at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
       at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
       at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:83)
       at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:83)
       at org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:676)
       at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:84)
       at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:165)
       at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:74)
       at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676)
       at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:290)
       at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271)
       at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229)
       at jp.ne.paypay.daas.dataprocessor.writer.EventsWriter$.saveToHudiTable(EventsWriter.scala:145)
       at jp.ne.paypay.daas.dataprocessor.MainProcessor$.processBatch(MainProcessor.scala:162)
       at jp.ne.paypay.daas.dataprocessor.MainProcessor$.$anonfun$main$4(MainProcessor.scala:90)
       at jp.ne.paypay.daas.dataprocessor.MainProcessor$.$anonfun$main$4$adapted(MainProcessor.scala:82)
       at org.apache.spark.sql.execution.streaming.sources.ForeachBatchSink.addBatch(ForeachBatchSink.scala:35)
       at org.apache.spark.sql.execution.streaming.MicroBatchExecution.$anonfun$runBatch$15(MicroBatchExecution.scala:537)
       at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:84)
       at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:165)
       at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:74)
       at org.apache.spark.sql.execution.streaming.MicroBatchExecution.$anonfun$runBatch$14(MicroBatchExecution.scala:536)
       at org.apache.spark.sql.execution.streaming.ProgressReporter.reportTimeTaken(ProgressReporter.scala:351)
       at org.apache.spark.sql.execution.streaming.ProgressReporter.reportTimeTaken$(ProgressReporter.scala:349)
       at org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:58)
       at org.apache.spark.sql.execution.streaming.MicroBatchExecution.runBatch(MicroBatchExecution.scala:535)
       at org.apache.spark.sql.execution.streaming.MicroBatchExecution.$anonfun$runActivatedStream$2(MicroBatchExecution.scala:198)
       at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
       at org.apache.spark.sql.execution.streaming.ProgressReporter.reportTimeTaken(ProgressReporter.scala:351)
       at org.apache.spark.sql.execution.streaming.ProgressReporter.reportTimeTaken$(ProgressReporter.scala:349)
       at org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:58)
       at org.apache.spark.sql.execution.streaming.MicroBatchExecution.$anonfun$runActivatedStream$1(MicroBatchExecution.scala:166)
       at org.apache.spark.sql.execution.streaming.ProcessingTimeExecutor.execute(TriggerExecutor.scala:56)
       at org.apache.spark.sql.execution.streaming.MicroBatchExecution.runActivatedStream(MicroBatchExecution.scala:160)
       at org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:281)
       at org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:193)
   Caused by: org.apache.hudi.exception.HoodieException: java.io.FileNotFoundException: No such file or directory 's3://myBucket/absolute_path_to/daas_date=2020-05/0c376059-0279-4967-8002-70c3cd9c6b8e-0_6-3401-224110_20200821153748.parquet'
       at org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdateInternal(HoodieCopyOnWriteTable.java:207)
       at org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdate(HoodieCopyOnWriteTable.java:190)
       at org.apache.hudi.table.compact.HoodieMergeOnReadTableCompactor.compact(HoodieMergeOnReadTableCompactor.java:139)
       at org.apache.hudi.table.compact.HoodieMergeOnReadTableCompactor.lambda$compact$644ebad7$1(HoodieMergeOnReadTableCompactor.java:98)
       at org.apache.spark.api.java.JavaPairRDD$.$anonfun$toScalaFunction$1(JavaPairRDD.scala:1040)
       at scala.collection.Iterator$$anon$10.next(Iterator.scala:459)
       at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:484)
       at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:490)
       at org.apache.spark.storage.memory.MemoryStore.putIterator(MemoryStore.scala:221)
       at org.apache.spark.storage.memory.MemoryStore.putIteratorAsBytes(MemoryStore.scala:349)
       at org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1182)
       at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1091)
       at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1156)
       at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:882)
       at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:335)
       at org.apache.spark.rdd.RDD.iterator(RDD.scala:286)
       at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
       at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
       at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
       at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
       at org.apache.spark.scheduler.Task.run(Task.scala:123)
       at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:411)
       at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
       at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
       at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
       at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
       at java.lang.Thread.run(Thread.java:748)
   Caused by: java.io.FileNotFoundException: No such file or directory 's3://myBucket/absolute_path_to/daas_date=2020-05/0c376059-0279-4967-8002-70c3cd9c6b8e-0_6-3401-224110_20200821153748.parquet'
       at com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.getFileStatus(S3NativeFileSystem.java:617)
       at com.amazon.ws.emr.hadoop.fs.EmrFileSystem.getFileStatus(EmrFileSystem.java:553)
       at org.apache.parquet.hadoop.ParquetReader$Builder.build(ParquetReader.java:300)
       at org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdateInternal(HoodieCopyOnWriteTable.java:202)
       ... 26 more
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] dm-tran edited a comment on issue #2020: [SUPPORT] Compaction fails with "java.io.FileNotFoundException"

Posted by GitBox <gi...@apache.org>.

dm-tran edited a comment on issue #2020:
URL: https://github.com/apache/hudi/issues/2020#issuecomment-679875396


   > can you re-bootstrap and then start ingesting the data but this time enable consistency guard right from the begining.
   
   @bvaradar Actually, this is what I did. I deleted the hudi table in s3, added the consistency check property and started ingesting the data from the beginning.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] dm-tran commented on issue #2020: [SUPPORT] Compaction fails with "java.io.FileNotFoundException"

Posted by GitBox <gi...@apache.org>.

dm-tran commented on issue #2020:
URL: https://github.com/apache/hudi/issues/2020#issuecomment-679911709


   Thank you @bvaradar 
   
   > Can you check if more than 1 writers are concurrently happening.
   
   Only the structured streaming application writes to the Hudi table, so there is only one writer.
   
   Tasks that failed are automatically retried by Spark. Could the retries lead to this kind of error?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] dm-tran commented on issue #2020: [SUPPORT] Compaction fails with "java.io.FileNotFoundException"

Posted by GitBox <gi...@apache.org>.

dm-tran commented on issue #2020:
URL: https://github.com/apache/hudi/issues/2020#issuecomment-680789158

@bvaradar

The workflow of my applications is the following one:
1. Initialize a Hudi table using a spark batch and "bulk insert".
2. Launch a spark structured streaming application that consumes messages from Kafka and saves them to Hudi, using "upsert".

Yesterday, I used property "hoodie.consistency.check.enabled=true" for step 2, but I forgot to use it for step 1. Sorry about that. I used this property for both steps today.

> Given, that you are able to reproduce very easily and I have not seen this issue reported by anyone, Would you be able to provide us a self-contained code to reproduce this.

Actually, I have been successfully running a dozen of structured streaming applications for several weeks.

I got this "java.io.FileNotFoundException" for the first time a few days ago, when launching a structured streaming application for a new data source. Providing a self-contained code to reproduce this error isn't easy. It might be related to the input data or workload.

> If not, can you turn on INFO level logging and catch the logs till you hit the exception and attach them.

Sure, I have been running the structured streaming application from the start for several hours, with INFO level logging. So far, it works fine. I will attach the logs if the exception is raised.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] dm-tran edited a comment on issue #2020: [SUPPORT] Compaction fails with "java.io.FileNotFoundException"

Posted by GitBox <gi...@apache.org>.

dm-tran edited a comment on issue #2020:
URL: https://github.com/apache/hudi/issues/2020#issuecomment-682314989


   The file that isn't found is `'s3://my-bucket/my-table/daas_date=2020/9dee1248-c972-4ed3-80f5-15545ac4c534-0_2-4957-299294_20200827155539.parquet'`.
   
   The available files in s3 that start with "9dee1248-c972-4ed3-80f5-15545ac4c534-0_2" are: 
   ```
   2020-08-27 10:26 33525767 s3://my-bucket/my-table/daas_date=2020/9dee1248-c972-4ed3-80f5-15545ac4c534-0_2-3850-231917_20200827102526.parquet
   2020-08-27 10:33 33526574 s3://my-bucket/my-table/daas_date=2020/9dee1248-c972-4ed3-80f5-15545ac4c534-0_2-3891-234401_20200827103318.parquet
   2020-08-27 16:17 33545224 s3://my-bucket/my-table/daas_date=2020/9dee1248-c972-4ed3-80f5-15545ac4c534-0_2-39-2458_20200827155539.parquet
   2020-08-27 11:13 33530132 s3://my-bucket/my-table/daas_date=2020/9dee1248-c972-4ed3-80f5-15545ac4c534-0_2-4096-246791_20200827111254.parquet
   2020-08-27 11:22 33530880 s3://my-bucket/my-table/daas_date=2020/9dee1248-c972-4ed3-80f5-15545ac4c534-0_2-4137-249295_20200827112139.parquet
   2020-08-27 12:00 33533333 s3://my-bucket/my-table/daas_date=2020/9dee1248-c972-4ed3-80f5-15545ac4c534-0_2-4301-259277_20200827115949.parquet
   2020-08-27 12:20 33534377 s3://my-bucket/my-table/daas_date=2020/9dee1248-c972-4ed3-80f5-15545ac4c534-0_2-4383-264271_20200827121947.parquet
   2020-08-27 12:42 33535631 s3://my-bucket/my-table/daas_date=2020/9dee1248-c972-4ed3-80f5-15545ac4c534-0_2-4465-269277_20200827124204.parquet
   2020-08-27 12:54 33536084 s3://my-bucket/my-table/daas_date=2020/9dee1248-c972-4ed3-80f5-15545ac4c534-0_2-4506-271786_20200827125338.parquet
   2020-08-27 13:07 33536635 s3://my-bucket/my-table/daas_date=2020/9dee1248-c972-4ed3-80f5-15545ac4c534-0_2-4547-274289_20200827130640.parquet
   2020-08-27 13:20 33537444 s3://my-bucket/my-table/daas_date=2020/9dee1248-c972-4ed3-80f5-15545ac4c534-0_2-4588-276783_20200827131919.parquet
   2020-08-27 13:32 33538151 s3://my-bucket/my-table/daas_date=2020/9dee1248-c972-4ed3-80f5-15545ac4c534-0_2-4629-279284_20200827133143.parquet
   2020-08-27 13:46 33539531 s3://my-bucket/my-table/daas_date=2020/9dee1248-c972-4ed3-80f5-15545ac4c534-0_2-4670-281782_20200827134536.parquet
   2020-08-27 14:14 33541130 s3://my-bucket/my-table/daas_date=2020/9dee1248-c972-4ed3-80f5-15545ac4c534-0_2-4752-286756_20200827141258.parquet
   2020-08-27 14:30 33541913 s3://my-bucket/my-table/daas_date=2020/9dee1248-c972-4ed3-80f5-15545ac4c534-0_2-4793-289269_20200827142922.parquet
   2020-08-27 14:49 33542820 s3://my-bucket/my-table/daas_date=2020/9dee1248-c972-4ed3-80f5-15545ac4c534-0_2-4834-291776_20200827144807.parquet
   2020-08-27 15:08 33543459 s3://my-bucket/my-table/daas_date=2020/9dee1248-c972-4ed3-80f5-15545ac4c534-0_2-4875-294286_20200827150653.parquet
   2020-08-27 15:30 33544369 s3://my-bucket/my-table/daas_date=2020/9dee1248-c972-4ed3-80f5-15545ac4c534-0_2-4916-296786_20200827152840.parquet
   ```
   
   Contents of s3://my-bucket/my-table/.hoodie/20200827155539.commit
   
   ```
    "9dee1248-c972-4ed3-80f5-15545ac4c534-0" : "daas_date=2020/9dee1248-c972-4ed3-80f5-15545ac4c534-0_2-39-2458_20200827155539.parquet",
   ```
   
   Contents of s3://my-bucket/my-table/.hoodie/20200827155539.compaction.requested
   
   ```
   [20200827152840, [.9dee1248-c972-4ed3-80f5-15545ac4c534-0_20200827152840.log.1_32-4949-299212], 9dee1248-c972-4ed3-80f5-15545ac4c534-0_2-4916-296786_20200827152840.parquet, 9dee1248-c972-4ed3-80f5-15545ac4c534-0, daas_date=2020, [TOTAL_LOG_FILES -> 1.0, TOTAL_IO_READ_MB -> 32.0, TOTAL_LOG_FILES_SIZE -> 121966.0, TOTAL_IO_WRITE_MB -> 31.0, TOTAL_IO_MB -> 63.0, TOTAL_LOG_FILE_SIZE -> 121966.0]],
   ```
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] dm-tran edited a comment on issue #2020: [SUPPORT] Compaction fails with "java.io.FileNotFoundException"

Posted by GitBox <gi...@apache.org>.

dm-tran edited a comment on issue #2020:
URL: https://github.com/apache/hudi/issues/2020#issuecomment-679866696


   @bvaradar I ran the structured streaming job with `hoodie.consistency.check.enabled = true`, starting from the earliest offsets in Kafka, and got the same error: a `java.io.FileNotFoundException` when the compaction is retried.
   
   **Summary**
   
   The structured streaming job ran during 3 hours:
   - at some point, some executors were lost because of an OutOfMemory error.
   - then the spark driver failed because the consistency check failed.
   
   The spark application was then retried by YARN, and the 2nd attempt failed with `Caused by: java.io.FileNotFoundException: No such file or directory` when the compaction was retried.
   
   **Stacktraces**
   
   Stracktrace of the first attempt:
   ```
   20/08/25 06:51:39 WARN HiveConf: HiveConf of name hive.server2.thrift.url does not exist
   20/08/25 06:51:40 WARN ProcessingTimeExecutor: Current batch is falling behind. The trigger interval is 300000 milliseconds, but spent 800229 milliseconds
   20/08/25 06:56:24 WARN BlockManagerMasterEndpoint: No more replicas available for rdd_1775_40 !
   20/08/25 06:56:24 WARN BlockManagerMasterEndpoint: No more replicas available for rdd_1785_53 !
   [...]
   20/08/25 06:56:24 WARN BlockManagerMasterEndpoint: No more replicas available for rdd_1785_35 !
   20/08/25 06:56:24 WARN BlockManagerMasterEndpoint: No more replicas available for rdd_1785_50 !
   20/08/25 06:56:24 WARN YarnAllocator: Container from a bad node: container_1594796531644_1833_01_000002 on host: ip-xxx-xxx-xxx-xxx.ap-northeast-1.compute.internal. Exit status: 143. Diagnostics: [2020-08-25 06:56:24.636]Container killed on request. Exit code is 143
   [2020-08-25 06:56:24.636]Container exited with a non-zero exit code 143. 
   [2020-08-25 06:56:24.637]Killed by external signal
   .
   20/08/25 06:56:24 WARN YarnSchedulerBackend$YarnSchedulerEndpoint: Requesting driver to remove executor 1 for reason Container from a bad node: container_1594796531644_1833_01_000002 on host: ip-xxx-xxx-xxx-xxx.ap-northeast-1.compute.internal. Exit status: 143. Diagnostics: [2020-08-25 06:56:24.636]Container killed on request. Exit code is 143
   [2020-08-25 06:56:24.636]Container exited with a non-zero exit code 143. 
   [2020-08-25 06:56:24.637]Killed by external signal
   .
   20/08/25 06:56:24 ERROR YarnClusterScheduler: Lost executor 1 on ip-xxx-xxx-xxx-xxx.ap-northeast-1.compute.internal: Container from a bad node: container_1594796531644_1833_01_000002 on host: ip-xxx-xxx-xxx-xxx.ap-northeast-1.compute.internal. Exit status: 143. Diagnostics: [2020-08-25 06:56:24.636]Container killed on request. Exit code is 143
   [2020-08-25 06:56:24.636]Container exited with a non-zero exit code 143. 
   [2020-08-25 06:56:24.637]Killed by external signal
   .
   20/08/25 06:56:24 WARN TaskSetManager: Lost task 1.0 in stage 816.0 (TID 50626, ip-xxx-xxx-xxx-xxx.ap-northeast-1.compute.internal, executor 1): ExecutorLostFailure (executor 1 exited caused by one of the running tasks) Reason: Container from a bad node: container_1594796531644_1833_01_000002 on host: ip-xxx-xxx-xxx-xxx.ap-northeast-1.compute.internal. Exit status: 143. Diagnostics: [2020-08-25 06:56:24.636]Container killed on request. Exit code is 143
   [2020-08-25 06:56:24.636]Container exited with a non-zero exit code 143. 
   [2020-08-25 06:56:24.637]Killed by external signal
   .
   20/08/25 06:56:24 WARN TaskSetManager: Lost task 0.0 in stage 816.0 (TID 50625, ip-xxx-xxx-xxx-xxx.ap-northeast-1.compute.internal, executor 1): ExecutorLostFailure (executor 1 exited caused by one of the running tasks) Reason: Container from a bad node: container_1594796531644_1833_01_000002 on host: ip-xxx-xxx-xxx-xxx.ap-northeast-1.compute.internal. Exit status: 143. Diagnostics: [2020-08-25 06:56:24.636]Container killed on request. Exit code is 143
   [2020-08-25 06:56:24.636]Container exited with a non-zero exit code 143. 
   [2020-08-25 06:56:24.637]Killed by external signal
   .
   20/08/25 06:56:24 WARN ExecutorAllocationManager: Attempted to mark unknown executor 1 idle
   20/08/25 07:07:51 ERROR MicroBatchExecution: Query [id = 6ea738ee-0886-4014-a2b2-f51efd693c45, runId = 97c16ef4-d610-4d44-a0e9-a9d24ed5e0cf] terminated with error
   org.apache.hudi.exception.HoodieCommitException: Failed to complete commit 20200825065331 due to finalize errors.
       at org.apache.hudi.client.AbstractHoodieWriteClient.finalizeWrite(AbstractHoodieWriteClient.java:204)
       at org.apache.hudi.client.HoodieWriteClient.doCompactionCommit(HoodieWriteClient.java:1142)
       at org.apache.hudi.client.HoodieWriteClient.commitCompaction(HoodieWriteClient.java:1102)
       at org.apache.hudi.client.HoodieWriteClient.runCompaction(HoodieWriteClient.java:1085)
       at org.apache.hudi.client.HoodieWriteClient.compact(HoodieWriteClient.java:1056)
       at org.apache.hudi.client.HoodieWriteClient.lambda$forceCompact$13(HoodieWriteClient.java:1171)
       at org.apache.hudi.common.util.Option.ifPresent(Option.java:96)
       at org.apache.hudi.client.HoodieWriteClient.forceCompact(HoodieWriteClient.java:1168)
       at org.apache.hudi.client.HoodieWriteClient.postCommit(HoodieWriteClient.java:503)
       at org.apache.hudi.client.AbstractHoodieWriteClient.commit(AbstractHoodieWriteClient.java:157)
       at org.apache.hudi.client.AbstractHoodieWriteClient.commit(AbstractHoodieWriteClient.java:101)
       at org.apache.hudi.client.AbstractHoodieWriteClient.commit(AbstractHoodieWriteClient.java:92)
       at org.apache.hudi.HoodieSparkSqlWriter$.checkWriteStatus(HoodieSparkSqlWriter.scala:268)
       at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:188)
       at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:108)
       at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:46)
       at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
       at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
       at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
       at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:131)
       at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:156)
       at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
       at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
       at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
       at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:83)
       at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:83)
       at org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:676)
       at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:84)
       at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:165)
       at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:74)
       at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676)
       at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:290)
       at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271)
       at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229)
       at aaa.dataprocessor.writer.EventsWriter$.saveToHudiTable(EventsWriter.scala:145)
       at aaa.dataprocessor.MainProcessor$.processBatch(MainProcessor.scala:162)
       at aaa.dataprocessor.MainProcessor$.$anonfun$main$4(MainProcessor.scala:90)
       at aaa.dataprocessor.MainProcessor$.$anonfun$main$4$adapted(MainProcessor.scala:82)
       at org.apache.spark.sql.execution.streaming.sources.ForeachBatchSink.addBatch(ForeachBatchSink.scala:35)
       at org.apache.spark.sql.execution.streaming.MicroBatchExecution.$anonfun$runBatch$15(MicroBatchExecution.scala:537)
       at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:84)
       at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:165)
       at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:74)
       at org.apache.spark.sql.execution.streaming.MicroBatchExecution.$anonfun$runBatch$14(MicroBatchExecution.scala:536)
       at org.apache.spark.sql.execution.streaming.ProgressReporter.reportTimeTaken(ProgressReporter.scala:351)
       at org.apache.spark.sql.execution.streaming.ProgressReporter.reportTimeTaken$(ProgressReporter.scala:349)
       at org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:58)
       at org.apache.spark.sql.execution.streaming.MicroBatchExecution.runBatch(MicroBatchExecution.scala:535)
       at org.apache.spark.sql.execution.streaming.MicroBatchExecution.$anonfun$runActivatedStream$2(MicroBatchExecution.scala:198)
       at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
       at org.apache.spark.sql.execution.streaming.ProgressReporter.reportTimeTaken(ProgressReporter.scala:351)
       at org.apache.spark.sql.execution.streaming.ProgressReporter.reportTimeTaken$(ProgressReporter.scala:349)
       at org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:58)
       at org.apache.spark.sql.execution.streaming.MicroBatchExecution.$anonfun$runActivatedStream$1(MicroBatchExecution.scala:166)
       at org.apache.spark.sql.execution.streaming.ProcessingTimeExecutor.execute(TriggerExecutor.scala:56)
       at org.apache.spark.sql.execution.streaming.MicroBatchExecution.runActivatedStream(MicroBatchExecution.scala:160)
       at org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:281)
       at org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:193)
   Caused by: org.apache.hudi.exception.HoodieIOException: Consistency check failed to ensure all files APPEAR
       at org.apache.hudi.table.HoodieTable.waitForAllFiles(HoodieTable.java:431)
       at org.apache.hudi.table.HoodieTable.cleanFailedWrites(HoodieTable.java:379)
       at org.apache.hudi.table.HoodieTable.finalizeWrite(HoodieTable.java:315)
       at org.apache.hudi.table.HoodieMergeOnReadTable.finalizeWrite(HoodieMergeOnReadTable.java:319)
       at org.apache.hudi.client.AbstractHoodieWriteClient.finalizeWrite(AbstractHoodieWriteClient.java:195)
       ... 57 more
   ```
   
   Stracktrace of the second attempt:
   ```
   SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
   20/08/25 07:07:56 WARN YarnSchedulerBackend$YarnSchedulerEndpoint: Attempted to request executors before the AM has registered!
   20/08/25 07:10:02 ERROR HoodieMergeOnReadTable: Rolling back instant [==>20200825065331__compaction__INFLIGHT]
   20/08/25 07:10:07 WARN HoodieCopyOnWriteTable: Rollback finished without deleting inflight instant file. Instant=[==>20200825065331__compaction__INFLIGHT]
   20/08/25 07:17:12 WARN TaskSetManager: Lost task 2.0 in stage 41.0 (TID 2539, ip-xxx-xxx-xxx-xxx.ap-northeast-1.compute.internal, executor 1): org.apache.hudi.exception.HoodieException: java.io.FileNotFoundException: No such file or directory 's3://myBucket/absolute_path_to/daas_date=2020/ff707f6d-0e41-405e-9623-f7302600765b-0_2-816-50629_20200825065331.parquet'
       at org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdateInternal(HoodieCopyOnWriteTable.java:207)
       at org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdate(HoodieCopyOnWriteTable.java:190)
       at org.apache.hudi.table.compact.HoodieMergeOnReadTableCompactor.compact(HoodieMergeOnReadTableCompactor.java:139)
       at org.apache.hudi.table.compact.HoodieMergeOnReadTableCompactor.lambda$compact$644ebad7$1(HoodieMergeOnReadTableCompactor.java:98)
       at org.apache.spark.api.java.JavaPairRDD$.$anonfun$toScalaFunction$1(JavaPairRDD.scala:1040)
       at scala.collection.Iterator$$anon$10.next(Iterator.scala:459)
       at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:484)
       at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:490)
       at org.apache.spark.storage.memory.MemoryStore.putIterator(MemoryStore.scala:221)
       at org.apache.spark.storage.memory.MemoryStore.putIteratorAsBytes(MemoryStore.scala:349)
       at org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1182)
       at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1091)
       at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1156)
       at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:882)
       at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:335)
       at org.apache.spark.rdd.RDD.iterator(RDD.scala:286)
       at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
       at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
       at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
       at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
       at org.apache.spark.scheduler.Task.run(Task.scala:123)
       at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:411)
       at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
       at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
       at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
       at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
       at java.lang.Thread.run(Thread.java:748)
   Caused by: java.io.FileNotFoundException: No such file or directory 's3://myBucket/absolute_path_to/daas_date=2020/ff707f6d-0e41-405e-9623-f7302600765b-0_2-816-50629_20200825065331.parquet'
       at com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.getFileStatus(S3NativeFileSystem.java:617)
       at com.amazon.ws.emr.hadoop.fs.EmrFileSystem.getFileStatus(EmrFileSystem.java:553)
       at org.apache.parquet.hadoop.ParquetReader$Builder.build(ParquetReader.java:300)
       at org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdateInternal(HoodieCopyOnWriteTable.java:202)
       ... 26 more
   
   20/08/25 07:17:13 WARN TaskSetManager: Lost task 3.0 in stage 41.0 (TID 2540, ip-xxx-xxx-xxx-xxx.ap-northeast-1.compute.internal, executor 1): org.apache.hudi.exception.HoodieException: java.io.FileNotFoundException: No such file or directory 's3://myBucket/absolute_path_to/daas_date=2020/56be5da5-f5f3-4675-8dec-433f3656f839-0_3-816-50630_20200825065331.parquet'
       at org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdateInternal(HoodieCopyOnWriteTable.java:207)
       at org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdate(HoodieCopyOnWriteTable.java:190)
       at org.apache.hudi.table.compact.HoodieMergeOnReadTableCompactor.compact(HoodieMergeOnReadTableCompactor.java:139)
       at org.apache.hudi.table.compact.HoodieMergeOnReadTableCompactor.lambda$compact$644ebad7$1(HoodieMergeOnReadTableCompactor.java:98)
       at org.apache.spark.api.java.JavaPairRDD$.$anonfun$toScalaFunction$1(JavaPairRDD.scala:1040)
       at scala.collection.Iterator$$anon$10.next(Iterator.scala:459)
       at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:484)
       at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:490)
       at org.apache.spark.storage.memory.MemoryStore.putIterator(MemoryStore.scala:221)
       at org.apache.spark.storage.memory.MemoryStore.putIteratorAsBytes(MemoryStore.scala:349)
       at org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1182)
       at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1091)
       at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1156)
       at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:882)
       at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:335)
       at org.apache.spark.rdd.RDD.iterator(RDD.scala:286)
       at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
       at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
       at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
       at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
       at org.apache.spark.scheduler.Task.run(Task.scala:123)
       at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:411)
       at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
       at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
       at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
       at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
       at java.lang.Thread.run(Thread.java:748)
   Caused by: java.io.FileNotFoundException: No such file or directory 's3://myBucket/absolute_path_to/daas_date=2020/56be5da5-f5f3-4675-8dec-433f3656f839-0_3-816-50630_20200825065331.parquet'
       at com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.getFileStatus(S3NativeFileSystem.java:617)
       at com.amazon.ws.emr.hadoop.fs.EmrFileSystem.getFileStatus(EmrFileSystem.java:553)
       at org.apache.parquet.hadoop.ParquetReader$Builder.build(ParquetReader.java:300)
       at org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdateInternal(HoodieCopyOnWriteTable.java:202)
       ... 26 more
   
   20/08/25 07:17:18 ERROR TaskSetManager: Task 2 in stage 41.0 failed 4 times; aborting job
   20/08/25 07:17:18 ERROR MicroBatchExecution: Query [id = 6ea738ee-0886-4014-a2b2-f51efd693c45, runId = 9afd92cc-2ced-47e9-a34b-9574dd82c229] terminated with error
   org.apache.spark.SparkException: Job aborted due to stage failure: Task 2 in stage 41.0 failed 4 times, most recent failure: Lost task 2.3 in stage 41.0 (TID 2546, ip-xxx-xxx-xxx-xxx.ap-northeast-1.compute.internal, executor 1): org.apache.hudi.exception.HoodieException: java.io.FileNotFoundException: No such file or directory 's3://myBucket/absolute_path_to/daas_date=2020/ff707f6d-0e41-405e-9623-f7302600765b-0_2-816-50629_20200825065331.parquet'
       at org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdateInternal(HoodieCopyOnWriteTable.java:207)
       at org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdate(HoodieCopyOnWriteTable.java:190)
       at org.apache.hudi.table.compact.HoodieMergeOnReadTableCompactor.compact(HoodieMergeOnReadTableCompactor.java:139)
       at org.apache.hudi.table.compact.HoodieMergeOnReadTableCompactor.lambda$compact$644ebad7$1(HoodieMergeOnReadTableCompactor.java:98)
       at org.apache.spark.api.java.JavaPairRDD$.$anonfun$toScalaFunction$1(JavaPairRDD.scala:1040)
       at scala.collection.Iterator$$anon$10.next(Iterator.scala:459)
       at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:484)
       at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:490)
       at org.apache.spark.storage.memory.MemoryStore.putIterator(MemoryStore.scala:221)
       at org.apache.spark.storage.memory.MemoryStore.putIteratorAsBytes(MemoryStore.scala:349)
       at org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1182)
       at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1091)
       at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1156)
       at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:882)
       at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:335)
       at org.apache.spark.rdd.RDD.iterator(RDD.scala:286)
       at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
       at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
       at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
       at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
       at org.apache.spark.scheduler.Task.run(Task.scala:123)
       at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:411)
       at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
       at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
       at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
       at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
       at java.lang.Thread.run(Thread.java:748)
   Caused by: java.io.FileNotFoundException: No such file or directory 's3://myBucket/absolute_path_to/daas_date=2020/ff707f6d-0e41-405e-9623-f7302600765b-0_2-816-50629_20200825065331.parquet'
       at com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.getFileStatus(S3NativeFileSystem.java:617)
       at com.amazon.ws.emr.hadoop.fs.EmrFileSystem.getFileStatus(EmrFileSystem.java:553)
       at org.apache.parquet.hadoop.ParquetReader$Builder.build(ParquetReader.java:300)
       at org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdateInternal(HoodieCopyOnWriteTable.java:202)
       ... 26 more
   
   Driver stacktrace:
       at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2041)
       at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2029)
       at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2028)
       at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
       at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
       at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
       at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2028)
       at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:966)
       at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:966)
       at scala.Option.foreach(Option.scala:407)
       at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:966)
       at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2262)
       at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2211)
       at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2200)
       at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
       at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:777)
       at org.apache.spark.SparkContext.runJob(SparkContext.scala:2061)
       at org.apache.spark.SparkContext.runJob(SparkContext.scala:2082)
       at org.apache.spark.SparkContext.runJob(SparkContext.scala:2101)
       at org.apache.spark.SparkContext.runJob(SparkContext.scala:2126)
       at org.apache.spark.rdd.RDD.$anonfun$collect$1(RDD.scala:945)
       at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
       at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
       at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
       at org.apache.spark.rdd.RDD.collect(RDD.scala:944)
       at org.apache.spark.api.java.JavaRDDLike.collect(JavaRDDLike.scala:361)
       at org.apache.spark.api.java.JavaRDDLike.collect$(JavaRDDLike.scala:360)
       at org.apache.spark.api.java.AbstractJavaRDDLike.collect(JavaRDDLike.scala:45)
       at org.apache.hudi.client.HoodieWriteClient.doCompactionCommit(HoodieWriteClient.java:1134)
       at org.apache.hudi.client.HoodieWriteClient.commitCompaction(HoodieWriteClient.java:1102)
       at org.apache.hudi.client.HoodieWriteClient.runCompaction(HoodieWriteClient.java:1085)
       at org.apache.hudi.client.HoodieWriteClient.compact(HoodieWriteClient.java:1056)
       at org.apache.hudi.client.HoodieWriteClient.lambda$forceCompact$13(HoodieWriteClient.java:1171)
       at org.apache.hudi.common.util.Option.ifPresent(Option.java:96)
       at org.apache.hudi.client.HoodieWriteClient.forceCompact(HoodieWriteClient.java:1168)
       at org.apache.hudi.client.HoodieWriteClient.postCommit(HoodieWriteClient.java:503)
       at org.apache.hudi.client.AbstractHoodieWriteClient.commit(AbstractHoodieWriteClient.java:157)
       at org.apache.hudi.client.AbstractHoodieWriteClient.commit(AbstractHoodieWriteClient.java:101)
       at org.apache.hudi.client.AbstractHoodieWriteClient.commit(AbstractHoodieWriteClient.java:92)
       at org.apache.hudi.HoodieSparkSqlWriter$.checkWriteStatus(HoodieSparkSqlWriter.scala:268)
       at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:188)
       at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:108)
       at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:46)
       at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
       at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
       at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
       at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:131)
       at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:156)
       at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
       at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
       at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
       at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:83)
       at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:83)
       at org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:676)
       at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:84)
       at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:165)
       at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:74)
       at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676)
       at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:290)
       at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271)
       at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229)
       at aaa.dataprocessor.writer.EventsWriter$.saveToHudiTable(EventsWriter.scala:145)
       at aaa.dataprocessor.MainProcessor$.processBatch(MainProcessor.scala:162)
       at aaa.dataprocessor.MainProcessor$.$anonfun$main$4(MainProcessor.scala:90)
       at aaa.dataprocessor.MainProcessor$.$anonfun$main$4$adapted(MainProcessor.scala:82)
       at org.apache.spark.sql.execution.streaming.sources.ForeachBatchSink.addBatch(ForeachBatchSink.scala:35)
       at org.apache.spark.sql.execution.streaming.MicroBatchExecution.$anonfun$runBatch$15(MicroBatchExecution.scala:537)
       at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:84)
       at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:165)
       at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:74)
       at org.apache.spark.sql.execution.streaming.MicroBatchExecution.$anonfun$runBatch$14(MicroBatchExecution.scala:536)
       at org.apache.spark.sql.execution.streaming.ProgressReporter.reportTimeTaken(ProgressReporter.scala:351)
       at org.apache.spark.sql.execution.streaming.ProgressReporter.reportTimeTaken$(ProgressReporter.scala:349)
       at org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:58)
       at org.apache.spark.sql.execution.streaming.MicroBatchExecution.runBatch(MicroBatchExecution.scala:535)
       at org.apache.spark.sql.execution.streaming.MicroBatchExecution.$anonfun$runActivatedStream$2(MicroBatchExecution.scala:198)
       at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
       at org.apache.spark.sql.execution.streaming.ProgressReporter.reportTimeTaken(ProgressReporter.scala:351)
       at org.apache.spark.sql.execution.streaming.ProgressReporter.reportTimeTaken$(ProgressReporter.scala:349)
       at org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:58)
       at org.apache.spark.sql.execution.streaming.MicroBatchExecution.$anonfun$runActivatedStream$1(MicroBatchExecution.scala:166)
       at org.apache.spark.sql.execution.streaming.ProcessingTimeExecutor.execute(TriggerExecutor.scala:56)
       at org.apache.spark.sql.execution.streaming.MicroBatchExecution.runActivatedStream(MicroBatchExecution.scala:160)
       at org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:281)
       at org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:193)
   Caused by: org.apache.hudi.exception.HoodieException: java.io.FileNotFoundException: No such file or directory 's3://myBucket/absolute_path_to/daas_date=2020/ff707f6d-0e41-405e-9623-f7302600765b-0_2-816-50629_20200825065331.parquet'
       at org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdateInternal(HoodieCopyOnWriteTable.java:207)
       at org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdate(HoodieCopyOnWriteTable.java:190)
       at org.apache.hudi.table.compact.HoodieMergeOnReadTableCompactor.compact(HoodieMergeOnReadTableCompactor.java:139)
       at org.apache.hudi.table.compact.HoodieMergeOnReadTableCompactor.lambda$compact$644ebad7$1(HoodieMergeOnReadTableCompactor.java:98)
       at org.apache.spark.api.java.JavaPairRDD$.$anonfun$toScalaFunction$1(JavaPairRDD.scala:1040)
       at scala.collection.Iterator$$anon$10.next(Iterator.scala:459)
       at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:484)
       at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:490)
       at org.apache.spark.storage.memory.MemoryStore.putIterator(MemoryStore.scala:221)
       at org.apache.spark.storage.memory.MemoryStore.putIteratorAsBytes(MemoryStore.scala:349)
       at org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1182)
       at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1091)
       at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1156)
       at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:882)
       at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:335)
       at org.apache.spark.rdd.RDD.iterator(RDD.scala:286)
       at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
       at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
       at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
       at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
       at org.apache.spark.scheduler.Task.run(Task.scala:123)
       at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:411)
       at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
       at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
       at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
       at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
       at java.lang.Thread.run(Thread.java:748)
   Caused by: java.io.FileNotFoundException: No such file or directory 's3://myBucket/absolute_path_to/daas_date=2020/ff707f6d-0e41-405e-9623-f7302600765b-0_2-816-50629_20200825065331.parquet'
       at com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.getFileStatus(S3NativeFileSystem.java:617)
       at com.amazon.ws.emr.hadoop.fs.EmrFileSystem.getFileStatus(EmrFileSystem.java:553)
       at org.apache.parquet.hadoop.ParquetReader$Builder.build(ParquetReader.java:300)
       at org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdateInternal(HoodieCopyOnWriteTable.java:202)
       ... 26 more
   20/08/25 07:17:18 WARN TaskSetManager: Lost task 3.3 in stage 41.0 (TID 2547, ip-xxx-xxx-xxx-xxx.ap-northeast-1.compute.internal, executor 1): TaskKilled (Stage cancelled)
   20/08/25 07:17:18 ERROR ApplicationMaster: User class threw exception: org.apache.spark.sql.streaming.StreamingQueryException: Job aborted due to stage failure: Task 2 in stage 41.0 failed 4 times, most recent failure: Lost task 2.3 in stage 41.0 (TID 2546, ip-xxx-xxx-xxx-xxx.ap-northeast-1.compute.internal, executor 1): org.apache.hudi.exception.HoodieException: java.io.FileNotFoundException: No such file or directory 's3://myBucket/absolute_path_to/daas_date=2020/ff707f6d-0e41-405e-9623-f7302600765b-0_2-816-50629_20200825065331.parquet'
       at org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdateInternal(HoodieCopyOnWriteTable.java:207)
       at org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdate(HoodieCopyOnWriteTable.java:190)
       at org.apache.hudi.table.compact.HoodieMergeOnReadTableCompactor.compact(HoodieMergeOnReadTableCompactor.java:139)
       at org.apache.hudi.table.compact.HoodieMergeOnReadTableCompactor.lambda$compact$644ebad7$1(HoodieMergeOnReadTableCompactor.java:98)
       at org.apache.spark.api.java.JavaPairRDD$.$anonfun$toScalaFunction$1(JavaPairRDD.scala:1040)
       at scala.collection.Iterator$$anon$10.next(Iterator.scala:459)
       at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:484)
       at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:490)
       at org.apache.spark.storage.memory.MemoryStore.putIterator(MemoryStore.scala:221)
       at org.apache.spark.storage.memory.MemoryStore.putIteratorAsBytes(MemoryStore.scala:349)
       at org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1182)
       at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1091)
       at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1156)
       at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:882)
       at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:335)
       at org.apache.spark.rdd.RDD.iterator(RDD.scala:286)
       at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
       at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
       at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
       at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
       at org.apache.spark.scheduler.Task.run(Task.scala:123)
       at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:411)
       at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
       at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
       at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
       at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
       at java.lang.Thread.run(Thread.java:748)
   Caused by: java.io.FileNotFoundException: No such file or directory 's3://myBucket/absolute_path_to/daas_date=2020/ff707f6d-0e41-405e-9623-f7302600765b-0_2-816-50629_20200825065331.parquet'
       at com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.getFileStatus(S3NativeFileSystem.java:617)
       at com.amazon.ws.emr.hadoop.fs.EmrFileSystem.getFileStatus(EmrFileSystem.java:553)
       at org.apache.parquet.hadoop.ParquetReader$Builder.build(ParquetReader.java:300)
       at org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdateInternal(HoodieCopyOnWriteTable.java:202)
       ... 26 more
   
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] bvaradar commented on issue #2020: [SUPPORT] Compaction fails with "java.io.FileNotFoundException"

Posted by GitBox <gi...@apache.org>.

bvaradar commented on issue #2020:
URL: https://github.com/apache/hudi/issues/2020#issuecomment-683236267


   @dm-tran : This is likely due to 2 different compactions running on the same instant. From your stderr_01.log, the compaction job failed but we do see s3://my-bucket/my-table/.hoodie/20200827155539.commit . Also, from both the logs, it is clear that FileNotFoundException happened right after the compaction failed (from the date timestamps in the logs). This is not possible unless you have another concurrent compaction running which rolled back this compaction run. Failure during finalizeWrite (in stderr_01.log) will not result in the compaction succeeding. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] n3nash closed issue #2020: [SUPPORT] Compaction fails with "java.io.FileNotFoundException"

Posted by GitBox <gi...@apache.org>.

n3nash closed issue #2020:
URL: https://github.com/apache/hudi/issues/2020


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] dm-tran commented on issue #2020: [SUPPORT] Compaction fails with "java.io.FileNotFoundException"

Posted by GitBox <gi...@apache.org>.

dm-tran commented on issue #2020:
URL: https://github.com/apache/hudi/issues/2020#issuecomment-682314989


   The file that isn't found is `'s3://my-bucket/my-table/daas_date=2020/9dee1248-c972-4ed3-80f5-15545ac4c534-0_2-4957-299294_20200827155539.parquet'`.
   
   The available files in s3 that start with "9dee1248-c972-4ed3-80f5-15545ac4c534-0_2" are: 
   ```
   2020-08-27 10:26 33525767 s3://my-bucket/my-table/daas_date=2020/9dee1248-c972-4ed3-80f5-15545ac4c534-0_2-3850-231917_20200827102526.parquet
   2020-08-27 10:33 33526574 s3://my-bucket/my-table/daas_date=2020/9dee1248-c972-4ed3-80f5-15545ac4c534-0_2-3891-234401_20200827103318.parquet
   2020-08-27 16:17 33545224 s3://my-bucket/my-table/daas_date=2020/9dee1248-c972-4ed3-80f5-15545ac4c534-0_2-39-2458_20200827155539.parquet
   2020-08-27 11:13 33530132 s3://my-bucket/my-table/daas_date=2020/9dee1248-c972-4ed3-80f5-15545ac4c534-0_2-4096-246791_20200827111254.parquet
   2020-08-27 11:22 33530880 s3://my-bucket/my-table/daas_date=2020/9dee1248-c972-4ed3-80f5-15545ac4c534-0_2-4137-249295_20200827112139.parquet
   2020-08-27 12:00 33533333 s3://my-bucket/my-table/daas_date=2020/9dee1248-c972-4ed3-80f5-15545ac4c534-0_2-4301-259277_20200827115949.parquet
   2020-08-27 12:20 33534377 s3://my-bucket/my-table/daas_date=2020/9dee1248-c972-4ed3-80f5-15545ac4c534-0_2-4383-264271_20200827121947.parquet
   2020-08-27 12:42 33535631 s3://my-bucket/my-table/daas_date=2020/9dee1248-c972-4ed3-80f5-15545ac4c534-0_2-4465-269277_20200827124204.parquet
   2020-08-27 12:54 33536084 s3://my-bucket/my-table/daas_date=2020/9dee1248-c972-4ed3-80f5-15545ac4c534-0_2-4506-271786_20200827125338.parquet
   2020-08-27 13:07 33536635 s3://my-bucket/my-table/daas_date=2020/9dee1248-c972-4ed3-80f5-15545ac4c534-0_2-4547-274289_20200827130640.parquet
   2020-08-27 13:20 33537444 s3://my-bucket/my-table/daas_date=2020/9dee1248-c972-4ed3-80f5-15545ac4c534-0_2-4588-276783_20200827131919.parquet
   2020-08-27 13:32 33538151 s3://my-bucket/my-table/daas_date=2020/9dee1248-c972-4ed3-80f5-15545ac4c534-0_2-4629-279284_20200827133143.parquet
   2020-08-27 13:46 33539531 s3://my-bucket/my-table/daas_date=2020/9dee1248-c972-4ed3-80f5-15545ac4c534-0_2-4670-281782_20200827134536.parquet
   2020-08-27 14:14 33541130 s3://my-bucket/my-table/daas_date=2020/9dee1248-c972-4ed3-80f5-15545ac4c534-0_2-4752-286756_20200827141258.parquet
   2020-08-27 14:30 33541913 s3://my-bucket/my-table/daas_date=2020/9dee1248-c972-4ed3-80f5-15545ac4c534-0_2-4793-289269_20200827142922.parquet
   2020-08-27 14:49 33542820 s3://my-bucket/my-table/daas_date=2020/9dee1248-c972-4ed3-80f5-15545ac4c534-0_2-4834-291776_20200827144807.parquet
   2020-08-27 15:08 33543459 s3://my-bucket/my-table/daas_date=2020/9dee1248-c972-4ed3-80f5-15545ac4c534-0_2-4875-294286_20200827150653.parquet
   2020-08-27 15:30 33544369 s3://my-bucket/my-table/daas_date=2020/9dee1248-c972-4ed3-80f5-15545ac4c534-0_2-4916-296786_20200827152840.parquet
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] bvaradar commented on issue #2020: [SUPPORT] Compaction fails with "java.io.FileNotFoundException"

Posted by GitBox <gi...@apache.org>.

bvaradar commented on issue #2020:
URL: https://github.com/apache/hudi/issues/2020#issuecomment-679873398


   @dm-tran : COmpaction would retry compacting the same file again till it succeeds. As the file is not there already, it would not help. can you re-bootstrap and then start ingesting  the data but this time enable consistency guard right from the begining.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] zherenyu831 commented on issue #2020: [SUPPORT] Compaction fails with "java.io.FileNotFoundException"

Posted by GitBox <gi...@apache.org>.

zherenyu831 commented on issue #2020:
URL: https://github.com/apache/hudi/issues/2020#issuecomment-691815280


   @bvaradar 
   Thank you so much, will keeping using hoodie.filesystem.view.incr.timeline.sync.enable=false


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] bvaradar commented on issue #2020: [SUPPORT] Compaction fails with "java.io.FileNotFoundException"

Posted by GitBox <gi...@apache.org>.

bvaradar commented on issue #2020:
URL: https://github.com/apache/hudi/issues/2020#issuecomment-689173851


   @zherenyu831  @dm-tran : Good catch about incremental timeline syncing. This is an experimental feature still and is disabled by default. There could be a bug here. I will investigate further and have raised a blocker for next release : https://issues.apache.org/jira/browse/HUDI-1275
   
   Please set this property to false for now. Also, Please use "compaction unschedule" CLI command to revert compactions. Deleting inflight/requested compaction files is not safe. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] dm-tran commented on issue #2020: [SUPPORT] Compaction fails with "java.io.FileNotFoundException"

Posted by GitBox <gi...@apache.org>.

dm-tran commented on issue #2020:
URL: https://github.com/apache/hudi/issues/2020#issuecomment-679875396


   > can you re-bootstrap and then start ingesting the data but this time enable consistency guard right from the begining.
   @bvaradar Actually, this is what I did. I deleted the hudi table in s3, added the consistency check property and started ingesting the data from the beginning.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] dm-tran edited a comment on issue #2020: [SUPPORT] Compaction fails with "java.io.FileNotFoundException"

Posted by GitBox <gi...@apache.org>.

dm-tran edited a comment on issue #2020:
URL: https://github.com/apache/hudi/issues/2020#issuecomment-687993540


   @bvaradar FYI, I have reproduced this error using Hudi 0.6.0, after running a structured streaming job during several days. Please find the logs attached. (I haven't identified a concurrent process that runs compactions)
   
   [withHudi060_stderr_01.log](https://github.com/apache/hudi/files/5180850/withHudi060_stderr_01.log)
   [withHudi060_stderr_02.log](https://github.com/apache/hudi/files/5180851/withHudi060_stderr_02.log)
   
   Is there a workaround to fix this error? Would it be possible to rollback some commits and resume ingestion/compaction?
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] bvaradar commented on issue #2020: [SUPPORT] Compaction fails with "java.io.FileNotFoundException"

Posted by GitBox <gi...@apache.org>.

bvaradar commented on issue #2020:
URL: https://github.com/apache/hudi/issues/2020#issuecomment-679253153


   Can you please add the details of 
   "commit showfiles --commit 20200821153748"
   
   Are you running with consistency check enabled ?
   
   Can you also check if the file is actually absent by listing the folder s3://myBucket/absolute_path_to/daas_date=2020-05/
   
   Also, paste the output of listing in this issue.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] dm-tran edited a comment on issue #2020: [SUPPORT] Compaction fails with "java.io.FileNotFoundException"

Posted by GitBox <gi...@apache.org>.

dm-tran edited a comment on issue #2020:
URL: https://github.com/apache/hudi/issues/2020#issuecomment-682311268


   @bvaradar The exception was raised, after running the structured streaming job for a while.
   
   Please find attached the driver logs with INFO level logging.
   
   [stderr_01.log](https://github.com/apache/hudi/files/5139921/stderr_01.log) : the structured streaming job fails with error `org.apache.hudi.exception.HoodieIOException: Consistency check failed to ensure all files APPEAR`
   [stderr_02.log](https://github.com/apache/hudi/files/5139922/stderr_02.log) : the structured streaming job is retried by YARN and compaction fails with a `java.io.FileNotFoundException`
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] dm-tran commented on issue #2020: [SUPPORT] Compaction fails with "java.io.FileNotFoundException"

Posted by GitBox <gi...@apache.org>.

dm-tran commented on issue #2020:
URL: https://github.com/apache/hudi/issues/2020#issuecomment-682311268


   @bvaradar The exception was raised, after running the structured streaming for a while.
   
   Please find attached the driver logs with INFO level logging.
   
   [stderr_01.log](https://github.com/apache/hudi/files/5139921/stderr_01.log) : the structured streaming job fails with error `org.apache.hudi.exception.HoodieIOException: Consistency check failed to ensure all files APPEAR`
   [stderr_02.log](https://github.com/apache/hudi/files/5139922/stderr_02.log) : the structured streaming job is retried by YARN and compaction fails with a `java.io.FileNotFoundException`
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] dm-tran commented on issue #2020: [SUPPORT] Compaction fails with "java.io.FileNotFoundException"

Posted by GitBox <gi...@apache.org>.

dm-tran commented on issue #2020:
URL: https://github.com/apache/hudi/issues/2020#issuecomment-687993540


   @bvaradar FYI, I have reproduced this error using Hudi 0.6.0, after running a structured streaming job during several days. Please find the logs attached. (I haven't identified a concurrent process that runs compactions)
   
   [withHudi060_stderr_01.log](https://github.com/apache/hudi/files/5180850/withHudi060_stderr_01.log)
   [withHudi060_stderr_02.log](https://github.com/apache/hudi/files/5180851/withHudi060_stderr_02.log)
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] n3nash commented on issue #2020: [SUPPORT] Compaction fails with "java.io.FileNotFoundException"

Posted by GitBox <gi...@apache.org>.

n3nash commented on issue #2020:
URL: https://github.com/apache/hudi/issues/2020#issuecomment-693555013


   @zherenyu831 Seems like the issue is resolved with setting the config to false. We will debug the issue as opened by @bvaradar. Closing this ticket.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] dm-tran commented on issue #2020: [SUPPORT] Compaction fails with "java.io.FileNotFoundException"

Posted by GitBox <gi...@apache.org>.

dm-tran commented on issue #2020:
URL: https://github.com/apache/hudi/issues/2020#issuecomment-679866696


   @bvaradar I ran the structured streaming job with `hoodie.consistency.check.enabled = true`, starting from the earliest offsets in Kafka, and got the same error: a `java.io.FileNotFoundException` when the compaction is retried.
   
   **Summary**
   
   The structured streaming job ran during 3 hours:
   - at some point, some executors were lost because of an OutOfMemory error
   - then the spark driver failed because of the consistency check failed
   
   The spark application was then retried by YARN, and the 2nd attempt failed with `Caused by: java.io.FileNotFoundException: No such file or directory` when the compaction was retried.
   
   **Stacktraces**
   
   Stracktrace of the first attempt:
   ```
   20/08/25 06:51:39 WARN HiveConf: HiveConf of name hive.server2.thrift.url does not exist
   20/08/25 06:51:40 WARN ProcessingTimeExecutor: Current batch is falling behind. The trigger interval is 300000 milliseconds, but spent 800229 milliseconds
   20/08/25 06:56:24 WARN BlockManagerMasterEndpoint: No more replicas available for rdd_1775_40 !
   20/08/25 06:56:24 WARN BlockManagerMasterEndpoint: No more replicas available for rdd_1785_53 !
   [...]
   20/08/25 06:56:24 WARN BlockManagerMasterEndpoint: No more replicas available for rdd_1785_35 !
   20/08/25 06:56:24 WARN BlockManagerMasterEndpoint: No more replicas available for rdd_1785_50 !
   20/08/25 06:56:24 WARN YarnAllocator: Container from a bad node: container_1594796531644_1833_01_000002 on host: ip-xxx-xxx-xxx-xxx.ap-northeast-1.compute.internal. Exit status: 143. Diagnostics: [2020-08-25 06:56:24.636]Container killed on request. Exit code is 143
   [2020-08-25 06:56:24.636]Container exited with a non-zero exit code 143. 
   [2020-08-25 06:56:24.637]Killed by external signal
   .
   20/08/25 06:56:24 WARN YarnSchedulerBackend$YarnSchedulerEndpoint: Requesting driver to remove executor 1 for reason Container from a bad node: container_1594796531644_1833_01_000002 on host: ip-xxx-xxx-xxx-xxx.ap-northeast-1.compute.internal. Exit status: 143. Diagnostics: [2020-08-25 06:56:24.636]Container killed on request. Exit code is 143
   [2020-08-25 06:56:24.636]Container exited with a non-zero exit code 143. 
   [2020-08-25 06:56:24.637]Killed by external signal
   .
   20/08/25 06:56:24 ERROR YarnClusterScheduler: Lost executor 1 on ip-xxx-xxx-xxx-xxx.ap-northeast-1.compute.internal: Container from a bad node: container_1594796531644_1833_01_000002 on host: ip-xxx-xxx-xxx-xxx.ap-northeast-1.compute.internal. Exit status: 143. Diagnostics: [2020-08-25 06:56:24.636]Container killed on request. Exit code is 143
   [2020-08-25 06:56:24.636]Container exited with a non-zero exit code 143. 
   [2020-08-25 06:56:24.637]Killed by external signal
   .
   20/08/25 06:56:24 WARN TaskSetManager: Lost task 1.0 in stage 816.0 (TID 50626, ip-xxx-xxx-xxx-xxx.ap-northeast-1.compute.internal, executor 1): ExecutorLostFailure (executor 1 exited caused by one of the running tasks) Reason: Container from a bad node: container_1594796531644_1833_01_000002 on host: ip-xxx-xxx-xxx-xxx.ap-northeast-1.compute.internal. Exit status: 143. Diagnostics: [2020-08-25 06:56:24.636]Container killed on request. Exit code is 143
   [2020-08-25 06:56:24.636]Container exited with a non-zero exit code 143. 
   [2020-08-25 06:56:24.637]Killed by external signal
   .
   20/08/25 06:56:24 WARN TaskSetManager: Lost task 0.0 in stage 816.0 (TID 50625, ip-xxx-xxx-xxx-xxx.ap-northeast-1.compute.internal, executor 1): ExecutorLostFailure (executor 1 exited caused by one of the running tasks) Reason: Container from a bad node: container_1594796531644_1833_01_000002 on host: ip-xxx-xxx-xxx-xxx.ap-northeast-1.compute.internal. Exit status: 143. Diagnostics: [2020-08-25 06:56:24.636]Container killed on request. Exit code is 143
   [2020-08-25 06:56:24.636]Container exited with a non-zero exit code 143. 
   [2020-08-25 06:56:24.637]Killed by external signal
   .
   20/08/25 06:56:24 WARN ExecutorAllocationManager: Attempted to mark unknown executor 1 idle
   20/08/25 07:07:51 ERROR MicroBatchExecution: Query [id = 6ea738ee-0886-4014-a2b2-f51efd693c45, runId = 97c16ef4-d610-4d44-a0e9-a9d24ed5e0cf] terminated with error
   org.apache.hudi.exception.HoodieCommitException: Failed to complete commit 20200825065331 due to finalize errors.
       at org.apache.hudi.client.AbstractHoodieWriteClient.finalizeWrite(AbstractHoodieWriteClient.java:204)
       at org.apache.hudi.client.HoodieWriteClient.doCompactionCommit(HoodieWriteClient.java:1142)
       at org.apache.hudi.client.HoodieWriteClient.commitCompaction(HoodieWriteClient.java:1102)
       at org.apache.hudi.client.HoodieWriteClient.runCompaction(HoodieWriteClient.java:1085)
       at org.apache.hudi.client.HoodieWriteClient.compact(HoodieWriteClient.java:1056)
       at org.apache.hudi.client.HoodieWriteClient.lambda$forceCompact$13(HoodieWriteClient.java:1171)
       at org.apache.hudi.common.util.Option.ifPresent(Option.java:96)
       at org.apache.hudi.client.HoodieWriteClient.forceCompact(HoodieWriteClient.java:1168)
       at org.apache.hudi.client.HoodieWriteClient.postCommit(HoodieWriteClient.java:503)
       at org.apache.hudi.client.AbstractHoodieWriteClient.commit(AbstractHoodieWriteClient.java:157)
       at org.apache.hudi.client.AbstractHoodieWriteClient.commit(AbstractHoodieWriteClient.java:101)
       at org.apache.hudi.client.AbstractHoodieWriteClient.commit(AbstractHoodieWriteClient.java:92)
       at org.apache.hudi.HoodieSparkSqlWriter$.checkWriteStatus(HoodieSparkSqlWriter.scala:268)
       at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:188)
       at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:108)
       at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:46)
       at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
       at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
       at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
       at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:131)
       at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:156)
       at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
       at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
       at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
       at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:83)
       at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:83)
       at org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:676)
       at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:84)
       at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:165)
       at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:74)
       at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676)
       at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:290)
       at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271)
       at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229)
       at aaa.dataprocessor.writer.EventsWriter$.saveToHudiTable(EventsWriter.scala:145)
       at aaa.dataprocessor.MainProcessor$.processBatch(MainProcessor.scala:162)
       at aaa.dataprocessor.MainProcessor$.$anonfun$main$4(MainProcessor.scala:90)
       at aaa.dataprocessor.MainProcessor$.$anonfun$main$4$adapted(MainProcessor.scala:82)
       at org.apache.spark.sql.execution.streaming.sources.ForeachBatchSink.addBatch(ForeachBatchSink.scala:35)
       at org.apache.spark.sql.execution.streaming.MicroBatchExecution.$anonfun$runBatch$15(MicroBatchExecution.scala:537)
       at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:84)
       at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:165)
       at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:74)
       at org.apache.spark.sql.execution.streaming.MicroBatchExecution.$anonfun$runBatch$14(MicroBatchExecution.scala:536)
       at org.apache.spark.sql.execution.streaming.ProgressReporter.reportTimeTaken(ProgressReporter.scala:351)
       at org.apache.spark.sql.execution.streaming.ProgressReporter.reportTimeTaken$(ProgressReporter.scala:349)
       at org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:58)
       at org.apache.spark.sql.execution.streaming.MicroBatchExecution.runBatch(MicroBatchExecution.scala:535)
       at org.apache.spark.sql.execution.streaming.MicroBatchExecution.$anonfun$runActivatedStream$2(MicroBatchExecution.scala:198)
       at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
       at org.apache.spark.sql.execution.streaming.ProgressReporter.reportTimeTaken(ProgressReporter.scala:351)
       at org.apache.spark.sql.execution.streaming.ProgressReporter.reportTimeTaken$(ProgressReporter.scala:349)
       at org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:58)
       at org.apache.spark.sql.execution.streaming.MicroBatchExecution.$anonfun$runActivatedStream$1(MicroBatchExecution.scala:166)
       at org.apache.spark.sql.execution.streaming.ProcessingTimeExecutor.execute(TriggerExecutor.scala:56)
       at org.apache.spark.sql.execution.streaming.MicroBatchExecution.runActivatedStream(MicroBatchExecution.scala:160)
       at org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:281)
       at org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:193)
   Caused by: org.apache.hudi.exception.HoodieIOException: Consistency check failed to ensure all files APPEAR
       at org.apache.hudi.table.HoodieTable.waitForAllFiles(HoodieTable.java:431)
       at org.apache.hudi.table.HoodieTable.cleanFailedWrites(HoodieTable.java:379)
       at org.apache.hudi.table.HoodieTable.finalizeWrite(HoodieTable.java:315)
       at org.apache.hudi.table.HoodieMergeOnReadTable.finalizeWrite(HoodieMergeOnReadTable.java:319)
       at org.apache.hudi.client.AbstractHoodieWriteClient.finalizeWrite(AbstractHoodieWriteClient.java:195)
       ... 57 more
   ```
   
   Stracktrace of the second attempt:
   ```
   SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
   20/08/25 07:07:56 WARN YarnSchedulerBackend$YarnSchedulerEndpoint: Attempted to request executors before the AM has registered!
   20/08/25 07:10:02 ERROR HoodieMergeOnReadTable: Rolling back instant [==>20200825065331__compaction__INFLIGHT]
   20/08/25 07:10:07 WARN HoodieCopyOnWriteTable: Rollback finished without deleting inflight instant file. Instant=[==>20200825065331__compaction__INFLIGHT]
   20/08/25 07:17:12 WARN TaskSetManager: Lost task 2.0 in stage 41.0 (TID 2539, ip-xxx-xxx-xxx-xxx.ap-northeast-1.compute.internal, executor 1): org.apache.hudi.exception.HoodieException: java.io.FileNotFoundException: No such file or directory 's3://myBucket/absolute_path_to/daas_date=2020/ff707f6d-0e41-405e-9623-f7302600765b-0_2-816-50629_20200825065331.parquet'
       at org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdateInternal(HoodieCopyOnWriteTable.java:207)
       at org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdate(HoodieCopyOnWriteTable.java:190)
       at org.apache.hudi.table.compact.HoodieMergeOnReadTableCompactor.compact(HoodieMergeOnReadTableCompactor.java:139)
       at org.apache.hudi.table.compact.HoodieMergeOnReadTableCompactor.lambda$compact$644ebad7$1(HoodieMergeOnReadTableCompactor.java:98)
       at org.apache.spark.api.java.JavaPairRDD$.$anonfun$toScalaFunction$1(JavaPairRDD.scala:1040)
       at scala.collection.Iterator$$anon$10.next(Iterator.scala:459)
       at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:484)
       at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:490)
       at org.apache.spark.storage.memory.MemoryStore.putIterator(MemoryStore.scala:221)
       at org.apache.spark.storage.memory.MemoryStore.putIteratorAsBytes(MemoryStore.scala:349)
       at org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1182)
       at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1091)
       at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1156)
       at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:882)
       at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:335)
       at org.apache.spark.rdd.RDD.iterator(RDD.scala:286)
       at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
       at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
       at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
       at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
       at org.apache.spark.scheduler.Task.run(Task.scala:123)
       at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:411)
       at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
       at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
       at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
       at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
       at java.lang.Thread.run(Thread.java:748)
   Caused by: java.io.FileNotFoundException: No such file or directory 's3://myBucket/absolute_path_to/daas_date=2020/ff707f6d-0e41-405e-9623-f7302600765b-0_2-816-50629_20200825065331.parquet'
       at com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.getFileStatus(S3NativeFileSystem.java:617)
       at com.amazon.ws.emr.hadoop.fs.EmrFileSystem.getFileStatus(EmrFileSystem.java:553)
       at org.apache.parquet.hadoop.ParquetReader$Builder.build(ParquetReader.java:300)
       at org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdateInternal(HoodieCopyOnWriteTable.java:202)
       ... 26 more
   
   20/08/25 07:17:13 WARN TaskSetManager: Lost task 3.0 in stage 41.0 (TID 2540, ip-xxx-xxx-xxx-xxx.ap-northeast-1.compute.internal, executor 1): org.apache.hudi.exception.HoodieException: java.io.FileNotFoundException: No such file or directory 's3://myBucket/absolute_path_to/daas_date=2020/56be5da5-f5f3-4675-8dec-433f3656f839-0_3-816-50630_20200825065331.parquet'
       at org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdateInternal(HoodieCopyOnWriteTable.java:207)
       at org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdate(HoodieCopyOnWriteTable.java:190)
       at org.apache.hudi.table.compact.HoodieMergeOnReadTableCompactor.compact(HoodieMergeOnReadTableCompactor.java:139)
       at org.apache.hudi.table.compact.HoodieMergeOnReadTableCompactor.lambda$compact$644ebad7$1(HoodieMergeOnReadTableCompactor.java:98)
       at org.apache.spark.api.java.JavaPairRDD$.$anonfun$toScalaFunction$1(JavaPairRDD.scala:1040)
       at scala.collection.Iterator$$anon$10.next(Iterator.scala:459)
       at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:484)
       at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:490)
       at org.apache.spark.storage.memory.MemoryStore.putIterator(MemoryStore.scala:221)
       at org.apache.spark.storage.memory.MemoryStore.putIteratorAsBytes(MemoryStore.scala:349)
       at org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1182)
       at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1091)
       at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1156)
       at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:882)
       at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:335)
       at org.apache.spark.rdd.RDD.iterator(RDD.scala:286)
       at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
       at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
       at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
       at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
       at org.apache.spark.scheduler.Task.run(Task.scala:123)
       at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:411)
       at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
       at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
       at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
       at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
       at java.lang.Thread.run(Thread.java:748)
   Caused by: java.io.FileNotFoundException: No such file or directory 's3://myBucket/absolute_path_to/daas_date=2020/56be5da5-f5f3-4675-8dec-433f3656f839-0_3-816-50630_20200825065331.parquet'
       at com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.getFileStatus(S3NativeFileSystem.java:617)
       at com.amazon.ws.emr.hadoop.fs.EmrFileSystem.getFileStatus(EmrFileSystem.java:553)
       at org.apache.parquet.hadoop.ParquetReader$Builder.build(ParquetReader.java:300)
       at org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdateInternal(HoodieCopyOnWriteTable.java:202)
       ... 26 more
   
   20/08/25 07:17:18 ERROR TaskSetManager: Task 2 in stage 41.0 failed 4 times; aborting job
   20/08/25 07:17:18 ERROR MicroBatchExecution: Query [id = 6ea738ee-0886-4014-a2b2-f51efd693c45, runId = 9afd92cc-2ced-47e9-a34b-9574dd82c229] terminated with error
   org.apache.spark.SparkException: Job aborted due to stage failure: Task 2 in stage 41.0 failed 4 times, most recent failure: Lost task 2.3 in stage 41.0 (TID 2546, ip-xxx-xxx-xxx-xxx.ap-northeast-1.compute.internal, executor 1): org.apache.hudi.exception.HoodieException: java.io.FileNotFoundException: No such file or directory 's3://myBucket/absolute_path_to/daas_date=2020/ff707f6d-0e41-405e-9623-f7302600765b-0_2-816-50629_20200825065331.parquet'
       at org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdateInternal(HoodieCopyOnWriteTable.java:207)
       at org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdate(HoodieCopyOnWriteTable.java:190)
       at org.apache.hudi.table.compact.HoodieMergeOnReadTableCompactor.compact(HoodieMergeOnReadTableCompactor.java:139)
       at org.apache.hudi.table.compact.HoodieMergeOnReadTableCompactor.lambda$compact$644ebad7$1(HoodieMergeOnReadTableCompactor.java:98)
       at org.apache.spark.api.java.JavaPairRDD$.$anonfun$toScalaFunction$1(JavaPairRDD.scala:1040)
       at scala.collection.Iterator$$anon$10.next(Iterator.scala:459)
       at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:484)
       at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:490)
       at org.apache.spark.storage.memory.MemoryStore.putIterator(MemoryStore.scala:221)
       at org.apache.spark.storage.memory.MemoryStore.putIteratorAsBytes(MemoryStore.scala:349)
       at org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1182)
       at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1091)
       at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1156)
       at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:882)
       at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:335)
       at org.apache.spark.rdd.RDD.iterator(RDD.scala:286)
       at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
       at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
       at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
       at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
       at org.apache.spark.scheduler.Task.run(Task.scala:123)
       at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:411)
       at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
       at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
       at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
       at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
       at java.lang.Thread.run(Thread.java:748)
   Caused by: java.io.FileNotFoundException: No such file or directory 's3://myBucket/absolute_path_to/daas_date=2020/ff707f6d-0e41-405e-9623-f7302600765b-0_2-816-50629_20200825065331.parquet'
       at com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.getFileStatus(S3NativeFileSystem.java:617)
       at com.amazon.ws.emr.hadoop.fs.EmrFileSystem.getFileStatus(EmrFileSystem.java:553)
       at org.apache.parquet.hadoop.ParquetReader$Builder.build(ParquetReader.java:300)
       at org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdateInternal(HoodieCopyOnWriteTable.java:202)
       ... 26 more
   
   Driver stacktrace:
       at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2041)
       at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2029)
       at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2028)
       at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
       at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
       at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
       at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2028)
       at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:966)
       at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:966)
       at scala.Option.foreach(Option.scala:407)
       at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:966)
       at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2262)
       at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2211)
       at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2200)
       at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
       at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:777)
       at org.apache.spark.SparkContext.runJob(SparkContext.scala:2061)
       at org.apache.spark.SparkContext.runJob(SparkContext.scala:2082)
       at org.apache.spark.SparkContext.runJob(SparkContext.scala:2101)
       at org.apache.spark.SparkContext.runJob(SparkContext.scala:2126)
       at org.apache.spark.rdd.RDD.$anonfun$collect$1(RDD.scala:945)
       at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
       at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
       at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
       at org.apache.spark.rdd.RDD.collect(RDD.scala:944)
       at org.apache.spark.api.java.JavaRDDLike.collect(JavaRDDLike.scala:361)
       at org.apache.spark.api.java.JavaRDDLike.collect$(JavaRDDLike.scala:360)
       at org.apache.spark.api.java.AbstractJavaRDDLike.collect(JavaRDDLike.scala:45)
       at org.apache.hudi.client.HoodieWriteClient.doCompactionCommit(HoodieWriteClient.java:1134)
       at org.apache.hudi.client.HoodieWriteClient.commitCompaction(HoodieWriteClient.java:1102)
       at org.apache.hudi.client.HoodieWriteClient.runCompaction(HoodieWriteClient.java:1085)
       at org.apache.hudi.client.HoodieWriteClient.compact(HoodieWriteClient.java:1056)
       at org.apache.hudi.client.HoodieWriteClient.lambda$forceCompact$13(HoodieWriteClient.java:1171)
       at org.apache.hudi.common.util.Option.ifPresent(Option.java:96)
       at org.apache.hudi.client.HoodieWriteClient.forceCompact(HoodieWriteClient.java:1168)
       at org.apache.hudi.client.HoodieWriteClient.postCommit(HoodieWriteClient.java:503)
       at org.apache.hudi.client.AbstractHoodieWriteClient.commit(AbstractHoodieWriteClient.java:157)
       at org.apache.hudi.client.AbstractHoodieWriteClient.commit(AbstractHoodieWriteClient.java:101)
       at org.apache.hudi.client.AbstractHoodieWriteClient.commit(AbstractHoodieWriteClient.java:92)
       at org.apache.hudi.HoodieSparkSqlWriter$.checkWriteStatus(HoodieSparkSqlWriter.scala:268)
       at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:188)
       at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:108)
       at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:46)
       at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
       at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
       at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
       at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:131)
       at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:156)
       at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
       at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
       at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
       at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:83)
       at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:83)
       at org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:676)
       at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:84)
       at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:165)
       at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:74)
       at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676)
       at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:290)
       at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271)
       at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229)
       at aaa.dataprocessor.writer.EventsWriter$.saveToHudiTable(EventsWriter.scala:145)
       at aaa.dataprocessor.MainProcessor$.processBatch(MainProcessor.scala:162)
       at aaa.dataprocessor.MainProcessor$.$anonfun$main$4(MainProcessor.scala:90)
       at aaa.dataprocessor.MainProcessor$.$anonfun$main$4$adapted(MainProcessor.scala:82)
       at org.apache.spark.sql.execution.streaming.sources.ForeachBatchSink.addBatch(ForeachBatchSink.scala:35)
       at org.apache.spark.sql.execution.streaming.MicroBatchExecution.$anonfun$runBatch$15(MicroBatchExecution.scala:537)
       at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:84)
       at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:165)
       at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:74)
       at org.apache.spark.sql.execution.streaming.MicroBatchExecution.$anonfun$runBatch$14(MicroBatchExecution.scala:536)
       at org.apache.spark.sql.execution.streaming.ProgressReporter.reportTimeTaken(ProgressReporter.scala:351)
       at org.apache.spark.sql.execution.streaming.ProgressReporter.reportTimeTaken$(ProgressReporter.scala:349)
       at org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:58)
       at org.apache.spark.sql.execution.streaming.MicroBatchExecution.runBatch(MicroBatchExecution.scala:535)
       at org.apache.spark.sql.execution.streaming.MicroBatchExecution.$anonfun$runActivatedStream$2(MicroBatchExecution.scala:198)
       at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
       at org.apache.spark.sql.execution.streaming.ProgressReporter.reportTimeTaken(ProgressReporter.scala:351)
       at org.apache.spark.sql.execution.streaming.ProgressReporter.reportTimeTaken$(ProgressReporter.scala:349)
       at org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:58)
       at org.apache.spark.sql.execution.streaming.MicroBatchExecution.$anonfun$runActivatedStream$1(MicroBatchExecution.scala:166)
       at org.apache.spark.sql.execution.streaming.ProcessingTimeExecutor.execute(TriggerExecutor.scala:56)
       at org.apache.spark.sql.execution.streaming.MicroBatchExecution.runActivatedStream(MicroBatchExecution.scala:160)
       at org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:281)
       at org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:193)
   Caused by: org.apache.hudi.exception.HoodieException: java.io.FileNotFoundException: No such file or directory 's3://myBucket/absolute_path_to/daas_date=2020/ff707f6d-0e41-405e-9623-f7302600765b-0_2-816-50629_20200825065331.parquet'
       at org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdateInternal(HoodieCopyOnWriteTable.java:207)
       at org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdate(HoodieCopyOnWriteTable.java:190)
       at org.apache.hudi.table.compact.HoodieMergeOnReadTableCompactor.compact(HoodieMergeOnReadTableCompactor.java:139)
       at org.apache.hudi.table.compact.HoodieMergeOnReadTableCompactor.lambda$compact$644ebad7$1(HoodieMergeOnReadTableCompactor.java:98)
       at org.apache.spark.api.java.JavaPairRDD$.$anonfun$toScalaFunction$1(JavaPairRDD.scala:1040)
       at scala.collection.Iterator$$anon$10.next(Iterator.scala:459)
       at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:484)
       at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:490)
       at org.apache.spark.storage.memory.MemoryStore.putIterator(MemoryStore.scala:221)
       at org.apache.spark.storage.memory.MemoryStore.putIteratorAsBytes(MemoryStore.scala:349)
       at org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1182)
       at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1091)
       at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1156)
       at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:882)
       at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:335)
       at org.apache.spark.rdd.RDD.iterator(RDD.scala:286)
       at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
       at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
       at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
       at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
       at org.apache.spark.scheduler.Task.run(Task.scala:123)
       at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:411)
       at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
       at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
       at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
       at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
       at java.lang.Thread.run(Thread.java:748)
   Caused by: java.io.FileNotFoundException: No such file or directory 's3://myBucket/absolute_path_to/daas_date=2020/ff707f6d-0e41-405e-9623-f7302600765b-0_2-816-50629_20200825065331.parquet'
       at com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.getFileStatus(S3NativeFileSystem.java:617)
       at com.amazon.ws.emr.hadoop.fs.EmrFileSystem.getFileStatus(EmrFileSystem.java:553)
       at org.apache.parquet.hadoop.ParquetReader$Builder.build(ParquetReader.java:300)
       at org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdateInternal(HoodieCopyOnWriteTable.java:202)
       ... 26 more
   20/08/25 07:17:18 WARN TaskSetManager: Lost task 3.3 in stage 41.0 (TID 2547, ip-xxx-xxx-xxx-xxx.ap-northeast-1.compute.internal, executor 1): TaskKilled (Stage cancelled)
   20/08/25 07:17:18 ERROR ApplicationMaster: User class threw exception: org.apache.spark.sql.streaming.StreamingQueryException: Job aborted due to stage failure: Task 2 in stage 41.0 failed 4 times, most recent failure: Lost task 2.3 in stage 41.0 (TID 2546, ip-xxx-xxx-xxx-xxx.ap-northeast-1.compute.internal, executor 1): org.apache.hudi.exception.HoodieException: java.io.FileNotFoundException: No such file or directory 's3://myBucket/absolute_path_to/daas_date=2020/ff707f6d-0e41-405e-9623-f7302600765b-0_2-816-50629_20200825065331.parquet'
       at org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdateInternal(HoodieCopyOnWriteTable.java:207)
       at org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdate(HoodieCopyOnWriteTable.java:190)
       at org.apache.hudi.table.compact.HoodieMergeOnReadTableCompactor.compact(HoodieMergeOnReadTableCompactor.java:139)
       at org.apache.hudi.table.compact.HoodieMergeOnReadTableCompactor.lambda$compact$644ebad7$1(HoodieMergeOnReadTableCompactor.java:98)
       at org.apache.spark.api.java.JavaPairRDD$.$anonfun$toScalaFunction$1(JavaPairRDD.scala:1040)
       at scala.collection.Iterator$$anon$10.next(Iterator.scala:459)
       at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:484)
       at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:490)
       at org.apache.spark.storage.memory.MemoryStore.putIterator(MemoryStore.scala:221)
       at org.apache.spark.storage.memory.MemoryStore.putIteratorAsBytes(MemoryStore.scala:349)
       at org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1182)
       at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1091)
       at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1156)
       at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:882)
       at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:335)
       at org.apache.spark.rdd.RDD.iterator(RDD.scala:286)
       at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
       at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
       at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
       at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
       at org.apache.spark.scheduler.Task.run(Task.scala:123)
       at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:411)
       at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
       at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
       at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
       at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
       at java.lang.Thread.run(Thread.java:748)
   Caused by: java.io.FileNotFoundException: No such file or directory 's3://myBucket/absolute_path_to/daas_date=2020/ff707f6d-0e41-405e-9623-f7302600765b-0_2-816-50629_20200825065331.parquet'
       at com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.getFileStatus(S3NativeFileSystem.java:617)
       at com.amazon.ws.emr.hadoop.fs.EmrFileSystem.getFileStatus(EmrFileSystem.java:553)
       at org.apache.parquet.hadoop.ParquetReader$Builder.build(ParquetReader.java:300)
       at org.apache.hudi.table.HoodieCopyOnWriteTable.handleUpdateInternal(HoodieCopyOnWriteTable.java:202)
       ... 26 more
   
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] zherenyu831 commented on issue #2020: [SUPPORT] Compaction fails with "java.io.FileNotFoundException"

Posted by GitBox <gi...@apache.org>.

zherenyu831 commented on issue #2020:
URL: https://github.com/apache/hudi/issues/2020#issuecomment-688017532


   @bvaradar I am also the team member with @dm-tran 
   This problem seems related to compaction process (maybe because hoodie.filesystem.view.incr.timeline.sync.enable), 
   since once it create a compaction request, the file name in it is different with actually file in s3.
   The workaround of us is delete failed compaction.infight file and compaction.request file in .hoodie folder...
   and it works again


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] bvaradar commented on issue #2020: [SUPPORT] Compaction fails with "java.io.FileNotFoundException"

Posted by GitBox <gi...@apache.org>.

bvaradar commented on issue #2020:
URL: https://github.com/apache/hudi/issues/2020#issuecomment-679905218


   @dm-tran : Thanks for the details. The only possible explanation that I can think of is more than 1 writers are concurrently running that can cause this. Can you check if more than 1 writers are concurrently happening. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] bvaradar commented on issue #2020: [SUPPORT] Compaction fails with "java.io.FileNotFoundException"

Posted by GitBox <gi...@apache.org>.

bvaradar commented on issue #2020:
URL: https://github.com/apache/hudi/issues/2020#issuecomment-680450091


   @dm-tran : No, that should be fine. Hudi logic takes care of Spark retries. So, that should not be the issue. Given, that you are able to reproduce very easily and I have not seen this issue reported by anyone, Would you be able to provide us a self-contained code to reproduce this. I can set up S3 and try.  If not, can you turn on INFO level logging and catch the logs till you hit the exception and attach them. I am not sure how else to debug this. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] dm-tran commented on issue #2020: [SUPPORT] Compaction fails with "java.io.FileNotFoundException"

Posted by GitBox <gi...@apache.org>.

dm-tran commented on issue #2020:
URL: https://github.com/apache/hudi/issues/2020#issuecomment-679481632


   Thank you for your answer @bvaradar !
   
   > Can you please add the details of "commit showfiles --commit 20200821153748"
   
   ```
   ╔═══════════════════╤════════════════════════════════════════╤═════════════════╤═══════════════════════╤═══════════════════════╤═════════════════════╤══════════════╤═══════════╗
   ║ Partition Path    │ File ID                                │ Previous Commit │ Total Records Updated │ Total Records Written │ Total Bytes Written │ Total Errors │ File Size ║
   ╠═══════════════════╪════════════════════════════════════════╪═════════════════╪═══════════════════════╪═══════════════════════╪═════════════════════╪══════════════╪═══════════╣
   ║ daas_date=2020-04 │ 63bacea1-d6af-4ce0-8dc8-6ce9db8df332-0 │ 20200821152906  │ 212                   │ 534115                │ 22998619            │ 0            │ 22998619  ║
   ╟───────────────────┼────────────────────────────────────────┼─────────────────┼───────────────────────┼───────────────────────┼─────────────────────┼──────────────┼───────────╢
   ║ daas_date=2020-04 │ 9c5e022c-feda-4059-84f6-752344cea4a9-0 │ 20200821152906  │ 89                    │ 460341                │ 18755115            │ 0            │ 18755115  ║
   ╟───────────────────┼────────────────────────────────────────┼─────────────────┼───────────────────────┼───────────────────────┼─────────────────────┼──────────────┼───────────╢
   ║ daas_date=2020-04 │ 80be527b-eda7-42f3-8565-c15e9447d731-0 │ 20200821152906  │ 39                    │ 192455                │ 9112346             │ 0            │ 9112346   ║
   ╟───────────────────┼────────────────────────────────────────┼─────────────────┼───────────────────────┼───────────────────────┼─────────────────────┼──────────────┼───────────╢
   ║ daas_date=2020-04 │ 569b2555-5cd6-416a-b7d7-11897603a1e3-0 │ 20200821152906  │ 3                     │ 483483                │ 19114286            │ 0            │ 19114286  ║
   ╟───────────────────┼────────────────────────────────────────┼─────────────────┼───────────────────────┼───────────────────────┼─────────────────────┼──────────────┼───────────╢
   ║ daas_date=2020-05 │ 80be527b-eda7-42f3-8565-c15e9447d731-1 │ 20200821152906  │ 106                   │ 302728                │ 13385764            │ 0            │ 13385764  ║
   ╟───────────────────┼────────────────────────────────────────┼─────────────────┼───────────────────────┼───────────────────────┼─────────────────────┼──────────────┼───────────╢
   ║ daas_date=2020-05 │ 27da8cb6-e4b7-4c29-904b-25d3ba321d0a-0 │ 20200821152906  │ 84                    │ 482538                │ 19568311            │ 0            │ 19568311  ║
   ╟───────────────────┼────────────────────────────────────────┼─────────────────┼───────────────────────┼───────────────────────┼─────────────────────┼──────────────┼───────────╢
   ║ daas_date=2020-05 │ 0c376059-0279-4967-8002-70c3cd9c6b8e-0 │ 20200821152906  │ 84                    │ 498131                │ 21751990            │ 0            │ 21751990  ║
   ╟───────────────────┼────────────────────────────────────────┼─────────────────┼───────────────────────┼───────────────────────┼─────────────────────┼──────────────┼───────────╢
   ║ daas_date=2020-05 │ 9730fe61-5584-4156-b25c-8c8ef41583f4-0 │ 20200821152906  │ 76                    │ 500352                │ 19812831            │ 0            │ 19812831  ║
   ╟───────────────────┼────────────────────────────────────────┼─────────────────┼───────────────────────┼───────────────────────┼─────────────────────┼──────────────┼───────────╢
   ║ daas_date=2020-05 │ c2c3fb95-3e58-4021-80c4-7e48aace8dda-0 │ 20200821152906  │ 72                    │ 484533                │ 21001957            │ 0            │ 21001957  ║
   ╟───────────────────┼────────────────────────────────────────┼─────────────────┼───────────────────────┼───────────────────────┼─────────────────────┼──────────────┼───────────╢
   ║ daas_date=2020-05 │ ec2ba5dc-7dd7-4cc7-93cd-1358476a124f-0 │ 20200821152906  │ 61                    │ 509569                │ 21960018            │ 0            │ 21960018  ║
   ╟───────────────────┼────────────────────────────────────────┼─────────────────┼───────────────────────┼───────────────────────┼─────────────────────┼──────────────┼───────────╢
   ║ daas_date=2020-05 │ bd54d7bb-2fb7-475f-8ca2-47594a1c3206-0 │ 20200821152906  │ 46                    │ 342451                │ 14678548            │ 0            │ 14678548  ║
   ╟───────────────────┼────────────────────────────────────────┼─────────────────┼───────────────────────┼───────────────────────┼─────────────────────┼──────────────┼───────────╢
   ║ daas_date=2019-05 │ a0d89a7a-0621-469a-8359-c4c4b8948ff5-1 │ 20200821152906  │ 3                     │ 445248                │ 16992382            │ 0            │ 16992382  ║
   ╟───────────────────┼────────────────────────────────────────┼─────────────────┼───────────────────────┼───────────────────────┼─────────────────────┼──────────────┼───────────╢
   ║ daas_date=2019-05 │ bc6f9f87-f16d-410b-b6fa-57abfb666920-0 │ 20200821152207  │ 1                     │ 456187                │ 17399230            │ 0            │ 17399230  ║
   ╟───────────────────┼────────────────────────────────────────┼─────────────────┼───────────────────────┼───────────────────────┼─────────────────────┼──────────────┼───────────╢
   ║ daas_date=2020-02 │ 8f6cdcc9-a0e6-4cb5-91b2-510b5498728f-0 │ 20200821145253  │ 3                     │ 500228                │ 20060642            │ 0            │ 20060642  ║
   ╟───────────────────┼────────────────────────────────────────┼─────────────────┼───────────────────────┼───────────────────────┼─────────────────────┼──────────────┼───────────╢
   ║ daas_date=2020-02 │ b14d49d0-5a5a-4f39-826c-24492428798a-0 │ 20200821145904  │ 2                     │ 318078                │ 12939981            │ 0            │ 12939981  ║
   ╟───────────────────┼────────────────────────────────────────┼─────────────────┼───────────────────────┼───────────────────────┼─────────────────────┼──────────────┼───────────╢
   ║ daas_date=2020-02 │ ddc1d386-4362-4b05-af7e-8bb4de0eecd2-0 │ 20200821152906  │ 2                     │ 485278                │ 19727682            │ 0            │ 19727682  ║
   ╟───────────────────┼────────────────────────────────────────┼─────────────────┼───────────────────────┼───────────────────────┼─────────────────────┼──────────────┼───────────╢
   ║ daas_date=2020-02 │ 48bce3e8-07b1-4122-ba68-7850a63bffaa-0 │ 20200821150425  │ 1                     │ 499951                │ 20217825            │ 0            │ 20217825  ║
   ╟───────────────────┼────────────────────────────────────────┼─────────────────┼───────────────────────┼───────────────────────┼─────────────────────┼──────────────┼───────────╢
   ║ daas_date=2019-06 │ 508cb5db-343d-4469-a563-b1718f5c6573-0 │ 20200821152207  │ 1                     │ 472946                │ 17554600            │ 0            │ 17554600  ║
   ╟───────────────────┼────────────────────────────────────────┼─────────────────┼───────────────────────┼───────────────────────┼─────────────────────┼──────────────┼───────────╢
   ║ daas_date=2019-06 │ 8ee8be98-3ffc-42de-8256-cf27d721f42f-1 │ 20200821143533  │ 1                     │ 218185                │ 8371612             │ 0            │ 8371612   ║
   ╟───────────────────┼────────────────────────────────────────┼─────────────────┼───────────────────────┼───────────────────────┼─────────────────────┼──────────────┼───────────╢
   ║ daas_date=2020-03 │ aa8afa91-df8a-4ffb-8631-bb2a89d02f08-0 │ 20200821144125  │ 2                     │ 457587                │ 18147009            │ 0            │ 18147009  ║
   ╟───────────────────┼────────────────────────────────────────┼─────────────────┼───────────────────────┼───────────────────────┼─────────────────────┼──────────────┼───────────╢
   ║ daas_date=2020-03 │ 97d32e96-fb5c-440f-9852-a1575079215c-0 │ 20200821152906  │ 1                     │ 498822                │ 19592479            │ 0            │ 19592479  ║
   ╟───────────────────┼────────────────────────────────────────┼─────────────────┼───────────────────────┼───────────────────────┼─────────────────────┼──────────────┼───────────╢
   ║ daas_date=2020-03 │ c558aa0c-a124-4bf7-b9dd-d567e4ee8113-0 │ 20200821152207  │ 1                     │ 520347                │ 20337019            │ 0            │ 20337019  ║
   ╟───────────────────┼────────────────────────────────────────┼─────────────────┼───────────────────────┼───────────────────────┼─────────────────────┼──────────────┼───────────╢
   ║ daas_date=2019-11 │ 5b286fd4-9ff2-4153-89f8-4fb7fc7ef02d-0 │ 20200821151547  │ 1                     │ 520080                │ 20243264            │ 0            │ 20243264  ║
   ╟───────────────────┼────────────────────────────────────────┼─────────────────┼───────────────────────┼───────────────────────┼─────────────────────┼──────────────┼───────────╢
   ║ daas_date=2019-11 │ 20256a1b-958d-449c-b3a1-f0ab0c453bde-0 │ 20200821152207  │ 1                     │ 467601                │ 18393947            │ 0            │ 18393947  ║
   ╟───────────────────┼────────────────────────────────────────┼─────────────────┼───────────────────────┼───────────────────────┼─────────────────────┼──────────────┼───────────╢
   ║ daas_date=2019-12 │ 762a64e7-21e3-4c8a-8e97-8cfb442e70a2-0 │ 20200821144717  │ 1                     │ 494207                │ 19713725            │ 0            │ 19713725  ║
   ╟───────────────────┼────────────────────────────────────────┼─────────────────┼───────────────────────┼───────────────────────┼─────────────────────┼──────────────┼───────────╢
   ║ daas_date=2019-12 │ 61561c4a-59b3-4f34-9cab-4c9aeb6f8bdb-0 │ 20200821152207  │ 1                     │ 496330                │ 20137687            │ 0            │ 20137687  ║
   ╟───────────────────┼────────────────────────────────────────┼─────────────────┼───────────────────────┼───────────────────────┼─────────────────────┼──────────────┼───────────╢
   ║ daas_date=2019-12 │ e3946c85-4ac3-4b4b-b43a-b2371a4b552a-0 │ 20200821151014  │ 1                     │ 469533                │ 18865890            │ 0            │ 18865890  ║
   ╟───────────────────┼────────────────────────────────────────┼─────────────────┼───────────────────────┼───────────────────────┼─────────────────────┼──────────────┼───────────╢
   ║ daas_date=2019-12 │ aaea130f-affc-40dc-b25d-bd0e6b269401-0 │ 20200821145904  │ 1                     │ 498338                │ 20196834            │ 0            │ 20196834  ║
   ╟───────────────────┼────────────────────────────────────────┼─────────────────┼───────────────────────┼───────────────────────┼─────────────────────┼──────────────┼───────────╢
   ║ daas_date=2019-10 │ 9eadc3a2-37bd-4f57-90c8-fcd33350c121-0 │ 20200821151014  │ 1                     │ 487381                │ 19804823            │ 0            │ 19804823  ║
   ╟───────────────────┼────────────────────────────────────────┼─────────────────┼───────────────────────┼───────────────────────┼─────────────────────┼──────────────┼───────────╢
   ║ daas_date=2019-10 │ 12c2581f-17b8-478f-82d0-d66f042a2846-0 │ 20200821152906  │ 1                     │ 497778                │ 19899349            │ 0            │ 19899349  ║
   ╟───────────────────┼────────────────────────────────────────┼─────────────────┼───────────────────────┼───────────────────────┼─────────────────────┼──────────────┼───────────╢
   ║ daas_date=2019-10 │ cf786b7b-863d-4bb5-b36b-9a7459d5da3e-0 │ 20200821152906  │ 1                     │ 482072                │ 19234653            │ 0            │ 19234653  ║
   ╟───────────────────┼────────────────────────────────────────┼─────────────────┼───────────────────────┼───────────────────────┼─────────────────────┼──────────────┼───────────╢
   ║ daas_date=2020-08 │ c50c0657-50bd-4d3e-9c4e-14f6b83f4a47-0 │ 20200821152906  │ 89                    │ 587055                │ 24543135            │ 0            │ 24543135  ║
   ╟───────────────────┼────────────────────────────────────────┼─────────────────┼───────────────────────┼───────────────────────┼─────────────────────┼──────────────┼───────────╢
   ║ daas_date=2020-08 │ 1aeadbe6-c52b-4e96-ade6-c5b692c7b6be-0 │ 20200821152906  │ 78                    │ 575952                │ 24546899            │ 0            │ 24546899  ║
   ╟───────────────────┼────────────────────────────────────────┼─────────────────┼───────────────────────┼───────────────────────┼─────────────────────┼──────────────┼───────────╢
   ║ daas_date=2020-08 │ 62d133ed-0231-44c3-966b-eb30b39a4dee-1 │ 20200821152906  │ 62                    │ 585495                │ 24545575            │ 0            │ 24545575  ║
   ╟───────────────────┼────────────────────────────────────────┼─────────────────┼───────────────────────┼───────────────────────┼─────────────────────┼──────────────┼───────────╢
   ║ daas_date=2020-08 │ 19c055e0-6601-4be2-abaa-f1c937cd4fa8-0 │ 20200821152814  │ 57                    │ 588315                │ 24562628            │ 0            │ 24562628  ║
   ╟───────────────────┼────────────────────────────────────────┼─────────────────┼───────────────────────┼───────────────────────┼─────────────────────┼──────────────┼───────────╢
   ║ daas_date=2020-06 │ 9db9dd32-0b88-49d3-9620-a13d25d1a7a6-0 │ 20200821152906  │ 93                    │ 500121                │ 20943670            │ 0            │ 20943670  ║
   ╟───────────────────┼────────────────────────────────────────┼─────────────────┼───────────────────────┼───────────────────────┼─────────────────────┼──────────────┼───────────╢
   ║ daas_date=2020-06 │ c48e8b31-5e78-4314-9ad6-74b38b471912-0 │ 20200821152906  │ 78                    │ 483713                │ 20575447            │ 0            │ 20575447  ║
   ╟───────────────────┼────────────────────────────────────────┼─────────────────┼───────────────────────┼───────────────────────┼─────────────────────┼──────────────┼───────────╢
   ║ daas_date=2020-06 │ cb058aae-0e88-4bbc-adf3-1a481e876200-0 │ 20200821152906  │ 74                    │ 511153                │ 21077515            │ 0            │ 21077515  ║
   ╟───────────────────┼────────────────────────────────────────┼─────────────────┼───────────────────────┼───────────────────────┼─────────────────────┼──────────────┼───────────╢
   ║ daas_date=2020-06 │ 5e3f4be1-e7d6-4608-8622-7a9284d2dd0e-0 │ 20200821152906  │ 62                    │ 472906                │ 19983800            │ 0            │ 19983800  ║
   ╟───────────────────┼────────────────────────────────────────┼─────────────────┼───────────────────────┼───────────────────────┼─────────────────────┼──────────────┼───────────╢
   ║ daas_date=2020-06 │ 26ace4dc-4e9c-4d3d-94be-d01964462fca-0 │ 20200821152906  │ 53                    │ 500369                │ 20743099            │ 0            │ 20743099  ║
   ╟───────────────────┼────────────────────────────────────────┼─────────────────┼───────────────────────┼───────────────────────┼─────────────────────┼──────────────┼───────────╢
   ║ daas_date=2020-06 │ 06046bb8-dc24-44a5-95cd-3a5c0aa9a904-0 │ 20200821152906  │ 43                    │ 237650                │ 10979197            │ 0            │ 10979197  ║
   ╟───────────────────┼────────────────────────────────────────┼─────────────────┼───────────────────────┼───────────────────────┼─────────────────────┼──────────────┼───────────╢
   ║ daas_date=2020-06 │ bd54d7bb-2fb7-475f-8ca2-47594a1c3206-1 │ 20200821152906  │ 13                    │ 133180                │ 6523322             │ 0            │ 6523322   ║
   ╟───────────────────┼────────────────────────────────────────┼─────────────────┼───────────────────────┼───────────────────────┼─────────────────────┼──────────────┼───────────╢
   ║ daas_date=2020-07 │ b1d0acc3-0e72-4798-8b26-1c93d9a8a3a9-0 │ 20200821152906  │ 60                    │ 499450                │ 21280641            │ 0            │ 21280641  ║
   ╟───────────────────┼────────────────────────────────────────┼─────────────────┼───────────────────────┼───────────────────────┼─────────────────────┼──────────────┼───────────╢
   ║ daas_date=2020-07 │ 96e33f65-8ed9-4198-8141-fd6b4211c58e-0 │ 20200821152906  │ 58                    │ 505448                │ 21342125            │ 0            │ 21342125  ║
   ╟───────────────────┼────────────────────────────────────────┼─────────────────┼───────────────────────┼───────────────────────┼─────────────────────┼──────────────┼───────────╢
   ║ daas_date=2020-07 │ 541ef666-271d-4b34-ac84-a023fae33338-0 │ 20200821152906  │ 56                    │ 500421                │ 21352889            │ 0            │ 21352889  ║
   ╟───────────────────┼────────────────────────────────────────┼─────────────────┼───────────────────────┼───────────────────────┼─────────────────────┼──────────────┼───────────╢
   ║ daas_date=2020-07 │ 372c42b0-612c-4644-8ebe-baec3ce18192-0 │ 20200821152906  │ 49                    │ 487464                │ 20588883            │ 0            │ 20588883  ║
   ╟───────────────────┼────────────────────────────────────────┼─────────────────┼───────────────────────┼───────────────────────┼─────────────────────┼──────────────┼───────────╢
   ║ daas_date=2020-07 │ f1f2c008-4d29-48e9-a6af-ecbe42f1753e-0 │ 20200821152906  │ 48                    │ 500438                │ 21573335            │ 0            │ 21573335  ║
   ╟───────────────────┼────────────────────────────────────────┼─────────────────┼───────────────────────┼───────────────────────┼─────────────────────┼──────────────┼───────────╢
   ║ daas_date=2020-07 │ f1d99d5c-6f2f-446b-9389-5be5987896c8-0 │ 20200821152906  │ 47                    │ 494778                │ 20824372            │ 0            │ 20824372  ║
   ╟───────────────────┼────────────────────────────────────────┼─────────────────┼───────────────────────┼───────────────────────┼─────────────────────┼──────────────┼───────────╢
   ║ daas_date=2020-07 │ 6e89f5b2-e3c6-4bdf-9bee-fc2f08f38624-0 │ 20200821152906  │ 47                    │ 493837                │ 20219635            │ 0            │ 20219635  ║
   ╟───────────────────┼────────────────────────────────────────┼─────────────────┼───────────────────────┼───────────────────────┼─────────────────────┼──────────────┼───────────╢
   ║ daas_date=2020-07 │ 37b67db6-d444-45b4-948d-ffc3d96a122f-0 │ 20200821152906  │ 46                    │ 487084                │ 20394802            │ 0            │ 20394802  ║
   ╟───────────────────┼────────────────────────────────────────┼─────────────────┼───────────────────────┼───────────────────────┼─────────────────────┼──────────────┼───────────╢
   ║ daas_date=2020-07 │ afb52cc2-5a34-4db0-854f-292fda6fc8da-0 │ 20200821152906  │ 37                    │ 495374                │ 21544700            │ 0            │ 21544700  ║
   ╟───────────────────┼────────────────────────────────────────┼─────────────────┼───────────────────────┼───────────────────────┼─────────────────────┼──────────────┼───────────╢
   ║ daas_date=2020-07 │ e22586c6-0890-4f01-9862-81b0f59d1195-0 │ 20200821152906  │ 34                    │ 476130                │ 20039658            │ 0            │ 20039658  ║
   ╟───────────────────┼────────────────────────────────────────┼─────────────────┼───────────────────────┼───────────────────────┼─────────────────────┼──────────────┼───────────╢
   ║ daas_date=2020-07 │ 06046bb8-dc24-44a5-95cd-3a5c0aa9a904-1 │ 20200821152906  │ 12                    │ 252056                │ 11086635            │ 0            │ 11086635  ║
   ╟───────────────────┼────────────────────────────────────────┼─────────────────┼───────────────────────┼───────────────────────┼─────────────────────┼──────────────┼───────────╢
   ║ daas_date=2020-07 │ 62d133ed-0231-44c3-966b-eb30b39a4dee-0 │ 20200821152906  │ 5                     │ 33545                 │ 1932342             │ 0            │ 1932342   ║
   ╟───────────────────┼────────────────────────────────────────┼─────────────────┼───────────────────────┼───────────────────────┼─────────────────────┼──────────────┼───────────╢
   ║ daas_date=2019-09 │ 8721bc6c-1eef-497b-ad3f-a6f9b1a1656c-0 │ 20200821142335  │ 2                     │ 529694                │ 20268183            │ 0            │ 20268183  ║
   ╟───────────────────┼────────────────────────────────────────┼─────────────────┼───────────────────────┼───────────────────────┼─────────────────────┼──────────────┼───────────╢
   ║ daas_date=2019-09 │ d4f2c861-6493-4185-9c1c-5f41b60abf15-1 │ 20200821142335  │ 1                     │ 256409                │ 10268763            │ 0            │ 10268763  ║
   ╚═══════════════════╧════════════════════════════════════════╧═════════════════╧═══════════════════════╧═══════════════════════╧═════════════════════╧══════════════╧═══════════╝
   ```
   
   > Are you running with consistency check enabled ?
   
   No, `hoodie.consistency.check.enabled` wasn't set. I will try to run the structured streaming job with `hoodie.consistency.check.enabled = true`.
   
   > Can you also check if the file is actually absent by listing the folder s3://myBucket/absolute_path_to/daas_date=2020-05/
   
   Yes, the file is actually absent.
   
   > Also, paste the output of listing in this issue.
   
   Parquet files with fileId "0c376059-0279-4967-8002-70c3cd9c6b8e-0":
   ```
   s3://myBucket/absolute_path_to/daas_date=2020-05/0c376059-0279-4967-8002-70c3cd9c6b8e-0_10-3360-221478_20200821152906.parquet
   s3://myBucket/absolute_path_to/daas_date=2020-05/0c376059-0279-4967-8002-70c3cd9c6b8e-0_11-2909-192474_20200821142335.parquet
   s3://myBucket/absolute_path_to/daas_date=2020-05/0c376059-0279-4967-8002-70c3cd9c6b8e-0_13-3032-200435_20200821144125.parquet
   s3://myBucket/absolute_path_to/daas_date=2020-05/0c376059-0279-4967-8002-70c3cd9c6b8e-0_2-3073-203081_20200821144717.parquet
   s3://myBucket/absolute_path_to/daas_date=2020-05/0c376059-0279-4967-8002-70c3cd9c6b8e-0_3-2581-171166_20200821133908.parquet
   s3://myBucket/absolute_path_to/daas_date=2020-05/0c376059-0279-4967-8002-70c3cd9c6b8e-0_3-3114-205741_20200821145253.parquet
   s3://myBucket/absolute_path_to/daas_date=2020-05/0c376059-0279-4967-8002-70c3cd9c6b8e-0_4-2950-195120_20200821142949.parquet
   s3://myBucket/absolute_path_to/daas_date=2020-05/0c376059-0279-4967-8002-70c3cd9c6b8e-0_4-3155-208347_20200821145904.parquet
   s3://myBucket/absolute_path_to/daas_date=2020-05/0c376059-0279-4967-8002-70c3cd9c6b8e-0_4-3237-213604_20200821151014.parquet
   s3://myBucket/absolute_path_to/daas_date=2020-05/0c376059-0279-4967-8002-70c3cd9c6b8e-0_5-2786-184435_20200821140554.parquet
   s3://myBucket/absolute_path_to/daas_date=2020-05/0c376059-0279-4967-8002-70c3cd9c6b8e-0_5-2827-187104_20200821141202.parquet
   s3://myBucket/absolute_path_to/daas_date=2020-05/0c376059-0279-4967-8002-70c3cd9c6b8e-0_5-2991-197774_20200821143533.parquet
   s3://myBucket/absolute_path_to/daas_date=2020-05/0c376059-0279-4967-8002-70c3cd9c6b8e-0_6-3196-210983_20200821150425.parquet
   s3://myBucket/absolute_path_to/daas_date=2020-05/0c376059-0279-4967-8002-70c3cd9c6b8e-0_6-3278-216229_20200821151547.parquet
   s3://myBucket/absolute_path_to/daas_date=2020-05/0c376059-0279-4967-8002-70c3cd9c6b8e-0_6-39-2575_20200821153748.parquet
   s3://myBucket/absolute_path_to/daas_date=2020-05/0c376059-0279-4967-8002-70c3cd9c6b8e-0_7-2745-181775_20200821140025.parquet
   s3://myBucket/absolute_path_to/daas_date=2020-05/0c376059-0279-4967-8002-70c3cd9c6b8e-0_7-39-2576_20200821154520.parquet
   s3://myBucket/absolute_path_to/daas_date=2020-05/0c376059-0279-4967-8002-70c3cd9c6b8e-0_7-39-2578_20200821154520.parquet
   s3://myBucket/absolute_path_to/daas_date=2020-05/0c376059-0279-4967-8002-70c3cd9c6b8e-0_7-39-2580_20200821154520.parquet
   s3://myBucket/absolute_path_to/daas_date=2020-05/0c376059-0279-4967-8002-70c3cd9c6b8e-0_7-39-2582_20200821154520.parquet
   s3://myBucket/absolute_path_to/daas_date=2020-05/0c376059-0279-4967-8002-70c3cd9c6b8e-0_8-3319-218872_20200821152207.parquet
   s3://myBucket/absolute_path_to/daas_date=2020-05/0c376059-0279-4967-8002-70c3cd9c6b8e-0_9-2540-168511_20200821133319.parquet
   ```
   
   There are around 5000 files, so I attached a text file, which contains the result of `s4cmd ls s3://myBucket/absolute_path_to/daas_date=2020-05` (this folder is actually a copy of the original s3 folder, so the date and time of each file are not the original ones).
   
   [2020-05_files.txt](https://github.com/apache/hudi/files/5121126/2020-05_files.txt)
   
   There are lots of files, because of the following process:
   1. the structured streaming job reads messages from Kafka and saves log files to s3
   2. the compaction which previously failed is retried but fails, and the structured streaming job fails
   3. the structured streaming job is re-launched by an external process: steps 1 and 2 are repeated (step 1 keeps adding log files)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] dm-tran commented on issue #2020: [SUPPORT] Compaction fails with "java.io.FileNotFoundException"

Posted by GitBox <gi...@apache.org>.

dm-tran commented on issue #2020:
URL: https://github.com/apache/hudi/issues/2020#issuecomment-683301240


   @bvaradar Thanks for your answer. I will look into this, but I am not aware of a concurrent process that would run compactions. I am only running a spark structured streaming job with `"hoodie.compact.inline" -> "true"` and `"hoodie.compact.inline.max.delta.commits" -> "1"`. (I do run several structured streaming jobs, but each of them processes a different source and writes to a different folder)
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org