You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2020/07/01 17:46:16 UTC

[GitHub] [hudi] zuyanton opened a new issue #1780: [SUPPORT]IllegalStateException: Hudi File Id has more than 1 pending compactions. MoR. Compaction inline.

zuyanton opened a new issue #1780:
URL: https://github.com/apache/hudi/issues/1780


   
   We are having an issue when running simple count query on our hudi table via hive. the error is Hudi File Id has more then one pending compactions. The table is MoR , compaction gets executed in line, table persisted to S3 , consistency check is turned on.   
   
   Error  does not make sense to me as it suggest that there are two pending compactions - 20200701015658 and 20200630235744 (see stack trace bellow). However all compaction are running in line and hence there should be no case of two compactions being pending as well as logs note that both compactions finished successfully   
   ```20/07/01 00:13:14 INFO HoodieWriteClient: Compacted successfully on commit 20200630235744```    
   and  ```hudi-cli compactions show all``` suggests the same :  
   ```
   ╔═════════════════════════╤═══════════╤═══════════════════════════════╗
   ║ Compaction Instant Time │ State     │ Total FileIds to be Compacted ║
   ╠═════════════════════════╪═══════════╪═══════════════════════════════╣
   ║ 20200701015658          │ COMPLETED │ 38                            ║
   ╟─────────────────────────┼───────────┼───────────────────────────────╢
   ║ 20200701015658          │ COMPLETED │ 38                            ║
   ╟─────────────────────────┼───────────┼───────────────────────────────╢
   ║ 20200701015658          │ COMPLETED │ 38                            ║
   ╟─────────────────────────┼───────────┼───────────────────────────────╢
   ║ 20200630235744          │ COMPLETED │ 30                            ║
   ╟─────────────────────────┼───────────┼───────────────────────────────╢
   ║ 20200630235744          │ COMPLETED │ 30                            ║
   ╟─────────────────────────┼───────────┼───────────────────────────────╢
   ║ 20200630235744          │ COMPLETED │ 30                            ║
   ╚═════════════════════════╧═══════════╧═══════════════════════════════╝
   
   ```
   when checking content of .hoodie folder, I can see that all three files for each compaction (*.compaction.requested ,*.compaction.infligh, *.commit) are present. It's seems like CompactionUtils.getAllPendingCompactionOperations possibly wrongly identifies "pending" compactions.
   **Environment Description**
   
   * Hudi version : 0.5.3 
   
   * Spark version : 2.4.4 
   
   * Hive version : 2.3.6
   
   * Hadoop version : 2.8.5
   
   * Storage (HDFS/S3/GCS..) : S3
   
   * Running on Docker? (yes/no) : no 
   
   
   **Stacktrace**
   
   ```Status: Failed
   Vertex failed, vertexName=Map 1, vertexId=vertex_1592430479775_0691_2_00, diagnostics=[Vertex vertex_1592430479775_0691_2_00 [Map 1] killed/failed due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: ofa_gl_je_lines_100_ro initializer failed, vertex=vertex_1592430479775_0691_2_00 [Map 1], java.lang.IllegalStateException: Hudi File Id (HoodieFileGroupId{partitionPath='61', fileId='f071cf58-8601-4ecd-b2da-80e5b0a92d47-3'}) has more than 1 pending compactions. Instants: (20200701015658,{"baseInstantTime": "20200630235744", "deltaFilePaths": [".f071cf58-8601-4ecd-b2da-80e5b0a92d47-3_20200630235744.log.1_1-22-9631"], "dataFilePath": "f071cf58-8601-4ecd-b2da-80e5b0a92d47-3_14-30-9419_20200630235744.parquet", "fileId": "f071cf58-8601-4ecd-b2da-80e5b0a92d47-3", "partitionPath": "61", "metrics": {"TOTAL_LOG_FILES": 1.0, "TOTAL_IO_READ_MB": 111.0, "TOTAL_LOG_FILES_SIZE": 7.3045072E7, "TOTAL_IO_WRITE_MB": 42.0, "TOTAL_IO_MB": 153.0, "TOTAL_LOG_FILE_SIZE": 7.3045072E7}}), (20200630235744,{"baseInstantTime": "20200630115655", "deltaFilePaths": [".f071cf58-8601-4ecd-b2da-80e5b0a92d47-3_20200630115655.log.1_1-22-9569"], "dataFilePath": "f071cf58-8601-4ecd-b2da-80e5b0a92d47-3_26-30-9435_20200630115655.parquet", "fileId": "f071cf58-8601-4ecd-b2da-80e5b0a92d47-3", "partitionPath": "61", "metrics": {"TOTAL_LOG_FILES": 1.0, "TOTAL_IO_READ_MB": 44.0, "TOTAL_LOG_FILES_SIZE": 2116823.0, "TOTAL_IO_WRITE_MB": 42.0, "TOTAL_IO_MB": 86.0, "TOTAL_LOG_FILE_SIZE": 2116823.0}})
   	at org.apache.hudi.common.util.CompactionUtils.lambda$getAllPendingCompactionOperations$5(CompactionUtils.java:161)
   	at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
   	at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
   	at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
   	at java.util.Iterator.forEachRemaining(Iterator.java:116)
   	at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
   	at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
   	at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
   	at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
   	at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
   	at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
   	at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:485)
   	at java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:272)
   	at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
   	at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
   	at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
   	at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
   	at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
   	at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
   	at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:485)
   	at org.apache.hudi.common.util.CompactionUtils.getAllPendingCompactionOperations(CompactionUtils.java:149)
   	at org.apache.hudi.common.table.view.AbstractTableFileSystemView.init(AbstractTableFileSystemView.java:95)
   	at org.apache.hudi.common.table.view.HoodieTableFileSystemView.init(HoodieTableFileSystemView.java:87)
   	at org.apache.hudi.common.table.view.HoodieTableFileSystemView.<init>(HoodieTableFileSystemView.java:81)
   	at org.apache.hudi.common.table.view.HoodieTableFileSystemView.<init>(HoodieTableFileSystemView.java:72)
   	at org.apache.hudi.common.table.view.HoodieTableFileSystemView.<init>(HoodieTableFileSystemView.java:110)
   	at org.apache.hudi.hadoop.HoodieParquetInputFormat.listStatus(HoodieParquetInputFormat.java:89)
   	at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:288)
   	at org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:442)
   	at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:561)
   	at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:196)
   	at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:278)
   	at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:269)
   	at java.security.AccessController.doPrivileged(Native Method)
   	at javax.security.auth.Subject.doAs(Subject.java:422)
   	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)
   	at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:269)
   	at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:253)
   	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
   	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
   	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
   	at java.lang.Thread.run(Thread.java:748)```
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] vinothchandar commented on issue #1780: [SUPPORT]IllegalStateException: Hudi File Id has more than 1 pending compactions. MoR. Compaction inline.

Posted by GitBox <gi...@apache.org>.

vinothchandar commented on issue #1780:
URL: https://github.com/apache/hudi/issues/1780#issuecomment-655194540

@a-uddhav are you facing the same issue as well?

@zuyanton by any chance you upgraded the writers before dropping new jars for queries? I think the issue is that

https://hudi.apache.org/releases.html#release-051-incubating-docs
```
With 0.5.1, we added functionality to stop using renames for Hudi timeline metadata operations. This feature is automatically enabled for newly created Hudi tables. For existing tables, this feature is turned off by default. Please read this section, before enabling this feature for existing hudi tables. To enable the new hudi timeline layout which avoids renames, use the write config “hoodie.timeline.layout.version=1”. Alternatively, you can use “repair overwrite-hoodie-props” to append the line “hoodie.timeline.layout.version=1” to hoodie.properties. Note that in any case, upgrade hudi readers (query engines) first with 0.5.1-incubating release before upgrading writer.
```

this is what seems relevant to me.. If you are spinning up EMR cluster, these jars should be same? or may be you are getting the 0.5.0 jar from EMR's installation?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] a-uddhav commented on issue #1780: [SUPPORT]IllegalStateException: Hudi File Id has more than 1 pending compactions. MoR. Compaction inline.

Posted by GitBox <gi...@apache.org>.

a-uddhav commented on issue #1780:
URL: https://github.com/apache/hudi/issues/1780#issuecomment-653364426


   @zuyanton 
   Any update on this?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] zuyanton closed issue #1780: [SUPPORT]IllegalStateException: Hudi File Id has more than 1 pending compactions. MoR. Compaction inline.

Posted by GitBox <gi...@apache.org>.

zuyanton closed issue #1780:
URL: https://github.com/apache/hudi/issues/1780


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] zuyanton commented on issue #1780: [SUPPORT]IllegalStateException: Hudi File Id has more than 1 pending compactions. MoR. Compaction inline.

Posted by GitBox <gi...@apache.org>.

zuyanton commented on issue #1780:
URL: https://github.com/apache/hudi/issues/1780#issuecomment-656807404


   @vinothchandar  Thank you for update. you are correct. Writes were done with 0.5.3 while hive still pointing to 0.5.0


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org