You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/01/18 09:56:07 UTC

[GitHub] [hudi] Guanpx opened a new issue #4510: [SUPPORT] Impala query error

Guanpx opened a new issue #4510:
URL: https://github.com/apache/hudi/issues/4510


   
   **Describe the problem you faced**
   
   A clear and concise description of the problem.
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1. hudi sync hive
   2. CREATE EXTERNAL IMPALA TABLE (https://hudi.apache.org/docs/querying_data/#impala-34-or-later)  
   3. select from impala table or REFRESH table
   4. impala error and query without data
   
   **Expected behavior**
   can not query impala table
   
   **Environment Description**
   
   * Hudi version : 0.10.0, MOR
   
   * Hive version : 2.1
   
   * Hadoop version : 3.0
   
   * Storage (HDFS/S3/GCS..) : HDFS
   
   * Running on Docker? (yes/no) : no
   
   
   **Additional context**
   
   * Impala version : 3.4.0
   
   **Stacktrace**
   
   ```
   I0104 18:06:19.961302 1557231 HoodieTableMetaClient.java:93] Loading HoodieTableMetaClient from hdfs://pre-cdh01:8020/hudi/rd/app_columns
   I0104 18:06:19.964633 1557231 FSUtils.java:100] Hadoop Configuration: fs.defaultFS: [hdfs://pre-cdh01:8020], Config:[Configuration: core-default.xml, core-site.xml, hdfs-default.xml, hdfs-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_-533850282_1, ugi=impala (auth:SIMPLE)]]]
   I0104 18:06:19.969547 1557231 HoodieTableConfig.java:68] Loading dataset properties from hdfs://pre-cdh01:8020/hudi/rd/app_columns/.hoodie/hoodie.properties
   I0104 18:06:19.974251 1557231 HoodieTableMetaClient.java:104] Finished Loading Table of type MERGE_ON_READ from hdfs://pre-cdh01:8020/hudi/rd/app_columns
   I0104 18:06:19.978808 1557231 HoodieActiveTimeline.java:82] Loaded instants java.util.stream.ReferencePipeline$Head@5d12f34a
   E0104 18:06:20.005887 1557231 HoodieROTablePathFilter.java:176] Error checking path :hdfs://pre-cdh01:8020/hudi/rd/app_columns/.1adb0953-af23-48d6-9bf2-acb72716060b_20220104164400776.log.1_0-2-0, under folder: hdfs://pre-cdh01:8020/hudi/rd/app_columns
   Java exception follows:
   java.lang.IllegalStateException: Hudi File Id (HoodieFileGroupId{partitionPath='', fileId='1adb0953-af23-48d6-9bf2-acb72716060b'}) has more than 1 pending compactions. Instants: (20220104170836577,{"baseInstantTime": "20220104165637271", "deltaFilePaths": [".1adb0953-af23-48d6-9bf2-acb72716060b_20220104165637271.log.1_0-2-0"], "dataFilePath": "1adb0953-af23-48d6-9bf2-acb72716060b_1-2-0_20220104165637271.parquet", "fileId": "1adb0953-af23-48d6-9bf2-acb72716060b", "partitionPath": "", "metrics": {"TOTAL_LOG_FILES": 1.0, "TOTAL_IO_READ_MB": 1.0, "TOTAL_LOG_FILES_SIZE": 729214.0, "TOTAL_IO_WRITE_MB": 0.0, "TOTAL_IO_MB": 1.0}}), (20220104165637271,{"baseInstantTime": "20220104164400776", "deltaFilePaths": [".1adb0953-af23-48d6-9bf2-acb72716060b_20220104164400776.log.1_0-2-0"], "dataFilePath": null, "fileId": "1adb0953-af23-48d6-9bf2-acb72716060b", "partitionPath": "", "metrics": {"TOTAL_LOG_FILES": 1.0, "TOTAL_IO_READ_MB": 0.0, "TOTAL_LOG_FILES_SIZE": 8143.0, "TOTAL_IO_WRITE_MB": 120.0
 , "TOTAL_IO_MB": 120.0}})
   	at org.apache.hudi.common.util.CompactionUtils.lambda$getAllPendingCompactionOperations$5(CompactionUtils.java:161)
   	at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
   	at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
   	at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
   	at java.util.Iterator.forEachRemaining(Iterator.java:116)
   	at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
   	at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
   	at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
   	at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
   	at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
   	at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
   	at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)
   	at java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:270)
   	at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
   	at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
   	at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
   	at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
   	at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
   	at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
   	at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)
   	at org.apache.hudi.common.util.CompactionUtils.getAllPendingCompactionOperations(CompactionUtils.java:149)
   	at org.apache.hudi.common.table.view.AbstractTableFileSystemView.init(AbstractTableFileSystemView.java:95)
   	at org.apache.hudi.common.table.view.HoodieTableFileSystemView.init(HoodieTableFileSystemView.java:87)
   	at org.apache.hudi.common.table.view.HoodieTableFileSystemView.<init>(HoodieTableFileSystemView.java:81)
   	at org.apache.hudi.common.table.view.HoodieTableFileSystemView.<init>(HoodieTableFileSystemView.java:72)
   	at org.apache.hudi.common.table.view.HoodieTableFileSystemView.<init>(HoodieTableFileSystemView.java:110)
   	at org.apache.hudi.hadoop.HoodieROTablePathFilter.accept(HoodieROTablePathFilter.java:140)
   	at org.apache.impala.util.HudiUtil.lambda$filterFilesForHudiROPath$0(HudiUtil.java:35)
   	at java.util.ArrayList.removeIf(ArrayList.java:1413)
   	at org.apache.impala.util.HudiUtil.filterFilesForHudiROPath(HudiUtil.java:35)
   	at org.apache.impala.catalog.FileMetadataLoader.load(FileMetadataLoader.java:198)
   	at org.apache.impala.catalog.ParallelFileMetadataLoader.lambda$load$0(ParallelFileMetadataLoader.java:93)
   	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
   	at com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:293)
   	at com.google.common.util.concurrent.AbstractListeningExecutorService.submit(AbstractListeningExecutorService.java:61)
   	at com.google.common.util.concurrent.AbstractListeningExecutorService.submit(AbstractListeningExecutorService.java:45)
   	at org.apache.impala.catalog.ParallelFileMetadataLoader.load(ParallelFileMetadataLoader.java:93)
   	at org.apache.impala.catalog.HdfsTable.loadFileMetadataForPartitions(HdfsTable.java:652)
   	at org.apache.impala.catalog.HdfsTable.loadAllPartitions(HdfsTable.java:573)
   	at org.apache.impala.catalog.HdfsTable.load(HdfsTable.java:1021)
   	at org.apache.impala.catalog.HdfsTable.load(HdfsTable.java:942)
   	at org.apache.impala.catalog.TableLoader.load(TableLoader.java:86)
   	at org.apache.impala.catalog.TableLoadingMgr$2.call(TableLoadingMgr.java:244)
   	at org.apache.impala.catalog.TableLoadingMgr$2.call(TableLoadingMgr.java:241)
   	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
   	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
   	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
   	at java.lang.Thread.run(Thread.java:748)
   E0104 18:06:20.007413 1557231 ParallelFileMetadataLoader.java:102] Loading file and block metadata for 1 paths for table default.hudi_app_columns encountered an error loading data for path hdfs://pre-cdh01:8020/hudi/rd/app_columns
   Java exception follows:
   java.util.concurrent.ExecutionException: org.apache.hudi.exception.HoodieException: Error checking path :hdfs://pre-cdh01:8020/hudi/rd/app_columns/.1adb0953-af23-48d6-9bf2-acb72716060b_20220104164400776.log.1_0-2-0, under folder: hdfs://pre-cdh01:8020/hudi/rd/app_columns
   	at java.util.concurrent.FutureTask.report(FutureTask.java:122)
   	at java.util.concurrent.FutureTask.get(FutureTask.java:192)
   	at org.apache.impala.catalog.ParallelFileMetadataLoader.load(ParallelFileMetadataLoader.java:99)
   	at org.apache.impala.catalog.HdfsTable.loadFileMetadataForPartitions(HdfsTable.java:652)
   	at org.apache.impala.catalog.HdfsTable.loadAllPartitions(HdfsTable.java:573)
   	at org.apache.impala.catalog.HdfsTable.load(HdfsTable.java:1021)
   	at org.apache.impala.catalog.HdfsTable.load(HdfsTable.java:942)
   	at org.apache.impala.catalog.TableLoader.load(TableLoader.java:86)
   	at org.apache.impala.catalog.TableLoadingMgr$2.call(TableLoadingMgr.java:244)
   	at org.apache.impala.catalog.TableLoadingMgr$2.call(TableLoadingMgr.java:241)
   	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
   	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
   	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
   	at java.lang.Thread.run(Thread.java:748)
   Caused by: org.apache.hudi.exception.HoodieException: Error checking path :hdfs://pre-cdh01:8020/hudi/rd/app_columns/.1adb0953-af23-48d6-9bf2-acb72716060b_20220104164400776.log.1_0-2-0, under folder: hdfs://pre-cdh01:8020/hudi/rd/app_columns
   	at org.apache.hudi.hadoop.HoodieROTablePathFilter.accept(HoodieROTablePathFilter.java:177)
   	at org.apache.impala.util.HudiUtil.lambda$filterFilesForHudiROPath$0(HudiUtil.java:35)
   	at java.util.ArrayList.removeIf(ArrayList.java:1413)
   	at org.apache.impala.util.HudiUtil.filterFilesForHudiROPath(HudiUtil.java:35)
   	at org.apache.impala.catalog.FileMetadataLoader.load(FileMetadataLoader.java:198)
   	at org.apache.impala.catalog.ParallelFileMetadataLoader.lambda$load$0(ParallelFileMetadataLoader.java:93)
   	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
   	at com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:293)
   	at com.google.common.util.concurrent.AbstractListeningExecutorService.submit(AbstractListeningExecutorService.java:61)
   	at com.google.common.util.concurrent.AbstractListeningExecutorService.submit(AbstractListeningExecutorService.java:45)
   	at org.apache.impala.catalog.ParallelFileMetadataLoader.load(ParallelFileMetadataLoader.java:93)
   	... 11 more
   Caused by: java.lang.IllegalStateException: Hudi File Id (HoodieFileGroupId{partitionPath='', fileId='1adb0953-af23-48d6-9bf2-acb72716060b'}) has more than 1 pending compactions. Instants: (20220104170836577,{"baseInstantTime": "20220104165637271", "deltaFilePaths": [".1adb0953-af23-48d6-9bf2-acb72716060b_20220104165637271.log.1_0-2-0"], "dataFilePath": "1adb0953-af23-48d6-9bf2-acb72716060b_1-2-0_20220104165637271.parquet", "fileId": "1adb0953-af23-48d6-9bf2-acb72716060b", "partitionPath": "", "metrics": {"TOTAL_LOG_FILES": 1.0, "TOTAL_IO_READ_MB": 1.0, "TOTAL_LOG_FILES_SIZE": 729214.0, "TOTAL_IO_WRITE_MB": 0.0, "TOTAL_IO_MB": 1.0}}), (20220104165637271,{"baseInstantTime": "20220104164400776", "deltaFilePaths": [".1adb0953-af23-48d6-9bf2-acb72716060b_20220104164400776.log.1_0-2-0"], "dataFilePath": null, "fileId": "1adb0953-af23-48d6-9bf2-acb72716060b", "partitionPath": "", "metrics": {"TOTAL_LOG_FILES": 1.0, "TOTAL_IO_READ_MB": 0.0, "TOTAL_LOG_FILES_SIZE": 8143.0, "TOTAL_IO_WRITE
 _MB": 120.0, "TOTAL_IO_MB": 120.0}})
   	at org.apache.hudi.common.util.CompactionUtils.lambda$getAllPendingCompactionOperations$5(CompactionUtils.java:161)
   	at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
   	at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
   	at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
   	at java.util.Iterator.forEachRemaining(Iterator.java:116)
   	at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
   	at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
   	at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
   	at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
   	at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
   	at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
   	at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)
   	at java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:270)
   	at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
   	at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
   	at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
   	at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
   	at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
   	at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
   	at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)
   	at org.apache.hudi.common.util.CompactionUtils.getAllPendingCompactionOperations(CompactionUtils.java:149)
   	at org.apache.hudi.common.table.view.AbstractTableFileSystemView.init(AbstractTableFileSystemView.java:95)
   	at org.apache.hudi.common.table.view.HoodieTableFileSystemView.init(HoodieTableFileSystemView.java:87)
   	at org.apache.hudi.common.table.view.HoodieTableFileSystemView.<init>(HoodieTableFileSystemView.java:81)
   	at org.apache.hudi.common.table.view.HoodieTableFileSystemView.<init>(HoodieTableFileSystemView.java:72)
   	at org.apache.hudi.common.table.view.HoodieTableFileSystemView.<init>(HoodieTableFileSystemView.java:110)
   	at org.apache.hudi.hadoop.HoodieROTablePathFilter.accept(HoodieROTablePathFilter.java:140)
   	... 21 more
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on issue #4510: [SUPPORT] Impala query error

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #4510:
URL: https://github.com/apache/hudi/issues/4510#issuecomment-1015327025


   WE already have a tracking jira to support MOR table type in Impala. If you are interested in working towards it, feel free to grab the jira and we can help with reviews if need be. Closing the github issue for now. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] Guanpx closed issue #4510: [SUPPORT] Impala query error

Posted by GitBox <gi...@apache.org>.
Guanpx closed issue #4510:
URL: https://github.com/apache/hudi/issues/4510


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] Guanpx commented on issue #4510: [SUPPORT] Impala query error

Posted by GitBox <gi...@apache.org>.
Guanpx commented on issue #4510:
URL: https://github.com/apache/hudi/issues/4510#issuecomment-1015177881


   > @Guanpx : I don't have exp w/ impala. But was MOR querying working from impala for older versions of hudi and failing with 0.10.0 ?
   
   I think MOR does not work in any older versions, that hudi version is 0.5.0-incubating in Impala, and this is commit https://github.com/apache/impala/commit/ea0e1def6160d596082b01365fcbbb6e24afb21d , cc @garyli1019 
   and this is version in impala: https://github.com/apache/impala/blob/master/bin/impala-config.sh#L204
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan closed issue #4510: [SUPPORT] Impala query error

Posted by GitBox <gi...@apache.org>.
nsivabalan closed issue #4510:
URL: https://github.com/apache/hudi/issues/4510


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] Guanpx commented on issue #4510: [SUPPORT] Impala query error

Posted by GitBox <gi...@apache.org>.
Guanpx commented on issue #4510:
URL: https://github.com/apache/hudi/issues/4510#issuecomment-1016039422


   thx, btw I find that jira here: https://issues.apache.org/jira/browse/HUDI-610


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on issue #4510: [SUPPORT] Impala query error

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #4510:
URL: https://github.com/apache/hudi/issues/4510#issuecomment-1014917684


   @Guanpx : I don't have exp w/ impala. But was MOR querying working from impala for older versions of hudi and failing with 0.10.0 ? 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] Guanpx edited a comment on issue #4510: [SUPPORT] Impala query error

Posted by GitBox <gi...@apache.org>.
Guanpx edited a comment on issue #4510:
URL: https://github.com/apache/hudi/issues/4510#issuecomment-1005333590


   **HDFS files and Compaction status**
   
   ![image](https://user-images.githubusercontent.com/29246713/148152279-9eaad5fb-b45a-4c73-ab9b-4982d1b2beb4.png)
   ![image](https://user-images.githubusercontent.com/29246713/148152295-db4acd42-5405-4f5f-ab02-9591abac2797.png)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] Guanpx commented on issue #4510: [SUPPORT] Impala query error

Posted by GitBox <gi...@apache.org>.
Guanpx commented on issue #4510:
URL: https://github.com/apache/hudi/issues/4510#issuecomment-1005333590


   ![image](https://user-images.githubusercontent.com/29246713/148152279-9eaad5fb-b45a-4c73-ab9b-4982d1b2beb4.png)![image](https://user-images.githubusercontent.com/29246713/148152295-db4acd42-5405-4f5f-ab02-9591abac2797.png)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] Guanpx commented on issue #4510: [SUPPORT] Impala query error

Posted by GitBox <gi...@apache.org>.
Guanpx commented on issue #4510:
URL: https://github.com/apache/hudi/issues/4510#issuecomment-1005520954


   In hudi-MOR table type, HDFS path have some log file, that leads to impala read error; 
   
   **use COW will be fine**  because that HDFS path only have parquet file 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org