You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2020/10/05 13:53:43 UTC

[GitHub] [hudi] sassai opened a new issue #2145: [SUPPORT] IOException when querying Hudi data with Hive using LIMIT clause

sassai opened a new issue #2145:
URL: https://github.com/apache/hudi/issues/2145


   **Describe the problem you faced**
   
   Running a query in Hive on Hudi data using LIMIT clause results in IOException. 
   
   ```console
   java.io.IOException: Input path does not exist: abfs://xxx@xxx.dfs.core.windows.net/data/hudi/batch/tables/nyc_taxi/address/year=2020/month=10/day=5/.hoodie_partition_metadata
   ```
   
   The `.hoodie_partition_metadata` files does exist and can be listed with `hdfs dfs -ls ` using the path above.
   
   Example query used:
   
   select * from nyc_taxi.address limit 100;
   
   Running the same query without the limit clause works fine.
   
   HIVE_AUX_JAR variable holds `hudi-utilities-bundle_2.11-0.6.0.jar` and `hudi-hadoop-mr-bundle-0.6.0.jar`
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1. Create a COPY_ON_WRITE table 
   2. Insert records to table (table has 11 million records)
   3. set hive.fetch.task.conversion=none;
   4. Query the table using the statement above
   5. IOException is thrown
   
   **Expected behavior**
   
   Resultset containing 100 records is returned.
   
   **Environment Description**
   
   * Hudi version : 0.6.0
   
   * Spark version : 2.4.0
   
   * Hive version : 3.1
   
   * Hadoop version : 3
   
   * Storage (HDFS/S3/GCS..) : ADLS Gen2
   
   * Running on Docker? (yes/no) : no
   
   **Stacktrace**
   
   ```console
   Error while compiling statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, vertexId=vertex_1601881880788_0031_6_00, 
   diagnostics=[Vertex vertex_1601881880788_0031_6_00 [Map 1] killed/failed due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: address initializer failed, vertex=vertex_1601881880788_0031_6_00 [Map 1],
    org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: abfs://xxx@xxx.dfs.core.windows.net/data/hudi/batch/tables/nyc_taxi/address/year=2020/month=10/day=5/.hoodie_partition_metadata at 
    org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:300) at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:240) at 
    org.apache.hudi.hadoop.HoodieParquetInputFormat.listStatus(HoodieParquetInputFormat.java:105) at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:328) at 
    org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:541) at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:830) at 
    org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:249) at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:280) at 
    org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:271) at java.security.AccessController.doPrivileged(Native Method) at 
    javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876) at 
    org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:271) at 
    org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:255) at 
    com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125) at 
    com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:69) at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78) at 
    java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: 
    java.io.IOException: Input path does not exist: abfs://xxx@xxx.dfs.core.windows.net/data/hudi/batch/tables/nyc_taxi/address/year=2020/month=10/day=5/.hoodie_partition_metadata at 
    org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:274) ... 19 more ]DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:0
   ```
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] bvaradar commented on issue #2145: [SUPPORT] IOException when querying Hudi data with Hive using LIMIT clause

Posted by GitBox <gi...@apache.org>.

bvaradar commented on issue #2145:
URL: https://github.com/apache/hudi/issues/2145#issuecomment-703736061


   THis could be similar to https://github.com/apache/hudi/issues/1962 
   
   The location setting in the hive table needs to be checked. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] bvaradar closed issue #2145: [SUPPORT] IOException when querying Hudi data with Hive using LIMIT clause

Posted by GitBox <gi...@apache.org>.

bvaradar closed issue #2145:
URL: https://github.com/apache/hudi/issues/2145


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org