You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Taraka Rama Rao Lethavadla (Jira)" <ji...@apache.org> on 2024/03/05 11:51:00 UTC

[jira] [Commented] (HIVE-28106) Parallel select queries are failing on external tables with FNF due to staging directory

    [ https://issues.apache.org/jira/browse/HIVE-28106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17823578#comment-17823578 ] 

Taraka Rama Rao Lethavadla commented on HIVE-28106:
---------------------------------------------------

Seems like some code refactoring made as part of https://issues.apache.org/jira/browse/HIVE-24581 seems to have caused this behaviour. But not able to reproduce this problem be it in cluster or using junit test cases

> Parallel select queries are failing on external tables with FNF due to staging directory
> ----------------------------------------------------------------------------------------
>
>                 Key: HIVE-28106
>                 URL: https://issues.apache.org/jira/browse/HIVE-28106
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive
>            Reporter: Taraka Rama Rao Lethavadla
>            Priority: Major
>
> The issue reported here is similar to that of HIVE-26481
> But here it is happening between simultaneous queries on external tables.
> Query1:
>  
> {noformat}
> 2024-02-27 09:41:59,349 INFO org.apache.hadoop.hive.common.FileUtils: [d77d1ce2-c574-48f3-b536-d8c9431a07ae etp519425508-395]: Creating directory if it doesn't exist: hdfs://namespace/warehouse/tablespace/external/hive/database.db/tbl/.hive-staging_hive_2024-02-27_09-41-59_167_572918081912359322-20
> ..
> ..
> 2024-02-2709:42:42,859INFOorg.apache.hadoop.hive.ql.Driver: [HiveServer2-Background-Pool: Thread-416]: Executing command(queryId=sdphive_20240227094159_75903d85-5c0b-4e80-8292-1e7943e85ea8): SELECT COUNT(*) FROM database.tbl WHERE XXXX IS NULL OR YYYY=''
> ..
> ..
> 2024-02-27 09:42:54,407 INFO org.apache.hadoop.hive.ql.Driver: [HiveServer2-Background-Pool: Thread-416]: Completed executing command(queryId=sdphive_20240227094159_75903d85-5c0b-4e80-8292-1e7943e85ea8); Time taken: 11.548 seconds
> {noformat}
> This query got completed and deleted the respective staging directory.
> {noformat}
> 2024-02-27 09:42:54,565 DEBUG hive.ql.Context: [d77d1ce2-c574-48f3-b536-d8c9431a07ae etp519425508-436]: Deleting result dir: hdfs://namespace/warehouse/tablespace/external/hive/database.db/tbl/.hive-staging_hive_2024-02-27_09-41-59_167_572918081912359322-20/-mr-10001 
> ..   
> ..
> 2024-02-27 09:42:54,566 DEBUG hive.ql.Context: [d77d1ce2-c574-48f3-b536-d8c9431a07ae etp519425508-436]: Deleting scratch dir: hdfs://namespace/warehouse/tablespace/external/hive/database.db/tbl/.hive-staging_hive_2024-02-27_09-41-59_167_572918081912359322-20  {noformat}
>  Query 2 started to execute at the same time on the same table
> {noformat}
> 2024-02-27 09:42:53,989 INFO org.apache.tez.client.TezClient: [HiveServer2-Background-Pool: Thread-457]: Submitting dag to TezSession, sessionName=HIVE-08b22263-8e80-470f-81b7-f70bb5561487, applicationId=application_1708662665640_1222, dagName=SELECT ABS(((XXXX - YYYY... (Stage-1), callerContext={ context=HIVE, callerType=HIVE_QUERY_ID, callerId=sdphive_20240227094206_21193765-6a9d-42ab-bc82-3229150fc334_User:UUUU }  {noformat}
> Tez AM logs (syslog_dag_1708662665640_1222_1)
>  
> {noformat}
> 2024-02-27 09:42:54,053 [INFO] [IPC Server handler 1 on 46229] |app.DAGAppMaster|: Running DAG: SELECT ABS(((XXXX - YYYY...  (Stage-1), callerContext={ context=HIVE, callerType=HIVE_QUERY_ID, callerId=sdphive_20240227094206_21193765-6a9d-42ab-bc82-3229150fc334_User:UUUU } 
> .. 
> ..
> 2024-02-27 09:42:54,443 [INFO] [App Shared Pool - #1] |exec.Utilities|: Adding 1 inputs; the first input is hdfs://namespace/warehouse/tablespace/external/hive/database.db/tbl
> ..
> ..
> 2024-02-27 09:42:54,445 [INFO] [App Shared Pool - #1] |io.HiveInputFormat|: Generating splits for dirs: hdfs://namespace/warehouse/tablespace/external/hive/database.db/tbl
> ..
> ..
> 2024-02-27 09:42:54,487 [INFO] [App Shared Pool - #2] |tez.HiveSplitGenerator|: The preferred split size is 33554432
> ..
> ..
> 2024-02-27 09:42:54,488 [INFO] [App Shared Pool - #2] |exec.Utilities|: Adding 1 inputs; the first input is hdfs://namespace/data/eisds/apps/qlys/final/history/tbl/partition_year=2023/partition_month=12/partition_date=2023-12-30
> ..
> ..
> 2024-02-27 09:42:54,631 [TRACE] [ORC_GET_SPLITS #0] |ipc.ProtobufRpcEngine|: 111: Call -> xx-yy-zz.net/170.42.154.76:8020: getListing {src: "/warehouse/tablespace/external/hive/database.db/tbl/.hive-staging_hive_2024-02-27_09-41-59_167_572918081912359322-20" startAfter: "" needLocation: true}  {noformat}
> And the query failed since that directory got removed at the same time
> {noformat}
> 2024-02-27 09:42:54,634 [ERROR] [Dispatcher thread {Central}] |impl.VertexImpl|: Vertex Input: tbl initializer failed, vertex=vertex_1708662665640_1222_1_00 [Map 1]
> org.apache.tez.dag.app.dag.impl.AMUserCodeException: java.lang.RuntimeException: ORC split generation failed with exception: java.io.FileNotFoundException: File hdfs://namespace/warehouse/tablespace/external/hive/database.db/tbl/.hive-staging_hive_2024-02-27_09-41-59_167_572918081912359322-20 does not exist.
>     at org.apache.tez.dag.app.dag.RootInputInitializerManager.runInitializerAndProcessResult(RootInputInitializerManager.java:188)
>     at org.apache.tez.dag.app.dag.RootInputInitializerManager.lambda(RootInputInitializerManager.java:171)
>     at java.util.concurrent.Executors.call(Executors.java:511)
>     at com.google.common.util.concurrent.TrustedListenableFutureTask.runInterruptibly(TrustedListenableFutureTask.java:125)
>     at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:69)
>     at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
>     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>     at java.util.concurrent.ThreadPoolExecutor.run(ThreadPoolExecutor.java:624)
>     at java.lang.Thread.run(Thread.java:750)
> Caused by: java.lang.RuntimeException: ORC split generation failed with exception: java.io.FileNotFoundException: File hdfs://namespace/warehouse/tablespace/external/hive/database.db/tbl/.hive-staging_hive_2024-02-27_09-41-59_167_572918081912359322-20 does not exist.
>     at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1853)
>     at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1940)
>     at org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:543)
>     at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:851)
>     at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:289)
>     at org.apache.tez.dag.app.dag.RootInputInitializerManager.lambda(RootInputInitializerManager.java:203)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at javax.security.auth.Subject.doAs(Subject.java:422)
>     at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
>     at org.apache.tez.dag.app.dag.RootInputInitializerManager.runInitializer(RootInputInitializerManager.java:196)
>     at org.apache.tez.dag.app.dag.RootInputInitializerManager.runInitializerAndProcessResult(RootInputInitializerManager.java:177)
>     ... 8 more
> Caused by: java.util.concurrent.ExecutionException: java.io.FileNotFoundException: File hdfs://namespace/warehouse/tablespace/external/hive/database.db/tbl/.hive-staging_hive_2024-02-27_09-41-59_167_572918081912359322-20 does not exist.
>     at java.util.concurrent.FutureTask.report(FutureTask.java:122)
>     at java.util.concurrent.FutureTask.get(FutureTask.java:192)
>     at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1785)
>     ... 18 more
> Caused by: java.io.FileNotFoundException: File hdfs://namespace/warehouse/tablespace/external/hive/database.db/tbl/.hive-staging_hive_2024-02-27_09-41-59_167_572918081912359322-20 does not exist.
>     at org.apache.hadoop.hdfs.DistributedFileSystem.<init>(DistributedFileSystem.java:1280)
>     at org.apache.hadoop.hdfs.DistributedFileSystem.<init>(DistributedFileSystem.java:1254)
>     at org.apache.hadoop.hdfs.DistributedFileSystem.doCall(DistributedFileSystem.java:1199)
>     at org.apache.hadoop.hdfs.DistributedFileSystem.doCall(DistributedFileSystem.java:1195)
>     at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>     at org.apache.hadoop.hdfs.DistributedFileSystem.listLocatedStatus(DistributedFileSystem.java:1213)
>     at org.apache.hadoop.fs.FileSystem.listLocatedStatus(FileSystem.java:2144)
>     at org.apache.hadoop.fs.FileSystem.handleFileStat(FileSystem.java:2332)
>     at org.apache.hadoop.fs.FileSystem.hasNext(FileSystem.java:2309)
>     at org.apache.hadoop.hive.ql.io.HdfsUtils.listLocatedFileStatus(HdfsUtils.java:104)
>     at org.apache.hadoop.hive.ql.io.HdfsUtils.listFileStatusWithId(HdfsUtils.java:215)
>     at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.listOriginalFiles(OrcInputFormat.java:1281)
>     at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.callInternal(OrcInputFormat.java:1271)
>     at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.lambda-zsh(OrcInputFormat.java:1245)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at javax.security.auth.Subject.doAs(Subject.java:422)
>     at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
>     at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.call(OrcInputFormat.java:1245)
>     at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.call(OrcInputFormat.java:1210){noformat}
> So table directory will be recursively traversed and filter out unwanted files to execute query. But the file exists while traversing but got deleted before it gets filtered out and causing an exception
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)