You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "mahesh kumar behera (Jira)" <ji...@apache.org> on 2022/01/19 11:14:00 UTC

[jira] [Assigned] (HIVE-25877) Load table from concurrent thread causes FileNotFoundException

     [ https://issues.apache.org/jira/browse/HIVE-25877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

mahesh kumar behera reassigned HIVE-25877:
------------------------------------------


> Load table from concurrent thread causes FileNotFoundException
> --------------------------------------------------------------
>
>                 Key: HIVE-25877
>                 URL: https://issues.apache.org/jira/browse/HIVE-25877
>             Project: Hive
>          Issue Type: Bug
>            Reporter: mahesh kumar behera
>            Assignee: mahesh kumar behera
>            Priority: Major
>
> As part of the direct insert optimisation (same issue is there for MM table also, without direct insert optimisation), the files from Tez jobs are moved to the table directory for ACID tables. Then the duplicate removal is done. Each session scan through the tables and cleans up the file related to specific session. But the iterator is created over all the files. So the FileNotFoundException is thrown when multiple sessions are acting on same table and the first session cleans up its data which is being read by the second session.
> {code:java}
> Caused by: java.io.FileNotFoundException: File hdfs://mbehera-1.mbehera.root.hwx.site:8020/warehouse/tablespace/managed/hive/tbl4/_tmp.delta_0000981_0000981_0000 does not exist.
>         at org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.<init>(DistributedFileSystem.java:1275) ~[hadoop-hdfs-client-3.1.1.7.2.14.0-117.jar:?]
>         at org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.<init>(DistributedFileSystem.java:1249) ~[hadoop-hdfs-client-3.1.1.7.2.14.0-117.jar:?]
>         at org.apache.hadoop.hdfs.DistributedFileSystem$25.doCall(DistributedFileSystem.java:1194) ~[hadoop-hdfs-client-3.1.1.7.2.14.0-117.jar:?]
>         at org.apache.hadoop.hdfs.DistributedFileSystem$25.doCall(DistributedFileSystem.java:1190) ~[hadoop-hdfs-client-3.1.1.7.2.14.0-117.jar:?]
>         at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) ~[hadoop-common-3.1.1.7.2.14.0-117.jar:?]
>         at org.apache.hadoop.hdfs.DistributedFileSystem.listLocatedStatus(DistributedFileSystem.java:1208) ~[hadoop-hdfs-client-3.1.1.7.2.14.0-117.jar:?]
>         at org.apache.hadoop.fs.FileSystem.listLocatedStatus(FileSystem.java:2144) ~[hadoop-common-3.1.1.7.2.14.0-117.jar:?]
>         at org.apache.hadoop.fs.FileSystem$5.handleFileStat(FileSystem.java:2332) ~[hadoop-common-3.1.1.7.2.14.0-117.jar:?]
>         at org.apache.hadoop.fs.FileSystem$5.hasNext(FileSystem.java:2309) ~[hadoop-common-3.1.1.7.2.14.0-117.jar:?]
>         at org.apache.hadoop.hive.ql.exec.Utilities.getDirectInsertDirectoryCandidatesRecursive(Utilities.java:4447) ~[hive-exec-3.1.3000.7.2.14.0-117.jar:3.1.3000.7.2.14.0-SNAPSHOT]
>         at org.apache.hadoop.hive.ql.exec.Utilities.getDirectInsertDirectoryCandidates(Utilities.java:4413) ~[hive-exec-3.1.3000.7.2.14.0-117.jar:3.1.3000.7.2.14.0-SNAPSHOT]
>         at org.apache.hadoop.hive.ql.metadata.Hive.getValidPartitionsInPath(Hive.java:2816) ~[hive-exec-3.1.3000.7.2.14.0-117.jar:3.1.3000.7.2.14.0-SNAPSHOT] {code}
>  
> {code:java}
> Caused by: java.io.FileNotFoundException: File hdfs://mbehera-1.mbehera.root.hwx.site:8020/warehouse/tablespace/managed/hive/tbl4/.hive-staging_hive_2022-01-19_05-18-38_933_1683918321120508074-54 does not exist.
>         at org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.<init>(DistributedFileSystem.java:1275) ~[hadoop-hdfs-client-3.1.1.7.2.14.0-117.jar:?]
>         at org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.<init>(DistributedFileSystem.java:1249) ~[hadoop-hdfs-client-3.1.1.7.2.14.0-117.jar:?]
>         at org.apache.hadoop.hdfs.DistributedFileSystem$25.doCall(DistributedFileSystem.java:1194) ~[hadoop-hdfs-client-3.1.1.7.2.14.0-117.jar:?]
>         at org.apache.hadoop.hdfs.DistributedFileSystem$25.doCall(DistributedFileSystem.java:1190) ~[hadoop-hdfs-client-3.1.1.7.2.14.0-117.jar:?]
>         at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) ~[hadoop-common-3.1.1.7.2.14.0-117.jar:?]
>         at org.apache.hadoop.hdfs.DistributedFileSystem.listLocatedStatus(DistributedFileSystem.java:1208) ~[hadoop-hdfs-client-3.1.1.7.2.14.0-117.jar:?]
>         at org.apache.hadoop.fs.FileSystem.listLocatedStatus(FileSystem.java:2144) ~[hadoop-common-3.1.1.7.2.14.0-117.jar:?]
>         at org.apache.hadoop.fs.FileSystem$5.handleFileStat(FileSystem.java:2332) ~[hadoop-common-3.1.1.7.2.14.0-117.jar:?]
>         at org.apache.hadoop.fs.FileSystem$5.hasNext(FileSystem.java:2309) ~[hadoop-common-3.1.1.7.2.14.0-117.jar:?]
>         at org.apache.hadoop.hive.ql.exec.Utilities.getDirectInsertDirectoryCandidatesRecursive(Utilities.java:4447) ~[hive-exec-3.1.3000.7.2.14.0-117.jar:3.1.3000.7.2.14.0-SNAPSHOT]
>         at org.apache.hadoop.hive.ql.exec.Utilities.getDirectInsertDirectoryCandidates(Utilities.java:4413) ~[hive-exec-3.1.3000.7.2.14.0-117.jar:3.1.3000.7.2.14.0-SNAPSHOT]
>         at org.apache.hadoop.hive.ql.exec.Utilities.getFullDPSpecs(Utilities.java:2971) ~[hive-exec-3.1.3000.7.2.14.0-117.jar:3.1.3000.7.2.14.0-SNAPSHOT] {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)