You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2023/03/23 16:19:00 UTC
[jira] [Work logged] (HIVE-27135) AcidUtils#getHdfsDirSnapshots() throws FNFE when a directory is removed in HDFS
[ https://issues.apache.org/jira/browse/HIVE-27135?focusedWorklogId=852624&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-852624 ]
ASF GitHub Bot logged work on HIVE-27135:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 23/Mar/23 16:18
Start Date: 23/Mar/23 16:18
Worklog Time Spent: 10m
Work Description: mdayakar commented on code in PR #4114:
URL: https://github.com/apache/hive/pull/4114#discussion_r1146445992
##########
common/src/java/org/apache/hadoop/hive/common/FileUtils.java:
##########
@@ -1376,6 +1376,12 @@ public static RemoteIterator<FileStatus> listStatusIterator(FileSystem fs, Path
status -> filter.accept(status.getPath()));
}
+ public static RemoteIterator<LocatedFileStatus> listLocatedStatusIterator(FileSystem fs, Path path, PathFilter filter)
Review Comment:
The existing code gets the RemoteIterator of LocatedFileStatus objects in org.apache.hadoop.hive.ql.io.AcidUtils#getHdfsDirSnapshots() API so used the same by adding a new API.
Issue Time Tracking
-------------------
Worklog Id: (was: 852624)
Time Spent: 4h 20m (was: 4h 10m)
> AcidUtils#getHdfsDirSnapshots() throws FNFE when a directory is removed in HDFS
> -------------------------------------------------------------------------------
>
> Key: HIVE-27135
> URL: https://issues.apache.org/jira/browse/HIVE-27135
> Project: Hive
> Issue Type: Bug
> Reporter: Dayakar M
> Assignee: Dayakar M
> Priority: Major
> Labels: pull-request-available
> Time Spent: 4h 20m
> Remaining Estimate: 0h
>
> AcidUtils#getHdfsDirSnapshots() throws FileNotFoundException when a directory is removed in HDFS while fetching HDFS Snapshots.
> Below testcode can be used to reproduce this issue.
> {code:java}
> @Test
> public void testShouldNotThrowFNFEWhenHiveStagingDirectoryIsRemovedWhileFetchingHDFSSnapshots() throws Exception {
> MockFileSystem fs = new MockFileSystem(new HiveConf(),
> new MockFile("mock:/tbl/part1/.hive-staging_dir/-ext-10002", 500, new byte[0]),
> new MockFile("mock:/tbl/part2/.hive-staging_dir", 500, new byte[0]),
> new MockFile("mock:/tbl/part1/_tmp_space.db", 500, new byte[0]),
> new MockFile("mock:/tbl/part1/delta_1_1/bucket-0000-0000", 500, new byte[0]));
> Path path = new MockPath(fs, "/tbl");
> Path stageDir = new MockPath(fs, "mock:/tbl/part1/.hive-staging_dir");
> FileSystem mockFs = spy(fs);
> Mockito.doThrow(new FileNotFoundException("")).when(mockFs).listLocatedStatus(eq(stageDir));
> try {
> Map<Path, AcidUtils.HdfsDirSnapshot> hdfsDirSnapshots = AcidUtils.getHdfsDirSnapshots(mockFs, path);
> Assert.assertEquals(1, hdfsDirSnapshots.size());
> }
> catch (FileNotFoundException fnf) {
> fail("Should not throw FileNotFoundException when a directory is removed while fetching HDFSSnapshots");
> }
> }{code}
> This issue got fixed as a part of HIVE-26481 but here its not fixed completely. [Here|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java#L1541] FileUtils.listFiles() API which returns a RemoteIterator<LocatedFileStatus>. So while iterating over, it checks if it is a directory and recursive listing then it will try to list files from that directory but if that directory is removed by other thread/task then it throws FileNotFoundException. Here the directory which got removed is the .staging directory which needs to be excluded through by using passed filter.
>
> So here we can use same logic written in _org.apache.hadoop.hive.ql.io.AcidUtils#getHdfsDirSnapshotsForCleaner()_ API to avoid FileNotFoundException.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)