You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "K S (JIRA)" <ji...@apache.org> on 2019/06/17 18:07:00 UTC
[jira] [Updated] (HADOOP-16378) RawLocalFileStatus throws exception
if a file is created and deleted quickly
[ https://issues.apache.org/jira/browse/HADOOP-16378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
K S updated HADOOP-16378:
-------------------------
Affects Version/s: 3.3.0
Environment: Ubuntu 18.04, Hadoop 2.7.3 (Though this problem exists on later versions of Hadoop as well), Java 8 ( + Java 11)
Description:
Bug occurs when Hadoop creates temporary ".nfs*" files as part of file moves and accesses. If this file is deleted very quickly after being created, a RuntimeException is thrown. The root cause is in the loadPermissionInfo method in org.apache.hadoop.fs.RawLocalFileSystem. To get the permission info, it first does
{code:java}
ls -ld{code}
and then attempts to get permissions info about each file. If a file disappears between these two steps, an exception is thrown.
*Reproduction Steps:*
An isolated way to reproduce the bug is to run FileInputFormat.listStatus over and over on the same dir that we’re creating those temp files in. On Ubuntu or any other Linux-based system, this should fail intermittently
*Fix:*
One way in which we managed to fix this was to ignore the exception being thrown in loadPemissionInfo() if the exit code is 1 or 2. Alternatively, it's possible that turning "useDeprecatedFileStatus" off in RawLocalFileSystem would fix this issue, though we never tested this, and the flag was implemented to fix HADOOP-9652
was:
Bug occurs when Hadoop creates temporary ".nfs*" files as part of file moves and accesses. If this file is deleted very quickly after being created, a RuntimeException is thrown. The root cause is in the loadPermissionInfo method in org.apache.hadoop.fs.RawLocalFileSystem. To get the permission info, it first does
{code:java}
ls -ld{code}
and then attempts to get permissions info about each file. If a file disappears between these two steps, an exception is thrown.
*Reproduction Steps:*
An isolated way to reproduce the bug is to run FileInputFormat.listStatus over and over on the same dir that we’re creating those temp files in. On Ubuntu or any other Linux-based system, this should fail intermittently. On MacOS (due to differences in how `ls` returns status codes) this should not fail.
*Fix:*
One way in which we managed to fix this was to ignore the exception being thrown in loadPemissionInfo() if the exit code is 1 or 2.
> RawLocalFileStatus throws exception if a file is created and deleted quickly
> ----------------------------------------------------------------------------
>
> Key: HADOOP-16378
> URL: https://issues.apache.org/jira/browse/HADOOP-16378
> Project: Hadoop Common
> Issue Type: Bug
> Components: fs
> Affects Versions: 3.3.0
> Environment: Ubuntu 18.04, Hadoop 2.7.3 (Though this problem exists on later versions of Hadoop as well), Java 8 ( + Java 11)
> Reporter: K S
> Priority: Major
>
> Bug occurs when Hadoop creates temporary ".nfs*" files as part of file moves and accesses. If this file is deleted very quickly after being created, a RuntimeException is thrown. The root cause is in the loadPermissionInfo method in org.apache.hadoop.fs.RawLocalFileSystem. To get the permission info, it first does
>
> {code:java}
> ls -ld{code}
> and then attempts to get permissions info about each file. If a file disappears between these two steps, an exception is thrown.
> *Reproduction Steps:*
> An isolated way to reproduce the bug is to run FileInputFormat.listStatus over and over on the same dir that we’re creating those temp files in. On Ubuntu or any other Linux-based system, this should fail intermittently
> *Fix:*
> One way in which we managed to fix this was to ignore the exception being thrown in loadPemissionInfo() if the exit code is 1 or 2. Alternatively, it's possible that turning "useDeprecatedFileStatus" off in RawLocalFileSystem would fix this issue, though we never tested this, and the flag was implemented to fix HADOOP-9652
>
>
>
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org