You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Matt Foley (JIRA)" <ji...@apache.org> on 2011/06/02 08:18:47 UTC

[jira] [Commented] (HADOOP-7352) Contracts of LocalFileSystem and DistributedFileSystem should require FileSystem::listStatus throw IOException not return null upon access error

    [ https://issues.apache.org/jira/browse/HADOOP-7352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13042622#comment-13042622 ] 

Matt Foley commented on HADOOP-7352:
------------------------------------

This behavior, of returning null on access error, is following the pattern of File::list and File::listFiles, which are defined by the JDK.  However, those cause NPE in a lot of Hadoop code too, which is why they've recently been fixed by HADOOP-7342, HADOOP-7322, HDFS-1934, and HDFS-2019.  Those patches provided FileUtil alternatives to the JDK methods.  That approach isn't applicable to FileSystem::listStatus because it is an abstract method in an important contract class defined by Hadoop.  It has a massive number of callers in Common and Mapred (although I found none in HDFS).

I believe this change in semantics, to throw IOException instead of returning null, would have no negative impact.  I've examined the 36 non-trivial callers in Common, and only 2 checked for null result.  All the others would definitely get NPE on null result, and the two that did check had faulty logic!  In Mapred there are far more callers, but almost all of them also will throw NPE on null result.  In going through about half of them, I found one correct and one incorrect null check in about 20 callers.

Therefore, I believe there is no downside to changing the semantics of the base method in this way, and we'll get rid of some mystery NPEs.  We will need to scan for any callers that check for null and use it as a non-error condition; there won't be many but there will be a couple.

Please give feedback.  If this is acceptable I will provide a patch for each of the underlying FileSystem.listStatus implementations, and do the careful scan for callers that actually expect null as an allowed result.

> Contracts of LocalFileSystem and DistributedFileSystem should require FileSystem::listStatus throw IOException not return null upon access error
> ------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-7352
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7352
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs, fs/s3
>            Reporter: Matt Foley
>            Assignee: Matt Foley
>
> In HADOOP-6201 and HDFS-538 it was agreed that FileSystem::listStatus should throw FileNotFoundException instead of returning null, when the target directory did not exist.
> However, in LocalFileSystem implementation today, FileSystem::listStatus still may return null, when the target directory exists but does not grant read permission.  This causes NPE in many callers, for all the reasons cited in HADOOP-6201 and HDFS-538.  See HADOOP-7327 and its linked issues for examples.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira