You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Matt Foley (JIRA)" <ji...@apache.org> on 2011/06/02 08:16:47 UTC

[jira] [Created] (HADOOP-7352) Contracts of LocalFileSystem and DistributedFileSystem should require FileSystem::listStatus throw IOException not return null upon access error

Contracts of LocalFileSystem and DistributedFileSystem should require FileSystem::listStatus throw IOException not return null upon access error
------------------------------------------------------------------------------------------------------------------------------------------------

                 Key: HADOOP-7352
                 URL: https://issues.apache.org/jira/browse/HADOOP-7352
             Project: Hadoop Common
          Issue Type: Improvement
          Components: fs, fs/s3
            Reporter: Matt Foley
            Assignee: Matt Foley


In HADOOP-6201 and HDFS-538 it was agreed that FileSystem::listStatus should throw FileNotFoundException instead of returning null, when the target directory did not exist.

However, in LocalFileSystem implementation today, FileSystem::listStatus still may return null, when the target directory exists but does not grant read permission.  This causes NPE in many callers, for all the reasons cited in HADOOP-6201 and HDFS-538.  See HADOOP-7327 and its linked issues for examples.


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-7352) Contracts of LocalFileSystem and DistributedFileSystem should require FileSystem::listStatus throw IOException not return null upon access error

Posted by "Matt Foley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-7352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13042884#comment-13042884 ] 

Matt Foley commented on HADOOP-7352:
------------------------------------

In a comment in another bug (<a href="https://issues.apache.org/jira/browse/HADOOP-7327?focusedCommentId=13042494&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13042494">here</a>), Daryn points out that if the exceptions are thrown from a lower level with knowledge of the cause of the problem, it would be good to throw something informative such as AccessControlException, rather than just a generic IOException.

I agree in principle, but AccessControlException is a subclass of SecurityException, which is a RuntimeException and therefore no better than NPE.  All of the callers of listStatus() are prepared to get an IOException, but none of the standard sub-classes of IOException include access failures.  Is there a Hadoop or Apache defined sub-class of IOException that would be appropriate?

> Contracts of LocalFileSystem and DistributedFileSystem should require FileSystem::listStatus throw IOException not return null upon access error
> ------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-7352
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7352
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs, fs/s3
>            Reporter: Matt Foley
>            Assignee: Matt Foley
>
> In HADOOP-6201 and HDFS-538 it was agreed that FileSystem::listStatus should throw FileNotFoundException instead of returning null, when the target directory did not exist.
> However, in LocalFileSystem implementation today, FileSystem::listStatus still may return null, when the target directory exists but does not grant read permission.  This causes NPE in many callers, for all the reasons cited in HADOOP-6201 and HDFS-538.  See HADOOP-7327 and its linked issues for examples.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-7352) Contracts of LocalFileSystem and DistributedFileSystem should require FileSystem::listStatus throw IOException not return null upon access error

Posted by "Matt Foley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-7352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13043163#comment-13043163 ] 

Matt Foley commented on HADOOP-7352:
------------------------------------

@Daryn - thanks, I'll use that where appropriate.

@Bharath - agreed, the exception will always be IOException or one of its subclasses; IOException where there is no detailed info available or no better subclass defined.

> Contracts of LocalFileSystem and DistributedFileSystem should require FileSystem::listStatus throw IOException not return null upon access error
> ------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-7352
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7352
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs, fs/s3
>            Reporter: Matt Foley
>            Assignee: Matt Foley
>
> In HADOOP-6201 and HDFS-538 it was agreed that FileSystem::listStatus should throw FileNotFoundException instead of returning null, when the target directory did not exist.
> However, in LocalFileSystem implementation today, FileSystem::listStatus still may return null, when the target directory exists but does not grant read permission.  This causes NPE in many callers, for all the reasons cited in HADOOP-6201 and HDFS-538.  See HADOOP-7327 and its linked issues for examples.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-7352) Contracts of LocalFileSystem and DistributedFileSystem should require FileSystem::listStatus throw IOException not return null upon access error

Posted by "Matt Foley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-7352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13042622#comment-13042622 ] 

Matt Foley commented on HADOOP-7352:
------------------------------------

This behavior, of returning null on access error, is following the pattern of File::list and File::listFiles, which are defined by the JDK.  However, those cause NPE in a lot of Hadoop code too, which is why they've recently been fixed by HADOOP-7342, HADOOP-7322, HDFS-1934, and HDFS-2019.  Those patches provided FileUtil alternatives to the JDK methods.  That approach isn't applicable to FileSystem::listStatus because it is an abstract method in an important contract class defined by Hadoop.  It has a massive number of callers in Common and Mapred (although I found none in HDFS).

I believe this change in semantics, to throw IOException instead of returning null, would have no negative impact.  I've examined the 36 non-trivial callers in Common, and only 2 checked for null result.  All the others would definitely get NPE on null result, and the two that did check had faulty logic!  In Mapred there are far more callers, but almost all of them also will throw NPE on null result.  In going through about half of them, I found one correct and one incorrect null check in about 20 callers.

Therefore, I believe there is no downside to changing the semantics of the base method in this way, and we'll get rid of some mystery NPEs.  We will need to scan for any callers that check for null and use it as a non-error condition; there won't be many but there will be a couple.

Please give feedback.  If this is acceptable I will provide a patch for each of the underlying FileSystem.listStatus implementations, and do the careful scan for callers that actually expect null as an allowed result.

> Contracts of LocalFileSystem and DistributedFileSystem should require FileSystem::listStatus throw IOException not return null upon access error
> ------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-7352
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7352
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs, fs/s3
>            Reporter: Matt Foley
>            Assignee: Matt Foley
>
> In HADOOP-6201 and HDFS-538 it was agreed that FileSystem::listStatus should throw FileNotFoundException instead of returning null, when the target directory did not exist.
> However, in LocalFileSystem implementation today, FileSystem::listStatus still may return null, when the target directory exists but does not grant read permission.  This causes NPE in many callers, for all the reasons cited in HADOOP-6201 and HDFS-538.  See HADOOP-7327 and its linked issues for examples.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-7352) Contracts of LocalFileSystem and DistributedFileSystem should require FileSystem::listStatus throw IOException not return null upon access error

Posted by "Jakob Homan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-7352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13043014#comment-13043014 ] 

Jakob Homan commented on HADOOP-7352:
-------------------------------------

It's correct to make the contract consistent.  Trying to make IOExceptions useful at this point is probably a lost cause.  Otherwise, the changes proposed sound reasonable.

> Contracts of LocalFileSystem and DistributedFileSystem should require FileSystem::listStatus throw IOException not return null upon access error
> ------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-7352
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7352
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs, fs/s3
>            Reporter: Matt Foley
>            Assignee: Matt Foley
>
> In HADOOP-6201 and HDFS-538 it was agreed that FileSystem::listStatus should throw FileNotFoundException instead of returning null, when the target directory did not exist.
> However, in LocalFileSystem implementation today, FileSystem::listStatus still may return null, when the target directory exists but does not grant read permission.  This causes NPE in many callers, for all the reasons cited in HADOOP-6201 and HDFS-538.  See HADOOP-7327 and its linked issues for examples.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-7352) Contracts of LocalFileSystem and DistributedFileSystem should require FileSystem::listStatus throw IOException not return null upon access error

Posted by "Bharath Mundlapudi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-7352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13043037#comment-13043037 ] 

Bharath Mundlapudi commented on HADOOP-7352:
--------------------------------------------

Throwing an IOException is reasonable when listStatus returns null. The reason for this null case can be multiple things like file may not be present, permission issue, disk is bad, OS decided to make file system readonly etc etc. Since at FileSystem level we may not know all the causes for this so we should throw an IOException. Do you agree?

> Contracts of LocalFileSystem and DistributedFileSystem should require FileSystem::listStatus throw IOException not return null upon access error
> ------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-7352
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7352
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs, fs/s3
>            Reporter: Matt Foley
>            Assignee: Matt Foley
>
> In HADOOP-6201 and HDFS-538 it was agreed that FileSystem::listStatus should throw FileNotFoundException instead of returning null, when the target directory did not exist.
> However, in LocalFileSystem implementation today, FileSystem::listStatus still may return null, when the target directory exists but does not grant read permission.  This causes NPE in many callers, for all the reasons cited in HADOOP-6201 and HDFS-538.  See HADOOP-7327 and its linked issues for examples.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-7352) Contracts of LocalFileSystem and DistributedFileSystem should require FileSystem::listStatus throw IOException not return null upon access error

Posted by "Daryn Sharp (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-7352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13042932#comment-13042932 ] 

Daryn Sharp commented on HADOOP-7352:
-------------------------------------

Hadoop common has a {{org.apache.hadoop.fs.permission.AccessControlException}} derives from {{IOException}}.

> Contracts of LocalFileSystem and DistributedFileSystem should require FileSystem::listStatus throw IOException not return null upon access error
> ------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-7352
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7352
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs, fs/s3
>            Reporter: Matt Foley
>            Assignee: Matt Foley
>
> In HADOOP-6201 and HDFS-538 it was agreed that FileSystem::listStatus should throw FileNotFoundException instead of returning null, when the target directory did not exist.
> However, in LocalFileSystem implementation today, FileSystem::listStatus still may return null, when the target directory exists but does not grant read permission.  This causes NPE in many callers, for all the reasons cited in HADOOP-6201 and HDFS-538.  See HADOOP-7327 and its linked issues for examples.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-7352) Contracts of LocalFileSystem and DistributedFileSystem should require FileSystem::listStatus throw IOException not return null upon access error

Posted by "Matt Foley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-7352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13042625#comment-13042625 ] 

Matt Foley commented on HADOOP-7352:
------------------------------------

Correction:  there were calls to FileSystem::listStatus in HDFS, but they were all in Test classes, except for two pass-through calls in HftpFileSystem.listStatus() and HftpFileSystem.LsParser.listStatus().

> Contracts of LocalFileSystem and DistributedFileSystem should require FileSystem::listStatus throw IOException not return null upon access error
> ------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-7352
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7352
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs, fs/s3
>            Reporter: Matt Foley
>            Assignee: Matt Foley
>
> In HADOOP-6201 and HDFS-538 it was agreed that FileSystem::listStatus should throw FileNotFoundException instead of returning null, when the target directory did not exist.
> However, in LocalFileSystem implementation today, FileSystem::listStatus still may return null, when the target directory exists but does not grant read permission.  This causes NPE in many callers, for all the reasons cited in HADOOP-6201 and HDFS-538.  See HADOOP-7327 and its linked issues for examples.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira