You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Andrew Wang (JIRA)" <ji...@apache.org> on 2013/09/25 20:18:05 UTC

[jira] [Commented] (HADOOP-9984) FileSystem#globStatus and FileSystem#listStatus should resolve symlinks by default

    [ https://issues.apache.org/jira/browse/HADOOP-9984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13777852#comment-13777852 ] 

Andrew Wang commented on HADOOP-9984:
-------------------------------------

Cross-posting some of my review feedback from HADOOP-9981 that Colin plans to address in this JIRA instead:

{quote}
* I think we have an existing bug in the paths of the returned FileStatus. When going through a glob, it sets the path to the built-up path which can include symlinks, while for a non-glob it's using getFileStatus which has a resolved path. I'm pretty sure FileStatus are supposed to have a resolved path. This is complicated by how PathFilter still needs to compare against the complete built-up path; maybe we could do something like:
{code}
if (filter.accept(new Path(prefix, status.getPath().getName()))) {
{code}
* Our symlink resolution right now is inconsistent: listStatus does not resolve results, getFileStatus does. Shouldn't this be getFileLinkStatus? Or are we waiting to fix this again in HDFS-9877 when it gets recommitted? I know HADOOP-9972 with the new APIs is coming down the pipe, so I just wanted to bring this up.
* I'd like to see tests that would have caught these correctness concerns: that resolved paths are returned correctly (with and without a wildcard), that PathFilters are matching against built-up paths as expected (with and without wildcards), and the looping /a/b -> .. symlink case you mentioned in a comment. Whether it's a terminal or intermediate wildcard also matters here. There are unfortunately a lot of edge cases.
{quote}
                
> FileSystem#globStatus and FileSystem#listStatus should resolve symlinks by default
> ----------------------------------------------------------------------------------
>
>                 Key: HADOOP-9984
>                 URL: https://issues.apache.org/jira/browse/HADOOP-9984
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 2.1.0-beta
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>            Priority: Blocker
>         Attachments: HADOOP-9984.001.patch, HADOOP-9984.003.patch
>
>
> During the process of adding symlink support to FileSystem, we realized that many existing HDFS clients would be broken by listStatus and globStatus returning symlinks.  One example is applications that assume that !FileStatus#isFile implies that the inode is a directory.  As we discussed in HADOOP-9972 and HADOOP-9912, we should default these APIs to returning resolved paths.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira