You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Eli Collins (JIRA)" <ji...@apache.org> on 2009/12/10 01:43:18 UTC

[jira] Created: (HADOOP-6427) Add Path isQualified

Add Path isQualified
--------------------

                 Key: HADOOP-6427
                 URL: https://issues.apache.org/jira/browse/HADOOP-6427
             Project: Hadoop Common
          Issue Type: Improvement
            Reporter: Eli Collins
            Assignee: Eli Collins


The Path class has a method to make a path qualified but not to query if the path is qualified. This is needed for HADOOP-64221. In addition this patch adds tests to TestPath that cover the file scheme. Note that "fully qualified" applies to domain names not URIs so this function and it tests also serve to define what we mean by a fully qualified path.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-6427) Add Path isQualified

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-6427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12795674#action_12795674 ] 

Doug Cutting commented on HADOOP-6427:
--------------------------------------

I think this is getting too hairy.  Let's just use URI#resolve(src, dest).  If dest is hdfs:///foo, then resolve returns hdfs:///foo and attempts to dereference the link will fail.  We don't implement an authority-defaulting mechanism when a scheme is provided.  Note that one could link to //host/foo, and if accessed over hftp this would resolve one way, and if over hdfs another.

Sorry for introducing the idea of this feature.


> Add Path isQualified
> --------------------
>
>                 Key: HADOOP-6427
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6427
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Eli Collins
>            Assignee: Eli Collins
>         Attachments: hadoop-6427-1.patch
>
>
> The Path class has a method to make a path qualified but not to query if the path is qualified. This is needed for HADOOP-64221. In addition this patch adds tests to TestPath that cover the file scheme. Note that "fully qualified" applies to domain names not URIs so this function and it tests also serve to define what we mean by a fully qualified path.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-6427) Add Path isQualified

Posted by "Eli Collins (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-6427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12789434#action_12789434 ] 

Eli Collins commented on HADOOP-6427:
-------------------------------------

The namenode does not validate links, just accepts and returns link targets. See {{createSymlink}} in FileContext in HADOOP-6421 and {{testCreateLinkToNonExistantFile}} in HDFS-245. The question is whether when a file context creates a symlink should it make it fully qualified, ie 
{code}
hdfsFC.createSymlink("/foo", link);   //  Create a link to /foo or or hdfs://nn:port/foo?  
...
localFC.getFileStatus(link); // ie should this now access /foo in this current file context or hdfs://nn:port/foo?  
{code}
The code currently assumes the latter, that creating a link to /foo from an hdfs file context should link to a file on that hdfs. Do you agree or prefer the other behavior. Either way I'll add another cross-file system test to HDFS-245 that illustrates the behavior (whichever we choose).

> Add Path isQualified
> --------------------
>
>                 Key: HADOOP-6427
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6427
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Eli Collins
>            Assignee: Eli Collins
>         Attachments: hadoop-6427-1.patch
>
>
> The Path class has a method to make a path qualified but not to query if the path is qualified. This is needed for HADOOP-64221. In addition this patch adds tests to TestPath that cover the file scheme. Note that "fully qualified" applies to domain names not URIs so this function and it tests also serve to define what we mean by a fully qualified path.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-6427) Add Path isQualified

Posted by "Eli Collins (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-6427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12789123#action_12789123 ] 

Eli Collins commented on HADOOP-6427:
-------------------------------------

bq. Can you further describe the use-case for this method? When will it be called?

When creating a symlink in FileContext we both want to give the NN a fully qualified path, and make sure getLinkTarget in the NN returns a fully qualified path. If the NN returned an absolute path FileContext would interpret it to be relative to the FileContext's slash not the NN's.

bq. Perhaps we should have a filesystem-specific method that indicates whether a path is fully-qualified?

I'm assuming the isQualified implementation for the "file" scheme would then live in LocalFs, ie is just special cased in a different place. Delegating to a file system seems a little weird since a Path can construct a fully qualified path without a file system. However from RFC 2396 it looks like the authority is part of the scheme specific part of the URI and therefore should be delegated to the scheme, which in our case is represented by an AbstractFileSystem. So yea it does seem more logical to delegate this method to the file system, I'll do that.

> Add Path isQualified
> --------------------
>
>                 Key: HADOOP-6427
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6427
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Eli Collins
>            Assignee: Eli Collins
>         Attachments: hadoop-6427-1.patch
>
>
> The Path class has a method to make a path qualified but not to query if the path is qualified. This is needed for HADOOP-64221. In addition this patch adds tests to TestPath that cover the file scheme. Note that "fully qualified" applies to domain names not URIs so this function and it tests also serve to define what we mean by a fully qualified path.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-6427) Add Path isQualified

Posted by "Eli Collins (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-6427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12788867#action_12788867 ] 

Eli Collins commented on HADOOP-6427:
-------------------------------------

The motivation for this is that symlink targets need to be fully qualified, eg file:///f is a valid link target because it's fully qualified but hdfs:///f is not. I ran into this when writing a test that links from Hdfs to LocalFs. 

> Add Path isQualified
> --------------------
>
>                 Key: HADOOP-6427
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6427
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Eli Collins
>            Assignee: Eli Collins
>         Attachments: hadoop-6427-1.patch
>
>
> The Path class has a method to make a path qualified but not to query if the path is qualified. This is needed for HADOOP-64221. In addition this patch adds tests to TestPath that cover the file scheme. Note that "fully qualified" applies to domain names not URIs so this function and it tests also serve to define what we mean by a fully qualified path.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-6427) Add Path isQualified

Posted by "Eli Collins (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-6427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12791103#action_12791103 ] 

Eli Collins commented on HADOOP-6427:
-------------------------------------

Doug suggested that {{fc.createSymlink("hdfs:///dir/file", "/dir/link")}} rather than throw an exception when "/dir/link" is accessed it would use the default fs of the file context. Useful for creating eg a link to "hdfs:///temp". Sounds good to me. This means the getLinkTarget is allowed to return a non-qualified path (just need to check the presence of a schema and that the path component is absolute). Since we don't need an isQualified method anymore I'll close this out.  Thanks for all the feedback Doug!

> Add Path isQualified
> --------------------
>
>                 Key: HADOOP-6427
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6427
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Eli Collins
>            Assignee: Eli Collins
>         Attachments: hadoop-6427-1.patch
>
>
> The Path class has a method to make a path qualified but not to query if the path is qualified. This is needed for HADOOP-64221. In addition this patch adds tests to TestPath that cover the file scheme. Note that "fully qualified" applies to domain names not URIs so this function and it tests also serve to define what we mean by a fully qualified path.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-6427) Add Path isQualified

Posted by "Eli Collins (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-6427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12793442#action_12793442 ] 

Eli Collins commented on HADOOP-6427:
-------------------------------------

See the symlink behavior I posted to HDFS-245. https://issues.apache.org/jira/browse/HDFS-245?focusedCommentId=12791197&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12791197

Absolute links are resolved using the file system of the link's parent. So hdfs://host1/foo -> /bar is resolved using the hdfs://host1 file system because that's foo's parent. hdfs://host1/dir/foo -> /bar would also use the hdfs://host1 file system because that's where dir resides, *unless* dir is a symlink to eg hdfs://host2/dir, in which case it would resolve /bar on host2, ie you fully resolve symlinks in the path leading up to the link to determine what the parent is. This is what I understand "symlinks are resolved relative to their source" to mean. I think resolving according to the parent is most intuitive since it means the link always resolves to the same location regardless of the URI used to access the link. This is similar to the behavior you described for links that are "volume root relative" (not clear what "NN's root" refers to in your des the presence of links across NNs). 

The partially qualified syntax (a scheme but no host, eg hdfs:///foo) indicates that the link is resolved using the client's default file system. This is the same behavior you described for links "relative to the client root". The current syntax is a little goofy since the scheme is ignored, eg using hdfs:///foo to indicate to use the client's default file system has nothing to do with "hdfs" since the client's default file system may be s3 or file etc. Replacing <some scheme>:/// with a special character (eg %) so you'd have %/dir/foo is perhaps less confusing. Another nice thing about this syntax (as opposed to using a partially qualified path) to indicate resolution using the client's default file system is that it side steps the fact that Hadoop doesn't support fully qualified paths with the "file" scheme (eg file://localhost/foo is an error and so partially qualified paths with a file scheme are currently special cased not to be considered relative to the client's default file system).

In NFS symlinks are opaque to (not interpreted by) the server and are always resolved on the client. Similarly, in the current patch symlinks are stored raw (ie the target is not modified) on the namenode and interpreted on the client. However this doesn't preclude resolving relative and absolute ("volume relative") links on the namenode when the links don't span file systems in the future as an optimization since (in this case) the final resolution is the same. The path resolution *is different* from NFS since not all paths are resolved using the client's slash. I think HDFS semantics should differ here since HDFS uses URIs instead of Unix paths. For example if a user with an HDFS file context creates the link /data/latest -> /2009/10 on host1 then another user with a *local* file context accesses hdfs://host1/data/latest it would seem confusing if they got a FileNotFoundException because the directory /2009/10 does not exist on their local file system.

Does this make sense? I think the above mostly jives with what you've posted in the jira earlier.

> Add Path isQualified
> --------------------
>
>                 Key: HADOOP-6427
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6427
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Eli Collins
>            Assignee: Eli Collins
>         Attachments: hadoop-6427-1.patch
>
>
> The Path class has a method to make a path qualified but not to query if the path is qualified. This is needed for HADOOP-64221. In addition this patch adds tests to TestPath that cover the file scheme. Note that "fully qualified" applies to domain names not URIs so this function and it tests also serve to define what we mean by a fully qualified path.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-6427) Add Path isQualified

Posted by "Eli Collins (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-6427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12791050#action_12791050 ] 

Eli Collins commented on HADOOP-6427:
-------------------------------------

The isQualified test is an error check. Ideally the namenode resolves all links within the file system so all the UnresolvedLinkExceptions that FileContext sees have fully qualified targets. In the current patch the namenode throws an exception to the client even if the link is within the same file system, but even then we want to check that we got back a fully qualified path be clear that the interpretation of the link target is done by the AbstractFileSystem and not FileContext. So I think we need a function to determine if a path is fully qualified in any case right? It also serves to define what we mean by "fully qualified" in Path (this is currently defined implicitly in the makeQualfied method).

I took your (excellent) suggestion of storing the link target verbatim but have the client return a fully qualified path. That's definitely the way to go. The current semantics follow, will update the design doc. All of these cases are now covered in individual tests in TestLink and TestLocalFsLink.

* Relative link targets

{{fc.createSymlink("file", "/dir/link")}} creates a link named "link" in /dir (assuming the current working directory is "/dir") that points to "file", eg resolves to "/dir/file", and if "dir" is renamed "dir2" the link resolves to "/dir2/link" because the path is not stored relative. The file context is determined by the parent of the link, eg the fully qualified path of the link target of "/dir/link" is determined by "/dir" not the file context that is used to access the link.

* Absolute links targets

{{fc.createSymlink("/dir/file", "/dir/link")}} creates a link named "link" in /dir that points to "/dir/file", if "dir" is renamed "dir2" the link becomes dangling because the link target is stored absolute. The file system is determined by the source not the client, eg {{fc.open("hdfs://host1/dir/link")}} opens "/dir/file" on host1 even if accessed using a local file context.

* Fully qualified link targets

{{fc.createSymlink("hdfs://host/dir/file", "/dir/link")}} creates a link named "link" in /dir that always points to the fully qualified path specified, regardless of the file context or path used to access the link.

I updated the code in HADOOP-6421.

> Add Path isQualified
> --------------------
>
>                 Key: HADOOP-6427
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6427
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Eli Collins
>            Assignee: Eli Collins
>         Attachments: hadoop-6427-1.patch
>
>
> The Path class has a method to make a path qualified but not to query if the path is qualified. This is needed for HADOOP-64221. In addition this patch adds tests to TestPath that cover the file scheme. Note that "fully qualified" applies to domain names not URIs so this function and it tests also serve to define what we mean by a fully qualified path.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-6427) Add Path isQualified

Posted by "Eli Collins (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-6427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12793833#action_12793833 ] 

Eli Collins commented on HADOOP-6427:
-------------------------------------

Forgot to mention, a benefit of using hdfs:///foo (or replace "hdfs" with any scheme except "file") is that the implementation is easy, less involved than ///foo. 

> Add Path isQualified
> --------------------
>
>                 Key: HADOOP-6427
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6427
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Eli Collins
>            Assignee: Eli Collins
>         Attachments: hadoop-6427-1.patch
>
>
> The Path class has a method to make a path qualified but not to query if the path is qualified. This is needed for HADOOP-64221. In addition this patch adds tests to TestPath that cover the file scheme. Note that "fully qualified" applies to domain names not URIs so this function and it tests also serve to define what we mean by a fully qualified path.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (HADOOP-6427) Add Path isQualified

Posted by "Eli Collins (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-6427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Eli Collins resolved HADOOP-6427.
---------------------------------

    Resolution: Won't Fix

> Add Path isQualified
> --------------------
>
>                 Key: HADOOP-6427
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6427
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Eli Collins
>            Assignee: Eli Collins
>         Attachments: hadoop-6427-1.patch
>
>
> The Path class has a method to make a path qualified but not to query if the path is qualified. This is needed for HADOOP-64221. In addition this patch adds tests to TestPath that cover the file scheme. Note that "fully qualified" applies to domain names not URIs so this function and it tests also serve to define what we mean by a fully qualified path.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-6427) Add Path isQualified

Posted by "Sanjay Radia (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-6427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12793731#action_12793731 ] 

Sanjay Radia commented on HADOOP-6427:
--------------------------------------

Absolute paths:
Agree with Eli's comment on absolute links. It is consistent with the "symlinks are resolved relative to their source" is consistent with 
"volume root relative". I was thrown off by Doug's comment in HDFS-245 where he said "Links should be resolved against the client's context."; this seems to be in conflict with his "symlinks are resolved relative to their source".
Also thanks for the "hdfs://host1/dir/foo" I had missed that case. I agree with your analysis.

Partially qualified syntax (The partially qualified syntax (a scheme but no host, eg hdfs:///foo)).
I agree that the syntax is goofy if the scheme is ignored. Not sure if everyone will buy into the "%" syntax (it was inspired by the "~" syntax but such 
special character get messy after a while.) Can we use a special scheme such as "clientContext:///foo"?
So is your proposal to mark partially qualified syntax as invalid or to allow them with the somewhat "goofy" interpretation.


> Add Path isQualified
> --------------------
>
>                 Key: HADOOP-6427
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6427
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Eli Collins
>            Assignee: Eli Collins
>         Attachments: hadoop-6427-1.patch
>
>
> The Path class has a method to make a path qualified but not to query if the path is qualified. This is needed for HADOOP-64221. In addition this patch adds tests to TestPath that cover the file scheme. Note that "fully qualified" applies to domain names not URIs so this function and it tests also serve to define what we mean by a fully qualified path.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-6427) Add Path isQualified

Posted by "Sanjay Radia (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-6427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12792767#action_12792767 ] 

Sanjay Radia commented on HADOOP-6427:
--------------------------------------

Doug> Symbolic links should be resolved relative to their source location. So hdfs://x/foo - > /bar should be interpreted as a link to hdfs://x/bar.

Is this because it is being interpreted relative to the volume root (ie hdfs://x) or because it us being interpreted relative to the client context, ie the client's root (/).
The client's root might be file:///.

I agree with your statement "Symbolic links should be resolved relative to their source location" if by "source" you mean  the place where the 
symlink occurs. In the HDFS-245 jira I called links like /bar to be volume-root-relative sym links.

As I pointed out in my previous comment that there are use cases for both. 

> Add Path isQualified
> --------------------
>
>                 Key: HADOOP-6427
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6427
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Eli Collins
>            Assignee: Eli Collins
>         Attachments: hadoop-6427-1.patch
>
>
> The Path class has a method to make a path qualified but not to query if the path is qualified. This is needed for HADOOP-64221. In addition this patch adds tests to TestPath that cover the file scheme. Note that "fully qualified" applies to domain names not URIs so this function and it tests also serve to define what we mean by a fully qualified path.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-6427) Add Path isQualified

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-6427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12788886#action_12788886 ] 

Doug Cutting commented on HADOOP-6427:
--------------------------------------

Perhaps we should have a filesystem-specific method that indicates whether a path is fully-qualified?

> Add Path isQualified
> --------------------
>
>                 Key: HADOOP-6427
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6427
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Eli Collins
>            Assignee: Eli Collins
>         Attachments: hadoop-6427-1.patch
>
>
> The Path class has a method to make a path qualified but not to query if the path is qualified. This is needed for HADOOP-64221. In addition this patch adds tests to TestPath that cover the file scheme. Note that "fully qualified" applies to domain names not URIs so this function and it tests also serve to define what we mean by a fully qualified path.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-6427) Add Path isQualified

Posted by "Eli Collins (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-6427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12788851#action_12788851 ] 

Eli Collins commented on HADOOP-6427:
-------------------------------------

I special cased the file scheme because it is special cased in RFC 1738:
{quote}
As a special case, <host> can be the string "localhost" or the empty string; this is interpreted as `the machine from which the URL is being interpreted'.
{quote}
ie the non-file schemes in the RFC require an authority because they require a fully qualified host.  My rationale is that since other file systems use a different scheme (eg the RFC reserves the "afs" scheme) and non-local files in Hadoop have their own scheme (hdfs, s3, s3n etc) they would not be considered "fully qualified" without an authority, so the "file" scheme is a special case. Ie even if accessing remote files via the file scheme from Hadoop's perspective the file is local in that LocalFs is still used (even if the underlying host provides this file via a network file system). Reasonable?

> Add Path isQualified
> --------------------
>
>                 Key: HADOOP-6427
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6427
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Eli Collins
>            Assignee: Eli Collins
>         Attachments: hadoop-6427-1.patch
>
>
> The Path class has a method to make a path qualified but not to query if the path is qualified. This is needed for HADOOP-64221. In addition this patch adds tests to TestPath that cover the file scheme. Note that "fully qualified" applies to domain names not URIs so this function and it tests also serve to define what we mean by a fully qualified path.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-6427) Add Path isQualified

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-6427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12789507#action_12789507 ] 

Doug Cutting commented on HADOOP-6427:
--------------------------------------

Symbolic links should be resolved relative to their source location.  So hdfs://x/foo - > /bar should be interpreted as a link to hdfs://x/bar.  The client should generally rewrite a symlink with new Path(src, dest) before de-referencing it, assuming src is fully-qualified.  Should we change the accessor on FileStatus to do this, and perhaps add a getRawLink() method?  I think the primary path field of a FileStatus (src) is always absolute, no?


> Add Path isQualified
> --------------------
>
>                 Key: HADOOP-6427
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6427
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Eli Collins
>            Assignee: Eli Collins
>         Attachments: hadoop-6427-1.patch
>
>
> The Path class has a method to make a path qualified but not to query if the path is qualified. This is needed for HADOOP-64221. In addition this patch adds tests to TestPath that cover the file scheme. Note that "fully qualified" applies to domain names not URIs so this function and it tests also serve to define what we mean by a fully qualified path.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-6427) Add Path isQualified

Posted by "Sanjay Radia (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-6427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12798060#action_12798060 ] 

Sanjay Radia commented on HADOOP-6427:
--------------------------------------

+1 seems fine.
Agree that we drop the DileContext relative symlinks  for now; these can be added later. 

> Add Path isQualified
> --------------------
>
>                 Key: HADOOP-6427
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6427
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Eli Collins
>            Assignee: Eli Collins
>         Attachments: hadoop-6427-1.patch
>
>
> The Path class has a method to make a path qualified but not to query if the path is qualified. This is needed for HADOOP-64221. In addition this patch adds tests to TestPath that cover the file scheme. Note that "fully qualified" applies to domain names not URIs so this function and it tests also serve to define what we mean by a fully qualified path.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-6427) Add Path isQualified

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-6427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12789410#action_12789410 ] 

Doug Cutting commented on HADOOP-6427:
--------------------------------------

The namenode should not validate links, right? It should just store the URI.  (We don't want the namenode creating foreign FileSystem instances.)  

Also, in unix it's quite possible to make a link to a non-existent file with no complaint whatsoever.  The manual page for the Linux symlink system call says, "No checking of oldpath is done."  So arguably we should support links to any URI and interpret it on the client when traversed.  The namenode can still safely optimize the case where the URI is relative.

So I question that we need to ensure that the namenode is always given a fully-qualified path.  A link to hdfs:///users/foo, linking to the user's home directory in the default filesystem, however that's configured by the user, might be reasonable.


> Add Path isQualified
> --------------------
>
>                 Key: HADOOP-6427
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6427
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Eli Collins
>            Assignee: Eli Collins
>         Attachments: hadoop-6427-1.patch
>
>
> The Path class has a method to make a path qualified but not to query if the path is qualified. This is needed for HADOOP-64221. In addition this patch adds tests to TestPath that cover the file scheme. Note that "fully qualified" applies to domain names not URIs so this function and it tests also serve to define what we mean by a fully qualified path.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-6427) Add Path isQualified

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-6427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12788815#action_12788815 ] 

Doug Cutting commented on HADOOP-6427:
--------------------------------------

You special-case file: uri's here.  I think it's reasonable to permit other filesystem implementations besides local whose uri's do not include an authority.  For example, one might layer a FilterFileSystem on a local filesystem.  Or one might use a filesystem with a global namespace, like AFS.

> Add Path isQualified
> --------------------
>
>                 Key: HADOOP-6427
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6427
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Eli Collins
>            Assignee: Eli Collins
>         Attachments: hadoop-6427-1.patch
>
>
> The Path class has a method to make a path qualified but not to query if the path is qualified. This is needed for HADOOP-64221. In addition this patch adds tests to TestPath that cover the file scheme. Note that "fully qualified" applies to domain names not URIs so this function and it tests also serve to define what we mean by a fully qualified path.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-6427) Add Path isQualified

Posted by "Sanjay Radia (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-6427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12792761#action_12792761 ] 

Sanjay Radia commented on HADOOP-6427:
--------------------------------------

If you look at my comments on the symlink jira, I pointed out that the /dir/link could be resolved relative to the root of the volume (NN) or  resolved relative to the client's context.
(https://issues.apache.org/jira/browse/HDFS-245?focusedCommentId=12629524&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12629524)
I list 4 kinds of symlinks in that comment. 

There are use cases for both volume-root-relative and client-root-relative.
I had assumed that /dir/link was a volume-relative symlink (but I don't think that was called out explicitly in the rather long jira).
I think NFS resolves symlink like "/dir/foo" to the root of the volume (ie NN) rather than the root of the client. (Need to verify this).

The only way to support both use cases is to have a different character(s) to distinguish the two. 
There was an earlier system I worked on that used / and % to distinguish between the two (yes ugly).

I would use NFS to guide us here. But regardless of the answer, we need a way to represent the other use case. 




> Add Path isQualified
> --------------------
>
>                 Key: HADOOP-6427
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6427
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Eli Collins
>            Assignee: Eli Collins
>         Attachments: hadoop-6427-1.patch
>
>
> The Path class has a method to make a path qualified but not to query if the path is qualified. This is needed for HADOOP-64221. In addition this patch adds tests to TestPath that cover the file scheme. Note that "fully qualified" applies to domain names not URIs so this function and it tests also serve to define what we mean by a fully qualified path.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-6427) Add Path isQualified

Posted by "Eli Collins (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-6427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12795718#action_12795718 ] 

Eli Collins commented on HADOOP-6427:
-------------------------------------

If I understand correctly we:
# Drop support links that are relative to the file context's default file system from the current patch, ie the client's default file system will never be used when resolving links. Defer this to a future change, if needed.
# Leave the behavior of partially qualified link targets up to file system of the path that refers to the link. Examples:
** file:///link -> file:///file is ok because LocalFs only supports partially qualfied URIs (does not support URIs with an authority)
** file:///link -> file://localhost/file is an error, same rationale
** hdfs://hostx/link -> hdfs:///file is an error because Hdfs URIs must be fully qualified
** hdfs://hostx/link -> //hosty/file is an error, same rationale
** foo://hostx/link -> //hosty/file is ok if file system foo supports paths of form //hosty/file

This sounds good to me. Sanjay, and others, does this sound reasonable to you?


> Add Path isQualified
> --------------------
>
>                 Key: HADOOP-6427
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6427
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Eli Collins
>            Assignee: Eli Collins
>         Attachments: hadoop-6427-1.patch
>
>
> The Path class has a method to make a path qualified but not to query if the path is qualified. This is needed for HADOOP-64221. In addition this patch adds tests to TestPath that cover the file scheme. Note that "fully qualified" applies to domain names not URIs so this function and it tests also serve to define what we mean by a fully qualified path.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-6427) Add Path isQualified

Posted by "Eli Collins (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-6427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12795722#action_12795722 ] 

Eli Collins commented on HADOOP-6427:
-------------------------------------

Hm, seems like we should first check if the target can be resolved with the file system of the target, if one is specified, before trying to resolve using the file system of the link. Ie 
* hdfs://hostx/link -> file:///file is ok even though Hdfs doesn't support URIs w/o authorities
* foo://host/link -> //host/file is only ok if file system foo supports URIs with just an authority 

I should also mention that even in the /link -> /file case we know the file system of the link since FileContext uses the default file system to make the paths it creates fully qualified, ie the default file system is never used when resolving a link target, but _is_ still used when creating paths.


> Add Path isQualified
> --------------------
>
>                 Key: HADOOP-6427
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6427
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Eli Collins
>            Assignee: Eli Collins
>         Attachments: hadoop-6427-1.patch
>
>
> The Path class has a method to make a path qualified but not to query if the path is qualified. This is needed for HADOOP-64221. In addition this patch adds tests to TestPath that cover the file scheme. Note that "fully qualified" applies to domain names not URIs so this function and it tests also serve to define what we mean by a fully qualified path.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-6427) Add Path isQualified

Posted by "Eli Collins (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-6427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Eli Collins updated HADOOP-6427:
--------------------------------

    Attachment: hadoop-6427-1.patch

> Add Path isQualified
> --------------------
>
>                 Key: HADOOP-6427
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6427
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Eli Collins
>            Assignee: Eli Collins
>         Attachments: hadoop-6427-1.patch
>
>
> The Path class has a method to make a path qualified but not to query if the path is qualified. This is needed for HADOOP-64221. In addition this patch adds tests to TestPath that cover the file scheme. Note that "fully qualified" applies to domain names not URIs so this function and it tests also serve to define what we mean by a fully qualified path.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-6427) Add Path isQualified

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-6427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12788870#action_12788870 ] 

Doug Cutting commented on HADOOP-6427:
--------------------------------------

RFC 1738 concerns URLs, a subset of URIs.  Hadoop Path's are a subset of URIs to, but not the same subset.  It might be reasonable for someone to layer an encrypted local filesystem implemetation using sfile: URIs with no schema.  So I still think we shouldn't special-case file: uris here.

Can you further describe the use-case for this method?  When will it be called?

> Add Path isQualified
> --------------------
>
>                 Key: HADOOP-6427
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6427
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Eli Collins
>            Assignee: Eli Collins
>         Attachments: hadoop-6427-1.patch
>
>
> The Path class has a method to make a path qualified but not to query if the path is qualified. This is needed for HADOOP-64221. In addition this patch adds tests to TestPath that cover the file scheme. Note that "fully qualified" applies to domain names not URIs so this function and it tests also serve to define what we mean by a fully qualified path.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-6427) Add Path isQualified

Posted by "Eli Collins (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-6427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12793827#action_12793827 ] 

Eli Collins commented on HADOOP-6427:
-------------------------------------

I think using a new syntax is preferable to using a partially qualified path. Some rationale: 

Using a partially qualified path:
* Using an existing scheme, eg hdfs:///foo is confusing since the "hdfs" is not relevant since the client default file system might not be hdfs. And the file scheme is always a partially qualified uri (ie file:///foo *should not* mean use the default file system of the client, it means use the local file system). Therefore file needs to be special cased, so it's inconsistent.
* Using a new scheme, eg default:///foo is a little odd since the scheme is special-caed ("default" doesn't map to a class that extends AbstractFileSystem). The behavior when a partially qualified path with "default" scheme is made fully qualified would be quirky. Would it be an error? Or ignore the authority?
* We could use an authority but no scheme, eg //default/, but that's bad because it could be combined with another path (made qualified) in which case "default" now has a different meaning (it's an authority).

Using new syntax:
* Use a prefix consisting of reserved URI characters that do not normalize away, eg "%", ";" or "!". However these are valid Posix directory names so you could not tell whether "%/foo" was a relative link or one that should be resolved using the default file system of the file context.
* Use prefix that noramalizes away. Eg /// normalizes to / so ///foo normally means /foo, and still would, except when used as a link target, where it would now mean /foo resolved using the default file system of the client. There's no ambiguity since users can still create an absolute link to foo using "/foo". This would not change any existing semantics because symlinks are new and symlink target resolution is the only place the new meaning would take effect. Using a "///" prefix without a scheme is also uncommon. There are some details to consider though, since /// normalizes away in the Path constructor we'd have to add a flag to Path indicating that the Path was created with /// that would be checked when getting the link target in FileContext. And that scheme:///foo did not have the flag set. And we'd need to make sure the flag wasn't propagated when creating a new fully qualified path.

What do people think of using a "///" prefix?  We could also not shelve this link type until users request it. It is useful, but probably the least common link type. 

> Add Path isQualified
> --------------------
>
>                 Key: HADOOP-6427
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6427
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Eli Collins
>            Assignee: Eli Collins
>         Attachments: hadoop-6427-1.patch
>
>
> The Path class has a method to make a path qualified but not to query if the path is qualified. This is needed for HADOOP-64221. In addition this patch adds tests to TestPath that cover the file scheme. Note that "fully qualified" applies to domain names not URIs so this function and it tests also serve to define what we mean by a fully qualified path.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.