You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "dhruba borthakur (JIRA)" <ji...@apache.org> on 2008/08/29 08:28:44 UTC

[jira] Created: (HADOOP-4044) Create symbolic links in HDFS

Create symbolic links in HDFS
-----------------------------

                 Key: HADOOP-4044
                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
             Project: Hadoop Core
          Issue Type: New Feature
          Components: dfs
            Reporter: dhruba borthakur
            Assignee: dhruba borthakur


HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Tsz Wo (Nicholas), SZE (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12638421#action_12638421 ] 

Tsz Wo (Nicholas), SZE commented on HADOOP-4044:
------------------------------------------------

Doug> Links are clearly not programming errors or client code errors.

It depends on how the API is defined.  e.g. we could define the API open(Path f) such that the parameter f is supposed to be a non-linked path.  An exception will be thrown if the client codes use it with a linked path.

> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch, symLink1.patch, symLink4.patch, symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12630421#action_12630421 ] 

dhruba borthakur commented on HADOOP-4044:
------------------------------------------

So folks, 

Proposal1 : Do you think that it is a good idea to put in a subset of the full symlink functionality (without support for symlinks to directories) as a first cut? And then, if a use case arises, we can do the full symlink functionality at a later date.

Proposal 2: Proposal 1 + if the symlink is for a pathname in the same namenode, then the namenode will resolve the symlink transparently. This will allow symbolic links to directories that point to paths inside the same namenode.

Please respond by putting +1/-1/0 for the two proposals. The reason I am not implementing the full set of functionality right now is because 0.19 cutoff data is only a few days away!

> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch, symLink1.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Sanjay Radia (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12637331#action_12637331 ] 

Sanjay Radia commented on HADOOP-4044:
--------------------------------------

Doug says:
>> [minor]  the method next() suggests that one follows the path to the next symbolic link. It does not strictly.

>Can you elaborate? Each time this is called a link is resolved, returning a path that may contain more links, following a chain of links, no? Also, >'doOperation()' is next to meaningless. So if 'next' is indeed inappropriate we should replace it with something more specific, not something generic.

First I had qualified this as _minor_ (meaning it is just a minor suggestion for the author).
When I read the code, I first thought that dhruba was  processing all sym links and then doing the actual operation at the end.
Turns out next() actually performs the operation. If the operation succeeds (and it will for most pathnames including for all dot-relative symlinks)
all is well and done. Indeed if there is another NN in a chain of links that is willing to process remote  symlinks internally all is fine and done.
If the operations  fails, then the recovery is performed.
Now the design is quite elegant. Later we can add new reasons why the NN could not complete the operation and the structure could be modified
to recover for new conditions. I like that. The name next() suggests that one is following  symlinks so I was suggesting something like doOperaiton() or
performOperation() which I find more accurate than the the generic "next()". 
Perhaps adding a comment to the abstract method may be sufficient. _I am okay with whatever the author decides here - it is a very minor comment_.

> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch, symLink1.patch, symLink4.patch, symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12638593#action_12638593 ] 

Doug Cutting commented on HADOOP-4044:
--------------------------------------

Owen> I would probably use a thread local variable to measure the depth of the traversal. It isn't great, but it would be encapsulated in LinkResult.

I see.  Yuck.  It wouldn't in fact be entirely encapsulated, since it would assume that all the resolution of a path happens in the same thread.  Thus we'd want to mark each of the FileSystem methods that uses a LinkResult final, add some comments noting that the LinkResult API has this restriction, etc.  (It'd be odd to ever try to run these in different threads, but folks have tried odder things!)  In general, thread locals seem like a poor design choice for this.  They're best for things like caches, to avoid thread contention, rather than for storing program state, no?

We could instead perhaps add something like a LinkResolution object that keeps a counter internally that is incremented as the link is resolved.  The SPI would pass this around rather than a Path?  Is that better than Dhruba's patch?


> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch, symLink1.patch, symLink4.patch, symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12629266#action_12629266 ] 

Raghu Angadi commented on HADOOP-4044:
--------------------------------------

> If the extra RPC is a problem, then a FileSystem can implement open() directly itself as an optimization. 

That imposes restriction on what SymLink can point to.

> Exceptions should not be used for normal program control. 

agreed. we could of course propose another FS.open() that returns something more than an FSInputStream.


> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4044) Create symbolic links in HDFS

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur updated HADOOP-4044:
-------------------------------------

    Attachment: symLink6.patch

Hi Doug, thanks for your comments. I made all the changes you suggested except one. Please let me know if these changes look good. I am working on another unit test for this feature.

You had suggested that fs/spi/VfsStatusBase be renamed to FSLink. However, my thinking is that the VfsStatusBase can be used not only for links but for other items too. For example, if we want to return other types of NameNode exceptions to the client, we could use VfsStatusBase. That's the reason why I am hesitant to rename it to FSLink.

> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch, symLink1.patch, symLink4.patch, symLink5.patch, symLink6.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12638430#action_12638430 ] 

Doug Cutting commented on HADOOP-4044:
--------------------------------------

Owen, do you think this logic is unique to HDFS?  If we add support for links to another FileSystem implementation would we want to share this logic with HDFS?  If so, could we do it through a common base-class, like LinkableFileSystem.  Then FileSystems that never intend to support links could continue to directly implement the end-user methods, while FileSystems that support links would extend the new subclass and implement the LinkResult<T> SPI methods.  Is that consistent with your proposal?

Also, the pseudo code above will cause a stack overflow for a circular link.  Shouldn't we catch loops earlier than that?  And if the loop-detection logic is non-trivial, won't we end up with either lots of duplicated code or something like the anonymous classes in Dhruba's patch?


> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch, symLink1.patch, symLink4.patch, symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Robert Chansler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12631218#action_12631218 ] 

Robert Chansler commented on HADOOP-4044:
-----------------------------------------

-1
Sometimes just doing the easy part is not good enough. In this case, the reasonable expectation of users is that if HDFS has links, you can 1) link to anything, and 2) links can be path components. (Those two issues aren't well-separated in the previous discussion.) It's not reasonable to have to keep explaining that links only sort of work to each user. And even the easy case is not completely thought out: Is it an _error_ to make a link to a directory, or is that something that _just won't work_ when using the link? What about modifications to the shell commands to support links?

> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch, symLink1.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12630242#action_12630242 ] 

Raghu Angadi commented on HADOOP-4044:
--------------------------------------

bq. @Raghu: The FSDataInputStream will implement another interface StreamType that will store the information on whether this is a symbolc link. No extra RPC is needed. 

Ok, this is one of the options for mentioned earlier that works just for open(). As the earlier discussions in the Jira shows, symlinks is kind of fundamental change to filesystem and affects various parts. Just like open(), once various options for general handling of links is discussed here, I don't think patch will be much more complicated than what you have attached.

Symlinks have been used for many decades and I don't think there any lack of use cases. I don't think HADOOP-4058 should alone be the criteria for this Jira. 


> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch, symLink1.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12638577#action_12638577 ] 

Doug Cutting commented on HADOOP-4044:
--------------------------------------

> clients are responsible to resolve links before using it 

That might be a reasonable contract, but correct use of the API would then involve extra RPCs to check links before trying to open them, no?  But if it is expected that folks will regularly pass unresolved links, then we shouldn't use an exception, since callers of this method would be expected to immediately handle the case of unresolved links.  There's no need for a throw, as there might be for, e.g., FileNotFound, which must be handled at some unknown layer above.  Exceptions are a long-distance goto, a control mechanism that should be reserved for cases that require them, where the catcher is unknown.


> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch, symLink1.patch, symLink4.patch, symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12637384#action_12637384 ] 

dhruba borthakur commented on HADOOP-4044:
------------------------------------------

1. "symlink" vs "link":
   I think it makes sense to call the current implementation as "links" instead of symlinks. None of the existing file system implementations have any support for any kinds of links. It is ok for the first implementation to refer to this new construct as a generic "link". HDFS implements it as a symbolic link, but some other file system may implement "links" as hard links.

  +1 for calling this construct as "link" instead of "symlink".

2. Exceptions vs Objects-as-return-status in an public API (FileSystem or ClientProtocol API)
   Exceptions or Object-as-return-value approaches are two ways of communication a certain piece of information to the user of the API. 
   (a) One goal is to discuss  how we can attempt to make that API somewhat future proof. If we consider our current Hadoop RPC, the only way to serialize/deserialize an exception object is to serialize its message string. The client side can de-serialize this string and reconstruct an exception object. If the return status need to contain various different pieces of information, then serializing/deser as a string inside the exception object is not very elegant. Many other RPC systems (e.g. Thrift)  allow versioning objects (adding new fields) but many might not allow adding new exceptions to a pre-existing method call.  Thus, making API calls return objects-as-return-types seem to be more future-proof than adding exceptions.
   (b) The thumb-rule that we have been following is that exceptions are generated when an abnormal situation occurs. If an exception is thrown by the ClientProtocol, it is logged by the RPC subsystem into an error log. This is a good characteristic to have in a distributed system, makes debugging easy because a scan in the error logs
pinpoints the exceptions raised by the API. Access control checks or disk-full conditions raise exceptions, and they are logged by the RPC subsystem. We do not want every call to "traverse a symbolic link" to log an exception message in the error logs, do we? (Of course, we can special case it and say that we will not log UnresolvedPathException; but by
special-casing it, we are acknowledging that this exception is not an abnormal behaviour).

 +1 for RenameResult rename(Path src, Path dst) throws IOException;

3. Exceptions vs Object-as-return-status inside the NameNode
   (a)  Different filesystem implementations can have very different implementations and a very different set of developers. For example, HDFS might implement code in such a way that traversing a link returns a object-status where S3 or KFS throws an exception (internal to the implementation). If we write a file system implementation for Ceph, we are likely to not rewrite the Ceph code to not use exceptions (or vice versa). I would like to draw the distinction that this issue is not related to what is decide in case (2) above.
   (b) The primary focus for designing the internal methods of the NameNode is not future-proof for backward compatibility. Also, there isn't any requirement to serialize/deserialize any exception objects as long as that object is used inside the NameNode. Thus, exceptions could be used here. This keeps most of the HDFS code clean and elegant.

  +1 for Using Exceptions inside the NameNode internal methods.

> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch, symLink1.patch, symLink4.patch, symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Konstantin Shvachko (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12638152#action_12638152 ] 

Konstantin Shvachko commented on HADOOP-4044:
---------------------------------------------

> All of these methods are now conditional. They now either return the declared value or a boolean

To make sure we are on the same page - open() does not literally return a boolean, you mean that throwing an exception should be treated as an (boolean) indicator to whether the link should be retrieved, right?
But the methods have always been conditional in *that sense*, because they are throwing AccessControlException, QuotaExceededException, and FileNotFoundException. And XMountPointException is just another exception of the same nature.
I do not see it as a fundamental change to the FileSystem API.

> When I referred to doubling the number of RPCs I was contrasting with the naive approach

I think everybody agrees that the naive approach is unacceptable.

> the link value would not be returned directly with the call to open() but would require a second RPC, right?

Correct. The point is that there is no degradation of performance in regular case, which is when there are no links or the link is local, but there is an extra rpc when the link points to another cluster. 
And I do not think we should optimize it exactly because this leads to the fundamental change of the api, just as you describe.
And because an application can avoid this inefficiency by directly resolving the link before calling open() if it knows about the mount point.


> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch, symLink1.patch, symLink4.patch, symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12638586#action_12638586 ] 

Doug Cutting commented on HADOOP-4044:
--------------------------------------

> the local file system will want to support the normal dot-relative symlinks but perhaps not the remote one (ie the mounts ).

Internal symlinks are supported automatically by local FS, at least on unix.  And support for remote symlinks on the local FS might be *very* cool.  Unix will gladly let me create symbolics link to anHDFS URI  (Try 'ln -s hdfs://foo/bar foo; ls -l foo'). These cannot be resolved by a unix shell, but they could by 'bin/hadoop fs -ls', by MapReduce programs etc.  One could, e.g., keep in one's home directory on unix links to the HDFS directories that one commonly uses.


> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch, symLink1.patch, symLink4.patch, symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Joydeep Sen Sarma (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12628072#action_12628072 ] 

Joydeep Sen Sarma commented on HADOOP-4044:
-------------------------------------------

+1. good primitive to have.

> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12640224#action_12640224 ] 

Doug Cutting commented on HADOOP-4044:
--------------------------------------

> I am almost certain that it won't affect any benchmark other than NNBench.

If that's really the case, what are we worried about here?

> What happens if a link or directory is changed between these two operations? open() fails though it should not.

The same thing that happens if that change is made just after a file is opened.  If you open a file, then someone else deletes it, subsequent accesses to that file will fail.  The namenode doesn't keep any state for files open for read, so a short-lived cache of block locations doesn't change things fundamentally.

That said, the cache idea only works for open, and doesn't work for rename, delete, etc.  In these cases we don't want to pre-fetch a list of block locations.  So nevermind the cache idea anyway.

The current options on the table seem to be:
 - Dhruba's patch modified to use Nicholas's idea of LinkResult<T> style, to avoid defining new return type classes for the SPI methods.
 - A less-invasive approach that requires two RPCs.  We may later optimize this by converting FileSystem's API to use the above style, but we may not need to.  We do need to be careful not to incompatibly change FileSystem's public API, but the SPI's not so constrained, since all FileSystem implementations are currently in trunk and can be easily maintained in a coordinated manner.  In the meantime, we can start using symbolic links in archives, etc. while we work out if and how to better optimize them.

Does that sound right?

I don't have a strong preference.  If I were implementing it myself I'd probably go for the simple approach first, early in a release cycle, then benchmark things and optimize it subsequently if needed.  The risk is not that great, since we already have good ideas of how to optimize it.  But the optimization will clearly help scalability, so it wouldn't hurt to have it from the outset either.

FYI, I tried implementing my patch above as a LinkedFileSystem subclass, to better contain the changes.  This turned out to be messy, since a LinkedFileSystem can link to an unlinked FileSystem.  With the subclass approach this must be explicitly handled by casts and 'instanceof', while when FileSystem itself supports links this can be handled by default method implementations.  So I am not convinced that a LinkedFileSystem subclass is a net win.

> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: HADOOP-4044-strawman.patch, symLink1.patch, symLink1.patch, symLink4.patch, symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12637667#action_12637667 ] 

Raghu Angadi commented on HADOOP-4044:
--------------------------------------

> Do you agree with this assumption, or do you feel we should examine it more?

I agree with assumption. I was only wondering why we are ok with compromise in one aspect and not about the other.

> I thought the point of contention is not the name of the method or return value but whether to use an exception or a data structure to pass the return value. (That pseudo code also referred to FileHandle, which is also not in the patch.)

For me at least, the issue was returning something that does not suite the api. If we have a member called 'getNextLinkOrOpen()' that returns next link or opens a file.. it makes perfect sense. My objection has been calling it open() and returning link some times.

Though I don't like the new names, +1 for current patch if the names of the members is changed to reflect what it does. This also allows HDFS to use exceptions since we don't want to change the names.



> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch, symLink1.patch, symLink4.patch, symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doug Cutting updated HADOOP-4044:
---------------------------------

    Attachment: HADOOP-4044-strawman.patch

> I guess we need an openImpl(...) method. Otherwise, it would be an infinite loop.

No, they have different signatures.  I've attached a patch for FileSystem that implements this.  For FileSystem's that do not support links this is completely back-compatible.  HDFS could implement this by adding an RPC per file before it is opened.  If that proves too slow, it could perhaps optimize things by returning and caching the block list with the link information, so that a second RPC could be avoided.

The patch is not complete, but it works.  I've only implemented open.  It's meant to be an illustration of a style that we might use.


> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: HADOOP-4044-strawman.patch, symLink1.patch, symLink1.patch, symLink4.patch, symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4044) Create symbolic links in HDFS

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur updated HADOOP-4044:
-------------------------------------

    Status: Open  (was: Patch Available)

Incorporate Konstantin's comments.

> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>             Fix For: 0.20.0
>
>         Attachments: 4044_20081030spi.java, HADOOP-4044-strawman.patch, symLink1.patch, symLink1.patch, symLink11.patch, symLink12.patch, symLink13.patch, symLink4.patch, symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12638581#action_12638581 ] 

Owen O'Malley commented on HADOOP-4044:
---------------------------------------

Ok, I realized after I posted that in fact the LinkResult is not abstract.

Doug> do you think this logic is unique to HDFS?

Well, currently yes, but in the long term hopefully not. In my proposal, most of the work would happen in LinkResult.

Doug> Also, the pseudo code above will cause a stack overflow for a circular link.

I was assuming (and included in the comments) that getNewPath would throw if it was too deeply nested. I would probably use a thread local variable to measure the depth of the traversal. It isn't great, but it would be encapsulated in LinkResult.

> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch, symLink1.patch, symLink4.patch, symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Sanjay Radia (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12637317#action_12637317 ] 

Sanjay Radia commented on HADOOP-4044:
--------------------------------------

Dhruba, a question about your patch. Why do the data types LocatedBlocks,  FSDataInputStream, FSDataOutputStream
contain the methods isSymlink() and getSymlink()? 
Or is it for overloading the return type to process the remote symbolic links?
To be consistent with the rest of your patch, wouldn't it have been better to declare the corresponding FSLinkXXX types for each of XXX above.



> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch, symLink1.patch, symLink4.patch, symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4044) Create symbolic links in HDFS

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur updated HADOOP-4044:
-------------------------------------

    Fix Version/s: 0.20.0
           Status: Patch Available  (was: Open)

> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>             Fix For: 0.20.0
>
>         Attachments: 4044_20081030spi.java, HADOOP-4044-strawman.patch, symLink1.patch, symLink1.patch, symLink11.patch, symLink12.patch, symLink13.patch, symLink4.patch, symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12634998#action_12634998 ] 

Doug Cutting commented on HADOOP-4044:
--------------------------------------

> Many return types are changed to return this path information.

Note that only the return types in the impl methods are changed.  The user methods retain their original signatures.

> I think throwing the exception is enough [ ... ]

But a link is not an exceptional situation, it is a normal situation and should thus be handled with normal control flow.  It does not require the non-local control flow that exceptions provide.  The callee needs to pass specific information directly back to the caller.  Using a common base class is appropriate here.

UnresolvedPathException is only used within HDFS, not in the FileSystem API.  My review has focussed on the changes to the changes to the public FileSystem API, not to the HDFS implementation.  Someone more familiar with HDFS should review the HDFS portiions of this patch.  On a cursory examination, it does appear that this patch does use exceptions for control flow within HDFS, which looks like a bad idea to me.  Rather HDFS's internal APIs should be modified to explicitly handle links as return values.


> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch, symLink1.patch, symLink4.patch, symLink5.patch, symLink6.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12637663#action_12637663 ] 

Doug Cutting commented on HADOOP-4044:
--------------------------------------

> so.. this is a performance compromise. But the arguments till now have been about "perfect" ...

This issue has long been performance-related.  The assumption is that it is unacceptable for the addition of symlinks to double or more the number of RPCs for common HDFS operations when no links are present.  I have noted this assumption on several occasions and no one has disputed it.  Do you agree with this assumption, or do you feel we should examine it more?

> in the patch it is not clalled 'nextLinkOrOpenImpl()'.. it is just called 'openImpl()'. 

I thought the point of contention is not the name of the method or return value but whether to use an exception or a data structure to pass the return value.  (That pseudo code also referred to FileHandle, which is also not in the patch.)  Would you like to propose better names for the methods and objects in the patch?  I am not wedded to the names used in the patch, but perhaps we should resolve the exception-related issue first?

> This is obviously an exaggerated example but hopefully makes a point.

I'm not sure what the point is.  If the RPC system wished to return some generic data for all protocols then it could not force a different return type, since return types are protocol-specific, so we'd probably need to add a method to the RPC runtime.  If the situation is unusual, then an exception would be more appropriate.

HTTP is perhaps an analogy.  It uses the return code to indicate a redirect.  URLConnection.html#getInputStream() doesn't throw an exception when it encounters a redirect.  It either follows the redirect or returns the text of the redirect page, depending on whether you've set HttpURLConnection#setFollowRedirects.  Redirects are part of the protocol.  The HTTP return type is complex.  That's the model we must move towards if we wish to handle links without adding lots more RPCs.

> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch, symLink1.patch, symLink4.patch, symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12642970#action_12642970 ] 

Doug Cutting commented on HADOOP-4044:
--------------------------------------

The src/core parts of this patch look good to me. 

One nit: since maxPathLinks is not settable, it should be a private constant MAX_PATH_LINKS for now.  And 1000 seems way too big.  Linux limited links to 5 for a long time, but Posix specifies a minimum of 8.  So we should probably choose somewhere between 8 and 32.

If folks find this to be a problem, we could make the limit configurable, or we could simply detect loops:

{code}
  T resolve(FileSystem fs, Path p) throws IOException {
    T in = next(fs, p);
    Path first = p;
    HashSet seen = new HashSet();
          
    // continue looping till there are no more symbolic links in the path.
    while (in.isLink()) {
      seen.add(p);

      // construct new path
      p = in.getLink();

      if (seen.contains(p))
        throw new IOException("Loop in symbolic links " + first);
      }
      ...
}
{code}

> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: HADOOP-4044-strawman.patch, symLink1.patch, symLink1.patch, symLink11.patch, symLink4.patch, symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Sanjay Radia (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12638584#action_12638584 ] 

Sanjay Radia commented on HADOOP-4044:
--------------------------------------

Owen, my first instinct was to implement it the way you suggested. However, one you add  loop detection you may want to consider using the structure similar the one that Dhruba used to share that code. The addition of FOOImpl for each method does clutter the FileSystem code.
Doug's suggestion of LinkableFileSystem is interesting, Not I don't think this is unique to HDFS. KFS may want to support this,
BTW  the local file system will want to support the normal dot-relative symlinks but perhaps not the remote one (ie the mounts ). So Linkable is not quite the distinction. (but that is just a question of naming).


> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch, symLink1.patch, symLink4.patch, symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12636764#action_12636764 ] 

dhruba borthakur commented on HADOOP-4044:
------------------------------------------

Hi Raghu, I was saying that "using exceptions inside the namenode keeps NameNode internal interfaces clean and elegant". So, lease go ahead and review this patch if you have some time. Thanks.

> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch, symLink1.patch, symLink4.patch, symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Allen Wittenauer (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12639609#action_12639609 ] 

Allen Wittenauer commented on HADOOP-4044:
------------------------------------------

>(I guess unix thinks the : is simply part of the name.)
 
Pretty much. ln doesn't validate the link at all, so you can put all sorts of interesting things there.  It is up to the calling program or filesystem to verify the validity in its context.  [For example, I think it is rsh that if you link a hostname to rsh, it will then rsh to that host.]

It should also be noted that : is a valid directory separator under HFS+ (Mac OS X) as well.  You're likely to see very bizarro results on those operating systems that use : as a metachar as a result.

> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: HADOOP-4044-strawman.patch, symLink1.patch, symLink1.patch, symLink4.patch, symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Sanjay Radia (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12637323#action_12637323 ] 

Sanjay Radia commented on HADOOP-4044:
--------------------------------------

Dhruba says:
> I vote for using exceptions internal to the NameNode to keep internal interfaces clean and elegant. 
I agree with you that using exception keeps the NN interfaces clean and elegant.
I further agree with Raghu that we need to be consistent here and extend your taste for "clean and elegant"  to to the RPC interface.


> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch, symLink1.patch, symLink4.patch, symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4044) Create symbolic links in HDFS

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur updated HADOOP-4044:
-------------------------------------

    Attachment: symLink13.patch

Fixed findbugs warnings. Also fixed the reason for the unit test failure: FileSystem.exists() was not implemented correctly in the earlier version of this patch.

> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: 4044_20081030spi.java, HADOOP-4044-strawman.patch, symLink1.patch, symLink1.patch, symLink11.patch, symLink12.patch, symLink13.patch, symLink4.patch, symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12638608#action_12638608 ] 

Doug Cutting commented on HADOOP-4044:
--------------------------------------

Perhaps we're prematurely optimizing here.  Perhaps the extra RPCs to resolve paths won't hurt things that much.  The namenode benchmarks I've seen show that it handles 50k opens/second.  How much slower would that be if we always first resolved the path.  In this naive approach we'd add a method which must always be called before a file is opened (or renamed, etc.) that would return an 'isResolved' boolean and a path.  If the path is resolved then it can be used, otherwise we must resolve it further.  The resolution loop could then be independent of the FileSystem method, so no ThreadLocal or other mechanism would be required to maintain this state.

{code}
public class ResolvedPath extends Path {}

public class LinkResolution {
  public isResolved() { ...}
  public Path getUnresolvedPath();
  public ResolvedPath getResolvedPath();
}

public class FileSystem {
  ...
  public abstract LinkResolution resolveLinks(Path p);
  public abstract FSInputStream open(ResolvedPath p);

  private ResolvedPath getResolvedPath(Path p) throws IOException {
    LinkResolution resolution = resolveLinks(p);
    int count = 0;
    while (!resolution.isResolved()) && count < MAX_LINKS) {
      p = resolution.getUnresolvedPath();
      resolution p.getFileSystem(conf).resolveLinks(p);
    }
    return resolution.getResolvedPath();
  }

  public FSInputStream open(Path p, ...) throws IOException {
    return open(getResolvedPath(p), ...);
  }
 ...
}
{code}
No exceptions, no thread locals, no anonymous classes, etc.  Should we benchmark this before we rule it out?

> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch, symLink1.patch, symLink4.patch, symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12637651#action_12637651 ] 

Raghu Angadi commented on HADOOP-4044:
--------------------------------------

> I don't think anyone would argue that the values of isLink() and getLinkTarget()

Yes. It is of course proper data for 'stat' or FileStatus.

> But we don't want to make this many calls to our FS implementations, since they're RPCs. So, for performance,
so.. this is a performance compromise. But the arguments till now have been about "perfect" api/implementation for a clearly imperfect/compromise (but well performing) solution. 

> Why does this suddenly make the that same data non-normal?

in the patch it is not clalled 'nextLinkOrOpenImpl()'.. it is just called 'openImpl()'. 

> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch, symLink1.patch, symLink4.patch, symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Sanjay Radia (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12629524#action_12629524 ] 

Sanjay Radia commented on HADOOP-4044:
--------------------------------------

There are the following 4 kinds of symbolic links:
 
* dot relative  - relative to the directory in which the symbolic link exists
    ** these symbolic links can be processed in the NN without kicking it back to the client
* volume root relative - this is relative to the root of the NN's root
    ** these symbolic links can be processed in the NN without kicking it back to the client
* Relative to another file system
   ** these are essentially a symbolic mount. Good use case are remote NNs or Hadoop Archives
   ** these symbolic links needs to be kicked back to the client to be processed on the client side.
* Relative to the root of the client's root (ie to where the client's / points to)
   ** The main use case for these is where I have a symbolic link t a genric location (say /tmp) that is best 
   picked from the client's environment.
  ** these symbolic links needs to be kicked back to the client to be processed on the client side.
   ** I believe we can avoid this last one for the first implementation since I don't think the use case is strong.


   

> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Sanjay Radia (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12637756#action_12637756 ] 

Sanjay Radia commented on HADOOP-4044:
--------------------------------------

Oops a few of my comments got deleted by mistake as I was doing cut/paste.

-------------

>My argument is that we should model normal functionality with normal programming: methods, parameters, return values and data structures. Do you disagree with this? Or do you think that links are not normal functionality?
>You keep arguing that "exceptions can be used for recoverable conditions". Indeed they can. But is a link really best modeled as a "recoverable condition"? This sounds to me like a rationalization for using exceptions, not a reason to use exceptions over data structures.


The way you have worded your questions makes it very hard for me to convey the alternative.
So I response in the best way I can (and you will not find the answer acceptable since I have said some of this before).
I see following a symbolic link as an exceptional condition but not an error. The NN prefers not to follow the remote symlinks.
It is however a condition that can be recovered by the client side.
Either the use of exceptions or data types and parameters are ways to deal with this.
I believe the use of exceptions in this situation is acceptable and further the better of the two alternative.
Further I believe it leaves the methods signatures intact and they continue to convey the function they previously did by their name and by the types of the in and out parameters. I like that property of the solution. The alternate solution forces me to create a whole bunch of new data types or overload existing one. If exceptions were not available I would use the alternate solution.

I don't like  your use of term "normal control flow".That term was used in one of your references for an example where and outOfBound exception was used to get out of a for loop at the array index boundary. Clearly use of exceptions to exit out of a for loop for out of bounds is blasphemous.
So the issue is that  if I am forced to argue using the words normal or not normal it makes my point of view ldiotic - the use of those words 
in that example do not apply in this context. 


----------------

> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch, symLink1.patch, symLink4.patch, symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12637147#action_12637147 ] 

Doug Cutting commented on HADOOP-4044:
--------------------------------------

> I prefer the name symlink instead of link.

Hmm.  'Symbolic link' is the marked case in unix, the exception, where 'link' alone refers to a hard link.  The reason is historic, symbolic links were not added to unix until 4.2 BSD.  The term 'link' was already used to describe hard links and so had to be qualified, even though symbolic links are now more commonly used than hard links.  In Hadoop I argue we should make hard links the marked case, and that the default understanding of 'link' in Hadoop, as on the web, should be like unix's symbolic link.  If we ever add support for hard links, then they would become the marked case in Hadoop, requiring qualification, like server-side redirects.  However I would not veto this if the majority prefer a longer name here.

> the method next() suggests that one follows the path to the next symbolic link. It does not strictly.

Can you elaborate?  Each time this is called a link is resolved, returning a path that may contain more links, following a chain of links, no?  Also, 'doOperation()' is next to meaningless.  So if 'next' is indeed inappropriate we should replace it with something more specific, not something generic.

> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch, symLink1.patch, symLink4.patch, symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Konstantin Shvachko (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12649189#action_12649189 ] 

Konstantin Shvachko commented on HADOOP-4044:
---------------------------------------------

> 1. can re-run the test and post them here.
Yes, that's what I mean the results should be posted.

> 2. every operation in DFSClient will have to convert a FsResultWritbale object into a FsResult
No they don't because FsResultWritbale is a subclass of FsResult. So RPC call returns FsResultWritbale and DFSClient can return at further up as FsResult no conversion needed. 
Please look at the implementation of {{DFSClient.setReplication()}} above.

> 4. Sure you can remove the method in a separate jira prior this one.

> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>             Fix For: 0.20.0
>
>         Attachments: 4044_20081030spi.java, HADOOP-4044-strawman.patch, symLink1.patch, symLink1.patch, symLink11.patch, symLink12.patch, symLink13.patch, symLink14.patch, symLink4.patch, symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12636790#action_12636790 ] 

rangadi edited comment on HADOOP-4044 at 10/3/08 11:11 PM:
----------------------------------------------------------------

I deal with every day and care about _both_ FileSystem and HDFS. We have two very different approaches for the same problem in these two places (changing return types vs throwing exception) . It does not feel quite right for me to +1 though I prefer only one of the approaches.  This does not affect external API, but for developers internal APIs are just as important. 

Two-step API is of course not good (as I pointed out in the early proposals), I don't think it is related to 'return type' vs exception issue.

My preference is to have the same approach as in HDFS. Any API that can throw IOException, can throw UnresolvedLinkException. Upper layers that can handle it handle it. No change in return types is required and we don't need (rather strange) classes like FSLinkBoolean, FSInputStreamLink etc.  As minor side benefit, we also remove all the extra objects created (even when no links are involved).

Finally, I don't think Exceptions are meant only for hard errors (FileNotFound, for e.g.). I doubt if exception is always an error, but my experience with java is less than 2 years :).

edit: minor. my first edit since Sept 16th.. 

      was (Author: rangadi):
    
I deal with every day and care about _both_ FileSystem and HDFS. We have two very different approaches for the same problem in these two places (changing return types vs throwing exception) . I does not feel quite right for me to +1 though I prefer only one of the approaches.  This does not affect external API, but for developers internal APIs are just important. 

Two-step API is of course not good (as I pointed out in the early proposals), I don't think it is related to 'return type' vs exception issue.

My preference is to have the same approach as in HDFS. Any API that can throw IOException, can throw UnresolvedLinkException. Upper layers that can handle it handle it. No change return types is required and we don't need extra classes like FSLinkBoolean, FSInputStreamLink etc.  As minor side benifit, we also remove all the extra objects created, even when no links are involved.

Finally, I don't think Exceptions are meant only for hard errors (FileNotFound, for e.g.). I doubt exception is always an error, but my experience with java is less than 2 years :).
  
> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch, symLink1.patch, symLink4.patch, symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12637133#action_12637133 ] 

Doug Cutting commented on HADOOP-4044:
--------------------------------------

Exceptions are intended for cases where you're not sure who will catch them.  They're a non-local control-flow mechanism.  Using them for normal control flow is an abuse and leads to spaghetti logic.  In this case we are changing the FileSystem service-provider interface so that many methods can return either a link or a normal value.  The appropriate implementation is a union result type, not exceptions.  In Java we implement this union by extending a 'Linkable' base class.

http://192.220.96.201/essays/java-style/exceptions.html
http://www.javapractices.com/topic/TopicAction.do?Id=19
http://leepoint.net/notes-java/flow/exceptions/03exceptions.html


> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch, symLink1.patch, symLink4.patch, symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Konstantin Shvachko (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12649135#action_12649135 ] 

Konstantin Shvachko commented on HADOOP-4044:
---------------------------------------------

# We should have some *performance testing*. There is a lot of wrapping/unwrapping going on. So we need to make sure that old operations like {{getBlockLocations(), create(), list()}} etc. on the {{FileSystem}} level perform the same as the old code.
# > I suggest we use Dhruba's implementation and do the improvement later. 
I don't understand what is the problem here.
Nichlas proposes to have 2 classes {{FsResult<T>}} and {{FsResultWritable<T>}} which extends {{FsResult<T>}}.
This is exactly what should be done here. {{ClientProtocol}} methods should return {{FsResultWritable<T>}}, while {{DFSClient}} and all other {{\*FileSystem}} classes should return {{FsResult<T>}}. This is because Writable is needed only in the RPC level, after that Writable is not required. E.g.
{code}
interface ClientProtocol {
  FsResultWritable<LocatedBlocks>  setReplication(...);
  FsResultWritable<NullWritable> create(...);
}

class DFSClient {
  FsResult<LocatedBlocks>  setReplication(...) {
    return namenode.setReplication(...); // can do this because FsResultWritable is a subclass of FsResult
  }

  FsResult<OutputStream> create(...) {
    return new FsResult<OutputStream>(new DFSOutputStream(...));
  }
}
{code}
# I did not review the code. Just looked in few places.
- Looks like FsResult constructors do not need to throw any exceptions.
- {{DFSClient.getHints()}} is deprecated. It is a good time to remove it.

> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>             Fix For: 0.20.0
>
>         Attachments: 4044_20081030spi.java, HADOOP-4044-strawman.patch, symLink1.patch, symLink1.patch, symLink11.patch, symLink12.patch, symLink13.patch, symLink4.patch, symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12628691#action_12628691 ] 

dhruba borthakur commented on HADOOP-4044:
------------------------------------------

After a discussion with Joydeep, we came up with the idea that that FileSystem.open() will transparently resolve a symbolic link if the symbolic  link is a relative or if it is of the form hdfs://bar/foo.txt. This means that a HDFS symbolic link can point only to other hdfs pathnames.

In future, if the symbolic link is of the form file://dir/foo.txt, then the FileSystem API can return it to higher layers for interpretation. This will allow an HDFS symbolic link to refer to a pathname that is not a HDFS pathname. I do not plan to implement this feature in the short term. 

> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Tsz Wo (Nicholas), SZE (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12648812#action_12648812 ] 

Tsz Wo (Nicholas), SZE commented on HADOOP-4044:
------------------------------------------------

Talked to Dhruba offline.  He mentioned that the idea my previous [comment|https://issues.apache.org/jira/browse/HADOOP-4044?focusedCommentId=12644069#action_12644069] is not easy to implement.  I did not have much time to take a closer look.  So, I suggest we use Dhruba's implementation and do the improvement later.

> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>             Fix For: 0.20.0
>
>         Attachments: 4044_20081030spi.java, HADOOP-4044-strawman.patch, symLink1.patch, symLink1.patch, symLink11.patch, symLink12.patch, symLink13.patch, symLink4.patch, symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12635017#action_12635017 ] 

Doug Cutting commented on HADOOP-4044:
--------------------------------------

> Are you saying that getSymlink() can return file:///etc/dir/file.txt?

Yes.


> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch, symLink1.patch, symLink4.patch, symLink5.patch, symLink6.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12629255#action_12629255 ] 

Raghu Angadi commented on HADOOP-4044:
--------------------------------------

Both have extra RPC (or system call in case of LocalFS) for open of a normal file (common case). How about something like :

{code}
public FSDataInputStream open(Path f, int bufferSize) throws IOException {
  FileSystem fs = this;
  for (int r = 0; r < maxRecursion; r++) {
    try {
        return fs.open(f, bufferSize, ..);
    catch (SymLinkNotFollowedException e) {
        f = e.getSymLink();
        fs = f.getFileSystem(getConf());
     }
  }
}
{code}

> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Tsz Wo (Nicholas), SZE (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12637156#action_12637156 ] 

Tsz Wo (Nicholas), SZE commented on HADOOP-4044:
------------------------------------------------

I think we don's need Path.createNewPath(...) as a public API.  This method may be confusing.

> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch, symLink1.patch, symLink4.patch, symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Allen Wittenauer (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12630028#action_12630028 ] 

Allen Wittenauer commented on HADOOP-4044:
------------------------------------------

symbolic links to directories allows for admins to move content around without disrupting user processes.   without symbolic links to directories, we'd end up creating a symbolic link for each file.  

that seems extremely counter-productive and more likely to cause the name node significant amount of grief.

> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12635006#action_12635006 ] 

dhruba borthakur commented on HADOOP-4044:
------------------------------------------

@Doug: Ok, I will post a new patch very soon. 
 
getRemainingPath() returns the trailing path components that could not be resolved by a file system implementation. 

Suppose one is trying to resolve hdfs://home/dhruba/link/dir/file.txt and link is a symbolic link pointing to file:///etc, then getSymlink() will return file:///etc and getRemainingPath() will return dir/file.txt. Now, FileSystem.java will use these two components to create a new path. Are you saying that getSymlink() can return file:///etc/dir/file.txt?

@Raghu: UnresolvedPathException is a namenode private class. It is not thrown to the dfs client. The Namenode catches this exception, maps it to FSLink object and returns it to the client.
  

> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch, symLink1.patch, symLink4.patch, symLink5.patch, symLink6.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12638330#action_12638330 ] 

Doug Cutting commented on HADOOP-4044:
--------------------------------------

Owen> I am not buying (yet) Doug's argument of saying that not-throwing exceptions inside the namenode is a better programming style.

One could in theory use exceptions to handle any condition.  For example, we could, if you attempt to open a directory, return the directory listing in an exception.  But, rather, we prefer to use data structures unless exceptions are clearly more appropriate.  So, the question is, when are exceptions appropriate?

A primary advantage of exceptions is their long-distance throwing.  The maxim is, "throw early", "catch late", since in most cases its better to pass exceptions through and let the highest layers deal with them.  So one sign that exceptions are appropriate is when you don't know who should handle them.  That is not the case here.  There's only one place where this is intended to be caught, in boilerplate methods of FileSystem.java.

In http://www.onjava.com/pub/a/onjava/2003/11/19/exceptions.html they list just three cases where exceptions are appropriate: programming errors, client code errors and resource failures.  Links are clearly not programming errors or client code errors.  But are they resource failures?  Examples of resource failures are things like network timeouts, out of memory, lack of permission, etc.  These generally concern resources that are out of the direct control of the application, and are hence present unexpected outcomes.  Reinforcing that interpretation one finds sentences like, "This is a misuse of the idea of exceptions, which are meant only for defects or for items outside the direct control of the program.", in one of the links I provided earlier.  So, are links a condition outside the direct control of the program?  Are they external resources?  I don't see that.  The same component that implements open() also directly implements the storage of the link.  So the fact that something is a link is entirely within the filesystem's domain.  So, again, using exceptions does not seem justified here.

Sanjay has argued that it is abnormal for a filesystem to link to a different filesystem.  This misses the intended sense of  "normal".  The value of a link is normal data, under the control of the filesystem implementation, regardless of whether it points to a different filesystem.  As I tried to elaborate above, "abnormal", with regards to exceptions, is defined as out of the direct control of the module, or in error.  One could reasonably call directories as "abnormal" in some sense--they're not files--but they are not errors nor are they conditions out of the control of the application, nor are they issues that are best handled at some unknown spot higher on the stack.  Directories, like links, are thus best modelled directly by the API, not as exceptions.

The primary argument folks have provided for the superiority of exceptions is that they better preserve the current service provider interface, and that this interface more obviously describes the actions in question.  It does describe the non-link part of these actions more concisely, but it hides the link part, so preserving that API is not a clear win.

This proposed use of exceptions in fact seems to me a poster child for how *not* to use exceptions.  If we go down this route, I don't see how we'll be able to argue against exceptions in other places.  Non-local control is not required.  No error is involved.  External conditions are not involved.  None of the standard reasons for using exceptions are met.  Exceptions are used here purely as a convenient control flow mechanism.  It's effectively setjmp/longjmp, whose use we should not encourage.

Owen> we aren't changing the FileSystem API at all.

We're not changing the public API much, but we are changing the service provider API substantially.  We'd like most FileSystems to support links, and to do so they'll need to change what many methods return.  It would be nice to be able to do this back-compatibly, without altering the existing API much.  But I don't see a way to do that and still achieve our performance goals without setting a bad precedent for the appropriate use of exceptions.

I'm not trying to be an uncompromising ass here.  ("You don't have to try, it comes naturally", I hear you say.)  If you look at the early history of this issue, you'll see that I only with reluctance and skepticism originally outlined the return type approach, because it is clearly a big, disruptive change that's hard to justify.  But I really can see no other way that doesn't set precedents that we cannot live with as a project.  I'd love for someone to provide a "hail mary" solution, that leaves the API's return types mostly alone, performs well and doesn't set a bad example, but until that happens I don't see an alternative.  (Or for someone to convince me that this is actually a reasonable, appropriate, intended use of exceptions.)  But until then, I think we just need to accept that FileSystem's SPI must get more complex in order to efficiently support links.


> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch, symLink1.patch, symLink4.patch, symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12637268#action_12637268 ] 

Doug Cutting commented on HADOOP-4044:
--------------------------------------

> The code looks bizarre (or unclean) with all the FSLink*** return types.

It does look different than the FS API always has looked, and different than other FS apis.  But to reject it for just looking different, to call it bizzarre and unclean, is xenophobia.  It *is* different.  Other FSs (e.g., linux VFS) don't try to do this in a single call, but rather loop, checking whether each element in the path is a link.  If we wish to avoid that, then we must pack more data into the return types of our API, and our API will look different.  We can try to hack around that by using exceptions or thread locals or output parameters, but those are all just attempts to try to make this API look like other APIs rather than embrace its differences.

So, do we really want to handle links in a single call?  I think so.


> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch, symLink1.patch, symLink4.patch, symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Tsz Wo (Nicholas), SZE (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12637311#action_12637311 ] 

szetszwo edited comment on HADOOP-4044 at 10/6/08 5:13 PM:
-------------------------------------------------------------------------

bq. Good question. I prefer FsXxx, but we've not been consistent in Hadoop, nor is Java. I think java.net.URL was a mistake, that it should have been named java.net.Url, that acronyms in class names should be capitalized, not all caps. This makes them easier to parse.

It is the time to rename all the FSXxx classes since the pubic API is going to be changed a lot in 0.20 anyway.  I have created HADOOP-4357 for renaming the classes.

      was (Author: szetszwo):
    bq. Good question. I prefer FsXxx, but we've not been consistent in Hadoop, nor is Java. I think java.net.URL was a mistake, that it should have been named java.net.Url, that acronyms in class names should be capitalized, not all caps. This makes them easier to parse.

It is the time to rename all the FSXxx classes since the pubic API is going to be changed a lot in 0.20 anyway.
  
> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch, symLink1.patch, symLink4.patch, symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Tsz Wo (Nicholas), SZE (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12637286#action_12637286 ] 

Tsz Wo (Nicholas), SZE commented on HADOOP-4044:
------------------------------------------------

I think it is wrong to combine FSLink and the original return type like boolean, BlockLocations, etc. to create new classes.  The problem here is that we need to return two items in a single method call. The usually approach is to return a container, say FsReturnType<T>.  I suggest the following API:
{code}
//T is the original type
public class FsReturnType<T> {
  public T getValue() {
    ...
  }

  Path getLink() throws IOException {
    ...
  }
}
{code}

BTW, should our naming convention be FsXxx or FSXxx?

> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch, symLink1.patch, symLink4.patch, symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Tsz Wo (Nicholas), SZE (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12637311#action_12637311 ] 

Tsz Wo (Nicholas), SZE commented on HADOOP-4044:
------------------------------------------------

bq. Good question. I prefer FsXxx, but we've not been consistent in Hadoop, nor is Java. I think java.net.URL was a mistake, that it should have been named java.net.Url, that acronyms in class names should be capitalized, not all caps. This makes them easier to parse.

It is the time to rename all the FSXxx classes since the pubic API is going to be changed a lot in 0.20 anyway.

> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch, symLink1.patch, symLink4.patch, symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12629272#action_12629272 ] 

rangadi edited comment on HADOOP-4044 at 9/8/08 1:13 PM:
--------------------------------------------------------------

> That imposes restriction on what SymLink can point to.
or may be not.. if an implementation (e.g. DistributedFileSystem) has all the information to instantiate another file system. 

One alternative is to have open() return {{'SymLinkFSInputStream extends FSInputStream'}} when a link is not followed by the implementation. A read on it will return bytes representinng the link (and it can have getSymLink() method).

      was (Author: rangadi):
    > That imposes restriction on what SymLink can point to.
or may be not.. if an implementation (e.g. DistributedFileSystem) has all the information to instantiate another file system. 

One alternative is to have open() return '{{ SymLinkFSInputStream extends FSInputStream }}' when a link is not followed by the implementation. A read on it will return bytes representinng the link (and it can have getSymLink() method).
  
> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Sanjay Radia (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12637753#action_12637753 ] 

Sanjay Radia commented on HADOOP-4044:
--------------------------------------

Doug says>>>>>>
>> Doug should we pay the cost and do it the "right way" now?
>Yes, of course.
>> I prefer to use exceptions [ ... ]
>But why? You've said it's "cleaner" and "simpler". I don't follow your argument. "Simpler" could mean fewer lines of code. I don't think it will actually change the lines of code much. Changing the return types will change more lines of code, hence the change is more complex, but the resulting code will probably be around the same size, regardless of which approach is taken. "Cleaner" is subjective, like "better", and requires elaboration.
>My argument is that we should model normal functionality with normal programming: methods, parameters, return values and data structures. Do you disagree with this? Or do you think that links are not normal functionality?
>You keep arguing that "exceptions can be used for recoverable conditions". Indeed they can. But is a link really best modeled as a "recoverable condition"? This sounds to me like a rationalization for using exceptions, not a reason to use exceptions over data structures.
>Please provide a reasoned explanation for your disagreement. Otherwise it is hard to move forward.

*Two part answer below: (1) response to the above, and (2) in the sprint of moving forward.*

This a very one-sided evaluation of the discussion that has occurred above. 
First, I have  said that we have a difference of opinion (essentially granting you that you have a reasonable point of view).
I have  *not* come out and said that you are not offering any reasonable arguments.
Has reasonable arguments been offered for my side? Yes. you are just not willing to see it because you think the use of exceptions in this
situation is blasphemous. 
 I am not the only one who finds the the use of exceptions clean and elegant and the alternate view not so; there are others who have
said similar things.
- Konstantine
- Raghu
- Dhruba - "Dhruba> Thus, exceptions could be used here [NN]. This keeps most of the HDFS code clean and elegant. " He however argues that we should not use exceptions for RPC. But he does state that use of exceptions make the NN clean and elegant.

Hence many have provided counter arguments to your view.
Myself and other have tried to explain  what we have meant by "elegant and simple". 
Since several of us are saying the same thing please grant us that we are seeing something you are not even if we are all wrong.
I  agree that terms like simple and elegant are not as precise as one would like but then neither are   "hacky" or  "perversion"  

What we have is a difference in opinion - both of are convinced that that the other approach is not good or is invalid. (BTW i don't believe 
your approach is invalid and wrong, I just prefer mine.) 

*In the spirit of moving forward:* 
* symlink vs link  - I am okay with either but prefer symlink. 

* use of exceptions inside NN
   +1. 
   I believe this keeps the code and the internal interfaces "simple and elegant"  (others such as Dhruba have said the same).
   It has nothing to do with how much code needs to get changed.
  Lets agree to disagree here. 
  Doug if you want to change the NN code to not use exceptions, I would like to have an opportunity to sit with you and make one last attempt
to help you see my point of view or get convinced by yours. Eitherway, we can  discuss this  over a beer someday (I will buy).

* Not using exceptions for RPC interface - use the patch approach or Owen's variation on it.
   I can live with this.
  I can justify not using exceptions here to some degree because the impact is limited and also some RPC systems do not have exceptions ( not a very strong argument because we already use exceptions in our RPC interface) 
 I am okay with the way the patch has done it or the alternative that Owen has offered. 
 (Its a six of this half a dozen of the other). As I noted, Owen's approach is potentially better for evolution and versioning.

* Having the NN compose the symlink and the rest of the path
   I can live with either - I prefer to keep them separate and have the client side do the composition as information is preserved and furthermore it allows us to deal other ways of composing the two parts (I have no concrete examples of needing this at this stage).








> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch, symLink1.patch, symLink4.patch, symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12638436#action_12638436 ] 

Doug Cutting commented on HADOOP-4044:
--------------------------------------

> we could define the API open(Path f) such that the parameter f is supposed to be a non-linked path.

Does calling something an "error" really make it an "error"?  It seems like the only motivation to call this an error is to justify using exceptions to handle links, not that it would in fact be a user or program error to call it in this way.  Rather, it would be an expected, documented, supported event, but we'd call it an error anyway, because otherwise we'd have to actually model the functionality in our API.


> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch, symLink1.patch, symLink4.patch, symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12638405#action_12638405 ] 

Doug Cutting commented on HADOOP-4044:
--------------------------------------

Konstantin> this argument is about the programming style rather than performance or anything else

It's clearly about both of those.  We wish to make a change motivated by performance, and the question is how best to implement that change.  We have two proposals on the table, one that is less invasive but uses a discouraged style, and one that involves substantial changes but avoids that style problem.  I tried earlier to remove the discussion from style guidelines to the underlying reason for those guidelines --that an API's purpose is to model expected control and data flow--but that argument seems to have caused more confusion than it has helped, so I've mostly reverted to citing style guidelines.

Konstantin> FileSystem is the first layer for which link values make sense.

The fact that that layer is fully-resolved and known indicates that links should be a first-class value in the API.  All of the other layers you present are part of a single module (HDFS) whose job is to present itself as a FileSystem.  If HDFS is implementing the FileSystem API, then FileSystem's datatypes are not alien to HDFS.  This is the same twist that Sanjay pursued.  Yes, the link URI is to a file outside of that particular HDFS filesystem, but link URIs are normal data that HDFS accepts as parameters (when adding a symbolic link), stores, and returns (when dereferencing a link).  So this is not an alien object or an unexpected situation.  A link to another filesystem is a sensible, first class value within HDFS.

> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch, symLink1.patch, symLink4.patch, symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12637170#action_12637170 ] 

Doug Cutting commented on HADOOP-4044:
--------------------------------------

> Here the author is distinguishing between declared and undeclared - but the point is that declared exceptions are okay for recoverable conditions.

Unfortunately that is not the question here.  He, among many others, says not to use exceptions for normal control flow.  Links in a filesystem are normal.  They are not conditions out of the control of the code in question.

> otherwise all the interfaces and code would have become unclean.

What does 'unclean' mean?  The interfaces should reflect their function.  We're making a fundamental change to these interfaces.  The return types of all of these operations must change.  We could implement a return type change with exceptions or thread locals.  Either would require fewer changes to the code, but both would be terrible programming style.  We need to suck it up and change the return types consistently to reflect the change in semantics.  Let's not call something 'unclean' just because it is more work.


> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch, symLink1.patch, symLink4.patch, symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Sanjay Radia (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12637582#action_12637582 ] 

Sanjay Radia commented on HADOOP-4044:
--------------------------------------

Dhruba says:
>If an exception is thrown by the ClientProtocol, it is logged by the RPC subsystem into an error log. 
>This is a good characteristic to have in a distributed system, makes debugging easy because a scan in the error logs
>pinpoints the exceptions raised by the API. 
>Access control checks or disk-full conditions raise exceptions, and they are logged by the RPC subsystem. We do not want every call to "traverse a 
>symbolic link" to log an exception message in the error logs, do we? (Of course, we can special case it and say that we will not log >UnresolvedPathException; but by
>special-casing it, we are acknowledging that this exception is not an abnormal behaviour).

The NN (or any server) should log server error (like disc full)   but not user exceptions (like non-existent path name supplied).
The reason that access violation exceptions are logged is because it is needed for security audits - security is a special case and the info may even be logged in a separate file.
Hence, in general,  the server side needs to filter what is logged.
If we has some inheritance hierarchy in our exceptions we could make the filter simpler.
The special casing argument is not valid since we should add some sort of a filter anyway at some point.

> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch, symLink1.patch, symLink4.patch, symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12630242#action_12630242 ] 

rangadi edited comment on HADOOP-4044 at 9/11/08 9:23 AM:
---------------------------------------------------------------

bq. @Raghu: The FSDataInputStream will implement another interface StreamType that will store the information on whether this is a symbolc link. No extra RPC is needed. 

Ok, this is one of the options mentioned earlier that works just for open(). As the discussion in the Jira shows, symlinks is kind of fundamental change to filesystem and affects various parts. Just like open(), once various options for general handling of links is discussed here, I don't think patch will be much more complicated than what you have attached.

Symlinks have been used for many decades and I don't think there any lack of use cases. I don't think HADOOP-4058 should alone be the criteria for this Jira. 


      was (Author: rangadi):
    bq. @Raghu: The FSDataInputStream will implement another interface StreamType that will store the information on whether this is a symbolc link. No extra RPC is needed. 

Ok, this is one of the options for mentioned earlier that works just for open(). As the earlier discussions in the Jira shows, symlinks is kind of fundamental change to filesystem and affects various parts. Just like open(), once various options for general handling of links is discussed here, I don't think patch will be much more complicated than what you have attached.

Symlinks have been used for many decades and I don't think there any lack of use cases. I don't think HADOOP-4058 should alone be the criteria for this Jira. 

  
> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch, symLink1.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12629271#action_12629271 ] 

Doug Cutting commented on HADOOP-4044:
--------------------------------------

> That imposes restriction on what SymLink can point to.

I don't follow what you mean here.

> we could of course propose another FS.open() that returns something more than an FSInputStream.

Yes, I drafted this and deleted it, since it's rather hairy and not clearly an advantage to anything but HDFS.  Here it is:

{code}
public class FileOpening {
  public FileOpening(boolean isLink, Path link) { ... }
  public boolean isLink() { ... }
  public Path getLink() { ... }
}

public FileOpening getFileOpening(Path f) throws IOException {
  FileStatus stat = getFileOpening(f);
  return new FileOpening(stat.isLink(), stat.getLink());
}

public FSDataInputStream open(Path f, int bufferSize) throws IOException {
  FileSystem fs = this;
  FileOpening opening = fs.getFileOpening(f);
  for (opening.isSymLink()) {
    f = opening.getSymLink();
    fs = f.getFileSystem(getConf());
    opening = fs.getOpening(f);
   }
  return fs.openData(f, opening, bufferSize);
}

public abstract FSDataInputStream openData(Path f, FileOpening opening, int bufferSize)
  throws IOException;
{code}

HDFS would override getFileOpening() to return a subclass that also contains block locations.  Phew!  Would anyone leverage this but HDFS?

> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12637681#action_12637681 ] 

Doug Cutting commented on HADOOP-4044:
--------------------------------------

> So is replacing 'return false;' with 'return new FSLinkBoolean(false)'. 

That's not a fair comparison.  I was talking about shorter names versus longer names.  You are talking about throwing exceptions versus adding a bit of code to return a value.  Both involve more-or-less code, but in one case that's the only issue, in the other there's a lot more to it.


> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch, symLink1.patch, symLink4.patch, symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Tsz Wo (Nicholas), SZE (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12639652#action_12639652 ] 

Tsz Wo (Nicholas), SZE commented on HADOOP-4044:
------------------------------------------------

Sanjay> if someone happens to create a dir called "hdfs:" and ...

Currently, ":" is probably not supported in path.  See HADOOP-2066.

> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: HADOOP-4044-strawman.patch, symLink1.patch, symLink1.patch, symLink4.patch, symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4044) Create symbolic links in HDFS

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur updated HADOOP-4044:
-------------------------------------

    Attachment: symLink14.patch

1. Performance testing: I have run DFSIOTest on 10 nodes only. I did not seen any performance degradation for any operations, including creates, deletes, reads or writes. I can re-run the test and post them here.

2. The problem is that the FileSystem.FSLinkResolver and FileSystem.resolve() will become more complicated. Also, every operation in DFSClient will have to convert a FsResultWritbale object into a FsResult object.

3. Fixed issues with FsResult constructor as you suggested.

4. I would like to remove the deprecated method DFSClient.getHints as part of a separate JIRA. 

> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>             Fix For: 0.20.0
>
>         Attachments: 4044_20081030spi.java, HADOOP-4044-strawman.patch, symLink1.patch, symLink1.patch, symLink11.patch, symLink12.patch, symLink13.patch, symLink14.patch, symLink4.patch, symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12638610#action_12638610 ] 

Doug Cutting commented on HADOOP-4044:
--------------------------------------

Oops.  That last method should instead read:
{code}
public FSInputStream open(Path p, ...) throws IOException {
  ResolvedPath resolved = getResolvedPath(p);
  return p.getFileSystem(conf).open(p);
}
{code}


> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch, symLink1.patch, symLink4.patch, symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12638357#action_12638357 ] 

Doug Cutting commented on HADOOP-4044:
--------------------------------------

I mistakenly attributed a comment above to Owen, when it was really Dhruba.  Sorry!  I should be more careful.

> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch, symLink1.patch, symLink4.patch, symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Konstantin Shvachko (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12638156#action_12638156 ] 

Konstantin Shvachko commented on HADOOP-4044:
---------------------------------------------

So the question is why this approach is not acceptable for you (if it is not)?
Is it because the programming style is not good or is it the performance or is it something else?
I am sorry that could not understand that from your previous comments.

> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch, symLink1.patch, symLink4.patch, symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4044) Create symbolic links in HDFS

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur updated HADOOP-4044:
-------------------------------------

    Attachment: symLink15.txt

Patch  merged with latest trunk. There are still some unit test failures.

> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: 4044_20081030spi.java, HADOOP-4044-strawman.patch, symLink1.patch, symLink1.patch, symLink11.patch, symLink12.patch, symLink13.patch, symLink14.patch, symLink15.txt, symLink4.patch, symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Allen Wittenauer (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12632696#action_12632696 ] 

Allen Wittenauer commented on HADOOP-4044:
------------------------------------------

1) Am I missing it, or is there no equivalent of chown -h / chmod -H?

2) What does the audit log look like when someone does operations on a symlink?  Does it show the resolved path or does it show the symlink when it is read?  What does symlink creation look like? 

> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch, symLink1.patch, symLink4.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Sanjay Radia (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12631045#action_12631045 ] 

Sanjay Radia commented on HADOOP-4044:
--------------------------------------

Symlink vs mount distinction.
-------------------------------------------
  The reason the two concepts merge in Hadoop (while they are distinct in other systems) is because Hadoop has file URIs that can point to arbitrary files/dirs in arbitrary volumes. This allows a mount to be merely specified by a URI of the target (as oppose to server name, protocol etc.). 

I think Allen has a valid point that remote symlinks to dirs are more useful than symlinks to files, especially for moving load (storage load or operational load) off to other name servers.

@JoyDeep's component by component look up file system.
  He proposes a two step approach -- if error occurs in following a symlink, then do component by component look up. If one is willing to return an error on a symlink then it equally easy to send more information with the the error ; the error information can denote the remaining path and one can easily implement symlinks efficiently.


Implementing a subset
------------------------------
 We should do a design that works for symliks to dirs or files; then implementing a subset to files is fine.

The use case for symlinks to directories is strong: 
 * HADOOP-4058  is made stronger using symlinks to dirs as Allen points out. Offloading entire dirs to  a remote volume is as useful if not more useful than offloading individual files.
 *  Symlinks to archive file systems (har://...) is useful and helps make the archive transparent.
 * Filesystems that have symlinks to files also have symlinks to dirs. (My own use of symlinks in Unix is mostly to dirs rather than files.)
 * Most file systems have a mount operation and its use case is well established.
   Mounts,  as pointed above, is best achieved in Hadoop as a symlink to dirs due to Hadoop's URI names. A separate API for mounts is not needed.

I am a little puzzled about why folks think kickbacks are hard. It will take a few discussions to get the internal interfaces right but it is not that complicated. Hence it is hard to do this for 0.19 but we can do it for 0.20.
Dhruba, looks like Facebook wants this subset feature in 0.19 - other wise it is best to wait for .20 and do it right.

My vote is that if we can determine  that the API for symlinks to files is strict subset then we can go ahead and do the subset in 0.19. Otherwise wait till 0.20 for the full solution.
(I think we can do the dot relative symlinks later  since Facebook's use case does not need it in 0.19).

We also need to determine the semantics of  other file system operations besides open.
A good way to proceed is  to create a table with a column for the full solution and another for the symlinks-to- files subset.
Q  If the symlink target happens to be a directory (or later gets changed to a dir) do we throw an exception?  

> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch, symLink1.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12630409#action_12630409 ] 

Doug Cutting commented on HADOOP-4044:
--------------------------------------

> component by component lookup is only done when sym links are encountered.

It depends on what you mean by "component-by-component".  If, in hdfs://nn/foo/bar/baz/boo, baz is a symbolic link to a different filesystem, then we should only make a single RPC to nn to resolve this link, not one for /foo, one for /foo/bar, etc.

> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch, symLink1.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12639629#action_12639629 ] 

Doug Cutting commented on HADOOP-4044:
--------------------------------------

Sanjay> if someone happens to create a dir called "hdfs:" and valid path foo/bar under it then the local file system will follow it and not return an error. Is this a concern?

Folks would have to use a double-slash in the symbolic link (which unix ignores) for Hadoop to interpret it as an external link.  So folks would have to use single-slash when linking to a directory whose name ends in colon to keep behavior consistent.  That doesn't seem like a big imposition.

Sanjay> Hence, on a "file does not exist" error, the client side lib will have to check if there are symlinks in the path.

Good point.  If we leverage Unix's automatic link following then we'd have to handle external links by catching FileNotFoundException.  We could instead loop calling lstat(), following links that are internal, and returning when either a link to a non file:// uri or an actual file is found.  This would require native code, but, this is a unix-only feature and we already provide native code for linux, so maybe that's acceptable.  Or we could call out to a shell, but that would probably be too slow.  Note that Java 7 adds API support for symbolic links, which will permit such looping logic, so that will provide a good way to implement this.

FileNotFoundException doesn't actually make this any easier, since it doesn't contain the path we need, as far as I can tell.  We'd still have to loop in native code or a shell script to find the external link, no?  So maybe the native code route is best?  We could implement just this bit of the Java 7 API, so when that arrives we can use it without changing much?


> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: HADOOP-4044-strawman.patch, symLink1.patch, symLink1.patch, symLink4.patch, symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12645341#action_12645341 ] 

dhruba borthakur commented on HADOOP-4044:
------------------------------------------

Thanks Sanjay and Robert. Will work your feedback and post a new patch.

> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: 4044_20081030spi.java, HADOOP-4044-strawman.patch, symLink1.patch, symLink1.patch, symLink11.patch, symLink4.patch, symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12628943#action_12628943 ] 

dhruba borthakur commented on HADOOP-4044:
------------------------------------------

> This means that a HDFS symbolic link can point only to other hdfs pathnames.

@Doug : I meant to say that "A HDFS symbolic link that does not have a full URI (absolute with schema, etc) will point to a path in the same HDFS instance".

@Raghu: Yes, it should be handed at the FileSystem level.

> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Sanjay Radia (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12649532#action_12649532 ] 

Sanjay Radia commented on HADOOP-4044:
--------------------------------------

Druba wrote
* Archives - current implementation does not follow symbolic links.
* distcp - pure FileSystem client, does not follow symbolic links

What is the desirable default behavior when a user does a distcp or an archive of a subtree that has symlinks to other 
volumes?
To avoid accidental copying of large amounts of data, I suspect that the default should be to copy the symlink as a symlink
rather than follow the symlink and copy its target.
If this is indeed the desired default behavior then the patch will need to fix that.


On a related note,  symlinks to an archive will throw an exception/error when one tries to follow it. That would require a fix to the archive file system; we should file a separate jira to fix that in the future.

> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>             Fix For: 0.20.0
>
>         Attachments: 4044_20081030spi.java, HADOOP-4044-strawman.patch, symLink1.patch, symLink1.patch, symLink11.patch, symLink12.patch, symLink13.patch, symLink14.patch, symLink4.patch, symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Sanjay Radia (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12637315#action_12637315 ] 

Sanjay Radia commented on HADOOP-4044:
--------------------------------------

> Let's not call something 'unclean' just because it is more work.

_It has nothing to do with more work -- I am always willing to do more work for the right clean solution._
 Konstantine also said it is "bizzare" that all the methods return FSLink. I am not the only one  finding this "unclean" or "bizzare".


_There is a the fundamental difference of opinion here:_ severl of us are arguing that the use of exceptions  is fine for recoverable conditions. It keeps the interface simple and clean.

BTW Do you find  such use of exceptions to be acceptable inside the  name node? Both Raghu and I raised this.
Are you suggesting that the namenode side of the code be changed to not use exceptions?
If so, you are consistent and it will be some work to change the internal name node code to use FSLink style approach; dhruba's patch does not do this.
If not, why the inconsistency in when exceptions are appropriate or not appropriate?
  
>The interfaces should reflect their function.
Having create() return FSLink does not help understand its function.
Return types like "FSLinkBoolean" and FSLinkBlockLocations confuse the function of the methods in an interface.
The symbolic link issue cuts across all interfaces that have  path names. Overloading the return type of each method that has path name parameters
dilutes how the well the method signature reflects it function. 
The method signatures and datatypes are easier to understand if we use exceptions (which in the opinion of some of us  is acceptable for recoverable conditions).

> Other FSs (e.g., linux VFS) don't try to do this in a single call, but rather loop, checking whether each element in the path is a link.
No one is suggesting that we handle it as a single call -- with  exceptions it will loop in exactly the same fashion. Perhaps you have misunderstood the
alternate solution being proposed.

>We can try to hack around that by using exceptions or thread locals or output parameters,
> but those are all just attempts to try to make this API look
>like other APIs rather than embrace its differences.
(Q. By overloading the return type aren't you are using output parameters? The return type is an output parameter.)
Many will consider the overloading of return types also hacky (and this will get worse when we have other reasons in the future when client side needs to perform a recovery function.).  Clearly we have a difference in opinion on what is hacky or not.

Comparing to other file systems APIs is not fair here for two reasons:
  1) they do not use a programming language  that has first class support for exceptions (both checked and unchecked).
  2) Mounts and symbolic links in most systems are handled on the server side  - in most FSs, clients cannot process links.
The only possible exception (no pun intended) is Slash-relative symbolic links in NFS (I don't know if NFS supports slash-relative symbolic links that are kicked back to be relative to the client's view of Slash (root)).

In the two approaches being discussed in this Jira, both are looping to process the operation across links/mount points. The difference is how the
condition is communicated back to caller and the impact it has on the method signatures  and data types used as parameters or return types.
There is a clear difference of opinion here.
I am interpreting your reference URL link  differently - I am reading it and seeing that using exceptions for recovery is acceptable. Further following a mount point link is *not normal* and hence the use of exception is acceptable. Joshau's _Effective Java_ also, as per my reading,  advocates the use of checked exceptions for recoverable conditions. You will probably interpret that text differently or disagree with it.


 



> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch, symLink1.patch, symLink4.patch, symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12637324#action_12637324 ] 

Owen O'Malley commented on HADOOP-4044:
---------------------------------------

My preference is toward not using exceptions for nominal cases, and I agree with Doug that links seem like nominal cases. Clearly, however, it isn't a slam dunk.

At the fear of crossing streams, I think that the classes like FSLinkBoolean are problematic. I think we are better off creating a type for each method's return type. I think that a lot of these return types should implement a class/interface (*Owen ducks*) that provides the common interface for dereferencing the links.

The reason for having

{code}
RenameResult rename(Path src, Path dst) throws IOException;
FSLinkBoolean rename(Path src, Path dst) throws IOException;
{code}

would be that if we want our protocol to be future-proof, rename may well need to add additional return values. If we have a bunch of methods all returning the same type, that is hard.

Thoughts?


> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch, symLink1.patch, symLink4.patch, symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Joydeep Sen Sarma (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12630292#action_12630292 ] 

Joydeep Sen Sarma commented on HADOOP-4044:
-------------------------------------------

wrt Allen's comment - i think it's not without merit. he's probably thinking about moving things within the same hdfs instance - in which case having to have a symlink per leaf is going to lead to doubling of inodes (for the subtree being moved).

Couple of orthogonal ideas:
1. would it not be easy to write a new file system client that does component by component lookup? (instead of DFSClient behavior). this would not seem to require any namenode changes.
2. support symlinks only within the same hdfs instance. the namenode can resolve things internally - no changes to the client.

then we can combine these two approaches. The namenode can return specific error if it finds the path to be resolved has reference to external fs (using work done in #2). The regular DFSClient can revert to special client file system that does component by component lookup (work done in #1) in case of this error. this will prevent having to incur the overhead of component by component lookup unless one hits a symlink that crosses fs boundary.


> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch, symLink1.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12637608#action_12637608 ] 

Doug Cutting commented on HADOOP-4044:
--------------------------------------

> Doug should we pay the cost and do it the "right way" now?

Yes, of course.

> I prefer to use exceptions [ ... ]

But why?  You've said it's "cleaner" and "simpler".  I don't follow your argument.  "Simpler" could mean fewer lines of code.  I don't think it will actually change the lines of code much.  Changing the return types will *change* more lines of code, hence the change is more complex, but the resulting code will probably be around the same size, regardless of which approach is taken.  "Cleaner" is subjective, like "better", and requires elaboration.

My argument is that we should model normal functionality with normal programming: methods, parameters, return values and data structures.  Do you disagree with this?  Or do you think that links are not normal functionality?

You keep arguing that "exceptions can be used for recoverable conditions".  Indeed they can.  But is a link really best modeled as a "recoverable condition"?  This sounds to me like a rationalization for using exceptions, not a reason to use exceptions over data structures.

Please provide a reasoned explanation for your disagreement.  Otherwise it is hard to move forward.

> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch, symLink1.patch, symLink4.patch, symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Allen Wittenauer (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12630444#action_12630444 ] 

Allen Wittenauer commented on HADOOP-4044:
------------------------------------------

-1 to both, because I think we're rushing this without a significant amount of thought as to what a multi-filesystem directory structure should look like.

In my mind, cross-filesystem "symlinks" really don't seem like "real" symlinks to me, but a sort of hacked version of autofs support.  How do you determine if a file is "real" or not?  How do I "undo" a symlink?  What does this look like from an audit log perspective? Is this really just a "short cut" to providing the equivalent of mount?

If we ignore what I said above :) , I'm trying to think of a use case where I wouldn't want to symlink an entire dir.  A single file?  Really?  

I think it is much more interesting to provide a symlink to a directory.  I can have "data/v1.423" and then have a symlink called LATEST that points to the latest version of a given versioned data set.

> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch, symLink1.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12630428#action_12630428 ] 

Raghu Angadi commented on HADOOP-4044:
--------------------------------------

-1 for the reasons for not doing the full symlinks : It will miss 0.19 and/or it is too complex.

-0  for Proposal1
-1 for Proposal2 

> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch, symLink1.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12629284#action_12629284 ] 

Raghu Angadi commented on HADOOP-4044:
--------------------------------------

We can sort of consider FileSystem to be at similar level as a VFS in Linux. In order to handle similar extensions in future as well, I think this might be good time for an implementation to take a object (wrapping arguments) and return a object that contains relevant info. Main goal is not to require actual implementation API changes when something like this is added. 


symlinks need to be handled for file system operations as well... create(), getListing, etc.

> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12637974#action_12637974 ] 

Doug Cutting commented on HADOOP-4044:
--------------------------------------

Stepping back a bit, I don't believe this dispute is fundamentally about programming style.  That, like Owen's arguments about RPCs, are supporting arguments.  The primary issue is that we're changing the FileSystem API in a fundamental way.  Instead of direct functions that map parameters to values, we wish to implement conditional functions, that do one of two things, and indicate in their response which path was taken.  This is a very different style, one that is forced on us for performance reasons.  The alternative is to double or more the number of RPCs per file operation, which is unacceptable.

If we're using a different style, then it should look different.  It should not look like the direct mapping of the existing API.  We should not attempt to hide this, as that will lead to confusion.  We should make it apparent to readers of the code that these methods return not a single type of value, but one of two kinds of values.  This is not a matter of exception style, it's a matter of expressiveness and readability.  Code should make clear what it does.  Style guidelines derive from this.  The reason exceptions are discouraged for normal control flow is that they hide control flow.  We are factoring our FileSystem API in a different way, and that factoring should be apparent to all as an obvious, natural attribute of the API.



> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch, symLink1.patch, symLink4.patch, symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12637318#action_12637318 ] 

Doug Cutting commented on HADOOP-4044:
--------------------------------------

> BTW Do you find such use of exceptions to be acceptable inside the name node?

No.

I thought I made it clear that I had not reviewed the namenode code changes, but only those in the public APIs.

> No one is suggesting that we handle it as a single call 

Linux's VFS does not handle both open and isLink in a single call, the way we'd like to.

> I am interpreting your reference URL link differently - I am reading it and seeing that using exceptions for recovery is acceptable.

That's a separate issue.  Recovery from an unexpected condition out of the control of the code in question.  That's not the case here.  Links are an expected, normal condition.


> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch, symLink1.patch, symLink4.patch, symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12630239#action_12630239 ] 

Doug Cutting commented on HADOOP-4044:
--------------------------------------

> There will be no enforcement of whether the target is a directory or not.

To be clear, under your proposal, if one does link hdfs:///foo/bar to s3:///bar, and s3:///bar/baz exists, if one tries to list hdfs:///foo/bar/baz one will get a FileNotFound exception.  Is that right?  If so, it seems like a major loss in functionality.



> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch, symLink1.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Sanjay Radia (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12637140#action_12637140 ] 

Sanjay Radia commented on HADOOP-4044:
--------------------------------------

Some minor comments on FileSystem side of the patch.

- I prefer the name symlink instead of link. Unix terminology uses link for hard link and symlink for symbolic link. I think we should stick to it
regardless of whether or not we plan to add hard links in the future.
- Dhruba you left some of the implementation methods public, they should be protected.
- minor -- the method next() suggests that one follows the path to the next symbolic link. It does not strictly. I suggest calling it doOperation() (this is 
style issue and so merely a soft suggestion). 

> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch, symLink1.patch, symLink4.patch, symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Sanjay Radia (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12638017#action_12638017 ] 

Sanjay Radia commented on HADOOP-4044:
--------------------------------------

Assuming that the vote for not using exceptions for rpc interface passes (I have voted the +0), should we consider modifying the
rpc interface a little further:
- return all HDFS exceptions as part of the return status rather then as exceptions. The only exception returned is the IOException when the rpc fails for some reason. 
- this would make the approach consistent - all status and information returned from the NN is part of the XXXResult object for each method XXX.

> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch, symLink1.patch, symLink4.patch, symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12637670#action_12637670 ] 

Doug Cutting commented on HADOOP-4044:
--------------------------------------

> I was only wondering why we are ok with compromise in one aspect and not about the other.

What compromise is this?

> it makes perfect sense. My objection has been calling it open() and returning link some times.

So it was all about the method name?  Okay, then please suggest new method names and return type names, rather than suggesting we use exceptions to return normal data.  I personally find the 'xOry' names useful in pseudo code, when I'm trying to make a point, but a bit verbose in real code.  We don't call FileStatus NameAndModifiedDateAndOwnerAnd....

> This also allows HDFS to use exceptions since we don't want to change the names.

What?  I don't follow your logic here.  Why can't HDFS change method names?



> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch, symLink1.patch, symLink4.patch, symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12628735#action_12628735 ] 

Raghu Angadi commented on HADOOP-4044:
--------------------------------------

I think it will result in a cleaner design/implementation to handle this at FileSystem level (thus handling arbitrary URIs).

> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4044) Create symbolic links in HDFS

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur updated HADOOP-4044:
-------------------------------------

    Attachment: symLink8.patch

Incorporated Doug's review comments. More unit tests coming soon.

> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch, symLink1.patch, symLink4.patch, symLink5.patch, symLink6.patch, symLink8.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12637676#action_12637676 ] 

Raghu Angadi commented on HADOOP-4044:
--------------------------------------

> Why can't HDFS change method names?

Right. They can.

> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch, symLink1.patch, symLink4.patch, symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Konstantin Shvachko (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12637319#action_12637319 ] 

Konstantin Shvachko commented on HADOOP-4044:
---------------------------------------------

> public class FsReturnType<T> 

Interesting idea. You will rather have something like
{code}
public class FSReturnType<T extends Writable> implements Writable {
  public T getValue();
  Path getLink() throws IOException;
}
{code}
And implement serialization methods, and think how to define WritableFactory for the parametrized class.
And then the protocol methods will look like this
{code}
public FSReturnType<BooleanWritable> setReplication(String src,...); 
{code}

This is better than creating a new class for every new return type, but seems rather complex compared to one new protocol method plus the UnresolvedPathException that I was proposing.

> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch, symLink1.patch, symLink4.patch, symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12638108#action_12638108 ] 

Doug Cutting commented on HADOOP-4044:
--------------------------------------

> why don't we accept the solution that does not require any fundamental changes to the FileSystem API

But it *does* change the API fundamentally.  All of these methods are now conditional.  They now either return the declared value or a boolean indicating that the declared value is to be ignored and that one should follow up with a request for the link value.  That's a substantial change, that shouldn't be cloaked in exception handling.

We might instead consider changing the contract of these methods to permit the return of a null value, and interpret a null value as the need to request the link value.  Unfortunately that won't work with the boolean methods, and it's a fragile approach, since folks might forget to check for null.  It's better to make it explicit with a datastructure.

> My proposal does "not double the number of RPCs per file operation".

I did not mean to imply that it did.  I'm sorry that you thought I did.  When I referred to doubling the number of RPCs I was contrasting with the naive approach of calling resolveLink() or somesuch explicitly before open(), so that open() is never passed a path that contains a link, as this is implemented in Unix and other VFSs.

Your approach would add an RPC whenever a symbolic link is encountered over the other approaches, since the link value would not be returned directly with the call to open() but would require a second RPC, right?

> I keep wondering what is your selective criteria for ignoring some arguments and answering other.

What arguments am I ignoring?  I am sorry if you feel I've ignored some of your arguments.  Please highlight them, if you like.

> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch, symLink1.patch, symLink4.patch, symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Konstantin Shvachko (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12638367#action_12638367 ] 

Konstantin Shvachko commented on HADOOP-4044:
---------------------------------------------

I understand your answer to my question [HADOOP-4044#action_12638156] is that
_this argument is about the programming style_
rather than performance or anything else. Or do you call differently?

Qouting one of the links you provide 
"Checked exceptions : represent invalid conditions in areas outside the immediate control of the program"

If a program cannot cannot handle some condition it throws a checked exception.
So the question of whether the value of a link is normal data is actually program layer-dependent.
- Can it be handled within the name-node. Yes for local links - handle them. No for external links. Throw the exception.
- Can it be handled by ClientProtocol. No. Throw the exception.
- Can it be handled by DFSClient. No. Throw the exception.
- Can it be handled by DistributedFileSystem. No. Throw the exception.
- Can it be handled by FileSystem. Yes! because FileSystem cashes file system implementations.

FileSystem is the first layer for which link  values make sense.


> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch, symLink1.patch, symLink4.patch, symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12635009#action_12635009 ] 

dhruba borthakur commented on HADOOP-4044:
------------------------------------------

I vote for using exceptions internal to the NameNode to keep internal interfaces clean and elegant. 

> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch, symLink1.patch, symLink4.patch, symLink5.patch, symLink6.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12638193#action_12638193 ] 

dhruba borthakur commented on HADOOP-4044:
------------------------------------------

>From my perspective, I am not buying (yet) Doug's argument of saying that not-throwing exceptions inside the namenode is a *better* programming style. If there is a proposal to change all internal namenode methods to return object-return-types instead of throwing UnresolvedPathException, then I am against it.

However, for the RPC interfaces, I like Doug's programming style of returning object-as-return-types rather than throwing exceptions, especially in the case of symbolic links.


> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch, symLink1.patch, symLink4.patch, symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Tsz Wo (Nicholas), SZE (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12638614#action_12638614 ] 

Tsz Wo (Nicholas), SZE commented on HADOOP-4044:
------------------------------------------------

> That last method should instead read ...

I guess we need an openImpl(...) method.  Otherwise, it would be an infinite loop.

> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch, symLink1.patch, symLink4.patch, symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Konstantin Shvachko (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12638616#action_12638616 ] 

Konstantin Shvachko commented on HADOOP-4044:
---------------------------------------------

I am replying to this Doug's comment HADOOP-4044#action_12638405.
Was thinking about it all night and have to admit I am not quite sure how to understand it.

We were talking about the mount point exception handling on different layers of the implementation of HDFS.
Looks like that based on the references you provide checked exceptions can be used when a particular layer does not have an immediate response to the condition causing the exception.

If your comment means that you can tolerate mount point exception on all levels of HDFS implementation up to (excluding) the FileSystem api, then I anticipate a big sigh of relief from a large group of developers. And we can turn to discussion of the FileSystem api return types, which Owen is already successfully conducting.

Otherwise, my understanding of the comment is that you are introducing a new argument (line of defense) for your dual-return-type approach, and that means that we are back at the beginning and will have to start over the whole discussion again.
Please clarify.

> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch, symLink1.patch, symLink4.patch, symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12638334#action_12638334 ] 

Doug Cutting commented on HADOOP-4044:
--------------------------------------

To make my position clear: I am -1 on using exceptions to signal links in the FileSystem SPI.  I would also recommend against using them in the Namenode too, but, as I stated before, that's not a public API, and we can clean it up later.  I believe we should not use exceptions for control flow there either, but would not veto this patch for that.


> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch, symLink1.patch, symLink4.patch, symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12630354#action_12630354 ] 

Doug Cutting commented on HADOOP-4044:
--------------------------------------

> would it not be easy to write a new file system client that does component by component lookup?

Sure, that's possible, but it would result in lots more RPCs per file operation.  We don't want to multiply the load on the namenode in this way.  So this should only involve one RPC per foreign link traversed rather than per component.


> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch, symLink1.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12630395#action_12630395 ] 

Raghu Angadi commented on HADOOP-4044:
--------------------------------------

> doing it in a way where the component by component lookup is only done when sym links are encountered. 

+1. this was implicit in most of  the proposals I think.

> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch, symLink1.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Tsz Wo (Nicholas), SZE (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12638460#action_12638460 ] 

Tsz Wo (Nicholas), SZE commented on HADOOP-4044:
------------------------------------------------

> It seems like the only motivation to call this an error is to justify using exceptions to handle links, not that it would in fact be a user or program error to call it in this way.

It is nothing wrong to assume parameter f a non-linked path.  For performance reason, clients are responsible to resolve links before using it since resolving links is expensive, especially for cross-file-system links.

Suppose we have the following link
  hdfs://clusterA/project/xxx ->  hdfs://clusterB/yyy

If a client submit a job with hdfs://clusterA/project/xxx/input and hdfs://clusterA/project/xxx/output, all the input and output file accesses involve two clusters and require two calls.  However, if the client uses hdfs://clusterB/yyy, only one cluster is involved.

> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch, symLink1.patch, symLink4.patch, symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4044) Create symbolic links in HDFS

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur updated HADOOP-4044:
-------------------------------------

    Attachment: symLink15.txt

The unit test TestLink was failing because the dfs.support.append was not set. Fixed.

> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: 4044_20081030spi.java, HADOOP-4044-strawman.patch, symLink1.patch, symLink1.patch, symLink11.patch, symLink12.patch, symLink13.patch, symLink14.patch, symLink15.txt, symLink15.txt, symLink4.patch, symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Sanjay Radia (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12645124#action_12645124 ] 

Sanjay Radia commented on HADOOP-4044:
--------------------------------------

Went though the patch. Looks good.
Feedback:
 1) Why does createSymLink create the group name as "none" rather than pass a null parameter to use the
group name from the parent dir  similar to the create and mkdir operations.

2) rename and createSymlink both take two path parameters - either of the paths can traverse a symlink.
   createSymlink does not handle this while rename does.
   An alternate way to handle this (compared to how rename does) is to have the FsResult return a flag saying which of the two paths cross a symlink.

3) getFileInode throws UnresolvedPathException. However, LeaseManager:findPath() was changed to
throw IOException (it threw no exceptions before). findPath should be declared to throw only the UnresolvedPathException.

4) There is a test to check that symlinks are correctly persisted in fsimage/editslogs. 
Please add TestSymlinks that tests for symlinks for each of FileSystem methods that take one or more path names. For those that take multiple path names, the test should test for symlinks in either parameter.



> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: 4044_20081030spi.java, HADOOP-4044-strawman.patch, symLink1.patch, symLink1.patch, symLink11.patch, symLink4.patch, symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12634978#action_12634978 ] 

Raghu Angadi commented on HADOOP-4044:
--------------------------------------

+1 for having "impl" methods for FileSystem implementations.

Currently the patch does both of these :
 - Implementation throw {{UnresolvedPathException}}
 - Many return types are changed to return this path information.

Are both of these required or just one? I think throwing the exception is enough and might be cleaner w.r.t interface. The api can return what makes sense for it rather worrying about link related class.


> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch, symLink1.patch, symLink4.patch, symLink5.patch, symLink6.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Sanjay Radia (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12639653#action_12639653 ] 

Sanjay Radia commented on HADOOP-4044:
--------------------------------------

Owen> I would probably use a thread local variable to measure the depth of the traversal. It isn't great, but it would be encapsulated in LinkResult.

Doug> I see. Yuck.

I second the Yuck.

Doug> Perhaps we're prematurely optimizing here. Perhaps the extra RPCs to resolve paths won't hurt things that much. 

The above was in reference to Dhruba's approach of using the FSLinkResolver abstract class  verses  Owen's more inline approach.
While Dhruba's FSLinkResolver is a little more complex, it is not complex enough to warrant doing the additional RPC.  
We might be able to further simplify it by using a LinkableFileSystem base class as Doug suggested.

I really don't like the extra rpc  for any method that has a pathname as a parameter regardless  of whether or not a link is in the path.

> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: HADOOP-4044-strawman.patch, symLink1.patch, symLink1.patch, symLink4.patch, symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12630123#action_12630123 ] 

dhruba borthakur commented on HADOOP-4044:
------------------------------------------

@Raghu: The FSDataInputStream will implement another interface StreamType that will store the information on whether this is a symbolc link. No extra RPC is needed.

@Doug: By definition, the filesystem has no control about the target of a symbolic link. The user could make it point to a non-existent path too. There will be no enforcement of whether the target is a directory or not.

@Allen: This proposal should not introduce additional grief for namenode. If you replace a file with a symbolic link, it still occupies one inode. No additional inodes are necessary. I agree that admins might like symbolic links to directories, but maybe we can do that at a later date when the user-case becomes more concrete. Does it sound acceptable?

The simplicity for the filesystem is that no "kickbacks" while traversing paths are needed.

{noformat}

  /**
   * Opens an FSDataInputStream at the indicated Path.
   * This does not follow symbolic links.
   * @param f the file name to open
   * @param bufferSize the size of the buffer to be used.
   */
  public abstract FSDataInputStream openfs(Path f, int bufferSize)
    throws IOException;

  /**
   * Opens an FSDataInputStream at the indicated Path.
   * If the specified path is a symbolic link, then open the
   * target of the symbolic link.
   * @param f the file to open
   */
  public FSDataInputStream open(Path f) throws IOException {
    return open(f, getConf().getInt("io.file.buffer.size", 4096));
  }

  /**
   * Opens an FSDataInputStream at the indicated Path.
   * @param f the file to open
   * @param bufferSize the size of the buffer to be used.
   */
  public FSDataInputStream open(Path f, int bufferSize)
    throws IOException {
    FileSystem fs = this;
    FSDataInputStream in = fs.openfs(f, bufferSize);
    while (in.isSymlink()) {
      // construct new path pointed to by symlink
      Path newpath = new Path(in.getSymlink());
      if (!newpath.isAbsolute()) {
        newpath = new Path(f.getParent(), newpath);
      }
      in.close();
      f = newpath;
      fs = f.getFileSystem(getConf());
      LOG.warn("XXX Opening symlink at " + f);
      in = fs.openfs(f, bufferSize);
    }
    return in;
  }


public class FSDataInputStream extends DataInputStream
    implements Seekable, PositionedReadable, StreamType {
....
}


/** Types of streams */
interface StreamType {
  /**
   * Is this stream a symbolic link?
   */
  public boolean isSymlink() throws IOException;

  /**
   * Return the contents of the symlink
   */
  public String getSymlink() throws IOException;
}

{noformat}

> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12635796#action_12635796 ] 

Doug Cutting commented on HADOOP-4044:
--------------------------------------

This is looking good.  A few nits:
- let's call it just 'link' not 'symlink', okay?  If we ever implement generic hard links, we'd make those the qualified case.
- several implementations of getSymlink() still do not throw exceptions when isLink() is false.
- FSDataInputStream and FSDataOutputStream should not directly implement FSLinkable.  Rather, openImpl, createImpl and appendImpl should return FSDataInputStreamLink and FSDataOutputStreamLink, classes which wrap a stream and a link, then the public method can dereference this to return FSDataInputStream and FSDataOutputStream as before.
 - shouldn't all of the implementations of Impl methods be protected, not public?

Note: I am reviewing only the src/core sections of this patch, not the src/hdfs.


> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch, symLink1.patch, symLink4.patch, symLink5.patch, symLink6.patch, symLink8.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12637672#action_12637672 ] 

Raghu Angadi commented on HADOOP-4044:
--------------------------------------

>[...] but a bit verbose in real code. 

So is replacing 'return false;' with 'return new FSLinkBoolean(false)'. 

And as developers we have to look at it and deal with it everyday :). 

> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch, symLink1.patch, symLink4.patch, symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12638625#action_12638625 ] 

Doug Cutting commented on HADOOP-4044:
--------------------------------------

Konstantin> checked exceptions can be used when a particular layer does not have an immediate response to the condition causing the exception

First, in this case, I consider HDFS to be a single layer, or module.  HDFS controls all of its components.  The calls on the stack above that are still part of HDFS are known to HDFS, are under HDFS's control and are modified in a coordinated manner, as a unit.  In particular, all of HDFS knows that a link can be to an external FileSystem, since this is a generic property of all FileSystems.  This is a safe assumption for any code in HDFS to make, and does not violate any abstraction boundaries within HDFS.

An exception is justified when the best handler for it is unknown.  It is not the immediacy of the handler, or the proximity of the handler, but whether the handler is fully determined.  If the handler is known, then exceptions are just being used as a convenient message passing channel.  When the destination is known, methods, parameters and return values are the preferred message passing mechanisms in object-oriented programming.

Does that address your question?


> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch, symLink1.patch, symLink4.patch, symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4044) Create symbolic links in HDFS

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur updated HADOOP-4044:
-------------------------------------

    Status: Open  (was: Patch Available)

Investigate failed unit test and findbugs problems.

> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: 4044_20081030spi.java, HADOOP-4044-strawman.patch, symLink1.patch, symLink1.patch, symLink11.patch, symLink12.patch, symLink4.patch, symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4044) Create symbolic links in HDFS

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur updated HADOOP-4044:
-------------------------------------

    Status: Patch Available  (was: Open)

> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: 4044_20081030spi.java, HADOOP-4044-strawman.patch, symLink1.patch, symLink1.patch, symLink11.patch, symLink12.patch, symLink4.patch, symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Nigel Daley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12649196#action_12649196 ] 

Nigel Daley commented on HADOOP-4044:
-------------------------------------

Dhruba, I'm having trouble enumerating all the test cases that are covered by the unit tests.  Can you write a test plan that outlines the test cases?  More than that, I'm having understanding the design of this 200Kb patch from the code and the last 2.5 months of comments on this Jira.  Is there a design document for this?

>From first glance, I don't see test cases for these:
 - deleting a symlink that points to a file
 - deleting a symlink that points to a directory
 - symlink to a symlink
 - any test cases with relative links
 - delete linked to file, but not link, access link
 - delete linked to file, recreate the same file, access link
 - links to other filesystems
 - archives containing links
 - rename a file to an existing symlink
 - rename a symlink to an existing file
 - setTimes and setReplication cases should check the result on both the symlink and linked to file
 - tests involving lack of permissions to create symlinks
 - tests that try to createSymlinks with filesystems that don't support them
 - ...

I'm also left with a number of questions:

1. Does setReplication, setTimes, chmod, chgrp, chown apply to the symlink or the underlying file/dir that the symlink point to?
2. Does fsck report symlinks?
3. Does har:// protocol support symlinks in the path leading to har file?  In har file itself?
4. Why doesn't the main method for this patch, FileSystem.createSymlink(), which declares to throw IOException, no have any @throws clause in the javadoc.
5. Under what conditions does FileSystem.createSymlink() throw IOException?
...and many more.

> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>             Fix For: 0.20.0
>
>         Attachments: 4044_20081030spi.java, HADOOP-4044-strawman.patch, symLink1.patch, symLink1.patch, symLink11.patch, symLink12.patch, symLink13.patch, symLink14.patch, symLink4.patch, symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4044) Create symbolic links in HDFS

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur updated HADOOP-4044:
-------------------------------------

    Attachment: symLink12.patch

Incorporated Sanjay's review comments. The FileSystem.createSymLink() does take two parameters, but only the first one needs to be "resolved". The second one can be anything (just a blob) and does not need to be resolved. In fact, the second parameter could be a dangling non-existent path.

Included test cases for all FileSystem APIs that were modified. 

>From Rob''s list of tests
    *  Shell commands  -- unit test included
    * Trash -- unit test included (because Trash is a pure FileSystem client)
    * Archives -- current implementation does not follow symbolic links. 
    * distcp -- pure FileSystem client, does not follow symbolic links 
    * chmod, chown, chgrp APIs -- unit test included
    * Rename - unit test included


> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: 4044_20081030spi.java, HADOOP-4044-strawman.patch, symLink1.patch, symLink1.patch, symLink11.patch, symLink12.patch, symLink4.patch, symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4044) Create symbolic links in HDFS

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur updated HADOOP-4044:
-------------------------------------

    Attachment: symLink1.patch

This patch demonstrates the proposed changes to the FileSystem API. I would like early-comments to the changes to the public API before I implement the support for symlinks in HDFS.

> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12637328#action_12637328 ] 

Raghu Angadi commented on HADOOP-4044:
--------------------------------------

Doug wrote :
> Exceptions are intended for cases where you're not sure who will catch them. [...]

UnresolvedLinkException seems like such a case, when it is thrown, we don't know whether it will be handled inside NameNode, or DFS or FS on the namenode.. it might be same with other FS implementations too. 

> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch, symLink1.patch, symLink4.patch, symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Sanjay Radia (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12637335#action_12637335 ] 

Sanjay Radia commented on HADOOP-4044:
--------------------------------------

Owen says:
>My preference is toward not using exceptions for nominal cases, and I agree with Doug that links seem like nominal cases. Clearly, however, it isn't a slam dunk.
>At the fear of crossing streams, I think that the classes like FSLinkBoolean are problematic. I think we are better off creating a type for each method's return type. I think that a lot of these return types should implement a class/interface (Owen ducks) that provides the common interface for dereferencing the links.

What you loose with  owen's class/interface suggestion is that the signature not longer indicates 
that the method returns a boolean or an array of locatedBlocks or an  file descriptor etc.
 I think Dhruba was trying to preserve that aspect of the original interfaces where by reading the name of the method and the names and types of the
input and output parameter, once can easily guess the semantics of the method.
 (the eraser I threw at Owen bounced off the screen). 
But Owen is right in that his approach is a little more future proof (and exceptions are more  future proof :-)

Owen what do you propose for the internal interfaces in the name node? -- do you suggest that we remove the symlink-related-exception  and use your structure  every where in the FS and HDFS code?
If not, why the inconsistency? 

> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch, symLink1.patch, symLink4.patch, symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12630015#action_12630015 ] 

dhruba borthakur commented on HADOOP-4044:
------------------------------------------

I would like to propose that the symbolic links point only to files (and not to directories). Why this restriction:

1. Symbolic links that point to directories are more like mount-points.
2. My current use-case (HADOOP-4058) does not require symbolic links to point to directories.
3. Directories in HDFS are more like object-references to file(s), they do not yet have all POSIX semantics associated with them. The clients do not parse components of the directory path, instead sends the entire pathname to the server for a lookup.
4. Avoiding symbolic links that point to directories keeps the code simple (till we really need it). An open(/foo/bar/zoo) where bar is a symbolic link does not arise.

In this proposal, a symbolic link can still be relative or absolute. No kickback (via exception or otherwise) is needed to implement this proposal. Also, this proposal does not preclude implementing the "symbolic links to directories" in the future.


> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12637620#action_12637620 ] 

Doug Cutting commented on HADOOP-4044:
--------------------------------------

> Another trueism is that returning unrelated values is API pollution.

Sure, but the values here are not unrelated.  They're core.

> I guess one end of the opinion is that retuning link is normal and other end thinks it isn't.

Okay, so, if that's the case, how is it abnormal?  The traditional way to model this would be something like:
{code}
public FileHandle open(Path path) {
  for (element in path) {
    FileStatus stat = fsImpl.getStatus(element);
    if (stat.isLink()) {
      return open(stat.getLinkTarget());
    } else if stat.isFile() {
      return fsImpl.open(element);
    }
  }
{code}
I don't think anyone would argue that the values of isLink() and getLinkTarget() are not normal data.  But we don't want to make this many calls to our FS implementations, since they're RPCs.  So, for performance, we want to package that same data in a different way, with something like:
{code}
public FileHandle open(Path path) {
    LinkOrFileHandle linkOrFileHandle = getNextLinkOrOpen(path);
    while (linkOrFileHandle.isLink()) {
       linkOrFileHandle = getNextLinkOrOpen(linkOrFileHandle.getLink());
    }
    return linkOrFileHandle.getFileHandle();
}
{code}
Why does this suddenly make the that same data non-normal?


> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch, symLink1.patch, symLink4.patch, symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Allen Wittenauer (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12632744#action_12632744 ] 

Allen Wittenauer commented on HADOOP-4044:
------------------------------------------

chmod -H (under BSD, at least):

     -H    If the -R option is specified, symbolic links on the command line
             are followed.  (Symbolic links encountered in the tree traversal
             are not followed by default.)

This likely only makes sense if the symlink is pointing to the local file system or another HDFS, though.


> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch, symLink1.patch, symLink4.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12630026#action_12630026 ] 

Doug Cutting commented on HADOOP-4044:
--------------------------------------

> symbolic links point only to files (and not to directories).

Isn't that hard to control?  How would we stop someone from replacing the target of a symlink with a directory?


> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Konstantin Shvachko (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12637702#action_12637702 ] 

Konstantin Shvachko commented on HADOOP-4044:
---------------------------------------------

Doug> whether to use an exception or a data structure to pass the return value.

!!! No need to pass the return value with the exception !!!
{code}
public FileHandle open(Path path) {
  while(path != null) {
    try {
      return open(path); // succeeds if there are no links in the path
    } catch (XMountPointException eX) { // called only if the mount point is crossed
      path = getLinkTarget(path);
    }
  }
}
{code}
There is no need to change api with this. Only add a new method getLinkTarget().
And XMountPointException does not encode or pass any values.

> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch, symLink1.patch, symLink4.patch, symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Edward Capriolo (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12700905#action_12700905 ] 

Edward Capriolo commented on HADOOP-4044:
-----------------------------------------

I am using/helping the hadoop-hive subproject. I wanted to share a use case for symlinks.

For example suppose a directory inside hadoop:
/user/edward/weblogs/{web1.log,web2.log,web3.log}. I can use a Hive EXTERNAL
table to point to the parent directory. I can then use Hive to query this external table. This is very powerful. This will work unless another file in this directory with a different format is also in the directory web_logsummary.csv. (this is my case)

Being able to drop in a 'symlink' where a file would go could be used to create structures from already existing data. Imagine a user that has a large hadoop deployment and is wishing to migrate/ start using hive. External table is constrained to one directory. They would need to recode application paths and or move files. If you had a 'symlink' concept anyone can start using hive without re-organizing or copying data.

Right now, hive has a lot of facilities to deal with input formats, such as specifying delimiters etc, but forcing the data either into a warehouse or into an external table is limiting. 'Symlinks' tied together with hive's current input format capabilities would make hive more versatile. 

> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: 4044_20081030spi.java, HADOOP-4044-strawman.patch, symLink1.patch, symLink1.patch, symLink11.patch, symLink12.patch, symLink13.patch, symLink14.patch, symLink4.patch, symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Konstantin Shvachko (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12637240#action_12637240 ] 

Konstantin Shvachko commented on HADOOP-4044:
---------------------------------------------

The code looks bizarre (or unclean) with all the FSLink*** return types.
I do not like when exceptions are used for normal control flow.
But looking at what it takes to avoid it I lean to the idea that may be this 
is the exceptional case when we should divert from the general principles.

I see two compromise solutions.

# Introduce a new method {{Path getSymLink(Path src)}}, which returns first symbolic link in the {{src}} path.
Then all methods {(create(), getBlockLocations(), rename()}}, etc. can throw the {{UnresolvedPathException}} all the way to the application via FileSystem API. The application can then call {{getSymLink()}} to resolve the mount point. 
{{UnresolvedPathException}} should be a regular subclass of IOException without any hints on how the sym link should be resolved. 
I understand that local links are resolved on the name-node. 
So {{UnresolvedPathException}} is thrown only when a mount point is crossed. This is uncommon and should not be optimized. 
The application itself should call {{getSymLink()}} after either receiving the {{UnresolvedPathException}} or if it knows about the mount point from its previous activity.
# Another "clean" way to handle this would be to introduce output parameters in RPC framework.
In current RPC all arguments are input only and the only way to return a value is by explicitly specifying the return value of the protocol method. 
And if you need to return several values you have to wrap them into an object.
This is exactly what makes FSLink*** so ugly.
If we had a way to specify that some arguments are output values then the api would more reasonable. E.g.
{code}
public FSLink getFileInfo(String inSrc, FileStatus outStatus) throws IOException;
{code}
vs 
{code}
public FSLinkFileStatus getFileInfo(String src) throws IOException;
{code}


> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch, symLink1.patch, symLink4.patch, symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Sanjay Radia (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12639603#action_12639603 ] 

Sanjay Radia commented on HADOOP-4044:
--------------------------------------

_Doug Says: >Internal symlinks are supported automatically by local FS, at least on unix. And support for remote symlinks on the local FS might be very cool. Unix will gladly let me create symbolics link to anHDFS URI (Try 'ln -s hdfs://foo/bar foo; ls -l foo'). These cannot be resolved by a unix shell, but they could by 'bin/hadoop fs -ls', by MapReduce programs etc. One could, e.g., keep in one's home directory on unix links to the HDFS directories that one commonly uses .._

 It would be very cool to support them. (I guess unix thinks the : is simply part of the name.)
BTW  if someone happens to create a dir called "hdfs:" and valid path foo/bar under it then the local file system will follow it and not return an error. Is this a concern?

When trying to go through the remote mount symlink,  the local file system  reports that the file does not exist. Hence, on a "file does not exist" error, the client side lib will have to check if there are  symlinks in the path. The only snag is that "file not found error"s will take a little longer to report as the 
library ties to check if there is a symlink in the path. Is this an issue? Most apps consider "file not found" to be an error and hence the speed of reporting that is not a concern; however, there are  some apps that create files if they are missing and their performance will suffer a little. 

> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: HADOOP-4044-strawman.patch, symLink1.patch, symLink1.patch, symLink4.patch, symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Sanjay Radia (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12637592#action_12637592 ] 

Sanjay Radia commented on HADOOP-4044:
--------------------------------------

Symlink vs link  -  I am indifferent and can go either way.


One advantage of Owen's proposal over the current patch: it will help with versioning in the future as each method can evolve its return type 
independently of the return types of other methods. 


Dhruba> Thus, exceptions could be used here. This keeps most of the HDFS code clean and elegant.

Doug> Yes, they could be used here, and it could be fixed later without breaking user code, but that doesn't make it clean and elegant.
> Rather it is a quick and dirty hack. Returning normal values through exceptions is a perversion.
> The Namenode should be revised to support links as first-class events, not as exceptions.

It is highly unlikely to  get "fixed" later and over time it will get harder and harder to "fix".
Doug should we pay the cost and do it the "right way" now?
(BTW I prefer to use exceptions and don't agree with Doug's view;  hence my use of quotes above).



> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch, symLink1.patch, symLink4.patch, symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4044) Create symbolic links in HDFS

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur updated HADOOP-4044:
-------------------------------------

    Attachment: symLink4.patch

First version of the patch that implements the entire symlink solution. Please review.

> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch, symLink1.patch, symLink4.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Konstantin Shvachko (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12638694#action_12638694 ] 

Konstantin Shvachko commented on HADOOP-4044:
---------------------------------------------

Doug> Does that address your question?

Yes it does in the way that I see you are changing the subject.
It was about exception usage, now it is about monolithic layers of HDFS.
Since the programming style correctness is not measurable as opposed to performance there is no definite answer on what is correct/incorrect, and therefore the discussion clearly can go on forever. If one argument is exhausted another can be raised or linked to the previous, and so on.

Although the question you raise is interesting by itself.
HDFS is not a single monolithic layer the same way as Hadoop is not a single layer. MapReduce as a layer on top of HDFS but we do not impose the knowledge of splits upon the file system. Going further up the stack we also do not require HDFS to recognize HBase records or files corresponding to columns in a table store.

HDFS layers are clearly different abstractions. E.g., NameNode does not know about input and output streams. Sure, the layers have common structures, say FileStatus is visible to all layers, but ContentSummary is constructed on the top FileSystem level.

I don't think this argument works in favor of your approach and justifies passing link values through the whole hierarchy of HDFS layers.

> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: HADOOP-4044-strawman.patch, symLink1.patch, symLink1.patch, symLink4.patch, symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12637689#action_12637689 ] 

Raghu Angadi commented on HADOOP-4044:
--------------------------------------

I didn't mean to say implementation was wrong. I meant that implementation is a result the interface.  

> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch, symLink1.patch, symLink4.patch, symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12628272#action_12628272 ] 

dhruba borthakur commented on HADOOP-4044:
------------------------------------------

The FileStatus object will indicate whether a path is a symbolic link. The NameNode will  have a new INode type called INodeSymbolicLink. It will store the contents of the symbolic link in the INodeSymbolicLink object. This information will also be stored in the fsimage.

The FileSystem.open() and FileSystem.create() calls will detect if the pathname is a symbolic link. If so, it transparently will open the path pointed to by the symbolic link.

Symbolic links can be either relative or absolute. if the symbolic link is a complete URI, then nothign needs to be done. If it is a relative pathname, the dfs-client side code will invoke makeAbsolute to make it a full pathname.





> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12638707#action_12638707 ] 

Doug Cutting commented on HADOOP-4044:
--------------------------------------

> I see you are changing the subject.

I am saddened that you think that.  I have tried to say the same thing in many different ways, to make myself better understood.  I will say it one last time: I will veto any patch that uses exceptions to to communicate links within the FileSystem implentation.  Exceptions are an inappropriate mechanism for communicating normal, expected values.  In the namesystem sub-component of HDFS, which stores links, and any other sub-components between that and the generic FileSystem API, links are normal, expected events, even links to different FileSystems.  There.  I've said it again.  I am now done saying it.  Do not expect me to respond again about the issue of using exceptions to communicate the presence of links in HDFS.  My answer will not change.  Let's move on.

> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: HADOOP-4044-strawman.patch, symLink1.patch, symLink1.patch, symLink4.patch, symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Sanjay Radia (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12637158#action_12637158 ] 

Sanjay Radia commented on HADOOP-4044:
--------------------------------------

If throwing a UnresolvedPathException is considered clean style inside the namenode then it should be clean for RPC interface.
We need to be consistent here. It is either clean style or it s not.

In my opnion, throwing the UnresolvedPathException across the rpc interface is clean -- it is one of many internal interfaces.

Joshua Bloch in "Effective Java" (item 40) argues that exceptions can be used for recoverable conditions. The processing the symbolic link is 
a recoverable condition. Later in the same item he argues that such exceptions can provide information to allow the recovery (in this case the remaining
pathname to resolve).

Indeed the link that Doug gave argues in favour of using declared exceptions for recoverable conditions
  [ http://192.220.96.201/essays/java-style/exceptions.html]:
_"My rule of thumb for whether I make an exception declared or undeclared is whether localized recovery from the exception being thrown is sensible. If the calling method (or one of its recent callers) of the code is the right place to handle a given failure type, I represent that failure with a declared exception. If, on the other hand, if the failure case is best handled by a global handler which catches all the exceptions for a given component of a program and treats them all as failures of that subsystem, I use undeclared exceptions"._ 
In the case of symbolic links, localized recovery by following the symbolic link is possible, and hence using an exception is acceptable.
Here the author is distinguishing between declared and undeclared -- but the point is that declared exceptions are okay for recoverable conditions. 

If java did not have the distinction for declared vs undeclared and exceptions were only for  errors, then Doug's argument would have been valid.


Dhruba made the right choice in throwing the exception inside the name node - otherwise all the interfaces and code would have become unclean.
The same style should have continued for the RPC operation.

The RPC interfaces in the patch have been changed as follows.

  void create(...) become FSLink create(..)

  and boolean setReplcation()  becomes FSLinkBoolean setReplication(..)
  ... similar for many  other methods.

The original interfaces were intuitive --  the input parameters and the return values are obvious for each method.
The newer ones have this FSlinkXXX return value that, IMHO, is not clean.
They will get worse with the union overloading overtime if we add new reasons to recover on the client side.
The use of exceptions would have kept the interface clean (just as was done inside the name node code).





> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch, symLink1.patch, symLink4.patch, symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Konstantin Shvachko (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12640650#action_12640650 ] 

Konstantin Shvachko commented on HADOOP-4044:
---------------------------------------------

> I would like to avoid a design that incurs an overhead of an additional RPC everytime a link is traversed.

+1. This will affect not only NNBench but all benchmarks including DFSIO and especially NNThroughputBenchmark.
GridMix and Sort will probably be less affected, but will suffer too.

> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: HADOOP-4044-strawman.patch, symLink1.patch, symLink1.patch, symLink4.patch, symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12640349#action_12640349 ] 

dhruba borthakur commented on HADOOP-4044:
------------------------------------------

I would like to avoid a design that incurs an overhead of an additional RPC everytime a link is traversed. I will post a new patch that incorporates Nicholas's idea of FsResult<T> style return type classes for SPI methods. I am hoping that this will be acceptable to all.

> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: HADOOP-4044-strawman.patch, symLink1.patch, symLink1.patch, symLink4.patch, symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12637679#action_12637679 ] 

Owen O'Malley commented on HADOOP-4044:
---------------------------------------

I don't think it is reasonable at all to use an extra rpc to always test the existence of links.

To me it boils down to three things:
  1. Hitting a link in a traversal isn't an error or an exceptional situation. 
  2. Exceptions are not handled well by rpc.
  3. The logging by the rpc package is confusing for non-system exceptions.

It seems like most of the debate seems to be on 1, but it almost entirely a style issue. You either think it is an appropriate use of exceptions or not. I haven't seen any people switching sides based on the arguments on this jira.

On #2, currently all exceptions are stringified and sent as RemoteException. That isn't very useful for throwing and catching. If we switch to Thrift in the future, it would again, be unclear what the exception semantics are. Return types are well understood and under Thrift are versioned, which would be a very good thing.

On #3, we can probably use minor changes to rpc to avoid it. (A marker interface that says not to log it when thrown across RPC? That would still be awkward for java exceptions that we are re-using.)


> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch, symLink1.patch, symLink4.patch, symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12629263#action_12629263 ] 

Doug Cutting commented on HADOOP-4044:
--------------------------------------

> Both have extra RPC (or system call in case of LocalFS) for open of a normal file (common case).

Exceptions should not be used for normal program control.  If the extra RPC is a problem, then a FileSystem can implement open() directly itself as an optimization.  LocatedBlocks could be altered to optionally contain a symbolic link.  LocalFileSystem can also override open(), since the native implementation already does the right thing.  Would the code above do something reasonable on S3 & KFS, or would we need to override open() for those too?  If we end up overriding it everywhere then it's probably not worth having a default implementation and having openData(), but we should rather just require that everyone implement open() to handle links.

> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12637534#action_12637534 ] 

Doug Cutting commented on HADOOP-4044:
--------------------------------------

Dhruba> the NameNode is not future-proof for backward compatibility.
Dhruba> Thus, exceptions could be used here. This keeps most of the HDFS code clean and elegant.

Yes, they could be used here, and it could be fixed later without breaking user code, but that doesn't make it clean and elegant.  Rather it is a quick and dirty hack.  Returning normal values through exceptions is a perversion.  The Namenode should be revised to support links as first-class events, not as exceptions.

Konstantin> This is universal, but not a transparent api anymore.

It's no less transparent than 'FileStatus getFileStatus(Path)'.  In that case we combined the value of many methods (isDirectory(), size(), owner(), etc.) into the value of a single method with a very generic name.

In this case we wish to collapse three logical calls into one.  We want to perform 'isLink()',and then either 'getLink()' or, e.g., 'open()', and return sufficient information so that the caller can tell what happened.  In such cases, when multiple values must be grouped together to represent something, we use a data structure, not some other channel like exceptions or global variables.  With a data structure comes some loss of transparency, also called abstraction.



> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch, symLink1.patch, symLink4.patch, symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12638422#action_12638422 ] 

Owen O'Malley commented on HADOOP-4044:
---------------------------------------

Getting back to the nuts and bolts of this patch, I don't really like the structure of this implementation. I think the LinkResolvers and the myriad anonymous classes that extend it are very hard to read. I'd much rather have something like:

{code}
abstract class LinkResult<T> {
  boolean isLink();
  T getResult();
  FileSystem getFileSystem() throws IOException;
  /** Throws an exception if too deep. */
  Path getNewPath() throws IOException;
}

class DistributedFileSystem extends FileSystem {
  ...
  public FSDataInputStream open(Path p, int buffer) throws IOException {
     LinkResult<FSDataInputStream> result = dfs.open(p, buffer, verifyChecksum, statistics);
     if (result.isLink()) {
       return result.getFileSystem().open(result.getPath(), buffer);
     } else {
       return result.getResult();
     }
  }
}
{code}

It is *much* easier to follow and understand and forces no changes on FileSystems that don't support links.


> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch, symLink1.patch, symLink4.patch, symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Sanjay Radia (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12637320#action_12637320 ] 

Sanjay Radia commented on HADOOP-4044:
--------------------------------------

>> Are you saying that getSymlink() can return file:///etc/dir/file.txt?

>Yes.

Can the name node always know how to compose the two parts. I recall something bizzare about the composing archive filesystem paths.
If there is well defined way of composing the two parts then I am fine with the alternative proposed above. 
If not, the two parts may have to be passed to the file system.


> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch, symLink1.patch, symLink4.patch, symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12637324#action_12637324 ] 

owen.omalley edited comment on HADOOP-4044 at 10/6/08 5:37 PM:
----------------------------------------------------------------

My preference is toward not using exceptions for nominal cases, and I agree with Doug that links seem like nominal cases. Clearly, however, it isn't a slam dunk.

At the fear of crossing streams, I think that the classes like FSLinkBoolean are problematic. I think we are better off creating a type for each method's return type. I think that a lot of these return types should implement a class/interface (*Owen ducks*) that provides the common interface for dereferencing the links.

The reason for having

{code}
RenameResult rename(Path src, Path dst) throws IOException;
versus
FSLinkBoolean rename(Path src, Path dst) throws IOException;
{code}

would be that if we want our protocol to be future-proof, rename may well need to add additional return values. If we have a bunch of methods all returning the same type, that is hard.

Thoughts?


      was (Author: owen.omalley):
    My preference is toward not using exceptions for nominal cases, and I agree with Doug that links seem like nominal cases. Clearly, however, it isn't a slam dunk.

At the fear of crossing streams, I think that the classes like FSLinkBoolean are problematic. I think we are better off creating a type for each method's return type. I think that a lot of these return types should implement a class/interface (*Owen ducks*) that provides the common interface for dereferencing the links.

The reason for having

{code}
RenameResult rename(Path src, Path dst) throws IOException;
FSLinkBoolean rename(Path src, Path dst) throws IOException;
{code}

would be that if we want our protocol to be future-proof, rename may well need to add additional return values. If we have a bunch of methods all returning the same type, that is hard.

Thoughts?

  
> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch, symLink1.patch, symLink4.patch, symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Sanjay Radia (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12629584#action_12629584 ] 

Sanjay Radia commented on HADOOP-4044:
--------------------------------------

Volume relative symbolic links (e.g. dot relative links) can be handled _within_ the NN. However, whenever a  symbolic link with a remote target is crossed  the client side needs to handle it - I am calling this a kickback.
Note that a symbolic link can occur during an open, create, chmod  or any other operation that supplies a path name. Further it can occur _anywhere_ in the path.
Hence:
   open("/foo/bar/zoo")
   bar may be a symbolic link to a remote volume.
  A getFileStatus("/foo/bar/zoo") cannot return a status of "is symbolic link":  zoo is not a symbolic link; bar is the 

Any operation that involves a path name, can issue a kickback which says:  "I crossed a symbolic link whose target is another file system. I processed a part of the path and the remaining path is xxx; please resolve the remaining path xxx at the target of the symbolic link."
In the above example the xxx (rest of path to process) is "zoo".

One way to provide this kickback is to use an exception.  (I consider an alternate below)
_The issue raised in the other comments in this Jira  is whether or not an exception is suitable here - i.e. is this normal or abnormal operation of the NN?_
Following a *remote* symbolic link is not  normal for  a NN; it is not normal for a NN to recursively call another *remote* NN   hence the exception is a reasonable way to deal with this situation.
The FileSystem interface should clearly NOT throw such an exception because following symbolic links are a normal part of that interface; in this case I am not suggesting that the FileSystem throws an exception, merely the NN throws that exception.

Many NameServices  handle remote mounts as exceptions or kickbacks. For example the spring name service at Sun has optional trust relationship between name servers. If a symbolic link was to a name server with which it had trust relation then then it followed the symbolic link recursively (using DNS terminology) and otherwise it did a kickback via an exception and the client side followed the links iteratively (using DNS terminology again).

An alternate way (really owen's suggestion) is to have open return a "FD" style object that contains the kickback information inside the FD object. The problem is that we will need a similar object for all operations that involve a pathname: open, stat, chmod, rename, etc. (Owen please comment this).
Hence I feel the use of the exception is a better way to do this.

BTW I  have a document on the above that I started on a few months ago at Yahoo but it is not completed. Dhruba suggested that I attach it above; will do so over the next week or so.

> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12630246#action_12630246 ] 

dhruba borthakur commented on HADOOP-4044:
------------------------------------------

@Doug: Your example scenario is correct. It could be a major loss in functionality. But maybe we can put in the additional complexity and reap the additional benefits at a later date. I am assuming that the need for the additional functionality can wait for some time. 

@Raghu: I agree that synlinks have many general purpose use-cases. But there are also file system systems (very similar to Hadoop :-))that do not support symlinks/hardlinks at all. 

For symlinks supported by various OS: the OS calls the file system with a ??lookup?? call for every path component. The filesystem resolves only one piece of the path component at a time. In HDFS, the FileSystem is passed in the entire path. Thus, the technique to resolve partial paths becomes messy and does not have any precedence. That's why I am proposing that we delay this part of the functionality.

Comments/feedback welcome.

> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch, symLink1.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12634968#action_12634968 ] 

Doug Cutting commented on HADOOP-4044:
--------------------------------------

I still much prefer the name 'link' to 'vfs'.  You use the name 'vfs' in support of symbolic links only, and thus should be named accordingly.  There's no need to invent another name here.  No one knows what a 'vfs' is, and we don't need to define it or confuse folks with this notion.  Everyone already knows what a link is.

Can we get away with only a single implementation of the link resolver?  You already create a bogus FileStatus in the status-based link resolver, so can't we use the same approach to implement the path-based link resolver in terms of the status-based one, so that we only have a single implementation of the loop-detection logic, not two?

There are still a few implementations of getSymlink() that don't throw exceptions for non-links.

You still have not explained the necessity of getRemainingPath().


> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch, symLink1.patch, symLink4.patch, symLink5.patch, symLink6.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Bryan Duxbury (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12630440#action_12630440 ] 

Bryan Duxbury commented on HADOOP-4044:
---------------------------------------

We would use filesystem-internal symlinks as a sort of fast-copy in situations where parts of our dataset are actually processed and others are merely copied. Technically, a hard link or copy-on-write scheme (ala GFS) would be more suited, but we could make do with symlinks.

> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch, symLink1.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Robert Chansler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12645308#action_12645308 ] 

Robert Chansler commented on HADOOP-4044:
-----------------------------------------

Just a list of important test cases saved from a hallway conversation:
* Shell commands
* Trash
* Archives
* distcp
* chmod, chown, chgrp APIs
* Rename


> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: 4044_20081030spi.java, HADOOP-4044-strawman.patch, symLink1.patch, symLink1.patch, symLink11.patch, symLink4.patch, symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12629272#action_12629272 ] 

Raghu Angadi commented on HADOOP-4044:
--------------------------------------

> That imposes restriction on what SymLink can point to.
or may be not.. if an implementation (e.g. DistributedFileSystem) has all the information to instantiate another file system. 

One alternative is to have open() return '{{ SymLinkFSInputStream extends FSInputStream }}' when a link is not followed by the implementation. A read on it will return bytes representinng the link (and it can have getSymLink() method).

> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12636757#action_12636757 ] 

Raghu Angadi commented on HADOOP-4044:
--------------------------------------


> I vote for using exceptions internal to the NameNode to keep internal interfaces clean and elegant. 

Why should NameNode not be clean and elegant :)? I actually don't think returning some base class unrelated to member/api is clean.. it is essentially same as returning "Object" and letting the upper layers checking the type.

> can somebody from dfsland please review this too? Thanks.
Because of my above prejudice, I am probably not the right person to review.

> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch, symLink1.patch, symLink4.patch, symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12637655#action_12637655 ] 

Raghu Angadi commented on HADOOP-4044:
--------------------------------------


Lets take another hypothetical case : Our RPC gets rewritten, and for some reason, it wants to inform callers to retry. will all the RPCs calls have a prefix 'retryOr...()' and all the returned values  will  have extra field to indicate if this needs to be retried? This is obviously an exaggerated example but hopefully makes a point. 

Also I realize that it is hard to be too objective about interfaces... 

> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch, symLink1.patch, symLink4.patch, symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4044) Create symbolic links in HDFS

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur updated HADOOP-4044:
-------------------------------------

    Attachment: symLink11.patch

The Client Protocol uses types of the form FsResult<BooleanWritable>, FsResult<LocatedBlocks>, etc. This patch is ready for review.


> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: HADOOP-4044-strawman.patch, symLink1.patch, symLink1.patch, symLink11.patch, symLink4.patch, symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12636772#action_12636772 ] 

Doug Cutting commented on HADOOP-4044:
--------------------------------------

> I actually don't think returning some base class unrelated to member/api is clean. [ ... ]

Except it is related.  The contract of all of these methods has changed, so that they can either return, e.g., an open file or a link.  We need a uniform way to represent this.   A two-step API is unacceptable for performance reasons.  A base return class thus seems like a fine way to do this.  What would you suggest instead?


> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch, symLink1.patch, symLink4.patch, symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12636790#action_12636790 ] 

Raghu Angadi commented on HADOOP-4044:
--------------------------------------


I deal with every day and care about _both_ FileSystem and HDFS. We have two very different approaches for the same problem in these two places (changing return types vs throwing exception) . I does not feel quite right for me to +1 though I prefer only one of the approaches.  This does not affect external API, but for developers internal APIs are just important. 

Two-step API is of course not good (as I pointed out in the early proposals), I don't think it is related to 'return type' vs exception issue.

My preference is to have the same approach as in HDFS. Any API that can throw IOException, can throw UnresolvedLinkException. Upper layers that can handle it handle it. No change return types is required and we don't need extra classes like FSLinkBoolean, FSInputStreamLink etc.  As minor side benifit, we also remove all the extra objects created, even when no links are involved.

Finally, I don't think Exceptions are meant only for hard errors (FileNotFound, for e.g.). I doubt exception is always an error, but my experience with java is less than 2 years :).

> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch, symLink1.patch, symLink4.patch, symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Sanjay Radia (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12637555#action_12637555 ] 

Sanjay Radia commented on HADOOP-4044:
--------------------------------------

Sanjay Saya:
>> Are you saying that getSymlink() can return file:///etc/dir/file.txt?

>Yes.

>Can the name node always know how to compose the two parts. I recall something bizzare about the composing archive filesystem paths.
>If there is well defined way of composing the two parts then I am fine with the alternative proposed above.
>If not, the two parts may have to be passed to the file system.

I checked - the above scheme will work with archives. It  uses the standard separator.
+1 on the getSymlink returning the composed pathname.


> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch, symLink1.patch, symLink4.patch, symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12632741#action_12632741 ] 

dhruba borthakur commented on HADOOP-4044:
------------------------------------------

1. Can you pl point me to hat chmod -H choud do?

2. Creation of an symlink log shows up this way in the audit log:

ugi=dhruba,users,engineers,svnuser	ip=/1.2.3.4	cmd=createSymlink	src=/user/dhruba/foo	dst=../test1 perm=dhruba:none:rw-rw-rw-

> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch, symLink1.patch, symLink4.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12629609#action_12629609 ] 

Doug Cutting commented on HADOOP-4044:
--------------------------------------

> a symbolic link can occur during an open, create, chmod [ ... ] anywhere in the path.

Thanks for clarifying.  I had not considered all of these cases.

> I am not suggesting that the FileSystem throws an exception, merely the NN throws that exception.

That's less of an issue, indeed.

> The problem is that we will need a similar object for all operations that involve a pathname: open, stat, chmod, rename, etc.

Could we use a common base class for the return value of all of these RPC methods?  The base class could have a "symLink" flag and fields that the client could check.  Is that better than exceptions?

> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12647492#action_12647492 ] 

Hadoop QA commented on HADOOP-4044:
-----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12393857/symLink12.patch
  against trunk revision 713847.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 62 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    -1 findbugs.  The patch appears to introduce 3 new Findbugs warnings.

    +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

    -1 core tests.  The patch failed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3588/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3588/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3588/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3588/console

This message is automatically generated.

> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: 4044_20081030spi.java, HADOOP-4044-strawman.patch, symLink1.patch, symLink1.patch, symLink11.patch, symLink12.patch, symLink4.patch, symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Joydeep Sen Sarma (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12630390#action_12630390 ] 

Joydeep Sen Sarma commented on HADOOP-4044:
-------------------------------------------

Doug - that's what i was proposing. doing it in a way where the component by component lookup is only done when sym links are encountered.

> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch, symLink1.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12634965#action_12634965 ] 

Doug Cutting commented on HADOOP-4044:
--------------------------------------

> However, my thinking is that the VfsStatusBase can be used not only for links but for other items too.

But aren't all those other cases where the return value can be a link?  I'll try to review the patch ASAP.


> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch, symLink1.patch, symLink4.patch, symLink5.patch, symLink6.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Konstantin Shvachko (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12630447#action_12630447 ] 

Konstantin Shvachko commented on HADOOP-4044:
---------------------------------------------

I am -1 on partial implementations, because your case of file-only symlinks can be handled on the application level.

In fact path resolution as Joydeep proposed can be handled outside the file system without checking each path component. 
We just need to throw FileNotFoundException with the exact path prefix that has not been found.
E.g. you want to open a file {{path = /dir0/link/dir1/file}} where {{link}} is a file containing a symlink {{/dir3/dir4/}}. 
If you call {{open(path)}} a {{FileNotFoundException}} will be thrown with the message "File is not found: {{/dir0/link/dir1/}}".
Now the application reads contents of file {{/dir0/link}}, creates a new {{path =  /dir3/dir4/dir1/file}}, and calls {{open(path)}} again.
An optimization to this can be that name-node throws a new {{DirectoryNotFoundException("Directory is not found: /dir0/link")}}
then the client knows for sure that "link" does exist, but is a file rather than a directory, and therefore can be a symlink.
Seems to me 
- this suites your case, 
- requires minimal changes to the fs internals and 
- does not affect existing hdfs behavior when links are not used.

> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch, symLink1.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4044) Create symbolic links in HDFS

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur updated HADOOP-4044:
-------------------------------------

    Attachment: symLink9.patch

Incorporated Doug's latest review comments.

can somebody from dfsland please review this too? Thanks.

> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch, symLink1.patch, symLink4.patch, symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Sanjay Radia (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12641181#action_12641181 ] 

Sanjay Radia commented on HADOOP-4044:
--------------------------------------

Doug says> Sanjay has argued that it is abnormal for a filesystem to link to a different filesystem. This misses the intended sense of "normal". The value of a link is normal data, under the control of the filesystem implementation, regardless of whether it points to a different filesystem.

_Don't want to restart the debate,_ but need to correct something that was attributed to me:
I never said that symlinks to remote files were not normal. Processing them (ie following them to a different NN) is not normal for a NN, Some NNs may follow through and some may not. Indeed DNS does both (follows or redirects the caller).

> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: HADOOP-4044-strawman.patch, symLink1.patch, symLink1.patch, symLink4.patch, symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12638348#action_12638348 ] 

Doug Cutting commented on HADOOP-4044:
--------------------------------------

Konstantin> open() does not literally return a boolean, you mean that throwing an exception should be treated as an (boolean) indicator to whether the link should be retrieved, right?

Yes, that's what I meant.  We'd be using an exception to do something that could equivalently done with a boolean.

> But the methods have always been conditional in that sense, because they are throwing AccessControlException, QuotaExceededException, and FileNotFoundException. And XMountPointException is just another exception of the same nature.

I think its nature is different.  The former are all cases where the long-distance throwing is valuable. These are intended to be caught and processed at a higher level.   In the successful execution of a typical application one would not expect to see these.  An XMountPointException on the other hand would never be handled at some unknown higher level, it's always handled at a known point.  It is expected to occur frequently in the course of a typical application's execution as part of the intended control path.

> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch, symLink1.patch, symLink4.patch, symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4044) Create symbolic links in HDFS

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur updated HADOOP-4044:
-------------------------------------

    Attachment: symLink5.patch

Merged patch with latest trunk.

> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch, symLink1.patch, symLink4.patch, symLink5.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12628486#action_12628486 ] 

Doug Cutting commented on HADOOP-4044:
--------------------------------------

A great use of symbolic links might be to enhance archives to link to their content.

> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12637613#action_12637613 ] 

Raghu Angadi commented on HADOOP-4044:
--------------------------------------

> Returning normal values through exceptions is a perversion. 
hardly anyone disagrees with it. Another trueism is that returning unrelated values is API pollution. I guess one end of the opinion is that retuning link is normal and other end thinks it isn't. The issue is not about returning values vs exception 'in general', but only about this particular case.

> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch, symLink1.patch, symLink4.patch, symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12637296#action_12637296 ] 

Doug Cutting commented on HADOOP-4044:
--------------------------------------

> public class FsReturnType<T> 

This looks good to me.

> BTW, should our naming convention be FsXxx or FSXxx?

Good question.  I prefer FsXxx, but we've not been consistent in Hadoop, nor is Java.  I think java.net.URL was a mistake, that it should have been named java.net.Url, that acronyms in class names should be capitalized, not all caps.  This makes them easier to parse.


> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch, symLink1.patch, symLink4.patch, symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Konstantin Shvachko (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12629680#action_12629680 ] 

Konstantin Shvachko commented on HADOOP-4044:
---------------------------------------------

sanjay> merely the NN throws that exception.

May be a symlink exception from the NN is not bad, but wrapping the SymLink data inside the exception seems to be a bad way of returning data back to the client. I mean this part of the code proposed above does not look right
{code}
catch (SymLinkNotFollowedException e) {
        f = e.getSymLink();
}
{code}
I would definitely prefer to have a common base class returned as Doug proposes.

> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12629575#action_12629575 ] 

Doug Cutting commented on HADOOP-4044:
--------------------------------------

Right, we should optimize dot-relative and volume-root-relative links so that they're resolved with a single RPC.

In all cases, with HDFS links into the same filesystem, we should make just a single RPC to its namenode on open.  We used to have an open() RPC, but that was replaced with getBlockLocations() when we noticed that open was no different than getBlockLocations().  With symlinks, we should probably add an open() call again.

Open() could return a struct with 3 fields: isLink, linkTarget and blockLocations.  Dot-relative and volume-root-relative links can be resolved on the NN, as follows:

||file type|isLink|linkTarget|blockLocations||
|regular|false|null|non-null|
|dot relative link|true|resolved path|non-null|
|volume-root relative link|true|resolved path|non-null|
|link to other filesys|true|foreign path|null|

Does that make sense?



> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Tsz Wo (Nicholas), SZE (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tsz Wo (Nicholas), SZE updated HADOOP-4044:
-------------------------------------------

    Attachment: 4044_20081030spi.java

FsResult can be divided into two classes since there are two kind of results, Writables (e.g. FsResult<BooleanWritable>) and non-writables (e.g. FSDataOutputStreamLink).  I suggest the following layout:
{code}
public class FsResult<T> implements FSLinkable {
  ...
}

public class FsResultWritable<T extends Writable> extends FsResult<T> implements Writable {
  ...
}
{code}
Methods like appendImpl(...) should return FsResult<FSDataOutputStream> and methods like setReplication(...) should return FsResultWritable<BooleanWritable>.

Then, we don't need classes like FSDataInputStreamLink and FSDataOutputStreamLink.  We also probably don't need FSLinkable anymore.

4044_20081030spi.java shows the complete source codes for FsResult and FsResultWritable.  I have not removed FSLinkable yet.


> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: 4044_20081030spi.java, HADOOP-4044-strawman.patch, symLink1.patch, symLink1.patch, symLink11.patch, symLink4.patch, symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12630470#action_12630470 ] 

Raghu Angadi commented on HADOOP-4044:
--------------------------------------

Also this is useful and and important enough feature to deserve an exception for 0.19.x release if 0.20.0 is too late or has too much other stuff. 

> In my mind, cross-filesystem "symlinks" really don't seem like "real" symlinks to me, but a sort of hacked version of autofs support.
If we think of Hadoop FileSystem as VFS, and HDFS, S3, LocalFS etc as various filesystems, then cross FS symlinks look more like real symlinks (like link in ext3 linking to a directory in NTFS). Yes, they are still not very real since a symlink in LocalFS can not point to "hdfs://users" (actually it is possible to handle those to certain extent). 

> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch, symLink1.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12629246#action_12629246 ] 

Doug Cutting commented on HADOOP-4044:
--------------------------------------

I think a better API might be:

{code}
public FSDataInputStream open(Path f, int bufferSize) throws IOException {
  FileSystem fs = this;
  FileStatus stat = fs.getFileStatus(f);
  for (stat.isSymLink()) {
    f = stat.getSymLink();
    fs = f.getFileSystem(getConf());
    stat = fs.getStatus(f);
   }
  return fs.openData(stat, bufferSize);
}

public abstract FSDataInputStream openData(FileStatus stat, int bufferSize) throws IOException;
{code}

We could, for back-compatibility, have openData() default to calling open() for one release, but I think most if not all FileSystem implementations are included in Hadoop, so we're mostly concerned about client back-compatibilty here, not implementation back-compatibility.

Also, FileStatus#getSymLink() should return a Path, not a String, no?

> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Konstantin Shvachko (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12638094#action_12638094 ] 

Konstantin Shvachko commented on HADOOP-4044:
---------------------------------------------

Doug> Instead of direct functions that map parameters to values, we wish to implement conditional functions, that do one of two things, and indicate in their response which path was taken.

My proposal allows to keep the code within the current API style: "direct functions that map parameters to values".
[HADOOP-4044#action_12637702]
[HADOOP-4044#action_12637610]
[HADOOP-4044#action_12637240]

Doug> The alternative is to double or more the number of RPCs per file operation, which is unacceptable. 

My proposal does "not double  the number of RPCs per file operation".
[HADOOP-4044#action_12637702]

So if this dispute is not about programming style, then why don't we accept the solution that does not require any fundamental changes to the FileSystem API and does not have performance penalties for the regular case? Or is it nevertheless about the programming style?
I keep wondering what is your selective criteria for ignoring some arguments and answering other.


> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch, symLink1.patch, symLink4.patch, symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12639908#action_12639908 ] 

Raghu Angadi commented on HADOOP-4044:
--------------------------------------

One of the issues with 'resolve then open' approach is correctness. What happens if a link or directory is changed between these two operations? open() fails though it should not.

> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: HADOOP-4044-strawman.patch, symLink1.patch, symLink1.patch, symLink4.patch, symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12630279#action_12630279 ] 

Raghu Angadi commented on HADOOP-4044:
--------------------------------------

> I agree that synlinks have many general purpose use-cases. But there are also file system systems (very similar to Hadoop )that do not support symlinks/hardlinks at all. 

yes. They probably don't have symlinks proposed in this jira either. I mainly wanted say once interface is there, implementation of real symlinks is not much more complicated than the attached patch. I could be wrong... 

> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch, symLink1.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12640332#action_12640332 ] 

Raghu Angadi commented on HADOOP-4044:
--------------------------------------


> The same thing that happens if that change is made just after a file is opened. If you open a file, then someone else deletes it, subsequent accesses to that file will fail.

yes, thats an HDFS implementation problems which could be fixed. That is qualitatively different from similar issues 'forced' by FS layer/API (which could also be fixed in future).

I don't have strong preference either way regd implementation.  

With extra RPC some of the  negatives I see (nothing too urgent): 
  - we won't see many negative affects since there are very few loads that are gated by NameNode currently. It might change
  - Every couple of months we might need to answer "why is this so..?" question on the list :)
  - In my experience any extra work done that could be avoided will end up being a problem in future. Especially  large distribute systems. So sooner or later we will end up 'correct' it.

> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: HADOOP-4044-strawman.patch, symLink1.patch, symLink1.patch, symLink4.patch, symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12630437#action_12630437 ] 

Doug Cutting commented on HADOOP-4044:
--------------------------------------

What use cases are there for filesystem-internal symlinks?  The really interesting use cases are cross-filesytem, since that could make archives transparent, and also permit a volume-monting-like style, permitting a single namespace to span multiple clusters, potentially making the namenode less of a bottleneck.

Without some strong use cases I'm -1 for any implementation that does not support cross-filesystem links.


> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch, symLink1.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Robert Chansler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12629656#action_12629656 ] 

Robert Chansler commented on HADOOP-4044:
-----------------------------------------

Is there a presumption that if (as in Sanjay's example) /foo/bar is a link to "X" that /foo/bar/zoo is exactly "X/zoo"? This might be problematic if X is a reference to another system, introducing a requirement for interfaces everywhere to have a "root" and "remaining" parameters. Or maybe as a practical matter this is unimportant. In any case the rules need to be clear. Is X="/a/b" different from X="/a/b/" when resolving /foo/bar/zoo? Are there cases where "/foo/bar/zoo" is not permitted as a path on the local system, requiring that path syntax checking be done late just in case /foo/bar is a link?



> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12639892#action_12639892 ] 

Raghu Angadi commented on HADOOP-4044:
--------------------------------------

>First, we implement the naive approach, benchmark it, and, it it's too slow, optimize it with a pre-fetch cache of block locations.

I am almost certain that it won't affect any benchmark other than NNBench.

> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: HADOOP-4044-strawman.patch, symLink1.patch, symLink1.patch, symLink4.patch, symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Konstantin Shvachko (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12629681#action_12629681 ] 

Konstantin Shvachko commented on HADOOP-4044:
---------------------------------------------

Another observation is that usually symbolic links are local (Volume relative) but they can point to mounted namespaces. So may be we should support 2 extra inode types:
- symlink: volume relative only
- mount points.
That of course does not eliminate problems discussed, but introduces a clear distinction between volume local and remote file system symlinks.

> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Pete Wyckoff (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12629291#action_12629291 ] 

Pete Wyckoff commented on HADOOP-4044:
--------------------------------------

keep in mind that we will want to implement as close to posix as possible an api for this in libhdfs: https://issues.apache.org/jira/browse/HADOOP-4118. We may not expose all the functionality of general URIs, but for the case of hdfs only, I think the API should be as simple as possible.


> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12631071#action_12631071 ] 

dhruba borthakur commented on HADOOP-4044:
------------------------------------------

+1 to Sanjay's proposal. I am all for symlinks to any arbitrary path (dirs or files). My initial proposal was to gauge the feeling of the community on figuring out whether it is better to get symlinks to files into 0.19 and then continue to work to make symlinks to any arbitrary path in 0.20. The code is not difficult, but there will some changes to existing interfaces for DistributedFileSystem.java.

"kickbaks" are hard, not from a design point of view but for coding point of view....especially because we do not want to throw UnresolvedPathException all the way back to the client. This means, the NameNode has to send back the "kickback" information as part of the result code for almost every RPC. This, in turn, means that almost all APIs in ClientProtocol will have to be modified.


> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch, symLink1.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Konstantin Shvachko (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12637610#action_12637610 ] 

Konstantin Shvachko commented on HADOOP-4044:
---------------------------------------------

Dhruba> In current Hadoop RPC, the only way to serialize/deserialize an exception object is to serialize its message string.

I am not sure my proposal was understood completely.
- I have an exception XMountPointException, which does not encode path (or any special fields) in its message.
- If I call {{getFileStatus(src)}} and src crosses a mount point the XMountPointException is thrown by the name-node, which is passed through RPC to the application level.
- The application catches XMountPointException and calls {{linkSrc = getSymLink(src)}}, which returns the first link on the path.
- The application then calls {{getFileStatus(linkSrc)}}. 
- Encoding linkSrc inside XMountPointException would be an optimization: unnecessary here because crossing mount point
is not common.
- If an application knows and expects to cross a mount point it cam call getSymLink() avoiding catching XMountPointException.

Is this the right usage of exceptions?

Doug> we combined the value of many methods into the value of a single method with a very generic name.

The point of RPC is to make transparent the types passed and returned by a method.
As said {{Object execute(Object request)}} replaces all PRCs by one generic call.
It is like sending and receiving blob buffers and casting them into appropriate structures.
Bad practice in C, unavoidable in Cobol, cannot qualify for a remote procedure call.

> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch, symLink1.patch, symLink4.patch, symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Allen Wittenauer (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12632744#action_12632744 ] 

aw edited comment on HADOOP-4044 at 9/19/08 10:09 AM:
--------------------------------------------------------------------

chmod -H (under BSD, at least):

     -H    If the -R option is specified, symbolic links on the command line
             are followed.  (Symbolic links encountered in the tree traversal
             are not followed by default.)

This likely only makes sense if the symlink is pointing to the local file system or another HDFS, though.

BSD also supports -L which might be useful as well:

     -L      If the -R option is specified, all symbolic links are followed.


      was (Author: aw):
    chmod -H (under BSD, at least):

     -H    If the -R option is specified, symbolic links on the command line
             are followed.  (Symbolic links encountered in the tree traversal
             are not followed by default.)

This likely only makes sense if the symlink is pointing to the local file system or another HDFS, though.

  
> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch, symLink1.patch, symLink4.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12629714#action_12629714 ] 

Raghu Angadi commented on HADOOP-4044:
--------------------------------------

> catch (SymLinkNotFollowedException e) { [...]
Just to clarify. the above was only mentioned as a way for _DFSClient_ to inform _FileSystem_ to overcome limitation of return type for open().  Not as an exception thrown by NameNode to client. 

Of course, exception is not necessary if open and friends are modified to return class/struct etc.
 

> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12628717#action_12628717 ] 

Doug Cutting commented on HADOOP-4044:
--------------------------------------

> This means that a HDFS symbolic link can point only to other hdfs pathnames. 

What's the point of this restriction?  Why not permit symbolic links to arbitrary URIs?

> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12638171#action_12638171 ] 

Owen O'Malley commented on HADOOP-4044:
---------------------------------------

Doug,
   I don't see your point. Clearly this is not black and white at all or it wouldn't have caused so much disagreement among the developers.  I still think the core disagreement is whether hitting a link is enough of an exception to be reasonably implemented as a Java exception. Most of the rest of it I consider supporting arguments. *smile*

In particular, we aren't changing the FileSystem API at all. No client code should be able to detect the difference between the versions before and after. What is changing is the RPC layer and I agree that we effectively have a tagged union between the link result and the actual result. I still don't think either is very nice looking, which is why I was falling back to supporting arguments.

I think defining result types will give us significant advantages going forward.
  1. It is easy to add additional information in a forward/backward compatible manner.
  2. It is declared and strongly type checked.
  3. It doesn't throw exceptions. *grin*

> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch, symLink1.patch, symLink4.patch, symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4044) Create symbolic links in HDFS

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur updated HADOOP-4044:
-------------------------------------

    Attachment: symLink1.patch

This patch implements supporting symbolic links to files (not directories). This is attached to discuss/demonstrate the code-simplicity of this proposal. 

> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch, symLink1.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Konstantin Shvachko (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12637348#action_12637348 ] 

Konstantin Shvachko commented on HADOOP-4044:
---------------------------------------------

Owen is proposing a reasonable rename of the FSLink*** classes. But the implementations of these classes remain the same.
Going further this way we could make the API look like this:
{code}
RenameReply rename(RenameRequest arg) throws IOException;
{code}
This is universal, but not a transparent api anymore.
The next step to completely generalize (and exaggerate)  this would be to declare one single protocol method.
{code}
Reply execute(enum opCode, Request arg) throws IOException;
{code}
Then the api will have to be defined by a pair of Request, Reply classes.

> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch, symLink1.patch, symLink4.patch, symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12639888#action_12639888 ] 

Doug Cutting commented on HADOOP-4044:
--------------------------------------

Sanjay> I really don't like the extra rpc [ ... ]

What did you think about my suggestion above that we might use a cache to avoid this?  First, we implement the naive approach, benchmark it, and, it it's too slow, optimize it with a pre-fetch cache of block locations.

> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: HADOOP-4044-strawman.patch, symLink1.patch, symLink1.patch, symLink4.patch, symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Sanjay Radia (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12641174#action_12641174 ] 

Sanjay Radia commented on HADOOP-4044:
--------------------------------------

> I would like to avoid a design that incurs an overhead of an additional RPC everytime a link is traversed.

>+1. This will affect not only NNBench but all benchmarks including DFSIO and especially NNThroughputBenchmark.
>GridMix and Sort will probably be less affected, but will suffer too.

+1. I would also like to avoid an extra rpc, since avoiding one is straight forward.

Doug >What did you think about my suggestion above that we might use a cache to avoid this? First, we implement the naive approach, benchmark it, and, it it's too slow, optimize it with a pre-fetch cache of block locations.

Clearly your cache solution  deals with the extra RPC issue.
Generally I see a cache as a way of improving the  performance of an ordinarily good design or algorithm. I don't like the use of caches as  part of a design to make an algorithm work  when alternate good designs are available that don't need a cache. Would we have come up with this design if we didn't have such an emotionally charged discussion on exceptions?

We have a good design where if the resolution fails due to a symlink, we return this information to the caller. It does not require the use of a cache.
We are divided over how to return this information - use the return status or use an exception. 
The cache solution is a way to avoid making the painfully emotionally charged decision for the Hadoop community.
I don't want to explain the reason we use the cache to hadoop developers again and again down the road. 
We should not avoid the decision, but make it. 
A couple of weeks ago I was confident that a compromise vote would pass. I am hoping that the same is true now.


> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: HADOOP-4044-strawman.patch, symLink1.patch, symLink1.patch, symLink4.patch, symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12630023#action_12630023 ] 

Raghu Angadi commented on HADOOP-4044:
--------------------------------------

> In this proposal, a symbolic link can still be relative or absolute. No kickback (via exception or otherwise) is needed to implement this proposal.
Does it mean extra RPC as in the patch? If not, how does open() look?

> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4044) Create symbolic links in HDFS

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12633842#action_12633842 ] 

Doug Cutting commented on HADOOP-4044:
--------------------------------------

Some comments:
 - I don't like the 'vfs' package and naming.  Symbolic links should not be a distinguished portion of the FileSystem API, but seamlessly integrated.  So I suggest that vfs/VfsStatus be renamed to FSLinkable, vfs/VfsStatusBase to FSLink, vfs/VfsStatusBoolean to FSLinkBoolean, vfs/VfsStatusFileStatus to LinkableFileStatus, etc.  If these are to be in a separate package, it might be called 'fs/spi', since they are primarily needed only by implementors of the FileSystem API, not by users.  The protected implementation methods should be called openImpl(), appendImpl(), etc.
 - getLink() should return a Path, not a String.
 - getLink() should throw an exception when isLink() is false.
 - The check for link cycles is wrong.  If the loop starts after the first link traversed it will not be detected.  A common approach is simply to limit the number of links traversed to a constant.  Alternately you can keep a 'fast' and 'slow' pointer, incrementing the fast pointer through the list twice as fast as the slow.  If they are ever equal then there's a loop.  This will detect all loops.
 - I don't see the need for both getLink() and getRemainingPath().  Wouldn't it be simpler to always have getLink() return a fully-qualified path?  Internally a FileSystem might support relative paths, but why do we need to expose these?

Instead of repeating the link resolving loop in every method, we might use a "closure", e.g:
{code}
public FSInputStream open(Path p, final int bufferSize) throws IOException {
  return resolve(path, new FSLinkResolver<FSInputStream>() {
    FSInputStream next(Path p) throws IOException { return openImpl(p, bufferSize); }
};
{code}
where FSLinkResolver#resolve implements the loop-detection algorithm, calling #next to traverse the list.


> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch, symLink1.patch, symLink4.patch, symLink5.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (HADOOP-4044) Create symbolic links in HDFS

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12632741#action_12632741 ] 

dhruba edited comment on HADOOP-4044 at 9/19/08 10:00 AM:
--------------------------------------------------------------------

1. Can you pl point me to what chmod -H should do?

2. Creation of an symlink log shows up this way in the audit log:

ugi=dhruba,users,engineers,svnuser	ip=/1.2.3.4	cmd=createSymlink	src=/user/dhruba/foo	dst=../test1 perm=dhruba:none:rw-rw-rw-

      was (Author: dhruba):
    1. Can you pl point me to hat chmod -H choud do?

2. Creation of an symlink log shows up this way in the audit log:

ugi=dhruba,users,engineers,svnuser	ip=/1.2.3.4	cmd=createSymlink	src=/user/dhruba/foo	dst=../test1 perm=dhruba:none:rw-rw-rw-
  
> Create symbolic links in HDFS
> -----------------------------
>
>                 Key: HADOOP-4044
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4044
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: symLink1.patch, symLink1.patch, symLink4.patch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.