You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by sanjay Radia <sa...@hortonworks.com> on 2013/10/03 20:39:33 UTC

Re: symlink support in Hadoop 2 GA

There are a number of issues (some minor, some more than minor).
GA is close and we are are still in discussion on the some of them; while I believe we will close on these very very shortly, code change like this so close to GA is dangerous.

I suggest we do the following:
1) Disable Symlinks  in 2.2 GA- throw unsupported exception on createSymlink in both FileSystem and FileContext.
2) Deal with the  isDir() in 2.2GA in preparation for item 3 coming after GA:
	a) Deprecate isDir()
        b) Add a new API that returns an enum (see FileContext).
3) Fix Symlinks, in a future release, hopefully the very next one after 2.2GA
   a)  change the stack to use the new API replacing isDir(). 
   b) fix isDIr() to do something smarter (we can detail this later but there is a solution that has been discussed). This helps customer applications that call isDir(). 
  c) Remove isDir in a future release when customers have had sufficient time to migrate.

sanjay

PS. J Rottinghuis expressed a similar sentiment in a previous email in this thread:



On Sep 18, 2013, at 5:11 PM, J. Rottinghuis wrote:

> I like symlink functionality, but in our migration to Hadoop 2.x this is a
> total distraction. If the APIs stay in 2.2 GA we'll have to choose to:
> a) Not uprev until symlink support is figured out up and down the stack,
> and we've been able to migrate all our 1.x (equivalent) clusters to 2.x
> (equivalent). Or
> b) rip out the API altogether. Or
> c) change the implementation to throw an UnsupportedOperationException
> I'm not sure yet which of these I like least.


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: symlink support in Hadoop 2 GA

Posted by Andrew Wang <an...@cloudera.com>.
Colin posted a summary of our phone call yesterday (attendees: myself,
Colin, Daryn, Nathan, Jason, Chris, Suresh, Sanjay) on HADOOP-9984:

https://issues.apache.org/jira/browse/HADOOP-9984?focusedCommentId=13785701&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13785701

Pasted here:


   - We discussed alternatives to
HADOOP-9984<https://issues.apache.org/jira/browse/HADOOP-9984>,
   but concluded that they weren't workable.
   - We agreed that doing the symlink resolution in each Filesystem
   subclass is what we ought to do in 9984, in order to keep compatibility
   with out-of-tree filesystems.
   - We agreed to disable symlink resolution in Hadoop 2 GA. We will spend
   a few weeks ironing out all the bugs and enable it in Hadoop 2.3. However,
   we would like to make all backwards-incompatible API changes prior to
   Hadoop 2 GA.
   - We agreed that
HADOOP-9972<https://issues.apache.org/jira/browse/HADOOP-9972> (new
   symlink-aware API for globStatus) should get into Hadoop 2 GA.
   - We discussed the issue of returning resolved paths versus unresolved
   paths, but were unable to come to any conclusion. Everyone agreed that
   there would be serious performance problems if we returned unresolved
   paths, but some claimed that programs would break when encountering
   resolved paths.


There's also a new umbrella issue at HADOOP-10019 tracking on-going
symlinks changes.

Best,
Andrew


On Thu, Oct 3, 2013 at 2:08 PM, Daryn Sharp <da...@yahoo-inc.com> wrote:

> I reluctantly agree that we should disable symlinks in 2.2 until we can
> sort out the compatibility issues.  I'm reluctant in the sense that its a
> feature users have long wanted, and it's something we'd like to use from an
> administrative view.  However I don't see all the issues being shorted out
> in the very near future.
>
> I filed some jiras today that have led me to believe that the current
> implementation of fs symlinks is irreparably flawed.  Adding optional
> primitives to filesystems to make them symlink capable is ok.  However,
> adding symlink resolution to individual filesystems is fundamentally
> broken.  It doesn't work for stacked filesystems (viewfs, chroots, filters,
> etc) because the resolution must occur at the highest level, not within an
> individual filesystem itself.  Otherwise the abstraction of the top-level
> filesystem is violated and all kinds of unexpected behavior like walking
> out of chroots becomes possible.
>
> Daryn
>
> On Oct 3, 2013, at 1:39 PM, sanjay Radia wrote:
>
> > There are a number of issues (some minor, some more than minor).
> > GA is close and we are are still in discussion on the some of them;
> while I believe we will close on these very very shortly, code change like
> this so close to GA is dangerous.
> >
> > I suggest we do the following:
> > 1) Disable Symlinks  in 2.2 GA- throw unsupported exception on
> createSymlink in both FileSystem and FileContext.
> > 2) Deal with the  isDir() in 2.2GA in preparation for item 3 coming
> after GA:
> >       a) Deprecate isDir()
> >        b) Add a new API that returns an enum (see FileContext).
> > 3) Fix Symlinks, in a future release, hopefully the very next one after
> 2.2GA
> >   a)  change the stack to use the new API replacing isDir().
> >   b) fix isDIr() to do something smarter (we can detail this later but
> there is a solution that has been discussed). This helps customer
> applications that call isDir().
> >  c) Remove isDir in a future release when customers have had sufficient
> time to migrate.
> >
> > sanjay
> >
> > PS. J Rottinghuis expressed a similar sentiment in a previous email in
> this thread:
> >
> >
> >
> > On Sep 18, 2013, at 5:11 PM, J. Rottinghuis wrote:
> >
> >> I like symlink functionality, but in our migration to Hadoop 2.x this
> is a
> >> total distraction. If the APIs stay in 2.2 GA we'll have to choose to:
> >> a) Not uprev until symlink support is figured out up and down the stack,
> >> and we've been able to migrate all our 1.x (equivalent) clusters to 2.x
> >> (equivalent). Or
> >> b) rip out the API altogether. Or
> >> c) change the implementation to throw an UnsupportedOperationException
> >> I'm not sure yet which of these I like least.
> >
> >
> > --
> > CONFIDENTIALITY NOTICE
> > NOTICE: This message is intended for the use of the individual or entity
> to
> > which it is addressed and may contain information that is confidential,
> > privileged and exempt from disclosure under applicable law. If the reader
> > of this message is not the intended recipient, you are hereby notified
> that
> > any printing, copying, dissemination, distribution, disclosure or
> > forwarding of this communication is strictly prohibited. If you have
> > received this communication in error, please contact the sender
> immediately
> > and delete it from your system. Thank You.
>
>

Re: symlink support in Hadoop 2 GA

Posted by Andrew Wang <an...@cloudera.com>.
Colin posted a summary of our phone call yesterday (attendees: myself,
Colin, Daryn, Nathan, Jason, Chris, Suresh, Sanjay) on HADOOP-9984:

https://issues.apache.org/jira/browse/HADOOP-9984?focusedCommentId=13785701&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13785701

Pasted here:


   - We discussed alternatives to
HADOOP-9984<https://issues.apache.org/jira/browse/HADOOP-9984>,
   but concluded that they weren't workable.
   - We agreed that doing the symlink resolution in each Filesystem
   subclass is what we ought to do in 9984, in order to keep compatibility
   with out-of-tree filesystems.
   - We agreed to disable symlink resolution in Hadoop 2 GA. We will spend
   a few weeks ironing out all the bugs and enable it in Hadoop 2.3. However,
   we would like to make all backwards-incompatible API changes prior to
   Hadoop 2 GA.
   - We agreed that
HADOOP-9972<https://issues.apache.org/jira/browse/HADOOP-9972> (new
   symlink-aware API for globStatus) should get into Hadoop 2 GA.
   - We discussed the issue of returning resolved paths versus unresolved
   paths, but were unable to come to any conclusion. Everyone agreed that
   there would be serious performance problems if we returned unresolved
   paths, but some claimed that programs would break when encountering
   resolved paths.


There's also a new umbrella issue at HADOOP-10019 tracking on-going
symlinks changes.

Best,
Andrew


On Thu, Oct 3, 2013 at 2:08 PM, Daryn Sharp <da...@yahoo-inc.com> wrote:

> I reluctantly agree that we should disable symlinks in 2.2 until we can
> sort out the compatibility issues.  I'm reluctant in the sense that its a
> feature users have long wanted, and it's something we'd like to use from an
> administrative view.  However I don't see all the issues being shorted out
> in the very near future.
>
> I filed some jiras today that have led me to believe that the current
> implementation of fs symlinks is irreparably flawed.  Adding optional
> primitives to filesystems to make them symlink capable is ok.  However,
> adding symlink resolution to individual filesystems is fundamentally
> broken.  It doesn't work for stacked filesystems (viewfs, chroots, filters,
> etc) because the resolution must occur at the highest level, not within an
> individual filesystem itself.  Otherwise the abstraction of the top-level
> filesystem is violated and all kinds of unexpected behavior like walking
> out of chroots becomes possible.
>
> Daryn
>
> On Oct 3, 2013, at 1:39 PM, sanjay Radia wrote:
>
> > There are a number of issues (some minor, some more than minor).
> > GA is close and we are are still in discussion on the some of them;
> while I believe we will close on these very very shortly, code change like
> this so close to GA is dangerous.
> >
> > I suggest we do the following:
> > 1) Disable Symlinks  in 2.2 GA- throw unsupported exception on
> createSymlink in both FileSystem and FileContext.
> > 2) Deal with the  isDir() in 2.2GA in preparation for item 3 coming
> after GA:
> >       a) Deprecate isDir()
> >        b) Add a new API that returns an enum (see FileContext).
> > 3) Fix Symlinks, in a future release, hopefully the very next one after
> 2.2GA
> >   a)  change the stack to use the new API replacing isDir().
> >   b) fix isDIr() to do something smarter (we can detail this later but
> there is a solution that has been discussed). This helps customer
> applications that call isDir().
> >  c) Remove isDir in a future release when customers have had sufficient
> time to migrate.
> >
> > sanjay
> >
> > PS. J Rottinghuis expressed a similar sentiment in a previous email in
> this thread:
> >
> >
> >
> > On Sep 18, 2013, at 5:11 PM, J. Rottinghuis wrote:
> >
> >> I like symlink functionality, but in our migration to Hadoop 2.x this
> is a
> >> total distraction. If the APIs stay in 2.2 GA we'll have to choose to:
> >> a) Not uprev until symlink support is figured out up and down the stack,
> >> and we've been able to migrate all our 1.x (equivalent) clusters to 2.x
> >> (equivalent). Or
> >> b) rip out the API altogether. Or
> >> c) change the implementation to throw an UnsupportedOperationException
> >> I'm not sure yet which of these I like least.
> >
> >
> > --
> > CONFIDENTIALITY NOTICE
> > NOTICE: This message is intended for the use of the individual or entity
> to
> > which it is addressed and may contain information that is confidential,
> > privileged and exempt from disclosure under applicable law. If the reader
> > of this message is not the intended recipient, you are hereby notified
> that
> > any printing, copying, dissemination, distribution, disclosure or
> > forwarding of this communication is strictly prohibited. If you have
> > received this communication in error, please contact the sender
> immediately
> > and delete it from your system. Thank You.
>
>

Re: symlink support in Hadoop 2 GA

Posted by Daryn Sharp <da...@yahoo-inc.com>.
I reluctantly agree that we should disable symlinks in 2.2 until we can sort out the compatibility issues.  I'm reluctant in the sense that its a feature users have long wanted, and it's something we'd like to use from an administrative view.  However I don't see all the issues being shorted out in the very near future.

I filed some jiras today that have led me to believe that the current implementation of fs symlinks is irreparably flawed.  Adding optional primitives to filesystems to make them symlink capable is ok.  However, adding symlink resolution to individual filesystems is fundamentally broken.  It doesn't work for stacked filesystems (viewfs, chroots, filters, etc) because the resolution must occur at the highest level, not within an individual filesystem itself.  Otherwise the abstraction of the top-level filesystem is violated and all kinds of unexpected behavior like walking out of chroots becomes possible.

Daryn

On Oct 3, 2013, at 1:39 PM, sanjay Radia wrote:

> There are a number of issues (some minor, some more than minor).
> GA is close and we are are still in discussion on the some of them; while I believe we will close on these very very shortly, code change like this so close to GA is dangerous.
> 
> I suggest we do the following:
> 1) Disable Symlinks  in 2.2 GA- throw unsupported exception on createSymlink in both FileSystem and FileContext.
> 2) Deal with the  isDir() in 2.2GA in preparation for item 3 coming after GA:
> 	a) Deprecate isDir()
>        b) Add a new API that returns an enum (see FileContext).
> 3) Fix Symlinks, in a future release, hopefully the very next one after 2.2GA
>   a)  change the stack to use the new API replacing isDir(). 
>   b) fix isDIr() to do something smarter (we can detail this later but there is a solution that has been discussed). This helps customer applications that call isDir(). 
>  c) Remove isDir in a future release when customers have had sufficient time to migrate.
> 
> sanjay
> 
> PS. J Rottinghuis expressed a similar sentiment in a previous email in this thread:
> 
> 
> 
> On Sep 18, 2013, at 5:11 PM, J. Rottinghuis wrote:
> 
>> I like symlink functionality, but in our migration to Hadoop 2.x this is a
>> total distraction. If the APIs stay in 2.2 GA we'll have to choose to:
>> a) Not uprev until symlink support is figured out up and down the stack,
>> and we've been able to migrate all our 1.x (equivalent) clusters to 2.x
>> (equivalent). Or
>> b) rip out the API altogether. Or
>> c) change the implementation to throw an UnsupportedOperationException
>> I'm not sure yet which of these I like least.
> 
> 
> -- 
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to 
> which it is addressed and may contain information that is confidential, 
> privileged and exempt from disclosure under applicable law. If the reader 
> of this message is not the intended recipient, you are hereby notified that 
> any printing, copying, dissemination, distribution, disclosure or 
> forwarding of this communication is strictly prohibited. If you have 
> received this communication in error, please contact the sender immediately 
> and delete it from your system. Thank You.


Re: symlink support in Hadoop 2 GA

Posted by Daryn Sharp <da...@yahoo-inc.com>.
I reluctantly agree that we should disable symlinks in 2.2 until we can sort out the compatibility issues.  I'm reluctant in the sense that its a feature users have long wanted, and it's something we'd like to use from an administrative view.  However I don't see all the issues being shorted out in the very near future.

I filed some jiras today that have led me to believe that the current implementation of fs symlinks is irreparably flawed.  Adding optional primitives to filesystems to make them symlink capable is ok.  However, adding symlink resolution to individual filesystems is fundamentally broken.  It doesn't work for stacked filesystems (viewfs, chroots, filters, etc) because the resolution must occur at the highest level, not within an individual filesystem itself.  Otherwise the abstraction of the top-level filesystem is violated and all kinds of unexpected behavior like walking out of chroots becomes possible.

Daryn

On Oct 3, 2013, at 1:39 PM, sanjay Radia wrote:

> There are a number of issues (some minor, some more than minor).
> GA is close and we are are still in discussion on the some of them; while I believe we will close on these very very shortly, code change like this so close to GA is dangerous.
> 
> I suggest we do the following:
> 1) Disable Symlinks  in 2.2 GA- throw unsupported exception on createSymlink in both FileSystem and FileContext.
> 2) Deal with the  isDir() in 2.2GA in preparation for item 3 coming after GA:
> 	a) Deprecate isDir()
>        b) Add a new API that returns an enum (see FileContext).
> 3) Fix Symlinks, in a future release, hopefully the very next one after 2.2GA
>   a)  change the stack to use the new API replacing isDir(). 
>   b) fix isDIr() to do something smarter (we can detail this later but there is a solution that has been discussed). This helps customer applications that call isDir(). 
>  c) Remove isDir in a future release when customers have had sufficient time to migrate.
> 
> sanjay
> 
> PS. J Rottinghuis expressed a similar sentiment in a previous email in this thread:
> 
> 
> 
> On Sep 18, 2013, at 5:11 PM, J. Rottinghuis wrote:
> 
>> I like symlink functionality, but in our migration to Hadoop 2.x this is a
>> total distraction. If the APIs stay in 2.2 GA we'll have to choose to:
>> a) Not uprev until symlink support is figured out up and down the stack,
>> and we've been able to migrate all our 1.x (equivalent) clusters to 2.x
>> (equivalent). Or
>> b) rip out the API altogether. Or
>> c) change the implementation to throw an UnsupportedOperationException
>> I'm not sure yet which of these I like least.
> 
> 
> -- 
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to 
> which it is addressed and may contain information that is confidential, 
> privileged and exempt from disclosure under applicable law. If the reader 
> of this message is not the intended recipient, you are hereby notified that 
> any printing, copying, dissemination, distribution, disclosure or 
> forwarding of this communication is strictly prohibited. If you have 
> received this communication in error, please contact the sender immediately 
> and delete it from your system. Thank You.