You are viewing a plain text version of this content. The canonical link for it is here.
Posted to general@hadoop.apache.org by Arun C Murthy <ac...@yahoo-inc.com> on 2011/04/08 00:52:44 UTC

Re: [ANNOUNCEMENT] Yahoo focusing on Apache Hadoop, discontinuing "The Yahoo Distribution of Hadoop"

On Feb 14, 2011, at 1:34 PM, Arun C Murthy wrote:
>
> As the final installment in this process, I've started a discussion on
> us contributing a re-factor of Map-Reduce in https://issues.apache.org/jira/browse/MAPREDUCE-279
> .



Hi Folks,

We wanted to share our thoughts around the co-development of the  
NextGen MapReduce branch (Jira MR-279), maintaining the branch-0.20- 
security and merging the work on the security branch with trunk.   
We've concluded that it does not make sense for us to port a very  
small subset of the work from the branch-0.20-security to the Hadoop  
mainline.  The JIRAs we don't plan to port all effect areas of the  
mainline that are going to be replaced by work in the NextGen  
MapReduce branch (http://svn.apache.org/viewvc/hadoop/mapreduce/branches/MR-279/ 
).

We've been working on the NextGen MapReduce branch (MAPREDUCE-279)  
within Apache for a while now and are excited about it's progress.  We  
think that this branch will be a huge improvement in scalability,  
performance and functionality.  We are now confident that we can get  
it ready for release in in the next few months.  We believe that the  
next major release of Apache Hadoop we will test at Yahoo will include  
the work in this branch and we are committed to merging the NextGen  
branch into the mainline after the PMC approves the merge.

Meanwhile, we have continued to find and fix bugs on branch-0.20- 
security and have been working to port that work into the Hadoop  
mainline.  Most of this work is done and we've also brought all the  
patches in from our github branch into apache subversion, so that it  
is easy for everyone to see the work remaining.  What we've found is  
that some of the work in branch-0.20-security is in code sections that  
have been completely replaced / refactored in the NextGen MapReduce  
branch.  Since we are committed to the NextGen branch, we don't think  
there is any upside in porting this code into portions of mainline we  
expect to discard. All of these JIRAs will be fixed in the NextGen  
MapReduce branch and through there ultimately in trunk (assuming the  
PMC approves the merge).

So at this point it is our intent to not port the JIRAs listed above  
to trunk, but to wait until we merge NextGen into trunk to resolve  
these issues there.  If you are interested in seeing these issues  
ported to mainline, let us know.  We are happy to help review your  
patches and explain context to anyone who is interested in doing this  
work.

Arun and Eric

Re: [ANNOUNCEMENT] Yahoo focusing on Apache Hadoop, discontinuing "The Yahoo Distribution of Hadoop"

Posted by Arun C Murthy <ac...@yahoo-inc.com>.
On Apr 8, 2011, at 11:08 AM, Todd Lipcon wrote:

> These all have patches that are pretty small, and I'd imagine would  
> apply pretty easily to trunk. Let me know if you'd like any help  
> forward-porting.
>

Thanks Todd, I'm happy to help review etc.

> The other ones, as new features/improvements, I'd agree it makes  
> sense not to waste effort re-implementing them for trunk MR, but  
> rather to make sure they're incorporated in next-gen.

Yep, exactly. Glad to know it makes sense.

thanks,
Arun

Re: [ANNOUNCEMENT] Yahoo focusing on Apache Hadoop, discontinuing "The Yahoo Distribution of Hadoop"

Posted by Eric Baldeschwieler <er...@yahoo-inc.com>.
Thanks Todd, your help with the jiras you IDed would be welcome!

---
E14 - typing on glass

On Apr 8, 2011, at 11:09 AM, "Todd Lipcon" <to...@cloudera.com> wrote:

> On Fri, Apr 8, 2011 at 10:34 AM, Arun C Murthy <ac...@yahoo-inc.com> wrote:
> 
>> 
>> On Apr 7, 2011, at 4:22 PM, Todd Lipcon wrote:
>> 
>> Is there a list available of which patches you've made this decision
>>> about? I'm curious, for example, about MAPREDUCE-2178 -- as of today, the MR
>>> security in trunk has a serious vulnerability. Do we plan on fixing it, or
>>> will the answer be that, if anyone needs security, they must update to "MR
>>> Next Gen"?
>>> 
>> 
>> Apologies if my original message was abstruse - I want to ensure that there
>> is no confusion between 'forward-port' and 'merge from yahoo-merge branch'.
>> 
>> Let me try to explain again: there are several forward ports from the
>> hadoop-0.20-2xx (branch-0.20-security) which are complete, including
>> MAPREDUCE-2178. They are currently part of the 'yahoo-merge' branch in
>> MapReduce. These are awaiting a merge into trunk. Trunk (with a few merges
>> from yahoo-merge) will have a complete security implementation.
>> 
> 
> Ah, OK, I see. That makes sense.
> 
> 
>> 
>> My message was intended to highlight some small number of features/bugs
>> which are/will-be in hadoop-0.20.2xx. Here is a nearly complete list of such
>> jiras: MAPREDUCE-517, MAPREDUCE-1872, MAPREDUCE-291, MAPREDUCE-2418,
>> MAPREDUCE-2409, MAPREDUCE-2411. I'll check to ensure there aren't others.
>> 
>> 
>> 
> Looking briefly at those, it seems that the ones that are clear bugs (with
> small fixes) should be put in the current MR implementation:
> MAPREDUCE-2411
> MAPREDUCE-2409
> MAPREDUCE-2418 (maybe)
> 
> These all have patches that are pretty small, and I'd imagine would apply
> pretty easily to trunk. Let me know if you'd like any help forward-porting.
> 
> The other ones, as new features/improvements, I'd agree it makes sense not
> to waste effort re-implementing them for trunk MR, but rather to make sure
> they're incorporated in next-gen.
> 
> -Todd
> -- 
> Todd Lipcon
> Software Engineer, Cloudera

Re: [ANNOUNCEMENT] Yahoo focusing on Apache Hadoop, discontinuing "The Yahoo Distribution of Hadoop"

Posted by Todd Lipcon <to...@cloudera.com>.
On Fri, Apr 8, 2011 at 10:34 AM, Arun C Murthy <ac...@yahoo-inc.com> wrote:

>
> On Apr 7, 2011, at 4:22 PM, Todd Lipcon wrote:
>
>  Is there a list available of which patches you've made this decision
>> about? I'm curious, for example, about MAPREDUCE-2178 -- as of today, the MR
>> security in trunk has a serious vulnerability. Do we plan on fixing it, or
>> will the answer be that, if anyone needs security, they must update to "MR
>> Next Gen"?
>>
>
> Apologies if my original message was abstruse - I want to ensure that there
> is no confusion between 'forward-port' and 'merge from yahoo-merge branch'.
>
> Let me try to explain again: there are several forward ports from the
> hadoop-0.20-2xx (branch-0.20-security) which are complete, including
> MAPREDUCE-2178. They are currently part of the 'yahoo-merge' branch in
> MapReduce. These are awaiting a merge into trunk. Trunk (with a few merges
> from yahoo-merge) will have a complete security implementation.
>

Ah, OK, I see. That makes sense.


>
> My message was intended to highlight some small number of features/bugs
> which are/will-be in hadoop-0.20.2xx. Here is a nearly complete list of such
> jiras: MAPREDUCE-517, MAPREDUCE-1872, MAPREDUCE-291, MAPREDUCE-2418,
> MAPREDUCE-2409, MAPREDUCE-2411. I'll check to ensure there aren't others.
>
>
>
Looking briefly at those, it seems that the ones that are clear bugs (with
small fixes) should be put in the current MR implementation:
MAPREDUCE-2411
MAPREDUCE-2409
MAPREDUCE-2418 (maybe)

These all have patches that are pretty small, and I'd imagine would apply
pretty easily to trunk. Let me know if you'd like any help forward-porting.

The other ones, as new features/improvements, I'd agree it makes sense not
to waste effort re-implementing them for trunk MR, but rather to make sure
they're incorporated in next-gen.

-Todd
-- 
Todd Lipcon
Software Engineer, Cloudera

Re: [ANNOUNCEMENT] Yahoo focusing on Apache Hadoop, discontinuing "The Yahoo Distribution of Hadoop"

Posted by Arun C Murthy <ac...@yahoo-inc.com>.
Todd,

On Apr 7, 2011, at 4:22 PM, Todd Lipcon wrote:

> Is there a list available of which patches you've made this decision  
> about? I'm curious, for example, about MAPREDUCE-2178 -- as of  
> today, the MR security in trunk has a serious vulnerability. Do we  
> plan on fixing it, or will the answer be that, if anyone needs  
> security, they must update to "MR Next Gen"?

Apologies if my original message was abstruse - I want to ensure that  
there is no confusion between 'forward-port' and 'merge from yahoo- 
merge branch'.

Let me try to explain again: there are several forward ports from the  
hadoop-0.20-2xx (branch-0.20-security) which are complete, including  
MAPREDUCE-2178. They are currently part of the 'yahoo-merge' branch in  
MapReduce. These are awaiting a merge into trunk. Trunk (with a few  
merges from yahoo-merge) will have a complete security implementation.

My message was intended to highlight some small number of features/ 
bugs which are/will-be in hadoop-0.20.2xx. Here is a nearly complete  
list of such jiras: MAPREDUCE-517, MAPREDUCE-1872, MAPREDUCE-291,  
MAPREDUCE-2418, MAPREDUCE-2409, MAPREDUCE-2411. I'll check to ensure  
there aren't others.	

Hope that makes sense. Again, apologies for any confusion I've caused.

thanks,
Arun


Re: [ANNOUNCEMENT] Yahoo focusing on Apache Hadoop, discontinuing "The Yahoo Distribution of Hadoop"

Posted by Todd Lipcon <to...@cloudera.com>.
Is there a list available of which patches you've made this decision about?
I'm curious, for example, about MAPREDUCE-2178 -- as of today, the MR
security in trunk has a serious vulnerability. Do we plan on fixing it, or
will the answer be that, if anyone needs security, they must update to "MR
Next Gen"?

-Todd

On Thu, Apr 7, 2011 at 3:52 PM, Arun C Murthy <ac...@yahoo-inc.com> wrote:

>
> On Feb 14, 2011, at 1:34 PM, Arun C Murthy wrote:
>
>>
>> As the final installment in this process, I've started a discussion on
>> us contributing a re-factor of Map-Reduce in
>> https://issues.apache.org/jira/browse/MAPREDUCE-279
>> .
>>
>
>
>
> Hi Folks,
>
> We wanted to share our thoughts around the co-development of the NextGen
> MapReduce branch (Jira MR-279), maintaining the branch-0.20-security and
> merging the work on the security branch with trunk.  We've concluded that it
> does not make sense for us to port a very small subset of the work from the
> branch-0.20-security to the Hadoop mainline.  The JIRAs we don't plan to
> port all effect areas of the mainline that are going to be replaced by work
> in the NextGen MapReduce branch (
> http://svn.apache.org/viewvc/hadoop/mapreduce/branches/MR-279/).
>
> We've been working on the NextGen MapReduce branch (MAPREDUCE-279) within
> Apache for a while now and are excited about it's progress.  We think that
> this branch will be a huge improvement in scalability, performance and
> functionality.  We are now confident that we can get it ready for release in
> in the next few months.  We believe that the next major release of Apache
> Hadoop we will test at Yahoo will include the work in this branch and we are
> committed to merging the NextGen branch into the mainline after the PMC
> approves the merge.
>
> Meanwhile, we have continued to find and fix bugs on branch-0.20-security
> and have been working to port that work into the Hadoop mainline.  Most of
> this work is done and we've also brought all the patches in from our github
> branch into apache subversion, so that it is easy for everyone to see the
> work remaining.  What we've found is that some of the work in
> branch-0.20-security is in code sections that have been completely replaced
> / refactored in the NextGen MapReduce branch.  Since we are committed to the
> NextGen branch, we don't think there is any upside in porting this code into
> portions of mainline we expect to discard. All of these JIRAs will be fixed
> in the NextGen MapReduce branch and through there ultimately in trunk
> (assuming the PMC approves the merge).
>
> So at this point it is our intent to not port the JIRAs listed above to
> trunk, but to wait until we merge NextGen into trunk to resolve these issues
> there.  If you are interested in seeing these issues ported to mainline, let
> us know.  We are happy to help review your patches and explain context to
> anyone who is interested in doing this work.
>
> Arun and Eric
>



-- 
Todd Lipcon
Software Engineer, Cloudera