You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-dev@hadoop.apache.org by Eli Collins <el...@cloudera.com> on 2011/07/07 18:58:19 UTC

MR1 next steps

Hey gang,

Had some discussion about what to do with MR1 with Arun at the summit,
wanted to move it on-list.. Was thinking we should sort these out some
on mr-dev before discussing/announcing a decision on general.

The question is, now that we'll soon have MR2 merged (hurray!), to
what extent do we ant to support MR1?  By MR1 I mean the JT and TT,
not the old MR API, which MR2 supports. Ie this isn't about job API
compatibility it's about implementation compatibility (eg existing
systems which may depend on JT/TT interfaces like metrics). Here are
the options as I see them:

1. Do nothing. MR1 will continue to be a regression, both in terms of
features and stability, against the MR in 203. Eg, MR1 in trunk still
doesn't support security. We would continue to recommend people use
MR1 from 20 (and MR2 from 23). Unclear what the value of having MR1 in
trunk in this shape is.

2. Remove the MR1 code from trunk/23, and just support MR2 in 23.
People who want MR1 can use the current stable release (which, per
option 1, we would recommend even if we left the code in as is).

3. Get MR1 in trunk in shape comparable to MR in 203. This preserves
the additional changes (to JT/TT at least) that have been added in
trunk since 0.20. Not clear if anyone would want to invest the
considerable effort this would take given that we have MR2 now (and
existing releases).

4. Put the MR1 code from 203 into trunk. This overwrites the changes
added to trunk not in 203, and would require some integration, however
it would give us a solid MR1 implementation that could be used in the
same release as MR2. It would be an incompatible change wrt 21/22,
however would be compatible in the sense that there are now both valid
MR1 and MR2 options in a single release.

I think #2 makes the most sense. From a developer perspective, MR2 is
good stuff, there's no need for us to maintain two implementations in
trunk/23 since we're already maintaining MR1 in the current releases.
I'm skeptical that anyone would volunteer to do #3 (lot of work,
unclear gain) or #4 (we already maintain MR1 elsewhere).  This allows
us to focus energy on MR2 instead of investing in MR1 (eg MR-2178,
which hasn't made much progress for ages).  From a user perspective,
MR2 preserves Job compatibility, so it should just programs that talk
to the JT/TT that are affected. MR2 is a little harder to run
out-of-the-box, however we can fix that and we don't recommend people
use MR1 from 21/22/trunk anyway.

Thoughts?

Thanks,
Eli

Re: MR1 next steps

Posted by Tom White <to...@cloudera.com>.
+1 for #2 as long as the user-level MR API remains compatible.

Cheers,
Tom

On Thu, Jul 7, 2011 at 9:58 AM, Eli Collins <el...@cloudera.com> wrote:
> Hey gang,
>
> Had some discussion about what to do with MR1 with Arun at the summit,
> wanted to move it on-list.. Was thinking we should sort these out some
> on mr-dev before discussing/announcing a decision on general.
>
> The question is, now that we'll soon have MR2 merged (hurray!), to
> what extent do we ant to support MR1?  By MR1 I mean the JT and TT,
> not the old MR API, which MR2 supports. Ie this isn't about job API
> compatibility it's about implementation compatibility (eg existing
> systems which may depend on JT/TT interfaces like metrics). Here are
> the options as I see them:
>
> 1. Do nothing. MR1 will continue to be a regression, both in terms of
> features and stability, against the MR in 203. Eg, MR1 in trunk still
> doesn't support security. We would continue to recommend people use
> MR1 from 20 (and MR2 from 23). Unclear what the value of having MR1 in
> trunk in this shape is.
>
> 2. Remove the MR1 code from trunk/23, and just support MR2 in 23.
> People who want MR1 can use the current stable release (which, per
> option 1, we would recommend even if we left the code in as is).
>
> 3. Get MR1 in trunk in shape comparable to MR in 203. This preserves
> the additional changes (to JT/TT at least) that have been added in
> trunk since 0.20. Not clear if anyone would want to invest the
> considerable effort this would take given that we have MR2 now (and
> existing releases).
>
> 4. Put the MR1 code from 203 into trunk. This overwrites the changes
> added to trunk not in 203, and would require some integration, however
> it would give us a solid MR1 implementation that could be used in the
> same release as MR2. It would be an incompatible change wrt 21/22,
> however would be compatible in the sense that there are now both valid
> MR1 and MR2 options in a single release.
>
> I think #2 makes the most sense. From a developer perspective, MR2 is
> good stuff, there's no need for us to maintain two implementations in
> trunk/23 since we're already maintaining MR1 in the current releases.
> I'm skeptical that anyone would volunteer to do #3 (lot of work,
> unclear gain) or #4 (we already maintain MR1 elsewhere).  This allows
> us to focus energy on MR2 instead of investing in MR1 (eg MR-2178,
> which hasn't made much progress for ages).  From a user perspective,
> MR2 preserves Job compatibility, so it should just programs that talk
> to the JT/TT that are affected. MR2 is a little harder to run
> out-of-the-box, however we can fix that and we don't recommend people
> use MR1 from 21/22/trunk anyway.
>
> Thoughts?
>
> Thanks,
> Eli
>

Re: MR1 next steps

Posted by Robert Evans <ev...@yahoo-inc.com>.
+1 for #2, So long as there are no feature regressions, as was talked about in a different thread.

--Bobby

On 7/7/11 7:20 PM, "Luke Lu" <ll...@vicaya.com> wrote:

On Thu, Jul 7, 2011 at 9:58 AM, Eli Collins <el...@cloudera.com> wrote:
> I think #2 makes the most sense.

+1. Supporting legacy servers in 0.23 is really redundant, when MR1 is
maintained in 0.20.2xx releases. For people who want to stick to MR1
for a while, a slight revision of #2 would be back-porting critical
MR1 patches in trunk to branch-0.20-security, so some contribution to
MR1 in trunk can be preserved somewhere.

__Luke


Re: MR1 next steps

Posted by Luke Lu <ll...@vicaya.com>.
On Thu, Jul 7, 2011 at 9:58 AM, Eli Collins <el...@cloudera.com> wrote:
> I think #2 makes the most sense.

+1. Supporting legacy servers in 0.23 is really redundant, when MR1 is
maintained in 0.20.2xx releases. For people who want to stick to MR1
for a while, a slight revision of #2 would be back-porting critical
MR1 patches in trunk to branch-0.20-security, so some contribution to
MR1 in trunk can be preserved somewhere.

__Luke

Re: MR1 next steps

Posted by Eli Collins <el...@cloudera.com>.
On Wed, Jul 13, 2011 at 2:52 PM, Arun C Murthy <ac...@hortonworks.com> wrote:
>
> On Jul 13, 2011, at 10:31 AM, Eli Collins wrote:
>>
>> Combining with the MR2 merge makes sense to me.
>>
>> Aside from o.a.h.m.server, what would be removed? Seems like parts of
>> src/contrib no longer apply  (eg fs scheduler)  and others (gridmix,
>> raid) should be able to continue to work.
>
> Good point. Maybe we should do it post merge since we need to evaluate implications on contrib stuff aprior.

You're right, let's not gate the MR2 merge on MR1 removal.

It's been a week since the thread started so if no one objects I'll
start a thread on general@.

Thanks,
Eli

Re: MR1 next steps

Posted by Arun C Murthy <ac...@hortonworks.com>.
On Jul 13, 2011, at 10:31 AM, Eli Collins wrote:
> 
> Combining with the MR2 merge makes sense to me.
> 
> Aside from o.a.h.m.server, what would be removed? Seems like parts of
> src/contrib no longer apply  (eg fs scheduler)  and others (gridmix,
> raid) should be able to continue to work.

Good point. Maybe we should do it post merge since we need to evaluate implications on contrib stuff aprior.

thanks,
Arun

Re: MR1 next steps

Posted by Eli Collins <el...@cloudera.com>.
On Wed, Jul 13, 2011 at 12:32 AM, Arun C Murthy <ac...@hortonworks.com> wrote:
>
> On Jul 13, 2011, at 12:30 AM, Arun C Murthy wrote:
>
>>
>> On Jul 7, 2011, at 9:58 AM, Eli Collins wrote:
>>
>>>
>>> I think #2 makes the most sense. From a developer perspective, MR2 is
>>> good stuff, there's no need for us to maintain two implementations in
>>> trunk/23 since we're already maintaining MR1 in the current releases.
>>
>> +1
>>
>> Anyone feel otherwise?
>>
>
> If we reach consensus, I can do this as part of merging MR-279 into trunk - which (see the email I sent out on merging) is a couple of weeks away.
>

Combining with the MR2 merge makes sense to me.

Aside from o.a.h.m.server, what would be removed? Seems like parts of
src/contrib no longer apply  (eg fs scheduler)  and others (gridmix,
raid) should be able to continue to work.

Thanks,
Eli

Re: MR1 next steps

Posted by Todd Lipcon <to...@cloudera.com>.
+1 from me as well. Though it will be a bit of a pain for ops upgrading, I
think the pain of maintaining two versions is worse.

-Todd

On Wed, Jul 13, 2011 at 6:20 AM, Vinod KV <vi...@yahoo-inc.com> wrote:

>
> +1 from my side too. Looks like there is overwhelming majority from the dev
> side.
>
> May be we should call for an explicit vote. Making it a vote calls for
> attention from anyone who might've missed this. This thread seems more of a
> proposal.
>
> Thanks,
> +Vinod
>
>
>
>
> On Wednesday 13 July 2011 01:02 PM, Arun C Murthy wrote:
>
>> On Jul 13, 2011, at 12:30 AM, Arun C Murthy wrote:
>>
>>  On Jul 7, 2011, at 9:58 AM, Eli Collins wrote:
>>>
>>>  I think #2 makes the most sense. From a developer perspective, MR2 is
>>>> good stuff, there's no need for us to maintain two implementations in
>>>> trunk/23 since we're already maintaining MR1 in the current releases.
>>>>
>>> +1
>>>
>>> Anyone feel otherwise?
>>>
>>>  If we reach consensus, I can do this as part of merging MR-279 into
>> trunk - which (see the email I sent out on merging) is a couple of weeks
>> away.
>>
>> thanks,
>> Arun
>>
>
>


-- 
Todd Lipcon
Software Engineer, Cloudera

Re: MR1 next steps

Posted by Vinod KV <vi...@yahoo-inc.com>.
+1 from my side too. Looks like there is overwhelming majority from the 
dev side.

May be we should call for an explicit vote. Making it a vote calls for 
attention from anyone who might've missed this. This thread seems more 
of a proposal.

Thanks,
+Vinod



On Wednesday 13 July 2011 01:02 PM, Arun C Murthy wrote:
> On Jul 13, 2011, at 12:30 AM, Arun C Murthy wrote:
>
>> On Jul 7, 2011, at 9:58 AM, Eli Collins wrote:
>>
>>> I think #2 makes the most sense. From a developer perspective, MR2 is
>>> good stuff, there's no need for us to maintain two implementations in
>>> trunk/23 since we're already maintaining MR1 in the current releases.
>> +1
>>
>> Anyone feel otherwise?
>>
> If we reach consensus, I can do this as part of merging MR-279 into trunk - which (see the email I sent out on merging) is a couple of weeks away.
>
> thanks,
> Arun


Re: MR1 next steps

Posted by Arun C Murthy <ac...@hortonworks.com>.
On Jul 13, 2011, at 12:30 AM, Arun C Murthy wrote:

> 
> On Jul 7, 2011, at 9:58 AM, Eli Collins wrote:
> 
>> 
>> I think #2 makes the most sense. From a developer perspective, MR2 is
>> good stuff, there's no need for us to maintain two implementations in
>> trunk/23 since we're already maintaining MR1 in the current releases.
> 
> +1
> 
> Anyone feel otherwise?
> 

If we reach consensus, I can do this as part of merging MR-279 into trunk - which (see the email I sent out on merging) is a couple of weeks away.

thanks,
Arun

Re: MR1 next steps

Posted by Arun C Murthy <ac...@hortonworks.com>.
On Jul 7, 2011, at 9:58 AM, Eli Collins wrote:

> 
> I think #2 makes the most sense. From a developer perspective, MR2 is
> good stuff, there's no need for us to maintain two implementations in
> trunk/23 since we're already maintaining MR1 in the current releases.

+1

Anyone feel otherwise?

Arun


Re: MR1 next steps

Posted by Amareshwari Sri Ramadasu <am...@yahoo-inc.com>.
+1 for option #2.

On 7/7/11 10:28 PM, "Eli Collins" <el...@cloudera.com> wrote:

Hey gang,

Had some discussion about what to do with MR1 with Arun at the summit,
wanted to move it on-list.. Was thinking we should sort these out some
on mr-dev before discussing/announcing a decision on general.

The question is, now that we'll soon have MR2 merged (hurray!), to
what extent do we ant to support MR1?  By MR1 I mean the JT and TT,
not the old MR API, which MR2 supports. Ie this isn't about job API
compatibility it's about implementation compatibility (eg existing
systems which may depend on JT/TT interfaces like metrics). Here are
the options as I see them:

1. Do nothing. MR1 will continue to be a regression, both in terms of
features and stability, against the MR in 203. Eg, MR1 in trunk still
doesn't support security. We would continue to recommend people use
MR1 from 20 (and MR2 from 23). Unclear what the value of having MR1 in
trunk in this shape is.

2. Remove the MR1 code from trunk/23, and just support MR2 in 23.
People who want MR1 can use the current stable release (which, per
option 1, we would recommend even if we left the code in as is).

3. Get MR1 in trunk in shape comparable to MR in 203. This preserves
the additional changes (to JT/TT at least) that have been added in
trunk since 0.20. Not clear if anyone would want to invest the
considerable effort this would take given that we have MR2 now (and
existing releases).

4. Put the MR1 code from 203 into trunk. This overwrites the changes
added to trunk not in 203, and would require some integration, however
it would give us a solid MR1 implementation that could be used in the
same release as MR2. It would be an incompatible change wrt 21/22,
however would be compatible in the sense that there are now both valid
MR1 and MR2 options in a single release.

I think #2 makes the most sense. From a developer perspective, MR2 is
good stuff, there's no need for us to maintain two implementations in
trunk/23 since we're already maintaining MR1 in the current releases.
I'm skeptical that anyone would volunteer to do #3 (lot of work,
unclear gain) or #4 (we already maintain MR1 elsewhere).  This allows
us to focus energy on MR2 instead of investing in MR1 (eg MR-2178,
which hasn't made much progress for ages).  From a user perspective,
MR2 preserves Job compatibility, so it should just programs that talk
to the JT/TT that are affected. MR2 is a little harder to run
out-of-the-box, however we can fix that and we don't recommend people
use MR1 from 21/22/trunk anyway.

Thoughts?

Thanks,
Eli