You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by Tsuyoshi OZAWA <oz...@gmail.com> on 2013/07/26 08:13:58 UTC

Abstraction layer to support both YARN and Mesos

Hi,

Now, Apache Mesos, an distributed resource manager, is top-level
apache project. Meanwhile, As you know, Hadoop has own resource
manager - YARN. IMHO, we should make resource manager pluggable in
MRv2, because there are their own field users of MapReduce would like
to use. I think this work is useful for MapReduce users. On the other
hand, this work can also be large, because MRv2's code base is tightly
coupled with YARN currently. Thoughts?

- Tsuyoshi

Re: Abstraction layer to support both YARN and Mesos

Posted by Tsuyoshi OZAWA <oz...@gmail.com>.
Harsh, yes, I know what you mean :-) Never mind. We should discuss
this topic with MR users.

On Tue, Jul 30, 2013 at 12:08 AM, Michael Segel
<ms...@hotmail.com> wrote:
> Actually,
> I am interested.
>
> Lots of different Apache top level projects seem to overlap and it can be confusing.
> Its very easy for a good technology to get starved because no one asks how to combine these features in to the framework.
>
> On Jul 29, 2013, at 10:06 AM, Michael Segel <ms...@segel.com> wrote:
>
>> Actually,
>> I am interested.
>>
>> Lots of different Apache top level projects seem to overlap and it can be confusing.
>> Its very easy for a good technology to get starved because no one asks how to combine these features in to the framework.
>>
>>
>> On Jul 29, 2013, at 9:58 AM, Tsuyoshi OZAWA <oz...@gmail.com> wrote:
>>
>>> I thought some high availability and resource isolation features in
>>> Mesos are more matured. If no one is interested in this topic, MR
>>> should go with YARN.
>>>
>>> On Fri, Jul 26, 2013 at 7:14 PM, Harsh J <ha...@cloudera.com> wrote:
>>>> Do we have a good reason to prefer Mesos over YARN for scheduling MR
>>>> specifically? At what times would one prefer the other?
>>>>
>>>> On Fri, Jul 26, 2013 at 11:43 AM, Tsuyoshi OZAWA
>>>> <oz...@gmail.com> wrote:
>>>>> Hi,
>>>>>
>>>>> Now, Apache Mesos, an distributed resource manager, is top-level
>>>>> apache project. Meanwhile, As you know, Hadoop has own resource
>>>>> manager - YARN. IMHO, we should make resource manager pluggable in
>>>>> MRv2, because there are their own field users of MapReduce would like
>>>>> to use. I think this work is useful for MapReduce users. On the other
>>>>> hand, this work can also be large, because MRv2's code base is tightly
>>>>> coupled with YARN currently. Thoughts?
>>>>>
>>>>> - Tsuyoshi
>>>>
>>>>
>>>>
>>>> --
>>>> Harsh J
>>>
>>>
>>>
>>> --
>>> - Tsuyoshi
>>>
>>
>



-- 
- Tsuyoshi

Re: Abstraction layer to support both YARN and Mesos

Posted by Tsuyoshi OZAWA <oz...@gmail.com>.
Harsh, yes, I know what you mean :-) Never mind. We should discuss
this topic with MR users.

On Tue, Jul 30, 2013 at 12:08 AM, Michael Segel
<ms...@hotmail.com> wrote:
> Actually,
> I am interested.
>
> Lots of different Apache top level projects seem to overlap and it can be confusing.
> Its very easy for a good technology to get starved because no one asks how to combine these features in to the framework.
>
> On Jul 29, 2013, at 10:06 AM, Michael Segel <ms...@segel.com> wrote:
>
>> Actually,
>> I am interested.
>>
>> Lots of different Apache top level projects seem to overlap and it can be confusing.
>> Its very easy for a good technology to get starved because no one asks how to combine these features in to the framework.
>>
>>
>> On Jul 29, 2013, at 9:58 AM, Tsuyoshi OZAWA <oz...@gmail.com> wrote:
>>
>>> I thought some high availability and resource isolation features in
>>> Mesos are more matured. If no one is interested in this topic, MR
>>> should go with YARN.
>>>
>>> On Fri, Jul 26, 2013 at 7:14 PM, Harsh J <ha...@cloudera.com> wrote:
>>>> Do we have a good reason to prefer Mesos over YARN for scheduling MR
>>>> specifically? At what times would one prefer the other?
>>>>
>>>> On Fri, Jul 26, 2013 at 11:43 AM, Tsuyoshi OZAWA
>>>> <oz...@gmail.com> wrote:
>>>>> Hi,
>>>>>
>>>>> Now, Apache Mesos, an distributed resource manager, is top-level
>>>>> apache project. Meanwhile, As you know, Hadoop has own resource
>>>>> manager - YARN. IMHO, we should make resource manager pluggable in
>>>>> MRv2, because there are their own field users of MapReduce would like
>>>>> to use. I think this work is useful for MapReduce users. On the other
>>>>> hand, this work can also be large, because MRv2's code base is tightly
>>>>> coupled with YARN currently. Thoughts?
>>>>>
>>>>> - Tsuyoshi
>>>>
>>>>
>>>>
>>>> --
>>>> Harsh J
>>>
>>>
>>>
>>> --
>>> - Tsuyoshi
>>>
>>
>



-- 
- Tsuyoshi

Re: Abstraction layer to support both YARN and Mesos

Posted by Tsuyoshi OZAWA <oz...@gmail.com>.
Harsh, yes, I know what you mean :-) Never mind. We should discuss
this topic with MR users.

On Tue, Jul 30, 2013 at 12:08 AM, Michael Segel
<ms...@hotmail.com> wrote:
> Actually,
> I am interested.
>
> Lots of different Apache top level projects seem to overlap and it can be confusing.
> Its very easy for a good technology to get starved because no one asks how to combine these features in to the framework.
>
> On Jul 29, 2013, at 10:06 AM, Michael Segel <ms...@segel.com> wrote:
>
>> Actually,
>> I am interested.
>>
>> Lots of different Apache top level projects seem to overlap and it can be confusing.
>> Its very easy for a good technology to get starved because no one asks how to combine these features in to the framework.
>>
>>
>> On Jul 29, 2013, at 9:58 AM, Tsuyoshi OZAWA <oz...@gmail.com> wrote:
>>
>>> I thought some high availability and resource isolation features in
>>> Mesos are more matured. If no one is interested in this topic, MR
>>> should go with YARN.
>>>
>>> On Fri, Jul 26, 2013 at 7:14 PM, Harsh J <ha...@cloudera.com> wrote:
>>>> Do we have a good reason to prefer Mesos over YARN for scheduling MR
>>>> specifically? At what times would one prefer the other?
>>>>
>>>> On Fri, Jul 26, 2013 at 11:43 AM, Tsuyoshi OZAWA
>>>> <oz...@gmail.com> wrote:
>>>>> Hi,
>>>>>
>>>>> Now, Apache Mesos, an distributed resource manager, is top-level
>>>>> apache project. Meanwhile, As you know, Hadoop has own resource
>>>>> manager - YARN. IMHO, we should make resource manager pluggable in
>>>>> MRv2, because there are their own field users of MapReduce would like
>>>>> to use. I think this work is useful for MapReduce users. On the other
>>>>> hand, this work can also be large, because MRv2's code base is tightly
>>>>> coupled with YARN currently. Thoughts?
>>>>>
>>>>> - Tsuyoshi
>>>>
>>>>
>>>>
>>>> --
>>>> Harsh J
>>>
>>>
>>>
>>> --
>>> - Tsuyoshi
>>>
>>
>



-- 
- Tsuyoshi

Re: Abstraction layer to support both YARN and Mesos

Posted by Tsuyoshi OZAWA <oz...@gmail.com>.
Harsh, yes, I know what you mean :-) Never mind. We should discuss
this topic with MR users.

On Tue, Jul 30, 2013 at 12:08 AM, Michael Segel
<ms...@hotmail.com> wrote:
> Actually,
> I am interested.
>
> Lots of different Apache top level projects seem to overlap and it can be confusing.
> Its very easy for a good technology to get starved because no one asks how to combine these features in to the framework.
>
> On Jul 29, 2013, at 10:06 AM, Michael Segel <ms...@segel.com> wrote:
>
>> Actually,
>> I am interested.
>>
>> Lots of different Apache top level projects seem to overlap and it can be confusing.
>> Its very easy for a good technology to get starved because no one asks how to combine these features in to the framework.
>>
>>
>> On Jul 29, 2013, at 9:58 AM, Tsuyoshi OZAWA <oz...@gmail.com> wrote:
>>
>>> I thought some high availability and resource isolation features in
>>> Mesos are more matured. If no one is interested in this topic, MR
>>> should go with YARN.
>>>
>>> On Fri, Jul 26, 2013 at 7:14 PM, Harsh J <ha...@cloudera.com> wrote:
>>>> Do we have a good reason to prefer Mesos over YARN for scheduling MR
>>>> specifically? At what times would one prefer the other?
>>>>
>>>> On Fri, Jul 26, 2013 at 11:43 AM, Tsuyoshi OZAWA
>>>> <oz...@gmail.com> wrote:
>>>>> Hi,
>>>>>
>>>>> Now, Apache Mesos, an distributed resource manager, is top-level
>>>>> apache project. Meanwhile, As you know, Hadoop has own resource
>>>>> manager - YARN. IMHO, we should make resource manager pluggable in
>>>>> MRv2, because there are their own field users of MapReduce would like
>>>>> to use. I think this work is useful for MapReduce users. On the other
>>>>> hand, this work can also be large, because MRv2's code base is tightly
>>>>> coupled with YARN currently. Thoughts?
>>>>>
>>>>> - Tsuyoshi
>>>>
>>>>
>>>>
>>>> --
>>>> Harsh J
>>>
>>>
>>>
>>> --
>>> - Tsuyoshi
>>>
>>
>



-- 
- Tsuyoshi

Re: Abstraction layer to support both YARN and Mesos

Posted by Tsuyoshi OZAWA <oz...@gmail.com>.
Harsh, yes, I know what you mean :-) Never mind. We should discuss
this topic with MR users.

On Tue, Jul 30, 2013 at 12:08 AM, Michael Segel
<ms...@hotmail.com> wrote:
> Actually,
> I am interested.
>
> Lots of different Apache top level projects seem to overlap and it can be confusing.
> Its very easy for a good technology to get starved because no one asks how to combine these features in to the framework.
>
> On Jul 29, 2013, at 10:06 AM, Michael Segel <ms...@segel.com> wrote:
>
>> Actually,
>> I am interested.
>>
>> Lots of different Apache top level projects seem to overlap and it can be confusing.
>> Its very easy for a good technology to get starved because no one asks how to combine these features in to the framework.
>>
>>
>> On Jul 29, 2013, at 9:58 AM, Tsuyoshi OZAWA <oz...@gmail.com> wrote:
>>
>>> I thought some high availability and resource isolation features in
>>> Mesos are more matured. If no one is interested in this topic, MR
>>> should go with YARN.
>>>
>>> On Fri, Jul 26, 2013 at 7:14 PM, Harsh J <ha...@cloudera.com> wrote:
>>>> Do we have a good reason to prefer Mesos over YARN for scheduling MR
>>>> specifically? At what times would one prefer the other?
>>>>
>>>> On Fri, Jul 26, 2013 at 11:43 AM, Tsuyoshi OZAWA
>>>> <oz...@gmail.com> wrote:
>>>>> Hi,
>>>>>
>>>>> Now, Apache Mesos, an distributed resource manager, is top-level
>>>>> apache project. Meanwhile, As you know, Hadoop has own resource
>>>>> manager - YARN. IMHO, we should make resource manager pluggable in
>>>>> MRv2, because there are their own field users of MapReduce would like
>>>>> to use. I think this work is useful for MapReduce users. On the other
>>>>> hand, this work can also be large, because MRv2's code base is tightly
>>>>> coupled with YARN currently. Thoughts?
>>>>>
>>>>> - Tsuyoshi
>>>>
>>>>
>>>>
>>>> --
>>>> Harsh J
>>>
>>>
>>>
>>> --
>>> - Tsuyoshi
>>>
>>
>



-- 
- Tsuyoshi

Re: Abstraction layer to support both YARN and Mesos

Posted by Tsuyoshi OZAWA <oz...@gmail.com>.
Harsh, yes, I know what you mean :-) Never mind. We should discuss
this topic with MR users.

On Tue, Jul 30, 2013 at 12:08 AM, Michael Segel
<ms...@hotmail.com> wrote:
> Actually,
> I am interested.
>
> Lots of different Apache top level projects seem to overlap and it can be confusing.
> Its very easy for a good technology to get starved because no one asks how to combine these features in to the framework.
>
> On Jul 29, 2013, at 10:06 AM, Michael Segel <ms...@segel.com> wrote:
>
>> Actually,
>> I am interested.
>>
>> Lots of different Apache top level projects seem to overlap and it can be confusing.
>> Its very easy for a good technology to get starved because no one asks how to combine these features in to the framework.
>>
>>
>> On Jul 29, 2013, at 9:58 AM, Tsuyoshi OZAWA <oz...@gmail.com> wrote:
>>
>>> I thought some high availability and resource isolation features in
>>> Mesos are more matured. If no one is interested in this topic, MR
>>> should go with YARN.
>>>
>>> On Fri, Jul 26, 2013 at 7:14 PM, Harsh J <ha...@cloudera.com> wrote:
>>>> Do we have a good reason to prefer Mesos over YARN for scheduling MR
>>>> specifically? At what times would one prefer the other?
>>>>
>>>> On Fri, Jul 26, 2013 at 11:43 AM, Tsuyoshi OZAWA
>>>> <oz...@gmail.com> wrote:
>>>>> Hi,
>>>>>
>>>>> Now, Apache Mesos, an distributed resource manager, is top-level
>>>>> apache project. Meanwhile, As you know, Hadoop has own resource
>>>>> manager - YARN. IMHO, we should make resource manager pluggable in
>>>>> MRv2, because there are their own field users of MapReduce would like
>>>>> to use. I think this work is useful for MapReduce users. On the other
>>>>> hand, this work can also be large, because MRv2's code base is tightly
>>>>> coupled with YARN currently. Thoughts?
>>>>>
>>>>> - Tsuyoshi
>>>>
>>>>
>>>>
>>>> --
>>>> Harsh J
>>>
>>>
>>>
>>> --
>>> - Tsuyoshi
>>>
>>
>



-- 
- Tsuyoshi

Re: Abstraction layer to support both YARN and Mesos

Posted by Tsuyoshi OZAWA <oz...@gmail.com>.
Harsh, yes, I know what you mean :-) Never mind. We should discuss
this topic with MR users.

On Tue, Jul 30, 2013 at 12:08 AM, Michael Segel
<ms...@hotmail.com> wrote:
> Actually,
> I am interested.
>
> Lots of different Apache top level projects seem to overlap and it can be confusing.
> Its very easy for a good technology to get starved because no one asks how to combine these features in to the framework.
>
> On Jul 29, 2013, at 10:06 AM, Michael Segel <ms...@segel.com> wrote:
>
>> Actually,
>> I am interested.
>>
>> Lots of different Apache top level projects seem to overlap and it can be confusing.
>> Its very easy for a good technology to get starved because no one asks how to combine these features in to the framework.
>>
>>
>> On Jul 29, 2013, at 9:58 AM, Tsuyoshi OZAWA <oz...@gmail.com> wrote:
>>
>>> I thought some high availability and resource isolation features in
>>> Mesos are more matured. If no one is interested in this topic, MR
>>> should go with YARN.
>>>
>>> On Fri, Jul 26, 2013 at 7:14 PM, Harsh J <ha...@cloudera.com> wrote:
>>>> Do we have a good reason to prefer Mesos over YARN for scheduling MR
>>>> specifically? At what times would one prefer the other?
>>>>
>>>> On Fri, Jul 26, 2013 at 11:43 AM, Tsuyoshi OZAWA
>>>> <oz...@gmail.com> wrote:
>>>>> Hi,
>>>>>
>>>>> Now, Apache Mesos, an distributed resource manager, is top-level
>>>>> apache project. Meanwhile, As you know, Hadoop has own resource
>>>>> manager - YARN. IMHO, we should make resource manager pluggable in
>>>>> MRv2, because there are their own field users of MapReduce would like
>>>>> to use. I think this work is useful for MapReduce users. On the other
>>>>> hand, this work can also be large, because MRv2's code base is tightly
>>>>> coupled with YARN currently. Thoughts?
>>>>>
>>>>> - Tsuyoshi
>>>>
>>>>
>>>>
>>>> --
>>>> Harsh J
>>>
>>>
>>>
>>> --
>>> - Tsuyoshi
>>>
>>
>



-- 
- Tsuyoshi

Re: Abstraction layer to support both YARN and Mesos

Posted by Michael Segel <ms...@hotmail.com>.
Actually,
I am interested.

Lots of different Apache top level projects seem to overlap and it can be confusing.
Its very easy for a good technology to get starved because no one asks how to combine these features in to the framework.

On Jul 29, 2013, at 10:06 AM, Michael Segel <ms...@segel.com> wrote:

> Actually, 
> I am interested. 
> 
> Lots of different Apache top level projects seem to overlap and it can be confusing. 
> Its very easy for a good technology to get starved because no one asks how to combine these features in to the framework.
> 
> 
> On Jul 29, 2013, at 9:58 AM, Tsuyoshi OZAWA <oz...@gmail.com> wrote:
> 
>> I thought some high availability and resource isolation features in
>> Mesos are more matured. If no one is interested in this topic, MR
>> should go with YARN.
>> 
>> On Fri, Jul 26, 2013 at 7:14 PM, Harsh J <ha...@cloudera.com> wrote:
>>> Do we have a good reason to prefer Mesos over YARN for scheduling MR
>>> specifically? At what times would one prefer the other?
>>> 
>>> On Fri, Jul 26, 2013 at 11:43 AM, Tsuyoshi OZAWA
>>> <oz...@gmail.com> wrote:
>>>> Hi,
>>>> 
>>>> Now, Apache Mesos, an distributed resource manager, is top-level
>>>> apache project. Meanwhile, As you know, Hadoop has own resource
>>>> manager - YARN. IMHO, we should make resource manager pluggable in
>>>> MRv2, because there are their own field users of MapReduce would like
>>>> to use. I think this work is useful for MapReduce users. On the other
>>>> hand, this work can also be large, because MRv2's code base is tightly
>>>> coupled with YARN currently. Thoughts?
>>>> 
>>>> - Tsuyoshi
>>> 
>>> 
>>> 
>>> --
>>> Harsh J
>> 
>> 
>> 
>> -- 
>> - Tsuyoshi
>> 
> 


Re: Abstraction layer to support both YARN and Mesos

Posted by Michael Segel <ms...@hotmail.com>.
Actually,
I am interested.

Lots of different Apache top level projects seem to overlap and it can be confusing.
Its very easy for a good technology to get starved because no one asks how to combine these features in to the framework.

On Jul 29, 2013, at 10:06 AM, Michael Segel <ms...@segel.com> wrote:

> Actually, 
> I am interested. 
> 
> Lots of different Apache top level projects seem to overlap and it can be confusing. 
> Its very easy for a good technology to get starved because no one asks how to combine these features in to the framework.
> 
> 
> On Jul 29, 2013, at 9:58 AM, Tsuyoshi OZAWA <oz...@gmail.com> wrote:
> 
>> I thought some high availability and resource isolation features in
>> Mesos are more matured. If no one is interested in this topic, MR
>> should go with YARN.
>> 
>> On Fri, Jul 26, 2013 at 7:14 PM, Harsh J <ha...@cloudera.com> wrote:
>>> Do we have a good reason to prefer Mesos over YARN for scheduling MR
>>> specifically? At what times would one prefer the other?
>>> 
>>> On Fri, Jul 26, 2013 at 11:43 AM, Tsuyoshi OZAWA
>>> <oz...@gmail.com> wrote:
>>>> Hi,
>>>> 
>>>> Now, Apache Mesos, an distributed resource manager, is top-level
>>>> apache project. Meanwhile, As you know, Hadoop has own resource
>>>> manager - YARN. IMHO, we should make resource manager pluggable in
>>>> MRv2, because there are their own field users of MapReduce would like
>>>> to use. I think this work is useful for MapReduce users. On the other
>>>> hand, this work can also be large, because MRv2's code base is tightly
>>>> coupled with YARN currently. Thoughts?
>>>> 
>>>> - Tsuyoshi
>>> 
>>> 
>>> 
>>> --
>>> Harsh J
>> 
>> 
>> 
>> -- 
>> - Tsuyoshi
>> 
> 


Re: Abstraction layer to support both YARN and Mesos

Posted by Michael Segel <ms...@hotmail.com>.
Actually,
I am interested.

Lots of different Apache top level projects seem to overlap and it can be confusing.
Its very easy for a good technology to get starved because no one asks how to combine these features in to the framework.

On Jul 29, 2013, at 10:06 AM, Michael Segel <ms...@segel.com> wrote:

> Actually, 
> I am interested. 
> 
> Lots of different Apache top level projects seem to overlap and it can be confusing. 
> Its very easy for a good technology to get starved because no one asks how to combine these features in to the framework.
> 
> 
> On Jul 29, 2013, at 9:58 AM, Tsuyoshi OZAWA <oz...@gmail.com> wrote:
> 
>> I thought some high availability and resource isolation features in
>> Mesos are more matured. If no one is interested in this topic, MR
>> should go with YARN.
>> 
>> On Fri, Jul 26, 2013 at 7:14 PM, Harsh J <ha...@cloudera.com> wrote:
>>> Do we have a good reason to prefer Mesos over YARN for scheduling MR
>>> specifically? At what times would one prefer the other?
>>> 
>>> On Fri, Jul 26, 2013 at 11:43 AM, Tsuyoshi OZAWA
>>> <oz...@gmail.com> wrote:
>>>> Hi,
>>>> 
>>>> Now, Apache Mesos, an distributed resource manager, is top-level
>>>> apache project. Meanwhile, As you know, Hadoop has own resource
>>>> manager - YARN. IMHO, we should make resource manager pluggable in
>>>> MRv2, because there are their own field users of MapReduce would like
>>>> to use. I think this work is useful for MapReduce users. On the other
>>>> hand, this work can also be large, because MRv2's code base is tightly
>>>> coupled with YARN currently. Thoughts?
>>>> 
>>>> - Tsuyoshi
>>> 
>>> 
>>> 
>>> --
>>> Harsh J
>> 
>> 
>> 
>> -- 
>> - Tsuyoshi
>> 
> 


Re: Abstraction layer to support both YARN and Mesos

Posted by Michael Segel <ms...@hotmail.com>.
Actually,
I am interested.

Lots of different Apache top level projects seem to overlap and it can be confusing.
Its very easy for a good technology to get starved because no one asks how to combine these features in to the framework.

On Jul 29, 2013, at 10:06 AM, Michael Segel <ms...@segel.com> wrote:

> Actually, 
> I am interested. 
> 
> Lots of different Apache top level projects seem to overlap and it can be confusing. 
> Its very easy for a good technology to get starved because no one asks how to combine these features in to the framework.
> 
> 
> On Jul 29, 2013, at 9:58 AM, Tsuyoshi OZAWA <oz...@gmail.com> wrote:
> 
>> I thought some high availability and resource isolation features in
>> Mesos are more matured. If no one is interested in this topic, MR
>> should go with YARN.
>> 
>> On Fri, Jul 26, 2013 at 7:14 PM, Harsh J <ha...@cloudera.com> wrote:
>>> Do we have a good reason to prefer Mesos over YARN for scheduling MR
>>> specifically? At what times would one prefer the other?
>>> 
>>> On Fri, Jul 26, 2013 at 11:43 AM, Tsuyoshi OZAWA
>>> <oz...@gmail.com> wrote:
>>>> Hi,
>>>> 
>>>> Now, Apache Mesos, an distributed resource manager, is top-level
>>>> apache project. Meanwhile, As you know, Hadoop has own resource
>>>> manager - YARN. IMHO, we should make resource manager pluggable in
>>>> MRv2, because there are their own field users of MapReduce would like
>>>> to use. I think this work is useful for MapReduce users. On the other
>>>> hand, this work can also be large, because MRv2's code base is tightly
>>>> coupled with YARN currently. Thoughts?
>>>> 
>>>> - Tsuyoshi
>>> 
>>> 
>>> 
>>> --
>>> Harsh J
>> 
>> 
>> 
>> -- 
>> - Tsuyoshi
>> 
> 


Re: Abstraction layer to support both YARN and Mesos

Posted by Michael Segel <ms...@hotmail.com>.
Actually,
I am interested.

Lots of different Apache top level projects seem to overlap and it can be confusing.
Its very easy for a good technology to get starved because no one asks how to combine these features in to the framework.

On Jul 29, 2013, at 10:06 AM, Michael Segel <ms...@segel.com> wrote:

> Actually, 
> I am interested. 
> 
> Lots of different Apache top level projects seem to overlap and it can be confusing. 
> Its very easy for a good technology to get starved because no one asks how to combine these features in to the framework.
> 
> 
> On Jul 29, 2013, at 9:58 AM, Tsuyoshi OZAWA <oz...@gmail.com> wrote:
> 
>> I thought some high availability and resource isolation features in
>> Mesos are more matured. If no one is interested in this topic, MR
>> should go with YARN.
>> 
>> On Fri, Jul 26, 2013 at 7:14 PM, Harsh J <ha...@cloudera.com> wrote:
>>> Do we have a good reason to prefer Mesos over YARN for scheduling MR
>>> specifically? At what times would one prefer the other?
>>> 
>>> On Fri, Jul 26, 2013 at 11:43 AM, Tsuyoshi OZAWA
>>> <oz...@gmail.com> wrote:
>>>> Hi,
>>>> 
>>>> Now, Apache Mesos, an distributed resource manager, is top-level
>>>> apache project. Meanwhile, As you know, Hadoop has own resource
>>>> manager - YARN. IMHO, we should make resource manager pluggable in
>>>> MRv2, because there are their own field users of MapReduce would like
>>>> to use. I think this work is useful for MapReduce users. On the other
>>>> hand, this work can also be large, because MRv2's code base is tightly
>>>> coupled with YARN currently. Thoughts?
>>>> 
>>>> - Tsuyoshi
>>> 
>>> 
>>> 
>>> --
>>> Harsh J
>> 
>> 
>> 
>> -- 
>> - Tsuyoshi
>> 
> 


Re: Abstraction layer to support both YARN and Mesos

Posted by Michael Segel <ms...@hotmail.com>.
Actually,
I am interested.

Lots of different Apache top level projects seem to overlap and it can be confusing.
Its very easy for a good technology to get starved because no one asks how to combine these features in to the framework.

On Jul 29, 2013, at 10:06 AM, Michael Segel <ms...@segel.com> wrote:

> Actually, 
> I am interested. 
> 
> Lots of different Apache top level projects seem to overlap and it can be confusing. 
> Its very easy for a good technology to get starved because no one asks how to combine these features in to the framework.
> 
> 
> On Jul 29, 2013, at 9:58 AM, Tsuyoshi OZAWA <oz...@gmail.com> wrote:
> 
>> I thought some high availability and resource isolation features in
>> Mesos are more matured. If no one is interested in this topic, MR
>> should go with YARN.
>> 
>> On Fri, Jul 26, 2013 at 7:14 PM, Harsh J <ha...@cloudera.com> wrote:
>>> Do we have a good reason to prefer Mesos over YARN for scheduling MR
>>> specifically? At what times would one prefer the other?
>>> 
>>> On Fri, Jul 26, 2013 at 11:43 AM, Tsuyoshi OZAWA
>>> <oz...@gmail.com> wrote:
>>>> Hi,
>>>> 
>>>> Now, Apache Mesos, an distributed resource manager, is top-level
>>>> apache project. Meanwhile, As you know, Hadoop has own resource
>>>> manager - YARN. IMHO, we should make resource manager pluggable in
>>>> MRv2, because there are their own field users of MapReduce would like
>>>> to use. I think this work is useful for MapReduce users. On the other
>>>> hand, this work can also be large, because MRv2's code base is tightly
>>>> coupled with YARN currently. Thoughts?
>>>> 
>>>> - Tsuyoshi
>>> 
>>> 
>>> 
>>> --
>>> Harsh J
>> 
>> 
>> 
>> -- 
>> - Tsuyoshi
>> 
> 


Re: Abstraction layer to support both YARN and Mesos

Posted by Michael Segel <ms...@segel.com>.
Actually, 
I am interested. 

Lots of different Apache top level projects seem to overlap and it can be confusing. 
Its very easy for a good technology to get starved because no one asks how to combine these features in to the framework.


On Jul 29, 2013, at 9:58 AM, Tsuyoshi OZAWA <oz...@gmail.com> wrote:

> I thought some high availability and resource isolation features in
> Mesos are more matured. If no one is interested in this topic, MR
> should go with YARN.
> 
> On Fri, Jul 26, 2013 at 7:14 PM, Harsh J <ha...@cloudera.com> wrote:
>> Do we have a good reason to prefer Mesos over YARN for scheduling MR
>> specifically? At what times would one prefer the other?
>> 
>> On Fri, Jul 26, 2013 at 11:43 AM, Tsuyoshi OZAWA
>> <oz...@gmail.com> wrote:
>>> Hi,
>>> 
>>> Now, Apache Mesos, an distributed resource manager, is top-level
>>> apache project. Meanwhile, As you know, Hadoop has own resource
>>> manager - YARN. IMHO, we should make resource manager pluggable in
>>> MRv2, because there are their own field users of MapReduce would like
>>> to use. I think this work is useful for MapReduce users. On the other
>>> hand, this work can also be large, because MRv2's code base is tightly
>>> coupled with YARN currently. Thoughts?
>>> 
>>> - Tsuyoshi
>> 
>> 
>> 
>> --
>> Harsh J
> 
> 
> 
> -- 
> - Tsuyoshi
> 


Re: Abstraction layer to support both YARN and Mesos

Posted by Michael Segel <ms...@segel.com>.
Actually, 
I am interested. 

Lots of different Apache top level projects seem to overlap and it can be confusing. 
Its very easy for a good technology to get starved because no one asks how to combine these features in to the framework.


On Jul 29, 2013, at 9:58 AM, Tsuyoshi OZAWA <oz...@gmail.com> wrote:

> I thought some high availability and resource isolation features in
> Mesos are more matured. If no one is interested in this topic, MR
> should go with YARN.
> 
> On Fri, Jul 26, 2013 at 7:14 PM, Harsh J <ha...@cloudera.com> wrote:
>> Do we have a good reason to prefer Mesos over YARN for scheduling MR
>> specifically? At what times would one prefer the other?
>> 
>> On Fri, Jul 26, 2013 at 11:43 AM, Tsuyoshi OZAWA
>> <oz...@gmail.com> wrote:
>>> Hi,
>>> 
>>> Now, Apache Mesos, an distributed resource manager, is top-level
>>> apache project. Meanwhile, As you know, Hadoop has own resource
>>> manager - YARN. IMHO, we should make resource manager pluggable in
>>> MRv2, because there are their own field users of MapReduce would like
>>> to use. I think this work is useful for MapReduce users. On the other
>>> hand, this work can also be large, because MRv2's code base is tightly
>>> coupled with YARN currently. Thoughts?
>>> 
>>> - Tsuyoshi
>> 
>> 
>> 
>> --
>> Harsh J
> 
> 
> 
> -- 
> - Tsuyoshi
> 


Re: Abstraction layer to support both YARN and Mesos

Posted by Michael Segel <ms...@segel.com>.
Actually, 
I am interested. 

Lots of different Apache top level projects seem to overlap and it can be confusing. 
Its very easy for a good technology to get starved because no one asks how to combine these features in to the framework.


On Jul 29, 2013, at 9:58 AM, Tsuyoshi OZAWA <oz...@gmail.com> wrote:

> I thought some high availability and resource isolation features in
> Mesos are more matured. If no one is interested in this topic, MR
> should go with YARN.
> 
> On Fri, Jul 26, 2013 at 7:14 PM, Harsh J <ha...@cloudera.com> wrote:
>> Do we have a good reason to prefer Mesos over YARN for scheduling MR
>> specifically? At what times would one prefer the other?
>> 
>> On Fri, Jul 26, 2013 at 11:43 AM, Tsuyoshi OZAWA
>> <oz...@gmail.com> wrote:
>>> Hi,
>>> 
>>> Now, Apache Mesos, an distributed resource manager, is top-level
>>> apache project. Meanwhile, As you know, Hadoop has own resource
>>> manager - YARN. IMHO, we should make resource manager pluggable in
>>> MRv2, because there are their own field users of MapReduce would like
>>> to use. I think this work is useful for MapReduce users. On the other
>>> hand, this work can also be large, because MRv2's code base is tightly
>>> coupled with YARN currently. Thoughts?
>>> 
>>> - Tsuyoshi
>> 
>> 
>> 
>> --
>> Harsh J
> 
> 
> 
> -- 
> - Tsuyoshi
> 


Re: Abstraction layer to support both YARN and Mesos

Posted by Michael Segel <ms...@segel.com>.
Actually, 
I am interested. 

Lots of different Apache top level projects seem to overlap and it can be confusing. 
Its very easy for a good technology to get starved because no one asks how to combine these features in to the framework.


On Jul 29, 2013, at 9:58 AM, Tsuyoshi OZAWA <oz...@gmail.com> wrote:

> I thought some high availability and resource isolation features in
> Mesos are more matured. If no one is interested in this topic, MR
> should go with YARN.
> 
> On Fri, Jul 26, 2013 at 7:14 PM, Harsh J <ha...@cloudera.com> wrote:
>> Do we have a good reason to prefer Mesos over YARN for scheduling MR
>> specifically? At what times would one prefer the other?
>> 
>> On Fri, Jul 26, 2013 at 11:43 AM, Tsuyoshi OZAWA
>> <oz...@gmail.com> wrote:
>>> Hi,
>>> 
>>> Now, Apache Mesos, an distributed resource manager, is top-level
>>> apache project. Meanwhile, As you know, Hadoop has own resource
>>> manager - YARN. IMHO, we should make resource manager pluggable in
>>> MRv2, because there are their own field users of MapReduce would like
>>> to use. I think this work is useful for MapReduce users. On the other
>>> hand, this work can also be large, because MRv2's code base is tightly
>>> coupled with YARN currently. Thoughts?
>>> 
>>> - Tsuyoshi
>> 
>> 
>> 
>> --
>> Harsh J
> 
> 
> 
> -- 
> - Tsuyoshi
> 


Re: Abstraction layer to support both YARN and Mesos

Posted by Tsuyoshi OZAWA <oz...@gmail.com>.
I thought some high availability and resource isolation features in
Mesos are more matured. If no one is interested in this topic, MR
should go with YARN.

On Fri, Jul 26, 2013 at 7:14 PM, Harsh J <ha...@cloudera.com> wrote:
> Do we have a good reason to prefer Mesos over YARN for scheduling MR
> specifically? At what times would one prefer the other?
>
> On Fri, Jul 26, 2013 at 11:43 AM, Tsuyoshi OZAWA
> <oz...@gmail.com> wrote:
>> Hi,
>>
>> Now, Apache Mesos, an distributed resource manager, is top-level
>> apache project. Meanwhile, As you know, Hadoop has own resource
>> manager - YARN. IMHO, we should make resource manager pluggable in
>> MRv2, because there are their own field users of MapReduce would like
>> to use. I think this work is useful for MapReduce users. On the other
>> hand, this work can also be large, because MRv2's code base is tightly
>> coupled with YARN currently. Thoughts?
>>
>> - Tsuyoshi
>
>
>
> --
> Harsh J



-- 
- Tsuyoshi

Re: Abstraction layer to support both YARN and Mesos

Posted by Tsuyoshi OZAWA <oz...@gmail.com>.
I thought some high availability and resource isolation features in
Mesos are more matured. If no one is interested in this topic, MR
should go with YARN.

On Fri, Jul 26, 2013 at 7:14 PM, Harsh J <ha...@cloudera.com> wrote:
> Do we have a good reason to prefer Mesos over YARN for scheduling MR
> specifically? At what times would one prefer the other?
>
> On Fri, Jul 26, 2013 at 11:43 AM, Tsuyoshi OZAWA
> <oz...@gmail.com> wrote:
>> Hi,
>>
>> Now, Apache Mesos, an distributed resource manager, is top-level
>> apache project. Meanwhile, As you know, Hadoop has own resource
>> manager - YARN. IMHO, we should make resource manager pluggable in
>> MRv2, because there are their own field users of MapReduce would like
>> to use. I think this work is useful for MapReduce users. On the other
>> hand, this work can also be large, because MRv2's code base is tightly
>> coupled with YARN currently. Thoughts?
>>
>> - Tsuyoshi
>
>
>
> --
> Harsh J



-- 
- Tsuyoshi

Re: Abstraction layer to support both YARN and Mesos

Posted by Tsuyoshi OZAWA <oz...@gmail.com>.
I thought some high availability and resource isolation features in
Mesos are more matured. If no one is interested in this topic, MR
should go with YARN.

On Fri, Jul 26, 2013 at 7:14 PM, Harsh J <ha...@cloudera.com> wrote:
> Do we have a good reason to prefer Mesos over YARN for scheduling MR
> specifically? At what times would one prefer the other?
>
> On Fri, Jul 26, 2013 at 11:43 AM, Tsuyoshi OZAWA
> <oz...@gmail.com> wrote:
>> Hi,
>>
>> Now, Apache Mesos, an distributed resource manager, is top-level
>> apache project. Meanwhile, As you know, Hadoop has own resource
>> manager - YARN. IMHO, we should make resource manager pluggable in
>> MRv2, because there are their own field users of MapReduce would like
>> to use. I think this work is useful for MapReduce users. On the other
>> hand, this work can also be large, because MRv2's code base is tightly
>> coupled with YARN currently. Thoughts?
>>
>> - Tsuyoshi
>
>
>
> --
> Harsh J



-- 
- Tsuyoshi

Re: Abstraction layer to support both YARN and Mesos

Posted by Tsuyoshi OZAWA <oz...@gmail.com>.
I thought some high availability and resource isolation features in
Mesos are more matured. If no one is interested in this topic, MR
should go with YARN.

On Fri, Jul 26, 2013 at 7:14 PM, Harsh J <ha...@cloudera.com> wrote:
> Do we have a good reason to prefer Mesos over YARN for scheduling MR
> specifically? At what times would one prefer the other?
>
> On Fri, Jul 26, 2013 at 11:43 AM, Tsuyoshi OZAWA
> <oz...@gmail.com> wrote:
>> Hi,
>>
>> Now, Apache Mesos, an distributed resource manager, is top-level
>> apache project. Meanwhile, As you know, Hadoop has own resource
>> manager - YARN. IMHO, we should make resource manager pluggable in
>> MRv2, because there are their own field users of MapReduce would like
>> to use. I think this work is useful for MapReduce users. On the other
>> hand, this work can also be large, because MRv2's code base is tightly
>> coupled with YARN currently. Thoughts?
>>
>> - Tsuyoshi
>
>
>
> --
> Harsh J



-- 
- Tsuyoshi

Re: Abstraction layer to support both YARN and Mesos

Posted by Tsuyoshi OZAWA <oz...@gmail.com>.
I thought some high availability and resource isolation features in
Mesos are more matured. If no one is interested in this topic, MR
should go with YARN.

On Fri, Jul 26, 2013 at 7:14 PM, Harsh J <ha...@cloudera.com> wrote:
> Do we have a good reason to prefer Mesos over YARN for scheduling MR
> specifically? At what times would one prefer the other?
>
> On Fri, Jul 26, 2013 at 11:43 AM, Tsuyoshi OZAWA
> <oz...@gmail.com> wrote:
>> Hi,
>>
>> Now, Apache Mesos, an distributed resource manager, is top-level
>> apache project. Meanwhile, As you know, Hadoop has own resource
>> manager - YARN. IMHO, we should make resource manager pluggable in
>> MRv2, because there are their own field users of MapReduce would like
>> to use. I think this work is useful for MapReduce users. On the other
>> hand, this work can also be large, because MRv2's code base is tightly
>> coupled with YARN currently. Thoughts?
>>
>> - Tsuyoshi
>
>
>
> --
> Harsh J



-- 
- Tsuyoshi

Re: Abstraction layer to support both YARN and Mesos

Posted by Tsuyoshi OZAWA <oz...@gmail.com>.
I thought some high availability and resource isolation features in
Mesos are more matured. If no one is interested in this topic, MR
should go with YARN.

On Fri, Jul 26, 2013 at 7:14 PM, Harsh J <ha...@cloudera.com> wrote:
> Do we have a good reason to prefer Mesos over YARN for scheduling MR
> specifically? At what times would one prefer the other?
>
> On Fri, Jul 26, 2013 at 11:43 AM, Tsuyoshi OZAWA
> <oz...@gmail.com> wrote:
>> Hi,
>>
>> Now, Apache Mesos, an distributed resource manager, is top-level
>> apache project. Meanwhile, As you know, Hadoop has own resource
>> manager - YARN. IMHO, we should make resource manager pluggable in
>> MRv2, because there are their own field users of MapReduce would like
>> to use. I think this work is useful for MapReduce users. On the other
>> hand, this work can also be large, because MRv2's code base is tightly
>> coupled with YARN currently. Thoughts?
>>
>> - Tsuyoshi
>
>
>
> --
> Harsh J



-- 
- Tsuyoshi

Re: Abstraction layer to support both YARN and Mesos

Posted by Tsuyoshi OZAWA <oz...@gmail.com>.
I thought some high availability and resource isolation features in
Mesos are more matured. If no one is interested in this topic, MR
should go with YARN.

On Fri, Jul 26, 2013 at 7:14 PM, Harsh J <ha...@cloudera.com> wrote:
> Do we have a good reason to prefer Mesos over YARN for scheduling MR
> specifically? At what times would one prefer the other?
>
> On Fri, Jul 26, 2013 at 11:43 AM, Tsuyoshi OZAWA
> <oz...@gmail.com> wrote:
>> Hi,
>>
>> Now, Apache Mesos, an distributed resource manager, is top-level
>> apache project. Meanwhile, As you know, Hadoop has own resource
>> manager - YARN. IMHO, we should make resource manager pluggable in
>> MRv2, because there are their own field users of MapReduce would like
>> to use. I think this work is useful for MapReduce users. On the other
>> hand, this work can also be large, because MRv2's code base is tightly
>> coupled with YARN currently. Thoughts?
>>
>> - Tsuyoshi
>
>
>
> --
> Harsh J



-- 
- Tsuyoshi

Re: Abstraction layer to support both YARN and Mesos

Posted by Harsh J <ha...@cloudera.com>.
Do we have a good reason to prefer Mesos over YARN for scheduling MR
specifically? At what times would one prefer the other?

On Fri, Jul 26, 2013 at 11:43 AM, Tsuyoshi OZAWA
<oz...@gmail.com> wrote:
> Hi,
>
> Now, Apache Mesos, an distributed resource manager, is top-level
> apache project. Meanwhile, As you know, Hadoop has own resource
> manager - YARN. IMHO, we should make resource manager pluggable in
> MRv2, because there are their own field users of MapReduce would like
> to use. I think this work is useful for MapReduce users. On the other
> hand, this work can also be large, because MRv2's code base is tightly
> coupled with YARN currently. Thoughts?
>
> - Tsuyoshi



-- 
Harsh J

Fwd: Abstraction layer to support both YARN and Mesos

Posted by Tim St Clair <ts...@redhat.com>.
In case folks are not on the other lists.  I saw this and figured they may be further interest. 

Cheers,
Tim

----- Forwarded Message -----
> From: "Vinod Kumar Vavilapalli" <vi...@apache.org>
> To: yarn-dev@hadoop.apache.org
> Cc: "mapreduce-dev" <ma...@hadoop.apache.org>
> Sent: Wednesday, July 31, 2013 11:45:31 AM
> Subject: Re: Abstraction layer to support both YARN and Mesos
> 
> 
> What I thought was the original proposal was to use the existing MR
> client+AM+task code to run on top of Mesos. And like Steve mentioned, today
> all of it is very tightly couple with YARN APIs. Using JobClient against a
> Mesos implementation of MapReduce is easy, changing AM to start getting
> containers from Mesos and launching via Mesos needs more abstractions. And
> at this point of time, again as Steve laid it out clearly, the focus of
> MapReduce project is on stabilizing and shipping together with YARN.
> 
> That said, working on thinking about those abstractions inside MR AM is a
> step forward IF there is enough interest around this. I see a couple of
> people already showing enthusiasm, but it'll be great to see more interest.
> May be a few from Mesos community who understand what those abstractions
> should look like.
> 
> The last thing we want is create unnecessary abstractions now that may never
> get used in the future.
> 
> Thanks,
> +Vinod
> 
> On Jul 31, 2013, at 9:34 AM, Bikas Saha wrote:
> 
> > +1 for Tom's suggestion. That is how we have transparently redirected MR
> > jobs to use Tez as the execution framework.
> > 
> > Bikas
> > 
> > -----Original Message-----
> > From: Tom White [mailto:tom@cloudera.com]
> > Sent: Wednesday, July 31, 2013 8:41 AM
> > To: mapreduce-dev
> > Cc: yarn-dev@hadoop.apache.org
> > Subject: Re: Abstraction layer to support both YARN and Mesos
> > 
> > I can see value in this, since it would allow MR programs and libraries to
> > run on either YARN or Mesos with no recompilation. The value here is
> > really in the libraries since it means library maintainers don't have to
> > maintain two versions of their library.
> > 
> > Note that there is no extra level of indirection required - it's already
> > there in org.apache.hadoop.mapreduce.protocol.ClientProtocolProvider -
> > which is used to switch between submitting jobs to the JobTracker and
> > submitting to YARN's RM. A MesosClientProtocolProvider might be hosted in
> > Mesos - perhaps Mesos developers are already working on this?
> > 
> > Cheers,
> > Tom
> > 
> > On Wed, Jul 31, 2013 at 4:30 PM, Steve Loughran <st...@hortonworks.com>
> > wrote:
> >> On 26 July 2013 07:13, Tsuyoshi OZAWA <oz...@gmail.com> wrote:
> >> 
> >>> Hi,
> >>> 
> >>> Now, Apache Mesos, an distributed resource manager, is top-level
> >>> apache project. Meanwhile, As you know, Hadoop has own resource
> >>> manager - YARN. IMHO, we should make resource manager pluggable in
> >>> MRv2, because there are their own field users of MapReduce would like
> >>> to use. I think this work is useful for MapReduce users. On the other
> >>> hand, this work can also be large, because MRv2's code base is
> >>> tightly coupled with YARN currently. Thoughts?
> >>> 
> >> 
> >> MRv2 is too intimately involved with Hadoop for it to easily be moved,
> >> have a look at the mapreduce package code base to see this. We are
> >> also developing and currently releasing them in sync.
> >> 
> >> Yes, an extra layer of indirection may appear to get MR to work on
> >> Mesos -but things like locality, ongoing dev YARN APIs &c and the
> >> release schedule would push for MRv2 to focus on YARN: data aware job
> >> (and service) scheduling in Hadoop clusters.
> >> 
> >> As an example of how those layers of indirection cause problems, look
> >> at commons-logging. Ubiquitous as the API in front of Log4J, when
> >> using raw Log4J would have been better (look in the hadoop tests code
> >> where the underlying logger is explicitly extracted and tuned for
> > examples).
> 
> 

Re: Abstraction layer to support both YARN and Mesos

Posted by Vinod Kumar Vavilapalli <vi...@apache.org>.
What I thought was the original proposal was to use the existing MR client+AM+task code to run on top of Mesos. And like Steve mentioned, today all of it is very tightly couple with YARN APIs. Using JobClient against a Mesos implementation of MapReduce is easy, changing AM to start getting containers from Mesos and launching via Mesos needs more abstractions. And at this point of time, again as Steve laid it out clearly, the focus of MapReduce project is on stabilizing and shipping together with YARN.

That said, working on thinking about those abstractions inside MR AM is a step forward IF there is enough interest around this. I see a couple of people already showing enthusiasm, but it'll be great to see more interest. May be a few from Mesos community who understand what those abstractions should look like.

The last thing we want is create unnecessary abstractions now that may never get used in the future.

Thanks,
+Vinod

On Jul 31, 2013, at 9:34 AM, Bikas Saha wrote:

> +1 for Tom's suggestion. That is how we have transparently redirected MR
> jobs to use Tez as the execution framework.
> 
> Bikas
> 
> -----Original Message-----
> From: Tom White [mailto:tom@cloudera.com]
> Sent: Wednesday, July 31, 2013 8:41 AM
> To: mapreduce-dev
> Cc: yarn-dev@hadoop.apache.org
> Subject: Re: Abstraction layer to support both YARN and Mesos
> 
> I can see value in this, since it would allow MR programs and libraries to
> run on either YARN or Mesos with no recompilation. The value here is
> really in the libraries since it means library maintainers don't have to
> maintain two versions of their library.
> 
> Note that there is no extra level of indirection required - it's already
> there in org.apache.hadoop.mapreduce.protocol.ClientProtocolProvider -
> which is used to switch between submitting jobs to the JobTracker and
> submitting to YARN's RM. A MesosClientProtocolProvider might be hosted in
> Mesos - perhaps Mesos developers are already working on this?
> 
> Cheers,
> Tom
> 
> On Wed, Jul 31, 2013 at 4:30 PM, Steve Loughran <st...@hortonworks.com>
> wrote:
>> On 26 July 2013 07:13, Tsuyoshi OZAWA <oz...@gmail.com> wrote:
>> 
>>> Hi,
>>> 
>>> Now, Apache Mesos, an distributed resource manager, is top-level
>>> apache project. Meanwhile, As you know, Hadoop has own resource
>>> manager - YARN. IMHO, we should make resource manager pluggable in
>>> MRv2, because there are their own field users of MapReduce would like
>>> to use. I think this work is useful for MapReduce users. On the other
>>> hand, this work can also be large, because MRv2's code base is
>>> tightly coupled with YARN currently. Thoughts?
>>> 
>> 
>> MRv2 is too intimately involved with Hadoop for it to easily be moved,
>> have a look at the mapreduce package code base to see this. We are
>> also developing and currently releasing them in sync.
>> 
>> Yes, an extra layer of indirection may appear to get MR to work on
>> Mesos -but things like locality, ongoing dev YARN APIs &c and the
>> release schedule would push for MRv2 to focus on YARN: data aware job
>> (and service) scheduling in Hadoop clusters.
>> 
>> As an example of how those layers of indirection cause problems, look
>> at commons-logging. Ubiquitous as the API in front of Log4J, when
>> using raw Log4J would have been better (look in the hadoop tests code
>> where the underlying logger is explicitly extracted and tuned for
> examples).


Re: Abstraction layer to support both YARN and Mesos

Posted by Vinod Kumar Vavilapalli <vi...@apache.org>.
What I thought was the original proposal was to use the existing MR client+AM+task code to run on top of Mesos. And like Steve mentioned, today all of it is very tightly couple with YARN APIs. Using JobClient against a Mesos implementation of MapReduce is easy, changing AM to start getting containers from Mesos and launching via Mesos needs more abstractions. And at this point of time, again as Steve laid it out clearly, the focus of MapReduce project is on stabilizing and shipping together with YARN.

That said, working on thinking about those abstractions inside MR AM is a step forward IF there is enough interest around this. I see a couple of people already showing enthusiasm, but it'll be great to see more interest. May be a few from Mesos community who understand what those abstractions should look like.

The last thing we want is create unnecessary abstractions now that may never get used in the future.

Thanks,
+Vinod

On Jul 31, 2013, at 9:34 AM, Bikas Saha wrote:

> +1 for Tom's suggestion. That is how we have transparently redirected MR
> jobs to use Tez as the execution framework.
> 
> Bikas
> 
> -----Original Message-----
> From: Tom White [mailto:tom@cloudera.com]
> Sent: Wednesday, July 31, 2013 8:41 AM
> To: mapreduce-dev
> Cc: yarn-dev@hadoop.apache.org
> Subject: Re: Abstraction layer to support both YARN and Mesos
> 
> I can see value in this, since it would allow MR programs and libraries to
> run on either YARN or Mesos with no recompilation. The value here is
> really in the libraries since it means library maintainers don't have to
> maintain two versions of their library.
> 
> Note that there is no extra level of indirection required - it's already
> there in org.apache.hadoop.mapreduce.protocol.ClientProtocolProvider -
> which is used to switch between submitting jobs to the JobTracker and
> submitting to YARN's RM. A MesosClientProtocolProvider might be hosted in
> Mesos - perhaps Mesos developers are already working on this?
> 
> Cheers,
> Tom
> 
> On Wed, Jul 31, 2013 at 4:30 PM, Steve Loughran <st...@hortonworks.com>
> wrote:
>> On 26 July 2013 07:13, Tsuyoshi OZAWA <oz...@gmail.com> wrote:
>> 
>>> Hi,
>>> 
>>> Now, Apache Mesos, an distributed resource manager, is top-level
>>> apache project. Meanwhile, As you know, Hadoop has own resource
>>> manager - YARN. IMHO, we should make resource manager pluggable in
>>> MRv2, because there are their own field users of MapReduce would like
>>> to use. I think this work is useful for MapReduce users. On the other
>>> hand, this work can also be large, because MRv2's code base is
>>> tightly coupled with YARN currently. Thoughts?
>>> 
>> 
>> MRv2 is too intimately involved with Hadoop for it to easily be moved,
>> have a look at the mapreduce package code base to see this. We are
>> also developing and currently releasing them in sync.
>> 
>> Yes, an extra layer of indirection may appear to get MR to work on
>> Mesos -but things like locality, ongoing dev YARN APIs &c and the
>> release schedule would push for MRv2 to focus on YARN: data aware job
>> (and service) scheduling in Hadoop clusters.
>> 
>> As an example of how those layers of indirection cause problems, look
>> at commons-logging. Ubiquitous as the API in front of Log4J, when
>> using raw Log4J would have been better (look in the hadoop tests code
>> where the underlying logger is explicitly extracted and tuned for
> examples).


RE: Abstraction layer to support both YARN and Mesos

Posted by Bikas Saha <bi...@hortonworks.com>.
+1 for Tom's suggestion. That is how we have transparently redirected MR
jobs to use Tez as the execution framework.

Bikas

-----Original Message-----
From: Tom White [mailto:tom@cloudera.com]
Sent: Wednesday, July 31, 2013 8:41 AM
To: mapreduce-dev
Cc: yarn-dev@hadoop.apache.org
Subject: Re: Abstraction layer to support both YARN and Mesos

I can see value in this, since it would allow MR programs and libraries to
run on either YARN or Mesos with no recompilation. The value here is
really in the libraries since it means library maintainers don't have to
maintain two versions of their library.

Note that there is no extra level of indirection required - it's already
there in org.apache.hadoop.mapreduce.protocol.ClientProtocolProvider -
which is used to switch between submitting jobs to the JobTracker and
submitting to YARN's RM. A MesosClientProtocolProvider might be hosted in
Mesos - perhaps Mesos developers are already working on this?

Cheers,
Tom

On Wed, Jul 31, 2013 at 4:30 PM, Steve Loughran <st...@hortonworks.com>
wrote:
> On 26 July 2013 07:13, Tsuyoshi OZAWA <oz...@gmail.com> wrote:
>
>> Hi,
>>
>> Now, Apache Mesos, an distributed resource manager, is top-level
>> apache project. Meanwhile, As you know, Hadoop has own resource
>> manager - YARN. IMHO, we should make resource manager pluggable in
>> MRv2, because there are their own field users of MapReduce would like
>> to use. I think this work is useful for MapReduce users. On the other
>> hand, this work can also be large, because MRv2's code base is
>> tightly coupled with YARN currently. Thoughts?
>>
>
> MRv2 is too intimately involved with Hadoop for it to easily be moved,
> have a look at the mapreduce package code base to see this. We are
> also developing and currently releasing them in sync.
>
> Yes, an extra layer of indirection may appear to get MR to work on
> Mesos -but things like locality, ongoing dev YARN APIs &c and the
> release schedule would push for MRv2 to focus on YARN: data aware job
> (and service) scheduling in Hadoop clusters.
>
> As an example of how those layers of indirection cause problems, look
> at commons-logging. Ubiquitous as the API in front of Log4J, when
> using raw Log4J would have been better (look in the hadoop tests code
> where the underlying logger is explicitly extracted and tuned for
examples).

RE: Abstraction layer to support both YARN and Mesos

Posted by Bikas Saha <bi...@hortonworks.com>.
+1 for Tom's suggestion. That is how we have transparently redirected MR
jobs to use Tez as the execution framework.

Bikas

-----Original Message-----
From: Tom White [mailto:tom@cloudera.com]
Sent: Wednesday, July 31, 2013 8:41 AM
To: mapreduce-dev
Cc: yarn-dev@hadoop.apache.org
Subject: Re: Abstraction layer to support both YARN and Mesos

I can see value in this, since it would allow MR programs and libraries to
run on either YARN or Mesos with no recompilation. The value here is
really in the libraries since it means library maintainers don't have to
maintain two versions of their library.

Note that there is no extra level of indirection required - it's already
there in org.apache.hadoop.mapreduce.protocol.ClientProtocolProvider -
which is used to switch between submitting jobs to the JobTracker and
submitting to YARN's RM. A MesosClientProtocolProvider might be hosted in
Mesos - perhaps Mesos developers are already working on this?

Cheers,
Tom

On Wed, Jul 31, 2013 at 4:30 PM, Steve Loughran <st...@hortonworks.com>
wrote:
> On 26 July 2013 07:13, Tsuyoshi OZAWA <oz...@gmail.com> wrote:
>
>> Hi,
>>
>> Now, Apache Mesos, an distributed resource manager, is top-level
>> apache project. Meanwhile, As you know, Hadoop has own resource
>> manager - YARN. IMHO, we should make resource manager pluggable in
>> MRv2, because there are their own field users of MapReduce would like
>> to use. I think this work is useful for MapReduce users. On the other
>> hand, this work can also be large, because MRv2's code base is
>> tightly coupled with YARN currently. Thoughts?
>>
>
> MRv2 is too intimately involved with Hadoop for it to easily be moved,
> have a look at the mapreduce package code base to see this. We are
> also developing and currently releasing them in sync.
>
> Yes, an extra layer of indirection may appear to get MR to work on
> Mesos -but things like locality, ongoing dev YARN APIs &c and the
> release schedule would push for MRv2 to focus on YARN: data aware job
> (and service) scheduling in Hadoop clusters.
>
> As an example of how those layers of indirection cause problems, look
> at commons-logging. Ubiquitous as the API in front of Log4J, when
> using raw Log4J would have been better (look in the hadoop tests code
> where the underlying logger is explicitly extracted and tuned for
examples).

Re: Abstraction layer to support both YARN and Mesos

Posted by Tom White <to...@cloudera.com>.
I can see value in this, since it would allow MR programs and
libraries to run on either YARN or Mesos with no recompilation. The
value here is really in the libraries since it means library
maintainers don't have to maintain two versions of their library.

Note that there is no extra level of indirection required - it's
already there in
org.apache.hadoop.mapreduce.protocol.ClientProtocolProvider - which is
used to switch between submitting jobs to the JobTracker and
submitting to YARN's RM. A MesosClientProtocolProvider might be hosted
in Mesos - perhaps Mesos developers are already working on this?

Cheers,
Tom

On Wed, Jul 31, 2013 at 4:30 PM, Steve Loughran <st...@hortonworks.com> wrote:
> On 26 July 2013 07:13, Tsuyoshi OZAWA <oz...@gmail.com> wrote:
>
>> Hi,
>>
>> Now, Apache Mesos, an distributed resource manager, is top-level
>> apache project. Meanwhile, As you know, Hadoop has own resource
>> manager - YARN. IMHO, we should make resource manager pluggable in
>> MRv2, because there are their own field users of MapReduce would like
>> to use. I think this work is useful for MapReduce users. On the other
>> hand, this work can also be large, because MRv2's code base is tightly
>> coupled with YARN currently. Thoughts?
>>
>
> MRv2 is too intimately involved with Hadoop for it to easily be moved, have
> a look at the mapreduce package code base to see this. We are also
> developing and currently releasing them in sync.
>
> Yes, an extra layer of indirection may appear to get MR to work on Mesos
> -but things like locality, ongoing dev YARN APIs &c and the release
> schedule would push for MRv2 to focus on YARN: data aware job (and service)
> scheduling in Hadoop clusters.
>
> As an example of how those layers of indirection cause problems, look at
> commons-logging. Ubiquitous as the API in front of Log4J, when using raw
> Log4J would have been better (look in the hadoop tests code where the
> underlying logger is explicitly extracted and tuned for examples).

Re: Abstraction layer to support both YARN and Mesos

Posted by Tom White <to...@cloudera.com>.
I can see value in this, since it would allow MR programs and
libraries to run on either YARN or Mesos with no recompilation. The
value here is really in the libraries since it means library
maintainers don't have to maintain two versions of their library.

Note that there is no extra level of indirection required - it's
already there in
org.apache.hadoop.mapreduce.protocol.ClientProtocolProvider - which is
used to switch between submitting jobs to the JobTracker and
submitting to YARN's RM. A MesosClientProtocolProvider might be hosted
in Mesos - perhaps Mesos developers are already working on this?

Cheers,
Tom

On Wed, Jul 31, 2013 at 4:30 PM, Steve Loughran <st...@hortonworks.com> wrote:
> On 26 July 2013 07:13, Tsuyoshi OZAWA <oz...@gmail.com> wrote:
>
>> Hi,
>>
>> Now, Apache Mesos, an distributed resource manager, is top-level
>> apache project. Meanwhile, As you know, Hadoop has own resource
>> manager - YARN. IMHO, we should make resource manager pluggable in
>> MRv2, because there are their own field users of MapReduce would like
>> to use. I think this work is useful for MapReduce users. On the other
>> hand, this work can also be large, because MRv2's code base is tightly
>> coupled with YARN currently. Thoughts?
>>
>
> MRv2 is too intimately involved with Hadoop for it to easily be moved, have
> a look at the mapreduce package code base to see this. We are also
> developing and currently releasing them in sync.
>
> Yes, an extra layer of indirection may appear to get MR to work on Mesos
> -but things like locality, ongoing dev YARN APIs &c and the release
> schedule would push for MRv2 to focus on YARN: data aware job (and service)
> scheduling in Hadoop clusters.
>
> As an example of how those layers of indirection cause problems, look at
> commons-logging. Ubiquitous as the API in front of Log4J, when using raw
> Log4J would have been better (look in the hadoop tests code where the
> underlying logger is explicitly extracted and tuned for examples).

Re: Abstraction layer to support both YARN and Mesos

Posted by Steve Loughran <st...@hortonworks.com>.
On 26 July 2013 07:13, Tsuyoshi OZAWA <oz...@gmail.com> wrote:

> Hi,
>
> Now, Apache Mesos, an distributed resource manager, is top-level
> apache project. Meanwhile, As you know, Hadoop has own resource
> manager - YARN. IMHO, we should make resource manager pluggable in
> MRv2, because there are their own field users of MapReduce would like
> to use. I think this work is useful for MapReduce users. On the other
> hand, this work can also be large, because MRv2's code base is tightly
> coupled with YARN currently. Thoughts?
>

MRv2 is too intimately involved with Hadoop for it to easily be moved, have
a look at the mapreduce package code base to see this. We are also
developing and currently releasing them in sync.

Yes, an extra layer of indirection may appear to get MR to work on Mesos
-but things like locality, ongoing dev YARN APIs &c and the release
schedule would push for MRv2 to focus on YARN: data aware job (and service)
scheduling in Hadoop clusters.

As an example of how those layers of indirection cause problems, look at
commons-logging. Ubiquitous as the API in front of Log4J, when using raw
Log4J would have been better (look in the hadoop tests code where the
underlying logger is explicitly extracted and tuned for examples).

Re: Abstraction layer to support both YARN and Mesos

Posted by Harsh J <ha...@cloudera.com>.
Do we have a good reason to prefer Mesos over YARN for scheduling MR
specifically? At what times would one prefer the other?

On Fri, Jul 26, 2013 at 11:43 AM, Tsuyoshi OZAWA
<oz...@gmail.com> wrote:
> Hi,
>
> Now, Apache Mesos, an distributed resource manager, is top-level
> apache project. Meanwhile, As you know, Hadoop has own resource
> manager - YARN. IMHO, we should make resource manager pluggable in
> MRv2, because there are their own field users of MapReduce would like
> to use. I think this work is useful for MapReduce users. On the other
> hand, this work can also be large, because MRv2's code base is tightly
> coupled with YARN currently. Thoughts?
>
> - Tsuyoshi



-- 
Harsh J

Re: Abstraction layer to support both YARN and Mesos

Posted by Steve Loughran <st...@hortonworks.com>.
On 26 July 2013 07:13, Tsuyoshi OZAWA <oz...@gmail.com> wrote:

> Hi,
>
> Now, Apache Mesos, an distributed resource manager, is top-level
> apache project. Meanwhile, As you know, Hadoop has own resource
> manager - YARN. IMHO, we should make resource manager pluggable in
> MRv2, because there are their own field users of MapReduce would like
> to use. I think this work is useful for MapReduce users. On the other
> hand, this work can also be large, because MRv2's code base is tightly
> coupled with YARN currently. Thoughts?
>

MRv2 is too intimately involved with Hadoop for it to easily be moved, have
a look at the mapreduce package code base to see this. We are also
developing and currently releasing them in sync.

Yes, an extra layer of indirection may appear to get MR to work on Mesos
-but things like locality, ongoing dev YARN APIs &c and the release
schedule would push for MRv2 to focus on YARN: data aware job (and service)
scheduling in Hadoop clusters.

As an example of how those layers of indirection cause problems, look at
commons-logging. Ubiquitous as the API in front of Log4J, when using raw
Log4J would have been better (look in the hadoop tests code where the
underlying logger is explicitly extracted and tuned for examples).

Re: Abstraction layer to support both YARN and Mesos

Posted by Harsh J <ha...@cloudera.com>.
Do we have a good reason to prefer Mesos over YARN for scheduling MR
specifically? At what times would one prefer the other?

On Fri, Jul 26, 2013 at 11:43 AM, Tsuyoshi OZAWA
<oz...@gmail.com> wrote:
> Hi,
>
> Now, Apache Mesos, an distributed resource manager, is top-level
> apache project. Meanwhile, As you know, Hadoop has own resource
> manager - YARN. IMHO, we should make resource manager pluggable in
> MRv2, because there are their own field users of MapReduce would like
> to use. I think this work is useful for MapReduce users. On the other
> hand, this work can also be large, because MRv2's code base is tightly
> coupled with YARN currently. Thoughts?
>
> - Tsuyoshi



-- 
Harsh J

Re: Abstraction layer to support both YARN and Mesos

Posted by Harsh J <ha...@cloudera.com>.
Do we have a good reason to prefer Mesos over YARN for scheduling MR
specifically? At what times would one prefer the other?

On Fri, Jul 26, 2013 at 11:43 AM, Tsuyoshi OZAWA
<oz...@gmail.com> wrote:
> Hi,
>
> Now, Apache Mesos, an distributed resource manager, is top-level
> apache project. Meanwhile, As you know, Hadoop has own resource
> manager - YARN. IMHO, we should make resource manager pluggable in
> MRv2, because there are their own field users of MapReduce would like
> to use. I think this work is useful for MapReduce users. On the other
> hand, this work can also be large, because MRv2's code base is tightly
> coupled with YARN currently. Thoughts?
>
> - Tsuyoshi



-- 
Harsh J

Re: Abstraction layer to support both YARN and Mesos

Posted by Harsh J <ha...@cloudera.com>.
Do we have a good reason to prefer Mesos over YARN for scheduling MR
specifically? At what times would one prefer the other?

On Fri, Jul 26, 2013 at 11:43 AM, Tsuyoshi OZAWA
<oz...@gmail.com> wrote:
> Hi,
>
> Now, Apache Mesos, an distributed resource manager, is top-level
> apache project. Meanwhile, As you know, Hadoop has own resource
> manager - YARN. IMHO, we should make resource manager pluggable in
> MRv2, because there are their own field users of MapReduce would like
> to use. I think this work is useful for MapReduce users. On the other
> hand, this work can also be large, because MRv2's code base is tightly
> coupled with YARN currently. Thoughts?
>
> - Tsuyoshi



-- 
Harsh J

Re: Abstraction layer to support both YARN and Mesos

Posted by Harsh J <ha...@cloudera.com>.
Do we have a good reason to prefer Mesos over YARN for scheduling MR
specifically? At what times would one prefer the other?

On Fri, Jul 26, 2013 at 11:43 AM, Tsuyoshi OZAWA
<oz...@gmail.com> wrote:
> Hi,
>
> Now, Apache Mesos, an distributed resource manager, is top-level
> apache project. Meanwhile, As you know, Hadoop has own resource
> manager - YARN. IMHO, we should make resource manager pluggable in
> MRv2, because there are their own field users of MapReduce would like
> to use. I think this work is useful for MapReduce users. On the other
> hand, this work can also be large, because MRv2's code base is tightly
> coupled with YARN currently. Thoughts?
>
> - Tsuyoshi



-- 
Harsh J