You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mesos.apache.org by Luyi Wang <wa...@gmail.com> on 2014/07/25 20:45:36 UTC

Does Mesos support Hadoop MR V2

Checked the mesos github(https://github.com/mesos/hadoop). It listed
support for MapReduce V1

How about the MR V2?

Right now we are using cloudera to manage hadoop clusters where uses MRV2.
We are planning to migrate all our services to mesos(still in the initial
investigating stage).  Good suggestions, advice and experiences are
welcomed.

Thanks a lot!


-Luyi.

Re: Does Mesos support Hadoop MR V2

Posted by Luyi Wang <wa...@gmail.com>.
Hey Tom:

Really nice to get your reply.  I am also looking forward that. If your
progress can be shared, that would be great.  I will also look into this.
Will report back if I got any progress.

Thanks a lot!



-Luyi.





On Fri, Jul 25, 2014 at 11:54 AM, Tom Arnfeld <to...@duedil.com> wrote:

> Hey Luyi,
>
> That's correct, the Hadoop framework currently only supports Hadoop 2
> MRv1. It also doesn't have great support for the HA jobtracker available in
> newer versions of Hadoop, but I've been working on that the past few weeks.
>
> I'm not sure how Hadoop 2 would play with Mesos, but very interested to
> find out more. Am I correct in thinking MRv2 will only run on top of YARN?
>
> I wonder if anyone else on the mailing list is running YARN on top of
> Mesos...
>
> Tom.
>
>
> On Friday, 25 July 2014, Luyi Wang <wa...@gmail.com> wrote:
>
>> Checked the mesos github(https://github.com/mesos/hadoop). It listed
>> support for MapReduce V1
>>
>> How about the MR V2?
>>
>> Right now we are using cloudera to manage hadoop clusters where uses
>> MRV2. We are planning to migrate all our services to mesos(still in the
>> initial investigating stage).  Good suggestions, advice and experiences are
>> welcomed.
>>
>> Thanks a lot!
>>
>>
>> -Luyi.
>>
>>
>>
>>

Re: Does Mesos support Hadoop MR V2

Posted by Tim St Clair <ts...@redhat.com>.
FWIW - I've been able to setup YARN clusters in containers using bridged networking and relying on zookeeper for Namenode resolution.

The problem is - 'bridged networking' = IP-fiasco at scale. 

However, I still have hopes for MAC VLAN support for Docker: https://groups.google.com/forum/#!topic/docker-dev/6tt1y9FTWKg 

Cheers,
Tim

----- Original Message -----
> From: "Jie Yu" <yu...@gmail.com>
> To: user@mesos.apache.org
> Cc: "mesos" <de...@mesos.apache.org>
> Sent: Saturday, July 26, 2014 5:19:28 PM
> Subject: Re: Does Mesos support Hadoop MR V2
> 
> I am interested in investigating the idea of YARN on top of Mesos. One of
> the benefits I can think of is that we can get rid of the static resource
> allocation between YARN and Mesos clusters. In that way, Mesos can allocate
> those resources that are not used by YARN to other Mesos frameworks like
> Aurora, Marathon, etc, to increase the resource utilization of the entire
> data center. Also, we could avoid running each MRv2 job as a framework
> which I think might cause some maintenance complexity (e.g. for framework
> rate limiting, etc). Finally, YARN currently does not have a good isolation
> support. It only supports cpu isolation right now (using cgroups). By
> porting YARN on top of Mesos, we might be able to leverage the existing
> Mesos containerizer strategy to provide better isolation between tasks.
> Maxime, I am curious why do you think it does not make sense to run YARN
> over Mesos? Since I am not super familar with YARN, I might be missing
> something.
> 
> I have been thinking of making ResourceManager in YARN a Mesos framework
> and making NodeManager a Mesos executor. The NodeManager will launch
> containers using primitives provided by Mesos so that we have a consistent
> containerizer layer. I haven't fully figured out how this could be done yet
> (e.g., nested containers, communication between NodeManager and
> ResourceManager, etc.), but I would love to explore this direction. I would
> like to hear about any feedback/suggestions you guys have about this
> direction.
> 
> Thanks,
> - Jie
> 
> 
> On Fri, Jul 25, 2014 at 1:39 PM, Maxime Brugidou <ma...@gmail.com>
> wrote:
> 
> > We run both mesos and yarn in prod and it does not make sense to run yarn
> > over mesos.
> >
> > However it would be interesting to find a way to run MRv2 jobs on mesos
> > with some custom layer to swap yarn with mesos. Not sure how to start
> > though... MRv2 contains a yarn application master that needs to be
> > rewritten as a mesos framework scheduler. This is probably doable. However
> > with MRv2 every map reduce job would be mapped as a new framework in Mesos.
> > Not sure how many frameworks mesos can run and scale up to. Especially
> > short lived frameworks.
> > On Jul 25, 2014 8:54 PM, "Tom Arnfeld" <to...@duedil.com> wrote:
> >
> >> Hey Luyi,
> >>
> >> That's correct, the Hadoop framework currently only supports Hadoop 2
> >> MRv1. It also doesn't have great support for the HA jobtracker available
> >> in
> >> newer versions of Hadoop, but I've been working on that the past few
> >> weeks.
> >>
> >> I'm not sure how Hadoop 2 would play with Mesos, but very interested to
> >> find out more. Am I correct in thinking MRv2 will only run on top of YARN?
> >>
> >> I wonder if anyone else on the mailing list is running YARN on top of
> >> Mesos...
> >>
> >> Tom.
> >>
> >> On Friday, 25 July 2014, Luyi Wang <wa...@gmail.com> wrote:
> >>
> >>> Checked the mesos github(https://github.com/mesos/hadoop). It listed
> >>> support for MapReduce V1
> >>>
> >>> How about the MR V2?
> >>>
> >>> Right now we are using cloudera to manage hadoop clusters where uses
> >>> MRV2. We are planning to migrate all our services to mesos(still in the
> >>> initial investigating stage).  Good suggestions, advice and experiences
> >>> are
> >>> welcomed.
> >>>
> >>> Thanks a lot!
> >>>
> >>>
> >>> -Luyi.
> >>>
> >>>
> >>>
> >>>
> 

-- 
Cheers,
Timothy St. Clair
Red Hat Inc.

Re: Does Mesos support Hadoop MR V2

Posted by Tom Arnfeld <to...@duedil.com>.
Thanks for the explanation John, that's very useful. I wasn't aware each
"job" in MRv2 was considered it's own entity to the scheduler, that's
interesting... I think Maxime's point about some kind of hadoop compatible
framework would work well, it sounds to me like the
Framework<>Executor<>Task flow might fit well here, perhaps? Is there any
reason an executor couldn't register a framework in Mesos?


On 28 July 2014 01:44, Luyi Wang <wa...@gmail.com> wrote:

> I second john's opinion on the confusing part of different terminology of
> hadoop v2.  That's the reason I asked the question on if mesos support mr
> v2.  As maxime's concern, the decoupling part might be difficult.  After
> reading the mesos mrv1's implementation, I think possibly mrv2 migration
> can be done as if not touching anything related with resource manger(Yarn).
>     Need more time to investigating more on this complication.
>
>
>
> -Luyi.
>
>
>
>
>
> On Sun, Jul 27, 2014 at 10:40 AM, Maxime Brugidou <
> maxime.brugidou@gmail.com> wrote:
>
>> John, i believe that you are 100% correct. Theoretically we should run
>> MRv2 on Mesos but the current implementation of MRv2 on Yarn seem very
>> complex and difficult to decouple from the resource manager/negotiator.
>>
>> It's still something that could be done I guess but maybe as completely
>> independent Hadoop-compatible map reduce framework for Mesos. You could
>> write this from scratch with a custom framework inspired by the MRv2 app
>> master implementation.
>>  On Jul 27, 2014 7:00 PM, "John Omernik" <jo...@omernik.com> wrote:
>>
>>> So excuse my naivety in this space, but my ignorance has never really
>>> stopped me from asking questions:
>>>
>>> I see YARN (Yet another resource negotiator) as very similar to Mesos.
>>> I.e. something to manage resources on a cluster of machines. So when I hear
>>> talk of running "YARN" on Mesos it's seems very redundant indeed, and I ask
>>> myself, what are we actually getting out of this setup?
>>>
>>> So, going to the mapr/reduce question, I see Mapr Reduce V1 and
>>> MaprReduce V2 like this:  Map Reduce V2 is an application that runs on
>>> YARN. I.e. if you run a job, it creates an application master, that
>>> application master requests resources, and the job gets run.  It differs
>>> from Map Reduce V1 is there is no long running Job Tracker (other than the
>>> YARN Resource Manager, but that is managing resources for all applications,
>>> not just Map Reduce Applications).  Ok, so Mesos, why can't there be a
>>> Mesos Application that is similar to a Map Reduce V2 Application in YARN?
>>>  Why do we need to run YARN on Mesos? That doesn't really make sense.
>>>  Basically, for M/R V2 vs M/R V1, the only difference is to mimic M/R V1 we
>>> need task trackers and job trackers running as Mesos applications (which we
>>> have).  So in M/R v2, we just need the equivalent of an application master
>>> running on Yarn, requesting resources across the cluster.
>>>
>>> Fundamentally, YARN is confusing because I think they coupled running
>>> Map Reduce jobs with the resource manager and called it "Hadoop v2".  By
>>> coupling the two, people look at YARN as Map Reduce V2, but it's not
>>> really.  It's a way to running jobs on a cluster of machines (ala Mesos)
>>> with a "application" that is the equivalent of Map Reduce V1.   The names
>>> being given seem to be confusing to me, it makes people who have invested
>>> in Hadoop (Map Reduce V1) be very interested in YARN because it's called
>>> "Hadoop V2".  While Mesos is seen as the "Other"
>>>
>>>
>>> Just for my sake I summarized a TL;DR form so if someone wants to
>>> correct my understanding they can
>>>
>>> Mesos = Tool to manage resources
>>>
>>> YARN = Tool to manage resources it's also called Hadoopv2
>>>
>>> Map Reduce V1 = Job trackers/Task Trackers it's what we know. It can run
>>> on Hadoop clusters, and Mesos.  It's also called Hadoopv1
>>>
>>> Map Reduce V2 =  Application that can run on YARN that mimics Map Reduce
>>> V1 on a YARN Cluster. This + YARN has been called Hadoopv2.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Sun, Jul 27, 2014 at 4:10 AM, Maxime Brugidou <
>>> maxime.brugidou@gmail.com> wrote:
>>>
>>>> When I said that running yarn over mesos did not make sense I meant
>>>> that running a resource manager in a resource manager was very sub-optimal.
>>>> You will eventually do static allocation of resources for the Yarn
>>>> framework in Mesos or have complex logic to determine how much resource
>>>> should be given to yarn. You will also have the same burden of managing 2
>>>> different clusters instead of one, even if yarn is sort of hidden as mesos
>>>> framework.
>>>>
>>>> However yes I believe its easier to run yarn on mesos than to run mrv2
>>>> on top of mesos. The solution I was discussing was obviously "ideal" and I
>>>> looked at the MRAppMaster since and it discouraged me :)
>>>>  On Jul 27, 2014 12:41 AM, "Rick Richardson" <ri...@gmail.com>
>>>> wrote:
>>>>
>>>>> FWIW I also think the fastest approach here is is porting Yarn onto
>>>>> Mesos.
>>>>>
>>>>> In a perfect world, writing an implementation layer for the Yarn
>>>>> Interface on Mesos would certainly be the optimal approach, but looking at
>>>>> the MRv2 code, it is very very coupled to many Yarn modules.
>>>>>
>>>>> If someone wanted to take on the project of making a generic resource
>>>>> scheduler Interface for MRv2, that works be amazing :)
>>>>> On Jul 26, 2014 6:19 PM, "Jie Yu" <yu...@gmail.com> wrote:
>>>>>
>>>>>> I am interested in investigating the idea of YARN on top of Mesos.
>>>>>> One of the benefits I can think of is that we can get rid of the static
>>>>>> resource allocation between YARN and Mesos clusters. In that way, Mesos can
>>>>>> allocate those resources that are not used by YARN to other Mesos
>>>>>> frameworks like Aurora, Marathon, etc, to increase the resource utilization
>>>>>> of the entire data center. Also, we could avoid running each MRv2 job as a
>>>>>> framework which I think might cause some maintenance complexity (e.g. for
>>>>>> framework rate limiting, etc). Finally, YARN currently does not have a good
>>>>>> isolation support. It only supports cpu isolation right now (using
>>>>>> cgroups). By porting YARN on top of Mesos, we might be able to leverage the
>>>>>> existing Mesos containerizer strategy to provide better isolation between
>>>>>> tasks. Maxime, I am curious why do you think it does not make sense to run
>>>>>> YARN over Mesos? Since I am not super familar with YARN, I might be missing
>>>>>> something.
>>>>>>
>>>>>> I have been thinking of making ResourceManager in YARN a Mesos
>>>>>> framework and making NodeManager a Mesos executor. The NodeManager will
>>>>>> launch containers using primitives provided by Mesos so that we have a
>>>>>> consistent containerizer layer. I haven't fully figured out how this could
>>>>>> be done yet (e.g., nested containers, communication between NodeManager and
>>>>>> ResourceManager, etc.), but I would love to explore this direction. I would
>>>>>> like to hear about any feedback/suggestions you guys have about this
>>>>>> direction.
>>>>>>
>>>>>> Thanks,
>>>>>> - Jie
>>>>>>
>>>>>>
>>>>>> On Fri, Jul 25, 2014 at 1:39 PM, Maxime Brugidou <
>>>>>> maxime.brugidou@gmail.com> wrote:
>>>>>>
>>>>>>> We run both mesos and yarn in prod and it does not make sense to run
>>>>>>> yarn over mesos.
>>>>>>>
>>>>>>> However it would be interesting to find a way to run MRv2 jobs on
>>>>>>> mesos with some custom layer to swap yarn with mesos. Not sure how to start
>>>>>>> though... MRv2 contains a yarn application master that needs to be
>>>>>>> rewritten as a mesos framework scheduler. This is probably doable. However
>>>>>>> with MRv2 every map reduce job would be mapped as a new framework in Mesos.
>>>>>>> Not sure how many frameworks mesos can run and scale up to. Especially
>>>>>>> short lived frameworks.
>>>>>>>  On Jul 25, 2014 8:54 PM, "Tom Arnfeld" <to...@duedil.com> wrote:
>>>>>>>
>>>>>>>> Hey Luyi,
>>>>>>>>
>>>>>>>> That's correct, the Hadoop framework currently only supports Hadoop
>>>>>>>> 2 MRv1. It also doesn't have great support for the HA jobtracker available
>>>>>>>> in newer versions of Hadoop, but I've been working on that the past few
>>>>>>>> weeks.
>>>>>>>>
>>>>>>>> I'm not sure how Hadoop 2 would play with Mesos, but very
>>>>>>>> interested to find out more. Am I correct in thinking MRv2 will only run on
>>>>>>>> top of YARN?
>>>>>>>>
>>>>>>>> I wonder if anyone else on the mailing list is running YARN on top
>>>>>>>> of Mesos...
>>>>>>>>
>>>>>>>> Tom.
>>>>>>>>
>>>>>>>> On Friday, 25 July 2014, Luyi Wang <wa...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Checked the mesos github(https://github.com/mesos/hadoop). It
>>>>>>>>> listed support for MapReduce V1
>>>>>>>>>
>>>>>>>>> How about the MR V2?
>>>>>>>>>
>>>>>>>>> Right now we are using cloudera to manage hadoop clusters where
>>>>>>>>> uses MRV2. We are planning to migrate all our services to mesos(still in
>>>>>>>>> the initial investigating stage).  Good suggestions, advice and experiences
>>>>>>>>> are welcomed.
>>>>>>>>>
>>>>>>>>> Thanks a lot!
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> -Luyi.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>
>>>
>

Re: Does Mesos support Hadoop MR V2

Posted by Luyi Wang <wa...@gmail.com>.
I second john's opinion on the confusing part of different terminology of
hadoop v2.  That's the reason I asked the question on if mesos support mr
v2.  As maxime's concern, the decoupling part might be difficult.  After
reading the mesos mrv1's implementation, I think possibly mrv2 migration
can be done as if not touching anything related with resource manger(Yarn).
    Need more time to investigating more on this complication.



-Luyi.





On Sun, Jul 27, 2014 at 10:40 AM, Maxime Brugidou <maxime.brugidou@gmail.com
> wrote:

> John, i believe that you are 100% correct. Theoretically we should run
> MRv2 on Mesos but the current implementation of MRv2 on Yarn seem very
> complex and difficult to decouple from the resource manager/negotiator.
>
> It's still something that could be done I guess but maybe as completely
> independent Hadoop-compatible map reduce framework for Mesos. You could
> write this from scratch with a custom framework inspired by the MRv2 app
> master implementation.
>  On Jul 27, 2014 7:00 PM, "John Omernik" <jo...@omernik.com> wrote:
>
>> So excuse my naivety in this space, but my ignorance has never really
>> stopped me from asking questions:
>>
>> I see YARN (Yet another resource negotiator) as very similar to Mesos.
>> I.e. something to manage resources on a cluster of machines. So when I hear
>> talk of running "YARN" on Mesos it's seems very redundant indeed, and I ask
>> myself, what are we actually getting out of this setup?
>>
>> So, going to the mapr/reduce question, I see Mapr Reduce V1 and
>> MaprReduce V2 like this:  Map Reduce V2 is an application that runs on
>> YARN. I.e. if you run a job, it creates an application master, that
>> application master requests resources, and the job gets run.  It differs
>> from Map Reduce V1 is there is no long running Job Tracker (other than the
>> YARN Resource Manager, but that is managing resources for all applications,
>> not just Map Reduce Applications).  Ok, so Mesos, why can't there be a
>> Mesos Application that is similar to a Map Reduce V2 Application in YARN?
>>  Why do we need to run YARN on Mesos? That doesn't really make sense.
>>  Basically, for M/R V2 vs M/R V1, the only difference is to mimic M/R V1 we
>> need task trackers and job trackers running as Mesos applications (which we
>> have).  So in M/R v2, we just need the equivalent of an application master
>> running on Yarn, requesting resources across the cluster.
>>
>> Fundamentally, YARN is confusing because I think they coupled running Map
>> Reduce jobs with the resource manager and called it "Hadoop v2".  By
>> coupling the two, people look at YARN as Map Reduce V2, but it's not
>> really.  It's a way to running jobs on a cluster of machines (ala Mesos)
>> with a "application" that is the equivalent of Map Reduce V1.   The names
>> being given seem to be confusing to me, it makes people who have invested
>> in Hadoop (Map Reduce V1) be very interested in YARN because it's called
>> "Hadoop V2".  While Mesos is seen as the "Other"
>>
>>
>> Just for my sake I summarized a TL;DR form so if someone wants to correct
>> my understanding they can
>>
>> Mesos = Tool to manage resources
>>
>> YARN = Tool to manage resources it's also called Hadoopv2
>>
>> Map Reduce V1 = Job trackers/Task Trackers it's what we know. It can run
>> on Hadoop clusters, and Mesos.  It's also called Hadoopv1
>>
>> Map Reduce V2 =  Application that can run on YARN that mimics Map Reduce
>> V1 on a YARN Cluster. This + YARN has been called Hadoopv2.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Sun, Jul 27, 2014 at 4:10 AM, Maxime Brugidou <
>> maxime.brugidou@gmail.com> wrote:
>>
>>> When I said that running yarn over mesos did not make sense I meant that
>>> running a resource manager in a resource manager was very sub-optimal. You
>>> will eventually do static allocation of resources for the Yarn framework in
>>> Mesos or have complex logic to determine how much resource should be given
>>> to yarn. You will also have the same burden of managing 2 different
>>> clusters instead of one, even if yarn is sort of hidden as mesos framework.
>>>
>>> However yes I believe its easier to run yarn on mesos than to run mrv2
>>> on top of mesos. The solution I was discussing was obviously "ideal" and I
>>> looked at the MRAppMaster since and it discouraged me :)
>>>  On Jul 27, 2014 12:41 AM, "Rick Richardson" <ri...@gmail.com>
>>> wrote:
>>>
>>>> FWIW I also think the fastest approach here is is porting Yarn onto
>>>> Mesos.
>>>>
>>>> In a perfect world, writing an implementation layer for the Yarn
>>>> Interface on Mesos would certainly be the optimal approach, but looking at
>>>> the MRv2 code, it is very very coupled to many Yarn modules.
>>>>
>>>> If someone wanted to take on the project of making a generic resource
>>>> scheduler Interface for MRv2, that works be amazing :)
>>>> On Jul 26, 2014 6:19 PM, "Jie Yu" <yu...@gmail.com> wrote:
>>>>
>>>>> I am interested in investigating the idea of YARN on top of Mesos. One
>>>>> of the benefits I can think of is that we can get rid of the static
>>>>> resource allocation between YARN and Mesos clusters. In that way, Mesos can
>>>>> allocate those resources that are not used by YARN to other Mesos
>>>>> frameworks like Aurora, Marathon, etc, to increase the resource utilization
>>>>> of the entire data center. Also, we could avoid running each MRv2 job as a
>>>>> framework which I think might cause some maintenance complexity (e.g. for
>>>>> framework rate limiting, etc). Finally, YARN currently does not have a good
>>>>> isolation support. It only supports cpu isolation right now (using
>>>>> cgroups). By porting YARN on top of Mesos, we might be able to leverage the
>>>>> existing Mesos containerizer strategy to provide better isolation between
>>>>> tasks. Maxime, I am curious why do you think it does not make sense to run
>>>>> YARN over Mesos? Since I am not super familar with YARN, I might be missing
>>>>> something.
>>>>>
>>>>> I have been thinking of making ResourceManager in YARN a Mesos
>>>>> framework and making NodeManager a Mesos executor. The NodeManager will
>>>>> launch containers using primitives provided by Mesos so that we have a
>>>>> consistent containerizer layer. I haven't fully figured out how this could
>>>>> be done yet (e.g., nested containers, communication between NodeManager and
>>>>> ResourceManager, etc.), but I would love to explore this direction. I would
>>>>> like to hear about any feedback/suggestions you guys have about this
>>>>> direction.
>>>>>
>>>>> Thanks,
>>>>> - Jie
>>>>>
>>>>>
>>>>> On Fri, Jul 25, 2014 at 1:39 PM, Maxime Brugidou <
>>>>> maxime.brugidou@gmail.com> wrote:
>>>>>
>>>>>> We run both mesos and yarn in prod and it does not make sense to run
>>>>>> yarn over mesos.
>>>>>>
>>>>>> However it would be interesting to find a way to run MRv2 jobs on
>>>>>> mesos with some custom layer to swap yarn with mesos. Not sure how to start
>>>>>> though... MRv2 contains a yarn application master that needs to be
>>>>>> rewritten as a mesos framework scheduler. This is probably doable. However
>>>>>> with MRv2 every map reduce job would be mapped as a new framework in Mesos.
>>>>>> Not sure how many frameworks mesos can run and scale up to. Especially
>>>>>> short lived frameworks.
>>>>>>  On Jul 25, 2014 8:54 PM, "Tom Arnfeld" <to...@duedil.com> wrote:
>>>>>>
>>>>>>> Hey Luyi,
>>>>>>>
>>>>>>> That's correct, the Hadoop framework currently only supports Hadoop
>>>>>>> 2 MRv1. It also doesn't have great support for the HA jobtracker available
>>>>>>> in newer versions of Hadoop, but I've been working on that the past few
>>>>>>> weeks.
>>>>>>>
>>>>>>> I'm not sure how Hadoop 2 would play with Mesos, but very interested
>>>>>>> to find out more. Am I correct in thinking MRv2 will only run on top of
>>>>>>> YARN?
>>>>>>>
>>>>>>> I wonder if anyone else on the mailing list is running YARN on top
>>>>>>> of Mesos...
>>>>>>>
>>>>>>> Tom.
>>>>>>>
>>>>>>> On Friday, 25 July 2014, Luyi Wang <wa...@gmail.com> wrote:
>>>>>>>
>>>>>>>> Checked the mesos github(https://github.com/mesos/hadoop). It
>>>>>>>> listed support for MapReduce V1
>>>>>>>>
>>>>>>>> How about the MR V2?
>>>>>>>>
>>>>>>>> Right now we are using cloudera to manage hadoop clusters where
>>>>>>>> uses MRV2. We are planning to migrate all our services to mesos(still in
>>>>>>>> the initial investigating stage).  Good suggestions, advice and experiences
>>>>>>>> are welcomed.
>>>>>>>>
>>>>>>>> Thanks a lot!
>>>>>>>>
>>>>>>>>
>>>>>>>> -Luyi.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>
>>

Re: Does Mesos support Hadoop MR V2

Posted by Maxime Brugidou <ma...@gmail.com>.
John, i believe that you are 100% correct. Theoretically we should run MRv2
on Mesos but the current implementation of MRv2 on Yarn seem very complex
and difficult to decouple from the resource manager/negotiator.

It's still something that could be done I guess but maybe as completely
independent Hadoop-compatible map reduce framework for Mesos. You could
write this from scratch with a custom framework inspired by the MRv2 app
master implementation.
On Jul 27, 2014 7:00 PM, "John Omernik" <jo...@omernik.com> wrote:

> So excuse my naivety in this space, but my ignorance has never really
> stopped me from asking questions:
>
> I see YARN (Yet another resource negotiator) as very similar to Mesos.
> I.e. something to manage resources on a cluster of machines. So when I hear
> talk of running "YARN" on Mesos it's seems very redundant indeed, and I ask
> myself, what are we actually getting out of this setup?
>
> So, going to the mapr/reduce question, I see Mapr Reduce V1 and MaprReduce
> V2 like this:  Map Reduce V2 is an application that runs on YARN. I.e. if
> you run a job, it creates an application master, that application master
> requests resources, and the job gets run.  It differs from Map Reduce V1 is
> there is no long running Job Tracker (other than the YARN Resource Manager,
> but that is managing resources for all applications, not just Map Reduce
> Applications).  Ok, so Mesos, why can't there be a Mesos Application that
> is similar to a Map Reduce V2 Application in YARN?  Why do we need to run
> YARN on Mesos? That doesn't really make sense.  Basically, for M/R V2 vs
> M/R V1, the only difference is to mimic M/R V1 we need task trackers and
> job trackers running as Mesos applications (which we have).  So in M/R v2,
> we just need the equivalent of an application master running on Yarn,
> requesting resources across the cluster.
>
> Fundamentally, YARN is confusing because I think they coupled running Map
> Reduce jobs with the resource manager and called it "Hadoop v2".  By
> coupling the two, people look at YARN as Map Reduce V2, but it's not
> really.  It's a way to running jobs on a cluster of machines (ala Mesos)
> with a "application" that is the equivalent of Map Reduce V1.   The names
> being given seem to be confusing to me, it makes people who have invested
> in Hadoop (Map Reduce V1) be very interested in YARN because it's called
> "Hadoop V2".  While Mesos is seen as the "Other"
>
>
> Just for my sake I summarized a TL;DR form so if someone wants to correct
> my understanding they can
>
> Mesos = Tool to manage resources
>
> YARN = Tool to manage resources it's also called Hadoopv2
>
> Map Reduce V1 = Job trackers/Task Trackers it's what we know. It can run
> on Hadoop clusters, and Mesos.  It's also called Hadoopv1
>
> Map Reduce V2 =  Application that can run on YARN that mimics Map Reduce
> V1 on a YARN Cluster. This + YARN has been called Hadoopv2.
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> On Sun, Jul 27, 2014 at 4:10 AM, Maxime Brugidou <
> maxime.brugidou@gmail.com> wrote:
>
>> When I said that running yarn over mesos did not make sense I meant that
>> running a resource manager in a resource manager was very sub-optimal. You
>> will eventually do static allocation of resources for the Yarn framework in
>> Mesos or have complex logic to determine how much resource should be given
>> to yarn. You will also have the same burden of managing 2 different
>> clusters instead of one, even if yarn is sort of hidden as mesos framework.
>>
>> However yes I believe its easier to run yarn on mesos than to run mrv2 on
>> top of mesos. The solution I was discussing was obviously "ideal" and I
>> looked at the MRAppMaster since and it discouraged me :)
>>  On Jul 27, 2014 12:41 AM, "Rick Richardson" <ri...@gmail.com>
>> wrote:
>>
>>> FWIW I also think the fastest approach here is is porting Yarn onto
>>> Mesos.
>>>
>>> In a perfect world, writing an implementation layer for the Yarn
>>> Interface on Mesos would certainly be the optimal approach, but looking at
>>> the MRv2 code, it is very very coupled to many Yarn modules.
>>>
>>> If someone wanted to take on the project of making a generic resource
>>> scheduler Interface for MRv2, that works be amazing :)
>>> On Jul 26, 2014 6:19 PM, "Jie Yu" <yu...@gmail.com> wrote:
>>>
>>>> I am interested in investigating the idea of YARN on top of Mesos. One
>>>> of the benefits I can think of is that we can get rid of the static
>>>> resource allocation between YARN and Mesos clusters. In that way, Mesos can
>>>> allocate those resources that are not used by YARN to other Mesos
>>>> frameworks like Aurora, Marathon, etc, to increase the resource utilization
>>>> of the entire data center. Also, we could avoid running each MRv2 job as a
>>>> framework which I think might cause some maintenance complexity (e.g. for
>>>> framework rate limiting, etc). Finally, YARN currently does not have a good
>>>> isolation support. It only supports cpu isolation right now (using
>>>> cgroups). By porting YARN on top of Mesos, we might be able to leverage the
>>>> existing Mesos containerizer strategy to provide better isolation between
>>>> tasks. Maxime, I am curious why do you think it does not make sense to run
>>>> YARN over Mesos? Since I am not super familar with YARN, I might be missing
>>>> something.
>>>>
>>>> I have been thinking of making ResourceManager in YARN a Mesos
>>>> framework and making NodeManager a Mesos executor. The NodeManager will
>>>> launch containers using primitives provided by Mesos so that we have a
>>>> consistent containerizer layer. I haven't fully figured out how this could
>>>> be done yet (e.g., nested containers, communication between NodeManager and
>>>> ResourceManager, etc.), but I would love to explore this direction. I would
>>>> like to hear about any feedback/suggestions you guys have about this
>>>> direction.
>>>>
>>>> Thanks,
>>>> - Jie
>>>>
>>>>
>>>> On Fri, Jul 25, 2014 at 1:39 PM, Maxime Brugidou <
>>>> maxime.brugidou@gmail.com> wrote:
>>>>
>>>>> We run both mesos and yarn in prod and it does not make sense to run
>>>>> yarn over mesos.
>>>>>
>>>>> However it would be interesting to find a way to run MRv2 jobs on
>>>>> mesos with some custom layer to swap yarn with mesos. Not sure how to start
>>>>> though... MRv2 contains a yarn application master that needs to be
>>>>> rewritten as a mesos framework scheduler. This is probably doable. However
>>>>> with MRv2 every map reduce job would be mapped as a new framework in Mesos.
>>>>> Not sure how many frameworks mesos can run and scale up to. Especially
>>>>> short lived frameworks.
>>>>>  On Jul 25, 2014 8:54 PM, "Tom Arnfeld" <to...@duedil.com> wrote:
>>>>>
>>>>>> Hey Luyi,
>>>>>>
>>>>>> That's correct, the Hadoop framework currently only supports Hadoop 2
>>>>>> MRv1. It also doesn't have great support for the HA jobtracker available in
>>>>>> newer versions of Hadoop, but I've been working on that the past few weeks.
>>>>>>
>>>>>> I'm not sure how Hadoop 2 would play with Mesos, but very interested
>>>>>> to find out more. Am I correct in thinking MRv2 will only run on top of
>>>>>> YARN?
>>>>>>
>>>>>> I wonder if anyone else on the mailing list is running YARN on top of
>>>>>> Mesos...
>>>>>>
>>>>>> Tom.
>>>>>>
>>>>>> On Friday, 25 July 2014, Luyi Wang <wa...@gmail.com> wrote:
>>>>>>
>>>>>>> Checked the mesos github(https://github.com/mesos/hadoop). It
>>>>>>> listed support for MapReduce V1
>>>>>>>
>>>>>>> How about the MR V2?
>>>>>>>
>>>>>>> Right now we are using cloudera to manage hadoop clusters where uses
>>>>>>> MRV2. We are planning to migrate all our services to mesos(still in the
>>>>>>> initial investigating stage).  Good suggestions, advice and experiences are
>>>>>>> welcomed.
>>>>>>>
>>>>>>> Thanks a lot!
>>>>>>>
>>>>>>>
>>>>>>> -Luyi.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>
>

Re: Does Mesos support Hadoop MR V2

Posted by John Omernik <jo...@omernik.com>.
So excuse my naivety in this space, but my ignorance has never really
stopped me from asking questions:

I see YARN (Yet another resource negotiator) as very similar to Mesos. I.e.
something to manage resources on a cluster of machines. So when I hear talk
of running "YARN" on Mesos it's seems very redundant indeed, and I ask
myself, what are we actually getting out of this setup?

So, going to the mapr/reduce question, I see Mapr Reduce V1 and MaprReduce
V2 like this:  Map Reduce V2 is an application that runs on YARN. I.e. if
you run a job, it creates an application master, that application master
requests resources, and the job gets run.  It differs from Map Reduce V1 is
there is no long running Job Tracker (other than the YARN Resource Manager,
but that is managing resources for all applications, not just Map Reduce
Applications).  Ok, so Mesos, why can't there be a Mesos Application that
is similar to a Map Reduce V2 Application in YARN?  Why do we need to run
YARN on Mesos? That doesn't really make sense.  Basically, for M/R V2 vs
M/R V1, the only difference is to mimic M/R V1 we need task trackers and
job trackers running as Mesos applications (which we have).  So in M/R v2,
we just need the equivalent of an application master running on Yarn,
requesting resources across the cluster.

Fundamentally, YARN is confusing because I think they coupled running Map
Reduce jobs with the resource manager and called it "Hadoop v2".  By
coupling the two, people look at YARN as Map Reduce V2, but it's not
really.  It's a way to running jobs on a cluster of machines (ala Mesos)
with a "application" that is the equivalent of Map Reduce V1.   The names
being given seem to be confusing to me, it makes people who have invested
in Hadoop (Map Reduce V1) be very interested in YARN because it's called
"Hadoop V2".  While Mesos is seen as the "Other"


Just for my sake I summarized a TL;DR form so if someone wants to correct
my understanding they can

Mesos = Tool to manage resources

YARN = Tool to manage resources it's also called Hadoopv2

Map Reduce V1 = Job trackers/Task Trackers it's what we know. It can run on
Hadoop clusters, and Mesos.  It's also called Hadoopv1

Map Reduce V2 =  Application that can run on YARN that mimics Map Reduce V1
on a YARN Cluster. This + YARN has been called Hadoopv2.


















On Sun, Jul 27, 2014 at 4:10 AM, Maxime Brugidou <ma...@gmail.com>
wrote:

> When I said that running yarn over mesos did not make sense I meant that
> running a resource manager in a resource manager was very sub-optimal. You
> will eventually do static allocation of resources for the Yarn framework in
> Mesos or have complex logic to determine how much resource should be given
> to yarn. You will also have the same burden of managing 2 different
> clusters instead of one, even if yarn is sort of hidden as mesos framework.
>
> However yes I believe its easier to run yarn on mesos than to run mrv2 on
> top of mesos. The solution I was discussing was obviously "ideal" and I
> looked at the MRAppMaster since and it discouraged me :)
>  On Jul 27, 2014 12:41 AM, "Rick Richardson" <ri...@gmail.com>
> wrote:
>
>> FWIW I also think the fastest approach here is is porting Yarn onto
>> Mesos.
>>
>> In a perfect world, writing an implementation layer for the Yarn
>> Interface on Mesos would certainly be the optimal approach, but looking at
>> the MRv2 code, it is very very coupled to many Yarn modules.
>>
>> If someone wanted to take on the project of making a generic resource
>> scheduler Interface for MRv2, that works be amazing :)
>> On Jul 26, 2014 6:19 PM, "Jie Yu" <yu...@gmail.com> wrote:
>>
>>> I am interested in investigating the idea of YARN on top of Mesos. One
>>> of the benefits I can think of is that we can get rid of the static
>>> resource allocation between YARN and Mesos clusters. In that way, Mesos can
>>> allocate those resources that are not used by YARN to other Mesos
>>> frameworks like Aurora, Marathon, etc, to increase the resource utilization
>>> of the entire data center. Also, we could avoid running each MRv2 job as a
>>> framework which I think might cause some maintenance complexity (e.g. for
>>> framework rate limiting, etc). Finally, YARN currently does not have a good
>>> isolation support. It only supports cpu isolation right now (using
>>> cgroups). By porting YARN on top of Mesos, we might be able to leverage the
>>> existing Mesos containerizer strategy to provide better isolation between
>>> tasks. Maxime, I am curious why do you think it does not make sense to run
>>> YARN over Mesos? Since I am not super familar with YARN, I might be missing
>>> something.
>>>
>>> I have been thinking of making ResourceManager in YARN a Mesos framework
>>> and making NodeManager a Mesos executor. The NodeManager will launch
>>> containers using primitives provided by Mesos so that we have a consistent
>>> containerizer layer. I haven't fully figured out how this could be done yet
>>> (e.g., nested containers, communication between NodeManager and
>>> ResourceManager, etc.), but I would love to explore this direction. I would
>>> like to hear about any feedback/suggestions you guys have about this
>>> direction.
>>>
>>> Thanks,
>>> - Jie
>>>
>>>
>>> On Fri, Jul 25, 2014 at 1:39 PM, Maxime Brugidou <
>>> maxime.brugidou@gmail.com> wrote:
>>>
>>>> We run both mesos and yarn in prod and it does not make sense to run
>>>> yarn over mesos.
>>>>
>>>> However it would be interesting to find a way to run MRv2 jobs on mesos
>>>> with some custom layer to swap yarn with mesos. Not sure how to start
>>>> though... MRv2 contains a yarn application master that needs to be
>>>> rewritten as a mesos framework scheduler. This is probably doable. However
>>>> with MRv2 every map reduce job would be mapped as a new framework in Mesos.
>>>> Not sure how many frameworks mesos can run and scale up to. Especially
>>>> short lived frameworks.
>>>>  On Jul 25, 2014 8:54 PM, "Tom Arnfeld" <to...@duedil.com> wrote:
>>>>
>>>>> Hey Luyi,
>>>>>
>>>>> That's correct, the Hadoop framework currently only supports Hadoop 2
>>>>> MRv1. It also doesn't have great support for the HA jobtracker available in
>>>>> newer versions of Hadoop, but I've been working on that the past few weeks.
>>>>>
>>>>> I'm not sure how Hadoop 2 would play with Mesos, but very interested
>>>>> to find out more. Am I correct in thinking MRv2 will only run on top of
>>>>> YARN?
>>>>>
>>>>> I wonder if anyone else on the mailing list is running YARN on top of
>>>>> Mesos...
>>>>>
>>>>> Tom.
>>>>>
>>>>> On Friday, 25 July 2014, Luyi Wang <wa...@gmail.com> wrote:
>>>>>
>>>>>> Checked the mesos github(https://github.com/mesos/hadoop). It listed
>>>>>> support for MapReduce V1
>>>>>>
>>>>>> How about the MR V2?
>>>>>>
>>>>>> Right now we are using cloudera to manage hadoop clusters where uses
>>>>>> MRV2. We are planning to migrate all our services to mesos(still in the
>>>>>> initial investigating stage).  Good suggestions, advice and experiences are
>>>>>> welcomed.
>>>>>>
>>>>>> Thanks a lot!
>>>>>>
>>>>>>
>>>>>> -Luyi.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>

Re: Does Mesos support Hadoop MR V2

Posted by Maxime Brugidou <ma...@gmail.com>.
When I said that running yarn over mesos did not make sense I meant that
running a resource manager in a resource manager was very sub-optimal. You
will eventually do static allocation of resources for the Yarn framework in
Mesos or have complex logic to determine how much resource should be given
to yarn. You will also have the same burden of managing 2 different
clusters instead of one, even if yarn is sort of hidden as mesos framework.

However yes I believe its easier to run yarn on mesos than to run mrv2 on
top of mesos. The solution I was discussing was obviously "ideal" and I
looked at the MRAppMaster since and it discouraged me :)
On Jul 27, 2014 12:41 AM, "Rick Richardson" <ri...@gmail.com>
wrote:

> FWIW I also think the fastest approach here is is porting Yarn onto Mesos.
>
> In a perfect world, writing an implementation layer for the Yarn Interface
> on Mesos would certainly be the optimal approach, but looking at the MRv2
> code, it is very very coupled to many Yarn modules.
>
> If someone wanted to take on the project of making a generic resource
> scheduler Interface for MRv2, that works be amazing :)
> On Jul 26, 2014 6:19 PM, "Jie Yu" <yu...@gmail.com> wrote:
>
>> I am interested in investigating the idea of YARN on top of Mesos. One of
>> the benefits I can think of is that we can get rid of the static resource
>> allocation between YARN and Mesos clusters. In that way, Mesos can allocate
>> those resources that are not used by YARN to other Mesos frameworks like
>> Aurora, Marathon, etc, to increase the resource utilization of the entire
>> data center. Also, we could avoid running each MRv2 job as a framework
>> which I think might cause some maintenance complexity (e.g. for framework
>> rate limiting, etc). Finally, YARN currently does not have a good isolation
>> support. It only supports cpu isolation right now (using cgroups). By
>> porting YARN on top of Mesos, we might be able to leverage the existing
>> Mesos containerizer strategy to provide better isolation between tasks.
>> Maxime, I am curious why do you think it does not make sense to run YARN
>> over Mesos? Since I am not super familar with YARN, I might be missing
>> something.
>>
>> I have been thinking of making ResourceManager in YARN a Mesos framework
>> and making NodeManager a Mesos executor. The NodeManager will launch
>> containers using primitives provided by Mesos so that we have a consistent
>> containerizer layer. I haven't fully figured out how this could be done yet
>> (e.g., nested containers, communication between NodeManager and
>> ResourceManager, etc.), but I would love to explore this direction. I would
>> like to hear about any feedback/suggestions you guys have about this
>> direction.
>>
>> Thanks,
>> - Jie
>>
>>
>> On Fri, Jul 25, 2014 at 1:39 PM, Maxime Brugidou <
>> maxime.brugidou@gmail.com> wrote:
>>
>>> We run both mesos and yarn in prod and it does not make sense to run
>>> yarn over mesos.
>>>
>>> However it would be interesting to find a way to run MRv2 jobs on mesos
>>> with some custom layer to swap yarn with mesos. Not sure how to start
>>> though... MRv2 contains a yarn application master that needs to be
>>> rewritten as a mesos framework scheduler. This is probably doable. However
>>> with MRv2 every map reduce job would be mapped as a new framework in Mesos.
>>> Not sure how many frameworks mesos can run and scale up to. Especially
>>> short lived frameworks.
>>>  On Jul 25, 2014 8:54 PM, "Tom Arnfeld" <to...@duedil.com> wrote:
>>>
>>>> Hey Luyi,
>>>>
>>>> That's correct, the Hadoop framework currently only supports Hadoop 2
>>>> MRv1. It also doesn't have great support for the HA jobtracker available in
>>>> newer versions of Hadoop, but I've been working on that the past few weeks.
>>>>
>>>> I'm not sure how Hadoop 2 would play with Mesos, but very interested to
>>>> find out more. Am I correct in thinking MRv2 will only run on top of YARN?
>>>>
>>>> I wonder if anyone else on the mailing list is running YARN on top of
>>>> Mesos...
>>>>
>>>> Tom.
>>>>
>>>> On Friday, 25 July 2014, Luyi Wang <wa...@gmail.com> wrote:
>>>>
>>>>> Checked the mesos github(https://github.com/mesos/hadoop). It listed
>>>>> support for MapReduce V1
>>>>>
>>>>> How about the MR V2?
>>>>>
>>>>> Right now we are using cloudera to manage hadoop clusters where uses
>>>>> MRV2. We are planning to migrate all our services to mesos(still in the
>>>>> initial investigating stage).  Good suggestions, advice and experiences are
>>>>> welcomed.
>>>>>
>>>>> Thanks a lot!
>>>>>
>>>>>
>>>>> -Luyi.
>>>>>
>>>>>
>>>>>
>>>>>
>>

Re: Does Mesos support Hadoop MR V2

Posted by Rick Richardson <ri...@gmail.com>.
FWIW I also think the fastest approach here is is porting Yarn onto Mesos.

In a perfect world, writing an implementation layer for the Yarn Interface
on Mesos would certainly be the optimal approach, but looking at the MRv2
code, it is very very coupled to many Yarn modules.

If someone wanted to take on the project of making a generic resource
scheduler Interface for MRv2, that works be amazing :)
On Jul 26, 2014 6:19 PM, "Jie Yu" <yu...@gmail.com> wrote:

> I am interested in investigating the idea of YARN on top of Mesos. One of
> the benefits I can think of is that we can get rid of the static resource
> allocation between YARN and Mesos clusters. In that way, Mesos can allocate
> those resources that are not used by YARN to other Mesos frameworks like
> Aurora, Marathon, etc, to increase the resource utilization of the entire
> data center. Also, we could avoid running each MRv2 job as a framework
> which I think might cause some maintenance complexity (e.g. for framework
> rate limiting, etc). Finally, YARN currently does not have a good isolation
> support. It only supports cpu isolation right now (using cgroups). By
> porting YARN on top of Mesos, we might be able to leverage the existing
> Mesos containerizer strategy to provide better isolation between tasks.
> Maxime, I am curious why do you think it does not make sense to run YARN
> over Mesos? Since I am not super familar with YARN, I might be missing
> something.
>
> I have been thinking of making ResourceManager in YARN a Mesos framework
> and making NodeManager a Mesos executor. The NodeManager will launch
> containers using primitives provided by Mesos so that we have a consistent
> containerizer layer. I haven't fully figured out how this could be done yet
> (e.g., nested containers, communication between NodeManager and
> ResourceManager, etc.), but I would love to explore this direction. I would
> like to hear about any feedback/suggestions you guys have about this
> direction.
>
> Thanks,
> - Jie
>
>
> On Fri, Jul 25, 2014 at 1:39 PM, Maxime Brugidou <
> maxime.brugidou@gmail.com> wrote:
>
>> We run both mesos and yarn in prod and it does not make sense to run yarn
>> over mesos.
>>
>> However it would be interesting to find a way to run MRv2 jobs on mesos
>> with some custom layer to swap yarn with mesos. Not sure how to start
>> though... MRv2 contains a yarn application master that needs to be
>> rewritten as a mesos framework scheduler. This is probably doable. However
>> with MRv2 every map reduce job would be mapped as a new framework in Mesos.
>> Not sure how many frameworks mesos can run and scale up to. Especially
>> short lived frameworks.
>>  On Jul 25, 2014 8:54 PM, "Tom Arnfeld" <to...@duedil.com> wrote:
>>
>>> Hey Luyi,
>>>
>>> That's correct, the Hadoop framework currently only supports Hadoop 2
>>> MRv1. It also doesn't have great support for the HA jobtracker available in
>>> newer versions of Hadoop, but I've been working on that the past few weeks.
>>>
>>> I'm not sure how Hadoop 2 would play with Mesos, but very interested to
>>> find out more. Am I correct in thinking MRv2 will only run on top of YARN?
>>>
>>> I wonder if anyone else on the mailing list is running YARN on top of
>>> Mesos...
>>>
>>> Tom.
>>>
>>> On Friday, 25 July 2014, Luyi Wang <wa...@gmail.com> wrote:
>>>
>>>> Checked the mesos github(https://github.com/mesos/hadoop). It listed
>>>> support for MapReduce V1
>>>>
>>>> How about the MR V2?
>>>>
>>>> Right now we are using cloudera to manage hadoop clusters where uses
>>>> MRV2. We are planning to migrate all our services to mesos(still in the
>>>> initial investigating stage).  Good suggestions, advice and experiences are
>>>> welcomed.
>>>>
>>>> Thanks a lot!
>>>>
>>>>
>>>> -Luyi.
>>>>
>>>>
>>>>
>>>>
>

Re: Does Mesos support Hadoop MR V2

Posted by Tim St Clair <ts...@redhat.com>.
FWIW - I've been able to setup YARN clusters in containers using bridged networking and relying on zookeeper for Namenode resolution.

The problem is - 'bridged networking' = IP-fiasco at scale. 

However, I still have hopes for MAC VLAN support for Docker: https://groups.google.com/forum/#!topic/docker-dev/6tt1y9FTWKg 

Cheers,
Tim

----- Original Message -----
> From: "Jie Yu" <yu...@gmail.com>
> To: user@mesos.apache.org
> Cc: "mesos" <de...@mesos.apache.org>
> Sent: Saturday, July 26, 2014 5:19:28 PM
> Subject: Re: Does Mesos support Hadoop MR V2
> 
> I am interested in investigating the idea of YARN on top of Mesos. One of
> the benefits I can think of is that we can get rid of the static resource
> allocation between YARN and Mesos clusters. In that way, Mesos can allocate
> those resources that are not used by YARN to other Mesos frameworks like
> Aurora, Marathon, etc, to increase the resource utilization of the entire
> data center. Also, we could avoid running each MRv2 job as a framework
> which I think might cause some maintenance complexity (e.g. for framework
> rate limiting, etc). Finally, YARN currently does not have a good isolation
> support. It only supports cpu isolation right now (using cgroups). By
> porting YARN on top of Mesos, we might be able to leverage the existing
> Mesos containerizer strategy to provide better isolation between tasks.
> Maxime, I am curious why do you think it does not make sense to run YARN
> over Mesos? Since I am not super familar with YARN, I might be missing
> something.
> 
> I have been thinking of making ResourceManager in YARN a Mesos framework
> and making NodeManager a Mesos executor. The NodeManager will launch
> containers using primitives provided by Mesos so that we have a consistent
> containerizer layer. I haven't fully figured out how this could be done yet
> (e.g., nested containers, communication between NodeManager and
> ResourceManager, etc.), but I would love to explore this direction. I would
> like to hear about any feedback/suggestions you guys have about this
> direction.
> 
> Thanks,
> - Jie
> 
> 
> On Fri, Jul 25, 2014 at 1:39 PM, Maxime Brugidou <ma...@gmail.com>
> wrote:
> 
> > We run both mesos and yarn in prod and it does not make sense to run yarn
> > over mesos.
> >
> > However it would be interesting to find a way to run MRv2 jobs on mesos
> > with some custom layer to swap yarn with mesos. Not sure how to start
> > though... MRv2 contains a yarn application master that needs to be
> > rewritten as a mesos framework scheduler. This is probably doable. However
> > with MRv2 every map reduce job would be mapped as a new framework in Mesos.
> > Not sure how many frameworks mesos can run and scale up to. Especially
> > short lived frameworks.
> > On Jul 25, 2014 8:54 PM, "Tom Arnfeld" <to...@duedil.com> wrote:
> >
> >> Hey Luyi,
> >>
> >> That's correct, the Hadoop framework currently only supports Hadoop 2
> >> MRv1. It also doesn't have great support for the HA jobtracker available
> >> in
> >> newer versions of Hadoop, but I've been working on that the past few
> >> weeks.
> >>
> >> I'm not sure how Hadoop 2 would play with Mesos, but very interested to
> >> find out more. Am I correct in thinking MRv2 will only run on top of YARN?
> >>
> >> I wonder if anyone else on the mailing list is running YARN on top of
> >> Mesos...
> >>
> >> Tom.
> >>
> >> On Friday, 25 July 2014, Luyi Wang <wa...@gmail.com> wrote:
> >>
> >>> Checked the mesos github(https://github.com/mesos/hadoop). It listed
> >>> support for MapReduce V1
> >>>
> >>> How about the MR V2?
> >>>
> >>> Right now we are using cloudera to manage hadoop clusters where uses
> >>> MRV2. We are planning to migrate all our services to mesos(still in the
> >>> initial investigating stage).  Good suggestions, advice and experiences
> >>> are
> >>> welcomed.
> >>>
> >>> Thanks a lot!
> >>>
> >>>
> >>> -Luyi.
> >>>
> >>>
> >>>
> >>>
> 

-- 
Cheers,
Timothy St. Clair
Red Hat Inc.

Re: Does Mesos support Hadoop MR V2

Posted by Rick Richardson <ri...@gmail.com>.
FWIW I also think the fastest approach here is is porting Yarn onto Mesos.

In a perfect world, writing an implementation layer for the Yarn Interface
on Mesos would certainly be the optimal approach, but looking at the MRv2
code, it is very very coupled to many Yarn modules.

If someone wanted to take on the project of making a generic resource
scheduler Interface for MRv2, that works be amazing :)
On Jul 26, 2014 6:19 PM, "Jie Yu" <yu...@gmail.com> wrote:

> I am interested in investigating the idea of YARN on top of Mesos. One of
> the benefits I can think of is that we can get rid of the static resource
> allocation between YARN and Mesos clusters. In that way, Mesos can allocate
> those resources that are not used by YARN to other Mesos frameworks like
> Aurora, Marathon, etc, to increase the resource utilization of the entire
> data center. Also, we could avoid running each MRv2 job as a framework
> which I think might cause some maintenance complexity (e.g. for framework
> rate limiting, etc). Finally, YARN currently does not have a good isolation
> support. It only supports cpu isolation right now (using cgroups). By
> porting YARN on top of Mesos, we might be able to leverage the existing
> Mesos containerizer strategy to provide better isolation between tasks.
> Maxime, I am curious why do you think it does not make sense to run YARN
> over Mesos? Since I am not super familar with YARN, I might be missing
> something.
>
> I have been thinking of making ResourceManager in YARN a Mesos framework
> and making NodeManager a Mesos executor. The NodeManager will launch
> containers using primitives provided by Mesos so that we have a consistent
> containerizer layer. I haven't fully figured out how this could be done yet
> (e.g., nested containers, communication between NodeManager and
> ResourceManager, etc.), but I would love to explore this direction. I would
> like to hear about any feedback/suggestions you guys have about this
> direction.
>
> Thanks,
> - Jie
>
>
> On Fri, Jul 25, 2014 at 1:39 PM, Maxime Brugidou <
> maxime.brugidou@gmail.com> wrote:
>
>> We run both mesos and yarn in prod and it does not make sense to run yarn
>> over mesos.
>>
>> However it would be interesting to find a way to run MRv2 jobs on mesos
>> with some custom layer to swap yarn with mesos. Not sure how to start
>> though... MRv2 contains a yarn application master that needs to be
>> rewritten as a mesos framework scheduler. This is probably doable. However
>> with MRv2 every map reduce job would be mapped as a new framework in Mesos.
>> Not sure how many frameworks mesos can run and scale up to. Especially
>> short lived frameworks.
>>  On Jul 25, 2014 8:54 PM, "Tom Arnfeld" <to...@duedil.com> wrote:
>>
>>> Hey Luyi,
>>>
>>> That's correct, the Hadoop framework currently only supports Hadoop 2
>>> MRv1. It also doesn't have great support for the HA jobtracker available in
>>> newer versions of Hadoop, but I've been working on that the past few weeks.
>>>
>>> I'm not sure how Hadoop 2 would play with Mesos, but very interested to
>>> find out more. Am I correct in thinking MRv2 will only run on top of YARN?
>>>
>>> I wonder if anyone else on the mailing list is running YARN on top of
>>> Mesos...
>>>
>>> Tom.
>>>
>>> On Friday, 25 July 2014, Luyi Wang <wa...@gmail.com> wrote:
>>>
>>>> Checked the mesos github(https://github.com/mesos/hadoop). It listed
>>>> support for MapReduce V1
>>>>
>>>> How about the MR V2?
>>>>
>>>> Right now we are using cloudera to manage hadoop clusters where uses
>>>> MRV2. We are planning to migrate all our services to mesos(still in the
>>>> initial investigating stage).  Good suggestions, advice and experiences are
>>>> welcomed.
>>>>
>>>> Thanks a lot!
>>>>
>>>>
>>>> -Luyi.
>>>>
>>>>
>>>>
>>>>
>

Re: Does Mesos support Hadoop MR V2

Posted by Jie Yu <yu...@gmail.com>.
I am interested in investigating the idea of YARN on top of Mesos. One of
the benefits I can think of is that we can get rid of the static resource
allocation between YARN and Mesos clusters. In that way, Mesos can allocate
those resources that are not used by YARN to other Mesos frameworks like
Aurora, Marathon, etc, to increase the resource utilization of the entire
data center. Also, we could avoid running each MRv2 job as a framework
which I think might cause some maintenance complexity (e.g. for framework
rate limiting, etc). Finally, YARN currently does not have a good isolation
support. It only supports cpu isolation right now (using cgroups). By
porting YARN on top of Mesos, we might be able to leverage the existing
Mesos containerizer strategy to provide better isolation between tasks.
Maxime, I am curious why do you think it does not make sense to run YARN
over Mesos? Since I am not super familar with YARN, I might be missing
something.

I have been thinking of making ResourceManager in YARN a Mesos framework
and making NodeManager a Mesos executor. The NodeManager will launch
containers using primitives provided by Mesos so that we have a consistent
containerizer layer. I haven't fully figured out how this could be done yet
(e.g., nested containers, communication between NodeManager and
ResourceManager, etc.), but I would love to explore this direction. I would
like to hear about any feedback/suggestions you guys have about this
direction.

Thanks,
- Jie


On Fri, Jul 25, 2014 at 1:39 PM, Maxime Brugidou <ma...@gmail.com>
wrote:

> We run both mesos and yarn in prod and it does not make sense to run yarn
> over mesos.
>
> However it would be interesting to find a way to run MRv2 jobs on mesos
> with some custom layer to swap yarn with mesos. Not sure how to start
> though... MRv2 contains a yarn application master that needs to be
> rewritten as a mesos framework scheduler. This is probably doable. However
> with MRv2 every map reduce job would be mapped as a new framework in Mesos.
> Not sure how many frameworks mesos can run and scale up to. Especially
> short lived frameworks.
> On Jul 25, 2014 8:54 PM, "Tom Arnfeld" <to...@duedil.com> wrote:
>
>> Hey Luyi,
>>
>> That's correct, the Hadoop framework currently only supports Hadoop 2
>> MRv1. It also doesn't have great support for the HA jobtracker available in
>> newer versions of Hadoop, but I've been working on that the past few weeks.
>>
>> I'm not sure how Hadoop 2 would play with Mesos, but very interested to
>> find out more. Am I correct in thinking MRv2 will only run on top of YARN?
>>
>> I wonder if anyone else on the mailing list is running YARN on top of
>> Mesos...
>>
>> Tom.
>>
>> On Friday, 25 July 2014, Luyi Wang <wa...@gmail.com> wrote:
>>
>>> Checked the mesos github(https://github.com/mesos/hadoop). It listed
>>> support for MapReduce V1
>>>
>>> How about the MR V2?
>>>
>>> Right now we are using cloudera to manage hadoop clusters where uses
>>> MRV2. We are planning to migrate all our services to mesos(still in the
>>> initial investigating stage).  Good suggestions, advice and experiences are
>>> welcomed.
>>>
>>> Thanks a lot!
>>>
>>>
>>> -Luyi.
>>>
>>>
>>>
>>>

Re: Does Mesos support Hadoop MR V2

Posted by Jie Yu <yu...@gmail.com>.
I am interested in investigating the idea of YARN on top of Mesos. One of
the benefits I can think of is that we can get rid of the static resource
allocation between YARN and Mesos clusters. In that way, Mesos can allocate
those resources that are not used by YARN to other Mesos frameworks like
Aurora, Marathon, etc, to increase the resource utilization of the entire
data center. Also, we could avoid running each MRv2 job as a framework
which I think might cause some maintenance complexity (e.g. for framework
rate limiting, etc). Finally, YARN currently does not have a good isolation
support. It only supports cpu isolation right now (using cgroups). By
porting YARN on top of Mesos, we might be able to leverage the existing
Mesos containerizer strategy to provide better isolation between tasks.
Maxime, I am curious why do you think it does not make sense to run YARN
over Mesos? Since I am not super familar with YARN, I might be missing
something.

I have been thinking of making ResourceManager in YARN a Mesos framework
and making NodeManager a Mesos executor. The NodeManager will launch
containers using primitives provided by Mesos so that we have a consistent
containerizer layer. I haven't fully figured out how this could be done yet
(e.g., nested containers, communication between NodeManager and
ResourceManager, etc.), but I would love to explore this direction. I would
like to hear about any feedback/suggestions you guys have about this
direction.

Thanks,
- Jie


On Fri, Jul 25, 2014 at 1:39 PM, Maxime Brugidou <ma...@gmail.com>
wrote:

> We run both mesos and yarn in prod and it does not make sense to run yarn
> over mesos.
>
> However it would be interesting to find a way to run MRv2 jobs on mesos
> with some custom layer to swap yarn with mesos. Not sure how to start
> though... MRv2 contains a yarn application master that needs to be
> rewritten as a mesos framework scheduler. This is probably doable. However
> with MRv2 every map reduce job would be mapped as a new framework in Mesos.
> Not sure how many frameworks mesos can run and scale up to. Especially
> short lived frameworks.
> On Jul 25, 2014 8:54 PM, "Tom Arnfeld" <to...@duedil.com> wrote:
>
>> Hey Luyi,
>>
>> That's correct, the Hadoop framework currently only supports Hadoop 2
>> MRv1. It also doesn't have great support for the HA jobtracker available in
>> newer versions of Hadoop, but I've been working on that the past few weeks.
>>
>> I'm not sure how Hadoop 2 would play with Mesos, but very interested to
>> find out more. Am I correct in thinking MRv2 will only run on top of YARN?
>>
>> I wonder if anyone else on the mailing list is running YARN on top of
>> Mesos...
>>
>> Tom.
>>
>> On Friday, 25 July 2014, Luyi Wang <wa...@gmail.com> wrote:
>>
>>> Checked the mesos github(https://github.com/mesos/hadoop). It listed
>>> support for MapReduce V1
>>>
>>> How about the MR V2?
>>>
>>> Right now we are using cloudera to manage hadoop clusters where uses
>>> MRV2. We are planning to migrate all our services to mesos(still in the
>>> initial investigating stage).  Good suggestions, advice and experiences are
>>> welcomed.
>>>
>>> Thanks a lot!
>>>
>>>
>>> -Luyi.
>>>
>>>
>>>
>>>

Re: Does Mesos support Hadoop MR V2

Posted by Maxime Brugidou <ma...@gmail.com>.
I haven't written yarn app masters myself and browsing the
hadoop-mapreduce-project directories is really not easy. I think it is
feasible to get a prototype to work but it would take time.

>From what I know one difference is that the app master (which is equivalent
to the mesos framework) is run by yarn itself and the client simply
communicate with the app master. So there is an additional level of
indirection which could be simplified for a prototype. There are also
multiple ways to submit a job because of compatibility layers.
On Jul 25, 2014 10:56 PM, "Tom Arnfeld" <to...@duedil.com> wrote:

> I've not seen any issues pertaining to running many short lived
> frameworks, but that's not near the number of frameworks you'd see if each
> job was a framework.
>
> We've been pushing all our work on MRv1 High Availability JT upstream on
> the github.com/mesos/hadoop repo, though there hasn't been much to it.
>
> There's some outstanding work in regards to framework failover that needs
> to be done (there's an issue on GH for this, right now if the JT fails over
> the mesos framework will re-register and all task trackers relaunched
> meaning running jobs restart from the beginning) and a couple of small bugs
> we've found in relation to memory limits that we haven't debugged.
>
> Maxime, it'd be cool to hear more about how possible it would be to do
> port the MRv2 framework equivalent to Mesos. I'm not very familiar with the
> internals of YARN itself.
>
> Tom.
>
> On Friday, 25 July 2014, Maxime Brugidou <ma...@gmail.com>
> wrote:
>
>> We run both mesos and yarn in prod and it does not make sense to run yarn
>> over mesos.
>>
>> However it would be interesting to find a way to run MRv2 jobs on mesos
>> with some custom layer to swap yarn with mesos. Not sure how to start
>> though... MRv2 contains a yarn application master that needs to be
>> rewritten as a mesos framework scheduler. This is probably doable. However
>> with MRv2 every map reduce job would be mapped as a new framework in Mesos.
>> Not sure how many frameworks mesos can run and scale up to. Especially
>> short lived frameworks.
>> On Jul 25, 2014 8:54 PM, "Tom Arnfeld" <to...@duedil.com> wrote:
>>
>>> Hey Luyi,
>>>
>>> That's correct, the Hadoop framework currently only supports Hadoop 2
>>> MRv1. It also doesn't have great support for the HA jobtracker available in
>>> newer versions of Hadoop, but I've been working on that the past few weeks.
>>>
>>> I'm not sure how Hadoop 2 would play with Mesos, but very interested to
>>> find out more. Am I correct in thinking MRv2 will only run on top of YARN?
>>>
>>> I wonder if anyone else on the mailing list is running YARN on top of
>>> Mesos...
>>>
>>> Tom.
>>>
>>> On Friday, 25 July 2014, Luyi Wang <wa...@gmail.com> wrote:
>>>
>>>> Checked the mesos github(https://github.com/mesos/hadoop). It listed
>>>> support for MapReduce V1
>>>>
>>>> How about the MR V2?
>>>>
>>>> Right now we are using cloudera to manage hadoop clusters where uses
>>>> MRV2. We are planning to migrate all our services to mesos(still in the
>>>> initial investigating stage).  Good suggestions, advice and experiences are
>>>> welcomed.
>>>>
>>>> Thanks a lot!
>>>>
>>>>
>>>> -Luyi.
>>>>
>>>>
>>>>
>>>>

Re: Does Mesos support Hadoop MR V2

Posted by Tom Arnfeld <to...@duedil.com>.
I've not seen any issues pertaining to running many short lived frameworks,
but that's not near the number of frameworks you'd see if each job was a
framework.

We've been pushing all our work on MRv1 High Availability JT upstream on
the github.com/mesos/hadoop repo, though there hasn't been much to it.

There's some outstanding work in regards to framework failover that needs
to be done (there's an issue on GH for this, right now if the JT fails over
the mesos framework will re-register and all task trackers relaunched
meaning running jobs restart from the beginning) and a couple of small bugs
we've found in relation to memory limits that we haven't debugged.

Maxime, it'd be cool to hear more about how possible it would be to do port
the MRv2 framework equivalent to Mesos. I'm not very familiar with the
internals of YARN itself.

Tom.

On Friday, 25 July 2014, Maxime Brugidou <ma...@gmail.com> wrote:

> We run both mesos and yarn in prod and it does not make sense to run yarn
> over mesos.
>
> However it would be interesting to find a way to run MRv2 jobs on mesos
> with some custom layer to swap yarn with mesos. Not sure how to start
> though... MRv2 contains a yarn application master that needs to be
> rewritten as a mesos framework scheduler. This is probably doable. However
> with MRv2 every map reduce job would be mapped as a new framework in Mesos.
> Not sure how many frameworks mesos can run and scale up to. Especially
> short lived frameworks.
> On Jul 25, 2014 8:54 PM, "Tom Arnfeld" <tom@duedil.com
> <javascript:_e(%7B%7D,'cvml','tom@duedil.com');>> wrote:
>
>> Hey Luyi,
>>
>> That's correct, the Hadoop framework currently only supports Hadoop 2
>> MRv1. It also doesn't have great support for the HA jobtracker available in
>> newer versions of Hadoop, but I've been working on that the past few weeks.
>>
>> I'm not sure how Hadoop 2 would play with Mesos, but very interested to
>> find out more. Am I correct in thinking MRv2 will only run on top of YARN?
>>
>> I wonder if anyone else on the mailing list is running YARN on top of
>> Mesos...
>>
>> Tom.
>>
>> On Friday, 25 July 2014, Luyi Wang <wangluyi1982@gmail.com
>> <javascript:_e(%7B%7D,'cvml','wangluyi1982@gmail.com');>> wrote:
>>
>>> Checked the mesos github(https://github.com/mesos/hadoop). It listed
>>> support for MapReduce V1
>>>
>>> How about the MR V2?
>>>
>>> Right now we are using cloudera to manage hadoop clusters where uses
>>> MRV2. We are planning to migrate all our services to mesos(still in the
>>> initial investigating stage).  Good suggestions, advice and experiences are
>>> welcomed.
>>>
>>> Thanks a lot!
>>>
>>>
>>> -Luyi.
>>>
>>>
>>>
>>>

Re: Does Mesos support Hadoop MR V2

Posted by Maxime Brugidou <ma...@gmail.com>.
We run both mesos and yarn in prod and it does not make sense to run yarn
over mesos.

However it would be interesting to find a way to run MRv2 jobs on mesos
with some custom layer to swap yarn with mesos. Not sure how to start
though... MRv2 contains a yarn application master that needs to be
rewritten as a mesos framework scheduler. This is probably doable. However
with MRv2 every map reduce job would be mapped as a new framework in Mesos.
Not sure how many frameworks mesos can run and scale up to. Especially
short lived frameworks.
On Jul 25, 2014 8:54 PM, "Tom Arnfeld" <to...@duedil.com> wrote:

> Hey Luyi,
>
> That's correct, the Hadoop framework currently only supports Hadoop 2
> MRv1. It also doesn't have great support for the HA jobtracker available in
> newer versions of Hadoop, but I've been working on that the past few weeks.
>
> I'm not sure how Hadoop 2 would play with Mesos, but very interested to
> find out more. Am I correct in thinking MRv2 will only run on top of YARN?
>
> I wonder if anyone else on the mailing list is running YARN on top of
> Mesos...
>
> Tom.
>
> On Friday, 25 July 2014, Luyi Wang <wa...@gmail.com> wrote:
>
>> Checked the mesos github(https://github.com/mesos/hadoop). It listed
>> support for MapReduce V1
>>
>> How about the MR V2?
>>
>> Right now we are using cloudera to manage hadoop clusters where uses
>> MRV2. We are planning to migrate all our services to mesos(still in the
>> initial investigating stage).  Good suggestions, advice and experiences are
>> welcomed.
>>
>> Thanks a lot!
>>
>>
>> -Luyi.
>>
>>
>>
>>

Re: Does Mesos support Hadoop MR V2

Posted by Tom Arnfeld <to...@duedil.com>.
Hey Luyi,

That's correct, the Hadoop framework currently only supports Hadoop 2 MRv1.
It also doesn't have great support for the HA jobtracker available in newer
versions of Hadoop, but I've been working on that the past few weeks.

I'm not sure how Hadoop 2 would play with Mesos, but very interested to
find out more. Am I correct in thinking MRv2 will only run on top of YARN?

I wonder if anyone else on the mailing list is running YARN on top of
Mesos...

Tom.

On Friday, 25 July 2014, Luyi Wang <wa...@gmail.com> wrote:

> Checked the mesos github(https://github.com/mesos/hadoop). It listed
> support for MapReduce V1
>
> How about the MR V2?
>
> Right now we are using cloudera to manage hadoop clusters where uses MRV2.
> We are planning to migrate all our services to mesos(still in the initial
> investigating stage).  Good suggestions, advice and experiences are
> welcomed.
>
> Thanks a lot!
>
>
> -Luyi.
>
>
>
>