You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mesos.apache.org by Yaneeve Shekel <Ya...@sizmek.com> on 2014/10/28 13:21:46 UTC

Re: Does Mesos support Hadoop MR V2

To quote John below,

"So excuse my naivety... but...", I am also confused as to the version/naming convention going on at the hadoop project.

I would like to run hadoop over mesos as opposed to over yarn. I would also like to use the "new" mapreduce packages.

https://github.com/mesos/hadoop mentions that "The pom.xml included is configured and tested against CDH5 and MRv1. Hadoop on Mesos does not currently support YARN (and MRv2)."  Does this all mean that the mapreduce package is not available. I think it does not, I think I should be able to use the "new" api over any scheduling system just as I could over plain vanilla cdh (where I could configure and use any combination of the the cross product -> (mapred, mapreduce) X (MRv1, YARN)). Could anyone verify this?

Second, has any work been done as pertaining the original thread with regards to what John has suggested below?



Thanks a lot,

Yaneeve



On Jul 27, 2014 7:00 PM, "John Omernik" <j....@omernik.com>> wrote:



> So excuse my naivety in this space, but my ignorance has never really

> stopped me from asking questions:

>

> I see YARN (Yet another resource negotiator) as very similar to Mesos.

> I.e. something to manage resources on a cluster of machines. So when I hear

> talk of running "YARN" on Mesos it's seems very redundant indeed, and I ask

> myself, what are we actually getting out of this setup?

>

> So, going to the mapr/reduce question, I see Mapr Reduce V1 and MaprReduce

> V2 like this:  Map Reduce V2 is an application that runs on YARN. I.e. if

> you run a job, it creates an application master, that application master

> requests resources, and the job gets run.  It differs from Map Reduce V1 is

> there is no long running Job Tracker (other than the YARN Resource Manager,

> but that is managing resources for all applications, not just Map Reduce

> Applications).  Ok, so Mesos, why can't there be a Mesos Application that

> is similar to a Map Reduce V2 Application in YARN?  Why do we need to run

> YARN on Mesos? That doesn't really make sense.  Basically, for M/R V2 vs

> M/R V1, the only difference is to mimic M/R V1 we need task trackers and

> job trackers running as Mesos applications (which we have).  So in M/R v2,

> we just need the equivalent of an application master running on Yarn,

> requesting resources across the cluster.

>

> Fundamentally, YARN is confusing because I think they coupled running Map

> Reduce jobs with the resource manager and called it "Hadoop v2".  By

> coupling the two, people look at YARN as Map Reduce V2, but it's not

> really.  It's a way to running jobs on a cluster of machines (ala Mesos)

> with a "application" that is the equivalent of Map Reduce V1.   The names

> being given seem to be confusing to me, it makes people who have invested

> in Hadoop (Map Reduce V1) be very interested in YARN because it's called

> "Hadoop V2".  While Mesos is seen as the "Other"

>

>

> Just for my sake I summarized a TL;DR form so if someone wants to correct

> my understanding they can

>

> Mesos = Tool to manage resources

>

> YARN = Tool to manage resources it's also called Hadoopv2

>

> Map Reduce V1 = Job trackers/Task Trackers it's what we know. It can run

> on Hadoop clusters, and Mesos.  It's also called Hadoopv1

>

> Map Reduce V2 =  Application that can run on YARN that mimics Map Reduce

> V1 on a YARN Cluster. This + YARN has been called Hadoopv2.

>

>

>

>

>

>

>

>

>

>

>

>

>

>

>

>

>

>

> On Sun, Jul 27, 2014 at 4:10 AM, Maxime Brugidou <

> maxime.brugi...@gmail.com<ma...@gmail.com>> wrote:

>

>> When I said that running yarn over mesos did not make sense I meant that

>> running a resource manager in a resource manager was very sub-optimal. You

>> will eventually do static allocation of resources for the Yarn framework in

>> Mesos or have complex logic to determine how much resource should be given

>> to yarn. You will also have the same burden of managing 2 different

>> clusters instead of one, even if yarn is sort of hidden as mesos framework.

>>

>> However yes I believe its easier to run yarn on mesos than to run mrv2 on

>> top of mesos. The solution I was discussing was obviously "ideal" and I

>> looked at the MRAppMaster since and it discouraged me :)

>>  On Jul 27, 2014 12:41 AM, "Rick Richardson" <ri...@gmail.com>>

>> wrote:

>>

>>> FWIW I also think the fastest approach here is is porting Yarn onto

>>> Mesos.

>>>

>>> In a perfect world, writing an implementation layer for the Yarn

>>> Interface on Mesos would certainly be the optimal approach, but looking at

>>> the MRv2 code, it is very very coupled to many Yarn modules.

>>>

>>> If someone wanted to take on the project of making a generic resource

>>> scheduler Interface for MRv2, that works be amazing :)

>>> On Jul 26, 2014 6:19 PM, "Jie Yu" <yu...@gmail.com>> wrote:

>>>

>>>> I am interested in investigating the idea of YARN on top of Mesos. One

>>>> of the benefits I can think of is that we can get rid of the static

>>>> resource allocation between YARN and Mesos clusters. In that way, Mesos can

>>>> allocate those resources that are not used by YARN to other Mesos

>>>> frameworks like Aurora, Marathon, etc, to increase the resource utilization

>>>> of the entire data center. Also, we could avoid running each MRv2 job as a

>>>> framework which I think might cause some maintenance complexity (e.g. for

>>>> framework rate limiting, etc). Finally, YARN currently does not have a good

>>>> isolation support. It only supports cpu isolation right now (using

>>>> cgroups). By porting YARN on top of Mesos, we might be able to leverage the

>>>> existing Mesos containerizer strategy to provide better isolation between

>>>> tasks. Maxime, I am curious why do you think it does not make sense to run

>>>> YARN over Mesos? Since I am not super familar with YARN, I might be missing

>>>> something.

>>>>

>>>> I have been thinking of making ResourceManager in YARN a Mesos

>>>> framework and making NodeManager a Mesos executor. The NodeManager will

>>>> launch containers using primitives provided by Mesos so that we have a

>>>> consistent containerizer layer. I haven't fully figured out how this could

>>>> be done yet (e.g., nested containers, communication between NodeManager and

>>>> ResourceManager, etc.), but I would love to explore this direction. I would

>>>> like to hear about any feedback/suggestions you guys have about this

>>>> direction.

>>>>

>>>> Thanks,

>>>> - Jie

>>>>

>>>>

>>>> On Fri, Jul 25, 2014 at 1:39 PM, Maxime Brugidou <

>>>> maxime.brugi...@gmail.com<ma...@gmail.com>> wrote:

>>>>

>>>>> We run both mesos and yarn in prod and it does not make sense to run

>>>>> yarn over mesos.

>>>>>

>>>>> However it would be interesting to find a way to run MRv2 jobs on

>>>>> mesos with some custom layer to swap yarn with mesos. Not sure how to

>>>>> start

>>>>> though... MRv2 contains a yarn application master that needs to be

>>>>> rewritten as a mesos framework scheduler. This is probably doable. However

>>>>> with MRv2 every map reduce job would be mapped as a new framework in

>>>>> Mesos.

>>>>> Not sure how many frameworks mesos can run and scale up to. Especially

>>>>> short lived frameworks.

>>>>>  On Jul 25, 2014 8:54 PM, "Tom Arnfeld" <t....@duedil.com>> wrote:

>>>>>

>>>>>> Hey Luyi,

>>>>>>

>>>>>> That's correct, the Hadoop framework currently only supports Hadoop 2

>>>>>> MRv1. It also doesn't have great support for the HA jobtracker available

>>>>>> in

>>>>>> newer versions of Hadoop, but I've been working on that the past few

>>>>>> weeks.

>>>>>>

>>>>>> I'm not sure how Hadoop 2 would play with Mesos, but very interested

>>>>>> to find out more. Am I correct in thinking MRv2 will only run on top of

>>>>>> YARN?

>>>>>>

>>>>>> I wonder if anyone else on the mailing list is running YARN on top of

>>>>>> Mesos...

>>>>>>

>>>>>> Tom.

>>>>>>

>>>>>> On Friday, 25 July 2014, Luyi Wang <wa...@gmail.com>> wrote:

>>>>>>

>>>>>>> Checked the mesos github(https://github.com/mesos/hadoop). It

>>>>>>> listed support for MapReduce V1

>>>>>>>

>>>>>>> How about the MR V2?

>>>>>>>

>>>>>>> Right now we are using cloudera to manage hadoop clusters where uses

>>>>>>> MRV2. We are planning to migrate all our services to mesos(still in the

>>>>>>> initial investigating stage).  Good suggestions, advice and experiences

>>>>>>> are

>>>>>>> welcomed.

>>>>>>>

>>>>>>> Thanks a lot!

>>>>>>>

>>>>>>>

>>>>>>> -Luyi.

>>>>>>>

>>>>>>>

>>>>>>>

>>>>>>>

>>>>

>


Re: Does Mesos support Hadoop MR V2

Posted by Brenden Matthews <br...@airbedandbreakfast.com>.
Porting YARN to run atop Mesos is quite reasonable.  Some folks at eBay
have started some work on this (https://github.com/mesos/myriad).  If
you're interested, you should check it out, and contribute to the project.

On Tue, Oct 28, 2014 at 5:21 AM, Yaneeve Shekel <Ya...@sizmek.com>
wrote:

>  To quote John below,
>
> “So excuse my naivety… but…”, I am also confused as to the version/naming convention going on at the hadoop project.
>
> I would like to run hadoop over mesos as opposed to over yarn. I would also like to use the *“new”* mapreduce packages.
>
> https://github.com/mesos/hadoop mentions that “The pom.xml included is configured and tested against CDH5 and MRv1. Hadoop on Mesos does not currently support YARN (and MRv2).”  Does this all mean that the mapreduce package is not available. I think it does not, I think I should be able to use the “new” api over any scheduling system just as I could over plain vanilla cdh (where I could configure and use any combination of the the cross product -> (mapred, mapreduce) X (MRv1, YARN)). Could anyone verify this?
>
> Second, has any work been done as pertaining the original thread with regards to what John has suggested below?
>
>
>
> Thanks a lot,
>
> Yaneeve
>
>
>
> On Jul 27, 2014 7:00 PM, "John Omernik" <j....@omernik.com> wrote:
>
>
>
> > So excuse my naivety in this space, but my ignorance has never really
>
> > stopped me from asking questions:
>
> >
>
> > I see YARN (Yet another resource negotiator) as very similar to Mesos.
>
> > I.e. something to manage resources on a cluster of machines. So when I hear
>
> > talk of running "YARN" on Mesos it's seems very redundant indeed, and I ask
>
> > myself, what are we actually getting out of this setup?
>
> >
>
> > So, going to the mapr/reduce question, I see Mapr Reduce V1 and MaprReduce
>
> > V2 like this:  Map Reduce V2 is an application that runs on YARN. I.e. if
>
> > you run a job, it creates an application master, that application master
>
> > requests resources, and the job gets run.  It differs from Map Reduce V1 is
>
> > there is no long running Job Tracker (other than the YARN Resource Manager,
>
> > but that is managing resources for all applications, not just Map Reduce
>
> > Applications).  Ok, so Mesos, why can't there be a Mesos Application that
>
> > is similar to a Map Reduce V2 Application in YARN?  Why do we need to run
>
> > YARN on Mesos? That doesn't really make sense.  Basically, for M/R V2 vs
>
> > M/R V1, the only difference is to mimic M/R V1 we need task trackers and
>
> > job trackers running as Mesos applications (which we have).  So in M/R v2,
>
> > we just need the equivalent of an application master running on Yarn,
>
> > requesting resources across the cluster.
>
> >
>
> > Fundamentally, YARN is confusing because I think they coupled running Map
>
> > Reduce jobs with the resource manager and called it "Hadoop v2".  By
>
> > coupling the two, people look at YARN as Map Reduce V2, but it's not
>
> > really.  It's a way to running jobs on a cluster of machines (ala Mesos)
>
> > with a "application" that is the equivalent of Map Reduce V1.   The names
>
> > being given seem to be confusing to me, it makes people who have invested
>
> > in Hadoop (Map Reduce V1) be very interested in YARN because it's called
>
> > "Hadoop V2".  While Mesos is seen as the "Other"
>
> >
>
> >
>
> > Just for my sake I summarized a TL;DR form so if someone wants to correct
>
> > my understanding they can
>
> >
>
> > Mesos = Tool to manage resources
>
> >
>
> > YARN = Tool to manage resources it's also called Hadoopv2
>
> >
>
> > Map Reduce V1 = Job trackers/Task Trackers it's what we know. It can run
>
> > on Hadoop clusters, and Mesos.  It's also called Hadoopv1
>
> >
>
> > Map Reduce V2 =  Application that can run on YARN that mimics Map Reduce
>
> > V1 on a YARN Cluster. This + YARN has been called Hadoopv2.
>
> >
>
> >
>
> >
>
> >
>
> >
>
> >
>
> >
>
> >
>
> >
>
> >
>
> >
>
> >
>
> >
>
> >
>
> >
>
> >
>
> >
>
> >
>
> > On Sun, Jul 27, 2014 at 4:10 AM, Maxime Brugidou <
>
> > maxime.brugi...@gmail.com> wrote:
>
> >
>
> >> When I said that running yarn over mesos did not make sense I meant that
>
> >> running a resource manager in a resource manager was very sub-optimal. You
>
> >> will eventually do static allocation of resources for the Yarn framework in
>
> >> Mesos or have complex logic to determine how much resource should be given
>
> >> to yarn. You will also have the same burden of managing 2 different
>
> >> clusters instead of one, even if yarn is sort of hidden as mesos framework.
>
> >>
>
> >> However yes I believe its easier to run yarn on mesos than to run mrv2 on
>
> >> top of mesos. The solution I was discussing was obviously "ideal" and I
>
> >> looked at the MRAppMaster since and it discouraged me :)
>
> >>  On Jul 27, 2014 12:41 AM, "Rick Richardson" <ri...@gmail.com>
>
> >> wrote:
>
> >>
>
> >>> FWIW I also think the fastest approach here is is porting Yarn onto
>
> >>> Mesos.
>
> >>>
>
> >>> In a perfect world, writing an implementation layer for the Yarn
>
> >>> Interface on Mesos would certainly be the optimal approach, but looking at
>
> >>> the MRv2 code, it is very very coupled to many Yarn modules.
>
> >>>
>
> >>> If someone wanted to take on the project of making a generic resource
>
> >>> scheduler Interface for MRv2, that works be amazing :)
>
> >>> On Jul 26, 2014 6:19 PM, "Jie Yu" <yu...@gmail.com> wrote:
>
> >>>
>
> >>>> I am interested in investigating the idea of YARN on top of Mesos. One
>
> >>>> of the benefits I can think of is that we can get rid of the static
>
> >>>> resource allocation between YARN and Mesos clusters. In that way, Mesos can
>
> >>>> allocate those resources that are not used by YARN to other Mesos
>
> >>>> frameworks like Aurora, Marathon, etc, to increase the resource utilization
>
> >>>> of the entire data center. Also, we could avoid running each MRv2 job as a
>
> >>>> framework which I think might cause some maintenance complexity (e.g. for
>
> >>>> framework rate limiting, etc). Finally, YARN currently does not have a good
>
> >>>> isolation support. It only supports cpu isolation right now (using
>
> >>>> cgroups). By porting YARN on top of Mesos, we might be able to leverage the
>
> >>>> existing Mesos containerizer strategy to provide better isolation between
>
> >>>> tasks. Maxime, I am curious why do you think it does not make sense to run
>
> >>>> YARN over Mesos? Since I am not super familar with YARN, I might be missing
>
> >>>> something.
>
> >>>>
>
> >>>> I have been thinking of making ResourceManager in YARN a Mesos
>
> >>>> framework and making NodeManager a Mesos executor. The NodeManager will
>
> >>>> launch containers using primitives provided by Mesos so that we have a
>
> >>>> consistent containerizer layer. I haven't fully figured out how this could
>
> >>>> be done yet (e.g., nested containers, communication between NodeManager and
>
> >>>> ResourceManager, etc.), but I would love to explore this direction. I would
>
> >>>> like to hear about any feedback/suggestions you guys have about this
>
> >>>> direction.
>
> >>>>
>
> >>>> Thanks,
>
> >>>> - Jie
>
> >>>>
>
> >>>>
>
> >>>> On Fri, Jul 25, 2014 at 1:39 PM, Maxime Brugidou <
>
> >>>> maxime.brugi...@gmail.com> wrote:
>
> >>>>
>
> >>>>> We run both mesos and yarn in prod and it does not make sense to run
>
> >>>>> yarn over mesos.
>
> >>>>>
>
> >>>>> However it would be interesting to find a way to run MRv2 jobs on
>
> >>>>> mesos with some custom layer to swap yarn with mesos. Not sure how to
>
> >>>>> start
>
> >>>>> though... MRv2 contains a yarn application master that needs to be
>
> >>>>> rewritten as a mesos framework scheduler. This is probably doable. However
>
> >>>>> with MRv2 every map reduce job would be mapped as a new framework in
>
> >>>>> Mesos.
>
> >>>>> Not sure how many frameworks mesos can run and scale up to. Especially
>
> >>>>> short lived frameworks.
>
> >>>>>  On Jul 25, 2014 8:54 PM, "Tom Arnfeld" <t....@duedil.com> wrote:
>
> >>>>>
>
> >>>>>> Hey Luyi,
>
> >>>>>>
>
> >>>>>> That's correct, the Hadoop framework currently only supports Hadoop 2
>
> >>>>>> MRv1. It also doesn't have great support for the HA jobtracker available
>
> >>>>>> in
>
> >>>>>> newer versions of Hadoop, but I've been working on that the past few
>
> >>>>>> weeks.
>
> >>>>>>
>
> >>>>>> I'm not sure how Hadoop 2 would play with Mesos, but very interested
>
> >>>>>> to find out more. Am I correct in thinking MRv2 will only run on top of
>
> >>>>>> YARN?
>
> >>>>>>
>
> >>>>>> I wonder if anyone else on the mailing list is running YARN on top of
>
> >>>>>> Mesos...
>
> >>>>>>
>
> >>>>>> Tom.
>
> >>>>>>
>
> >>>>>> On Friday, 25 July 2014, Luyi Wang <wa...@gmail.com> wrote:
>
> >>>>>>
>
> >>>>>>> Checked the mesos github(https://github.com/mesos/hadoop). It
>
> >>>>>>> listed support for MapReduce V1
>
> >>>>>>>
>
> >>>>>>> How about the MR V2?
>
> >>>>>>>
>
> >>>>>>> Right now we are using cloudera to manage hadoop clusters where uses
>
> >>>>>>> MRV2. We are planning to migrate all our services to mesos(still in the
>
> >>>>>>> initial investigating stage).  Good suggestions, advice and experiences
>
> >>>>>>> are
>
> >>>>>>> welcomed.
>
> >>>>>>>
>
> >>>>>>> Thanks a lot!
>
> >>>>>>>
>
> >>>>>>>
>
> >>>>>>> -Luyi.
>
> >>>>>>>
>
> >>>>>>>
>
> >>>>>>>
>
> >>>>>>>
>
> >>>>
>
> >
>
>
>