You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@mesos.apache.org by "Edward J. Yoon" <ed...@apache.org> on 2011/07/01 02:16:21 UTC

Question about Mesos.

Hi,

I'm newbie, and wonder what's the main differences between Hadoop
nextGen and Mesos.

Thanks.
-- 
Best Regards, Edward J. Yoon
@eddieyoon

Fwd: Question about Mesos.

Posted by "Edward J. Yoon" <ed...@apache.org>.

Just FW.

---------- Forwarded message ----------
From: Ted Dunning <te...@gmail.com>
Date: Fri, Jul 1, 2011 at 10:21 AM
Subject: Re: Question about Mesos.
To: mesos-dev@incubator.apache.org

Technically speaking, Mesos has a less expressive model for expressing
resource requirements.  The thesis of Mesos is that the negotiation between
application and scheduler can make up for this missing information.  Mesos
was also first to "market", but Hadoop nextGen is catching up fast.  The
MR-279 has code that works, albeit with some issues in production use.  From
all reports, these issues are being resolved quickly as Yahoo's considerable
QA resources come to bear.

Politically speaking, Mesos has a nearly inactive mailing list which, to
outward appearances, indicate a nearly inactive project.  There is some
evidence that considerable activity is occurring off-list, but this is a
process bug in the Apache model since "if it doesn't happen on the list, it
doesn't happen".

On the other side, Hadoop nextGen has the Hadoop community pretty much
behind it.  Since HNG has the potential to breakdown some of the deadlocks
that have plagued the Hadoop community release process, there is
considerable enthusiasm for it.

Combined, these factors make it much more likely that HNG will be the
dominant force in the Hadoop world.  That is, more likely in my own
estimation.  Others may differ.

On Thu, Jun 30, 2011 at 5:16 PM, Edward J. Yoon <ed...@apache.org>wrote:

> Hi,
>
> I'm newbie, and wonder what's the main differences between Hadoop
> nextGen and Mesos.
>
> Thanks.
> --
> Best Regards, Edward J. Yoon
> @eddieyoon
>

-- 
Best Regards, Edward J. Yoon
@eddieyoon

Re: Question about Mesos.

Posted by Ted Dunning <te...@gmail.com>.

Come now Matei,

This should be phrased "we will see how big the problems are with that".

Remember, the universe really is against us.  Writing compatible API's is
just taunting Murphy and his angels.

(in case it isn't clear ... tongue is firmly in cheek here)

On Fri, Jul 1, 2011 at 10:07 AM, Matei Zaharia <ma...@eecs.berkeley.edu>wrote:

> ... We'll see if we run into any unforeseen problems with that.

Re: Question about Mesos.

Posted by Matei Zaharia <ma...@eecs.berkeley.edu>.

Yup, check out https://issues.apache.org/jira/browse/MAPREDUCE-279 for instructions on how to download it.

Matei

On Jul 5, 2011, at 9:39 PM, brisk wrote:

> Any one knows are the HadoopNextGen source codes available now? Thanks!
> 
> Yizheng
> 
> 2011/7/1 Matei Zaharia <ma...@eecs.berkeley.edu>
> 
>> That's a good question. Right now we were planning to provide a wrapper
>> that has the same API as the resource manager in HNG, so that not only
>> MapReduce but other apps written against that API will work. We'll see if we
>> run into any unforeseen problems with that.
>> 
>> Matei
>> 
>> On Jul 1, 2011, at 3:37 AM, Edward J. Yoon wrote:
>> 
>>> Here's another silly question.
>>> 
>>> Mesos plans to add HNG? or will be supported only pure Map/Reduce?
>>> 
>>> On Fri, Jul 1, 2011 at 2:15 PM, Ted Dunning <te...@gmail.com>
>> wrote:
>>>> Also, both projects are changing in terms of what they do and what they
>>>> intend to do.
>>>> 
>>>> For instance, support for long running processes and alternative
>> execution
>>>> models other than map-reduce is an explicit goal for Yarn.
>>>> 
>>>> This illustrates how hard it is for anybody to compare systems.
>> Typically,
>>>> any given person knows much more about one system than the other leading
>> to
>>>> many comparison points that are only half true (that half being the one
>> with
>>>> better information).  This isn't remediable without collaborative
>> discussion
>>>> between (differently) informed speakers.
>>>> 
>>>> 
>>>> On Thu, Jun 30, 2011 at 10:10 PM, Edward J. Yoon <edwardyoon@apache.org
>>> wrote:
>>>> 
>>>>> Understood.
>>>>> 
>>>>> On Fri, Jul 1, 2011 at 1:59 PM, Matei Zaharia <matei@eecs.berkeley.edu
>>> 
>>>>> wrote:
>>>>>> I wouldn't say it's designed for Yahoo! only, but it's definitely
>> meant
>>>>> to solve issues they saw with large Hadoop clusters (and provides a lot
>> of
>>>>> value for that).
>>>>>> 
>>>>>> Matei
>>>>>> 
>>>>>> On Jul 1, 2011, at 12:51 AM, Edward J. Yoon wrote:
>>>>>> 
>>>>>>> Hmm, HNG seems designed for their (Y!) own circumstance.
>>>>>>> 
>>>>>>> On Fri, Jul 1, 2011 at 12:47 PM, Matei Zaharia <
>> matei@eecs.berkeley.edu>
>>>>> wrote:
>>>>>>>> Ted brought up some superficial differences, but if you want to
>>>>> understand technical differences, there are a bunch of those as well.
>> Mesos
>>>>> and Hadoop next-gen have similar goals (more efficient resource sharing
>> for
>>>>> data centers), but they are coming at it from different angles -- HNG
>> is
>>>>> currently mainly focusing on MapReduce and aims to support other types
>> of
>>>>> applications too, while Mesos was meant to support a very diverse set
>> of
>>>>> applications, including long-running services and batch jobs (rather
>> than
>>>>> only multiple instances of MapReduce), and is in fact being used for
>> that
>>>>> already. More importantly, HNG is really two pieces -- a refactoring of
>>>>> MapReduce to allow one instance of MR per application, and a resource
>>>>> manager called YARN that lets these instances coordinate. We are going
>> to
>>>>> support having the new MR2 application masters run on top of Mesos
>> instead
>>>>> of YARN too (and indeed the refactoring is nice because it will enable
>>>>> Hadoop MapReduce to run on other cluster scheduling systems in the
>> future).
>>>>>>>> 
>>>>>>>> In terms of the technical differences, here are some of the main
>> ones
>>>>> currently:
>>>>>>>> 
>>>>>>>> - Mesos is implemented in C++ rather than Java, and has APIs in C++
>> and
>>>>> Python in addition to Java.
>>>>>>>> 
>>>>>>>> - The resource allocation models are different: HNG has a central
>>>>> scheduler that supports data locality constraints, while Mesos provides
>>>>> "resource offers" to let applications pick the resources they like
>> according
>>>>> to other criteria in addition to requests/filters to describe which
>>>>> resources you want to be offered. Our belief is that resource offers
>> will
>>>>> allow Mesos to support a wider range of application scheduling needs,
>> while
>>>>> simultaneously making the system more scalable and highly available
>>>>> (minimizing the state and work required of the master).
>>>>>>>> 
>>>>>>>> - Mesos can enforce resource isolation through Linux Containers to
>>>>> guard against misbehaving / greedy tasks.
>>>>>>>> 
>>>>>>>> - HNG supports Kerberos authentication for users.
>>>>>>>> 
>>>>>>>> - HNG can run the MR2 version of Hadoop, while Mesos can run Hadoop
>>>>> 0.20, Spark and MPI.
>>>>>>>> 
>>>>>>>> - There are some smaller architectural differences that may matter
>> for
>>>>> some applications, such as communication being based on message-passing
>> in
>>>>> Mesos vs periodic heartbeats in HNG, which allows Mesos to provide
>> lower
>>>>> scheduling latencies (e.g. to still be efficient if your tasks take
>> 100ms
>>>>> each).
>>>>>>>> 
>>>>>>>> However, overall, as Ted said, many of these differences will likely
>> go
>>>>> away as both projects add features. What will be interesting is whether
>> some
>>>>> fundamental differences in the target workloads remain, which I think
>> is
>>>>> likely to happen. For example, the main deployment of Mesos is
>> currently to
>>>>> run long-running stream processing services at Twitter, which is
>> something
>>>>> that typical Hadoop environments just don't do and that requires
>> different
>>>>> things from the cluster scheduler. I also believe we're going to see a
>> lot
>>>>> of other cluster scheduling systems besides Mesos and HNG in the
>> future, as
>>>>> people's requirements for these systems grow. There are some very
>>>>> challenging problems in designing a general cluster scheduling system
>> that
>>>>> even the Google folks are still working hard on.
>>>>>>>> 
>>>>>>>> Matei
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Jun 30, 2011, at 6:26 PM, Edward J. Yoon wrote:
>>>>>>>> 
>>>>>>>>> Thanks for your nice and quick explanation!
>>>>>>>>> 
>>>>>>>>> On Fri, Jul 1, 2011 at 10:21 AM, Ted Dunning <
>> ted.dunning@gmail.com>
>>>>> wrote:
>>>>>>>>>> Technically speaking, Mesos has a less expressive model for
>>>>> expressing
>>>>>>>>>> resource requirements.  The thesis of Mesos is that the
>> negotiation
>>>>> between
>>>>>>>>>> application and scheduler can make up for this missing
>> information.
>>>>> Mesos
>>>>>>>>>> was also first to "market", but Hadoop nextGen is catching up
>> fast.
>>>>> The
>>>>>>>>>> MR-279 has code that works, albeit with some issues in production
>>>>> use.  From
>>>>>>>>>> all reports, these issues are being resolved quickly as Yahoo's
>>>>> considerable
>>>>>>>>>> QA resources come to bear.
>>>>>>>>>> 
>>>>>>>>>> Politically speaking, Mesos has a nearly inactive mailing list
>> which,
>>>>> to
>>>>>>>>>> outward appearances, indicate a nearly inactive project.  There is
>>>>> some
>>>>>>>>>> evidence that considerable activity is occurring off-list, but
>> this
>>>>> is a
>>>>>>>>>> process bug in the Apache model since "if it doesn't happen on the
>>>>> list, it
>>>>>>>>>> doesn't happen".
>>>>>>>>>> 
>>>>>>>>>> On the other side, Hadoop nextGen has the Hadoop community pretty
>>>>> much
>>>>>>>>>> behind it.  Since HNG has the potential to breakdown some of the
>>>>> deadlocks
>>>>>>>>>> that have plagued the Hadoop community release process, there is
>>>>>>>>>> considerable enthusiasm for it.
>>>>>>>>>> 
>>>>>>>>>> Combined, these factors make it much more likely that HNG will be
>> the
>>>>>>>>>> dominant force in the Hadoop world.  That is, more likely in my
>> own
>>>>>>>>>> estimation.  Others may differ.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> On Thu, Jun 30, 2011 at 5:16 PM, Edward J. Yoon <
>>>>> edwardyoon@apache.org>wrote:
>>>>>>>>>> 
>>>>>>>>>>> Hi,
>>>>>>>>>>> 
>>>>>>>>>>> I'm newbie, and wonder what's the main differences between Hadoop
>>>>>>>>>>> nextGen and Mesos.
>>>>>>>>>>> 
>>>>>>>>>>> Thanks.
>>>>>>>>>>> --
>>>>>>>>>>> Best Regards, Edward J. Yoon
>>>>>>>>>>> @eddieyoon
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> --
>>>>>>>>> Best Regards, Edward J. Yoon
>>>>>>>>> @eddieyoon
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> --
>>>>>>> Best Regards, Edward J. Yoon
>>>>>>> @eddieyoon
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> Best Regards, Edward J. Yoon
>>>>> @eddieyoon
>>>>> 
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> Best Regards, Edward J. Yoon
>>> @eddieyoon
>> 
>>

Re: Question about Mesos.

Posted by brisk <my...@gmail.com>.

Any one knows are the HadoopNextGen source codes available now? Thanks!

Yizheng

2011/7/1 Matei Zaharia <ma...@eecs.berkeley.edu>

> That's a good question. Right now we were planning to provide a wrapper
> that has the same API as the resource manager in HNG, so that not only
> MapReduce but other apps written against that API will work. We'll see if we
> run into any unforeseen problems with that.
>
> Matei
>
> On Jul 1, 2011, at 3:37 AM, Edward J. Yoon wrote:
>
> > Here's another silly question.
> >
> > Mesos plans to add HNG? or will be supported only pure Map/Reduce?
> >
> > On Fri, Jul 1, 2011 at 2:15 PM, Ted Dunning <te...@gmail.com>
> wrote:
> >> Also, both projects are changing in terms of what they do and what they
> >> intend to do.
> >>
> >> For instance, support for long running processes and alternative
> execution
> >> models other than map-reduce is an explicit goal for Yarn.
> >>
> >> This illustrates how hard it is for anybody to compare systems.
>  Typically,
> >> any given person knows much more about one system than the other leading
> to
> >> many comparison points that are only half true (that half being the one
> with
> >> better information).  This isn't remediable without collaborative
> discussion
> >> between (differently) informed speakers.
> >>
> >>
> >> On Thu, Jun 30, 2011 at 10:10 PM, Edward J. Yoon <edwardyoon@apache.org
> >wrote:
> >>
> >>> Understood.
> >>>
> >>> On Fri, Jul 1, 2011 at 1:59 PM, Matei Zaharia <matei@eecs.berkeley.edu
> >
> >>> wrote:
> >>>> I wouldn't say it's designed for Yahoo! only, but it's definitely
> meant
> >>> to solve issues they saw with large Hadoop clusters (and provides a lot
> of
> >>> value for that).
> >>>>
> >>>> Matei
> >>>>
> >>>> On Jul 1, 2011, at 12:51 AM, Edward J. Yoon wrote:
> >>>>
> >>>>> Hmm, HNG seems designed for their (Y!) own circumstance.
> >>>>>
> >>>>> On Fri, Jul 1, 2011 at 12:47 PM, Matei Zaharia <
> matei@eecs.berkeley.edu>
> >>> wrote:
> >>>>>> Ted brought up some superficial differences, but if you want to
> >>> understand technical differences, there are a bunch of those as well.
> Mesos
> >>> and Hadoop next-gen have similar goals (more efficient resource sharing
> for
> >>> data centers), but they are coming at it from different angles -- HNG
> is
> >>> currently mainly focusing on MapReduce and aims to support other types
> of
> >>> applications too, while Mesos was meant to support a very diverse set
> of
> >>> applications, including long-running services and batch jobs (rather
> than
> >>> only multiple instances of MapReduce), and is in fact being used for
> that
> >>> already. More importantly, HNG is really two pieces -- a refactoring of
> >>> MapReduce to allow one instance of MR per application, and a resource
> >>> manager called YARN that lets these instances coordinate. We are going
> to
> >>> support having the new MR2 application masters run on top of Mesos
> instead
> >>> of YARN too (and indeed the refactoring is nice because it will enable
> >>> Hadoop MapReduce to run on other cluster scheduling systems in the
> future).
> >>>>>>
> >>>>>> In terms of the technical differences, here are some of the main
> ones
> >>> currently:
> >>>>>>
> >>>>>> - Mesos is implemented in C++ rather than Java, and has APIs in C++
> and
> >>> Python in addition to Java.
> >>>>>>
> >>>>>> - The resource allocation models are different: HNG has a central
> >>> scheduler that supports data locality constraints, while Mesos provides
> >>> "resource offers" to let applications pick the resources they like
> according
> >>> to other criteria in addition to requests/filters to describe which
> >>> resources you want to be offered. Our belief is that resource offers
> will
> >>> allow Mesos to support a wider range of application scheduling needs,
> while
> >>> simultaneously making the system more scalable and highly available
> >>> (minimizing the state and work required of the master).
> >>>>>>
> >>>>>> - Mesos can enforce resource isolation through Linux Containers to
> >>> guard against misbehaving / greedy tasks.
> >>>>>>
> >>>>>> - HNG supports Kerberos authentication for users.
> >>>>>>
> >>>>>> - HNG can run the MR2 version of Hadoop, while Mesos can run Hadoop
> >>> 0.20, Spark and MPI.
> >>>>>>
> >>>>>> - There are some smaller architectural differences that may matter
> for
> >>> some applications, such as communication being based on message-passing
> in
> >>> Mesos vs periodic heartbeats in HNG, which allows Mesos to provide
> lower
> >>> scheduling latencies (e.g. to still be efficient if your tasks take
> 100ms
> >>> each).
> >>>>>>
> >>>>>> However, overall, as Ted said, many of these differences will likely
> go
> >>> away as both projects add features. What will be interesting is whether
> some
> >>> fundamental differences in the target workloads remain, which I think
> is
> >>> likely to happen. For example, the main deployment of Mesos is
> currently to
> >>> run long-running stream processing services at Twitter, which is
> something
> >>> that typical Hadoop environments just don't do and that requires
> different
> >>> things from the cluster scheduler. I also believe we're going to see a
> lot
> >>> of other cluster scheduling systems besides Mesos and HNG in the
> future, as
> >>> people's requirements for these systems grow. There are some very
> >>> challenging problems in designing a general cluster scheduling system
> that
> >>> even the Google folks are still working hard on.
> >>>>>>
> >>>>>> Matei
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On Jun 30, 2011, at 6:26 PM, Edward J. Yoon wrote:
> >>>>>>
> >>>>>>> Thanks for your nice and quick explanation!
> >>>>>>>
> >>>>>>> On Fri, Jul 1, 2011 at 10:21 AM, Ted Dunning <
> ted.dunning@gmail.com>
> >>> wrote:
> >>>>>>>> Technically speaking, Mesos has a less expressive model for
> >>> expressing
> >>>>>>>> resource requirements.  The thesis of Mesos is that the
> negotiation
> >>> between
> >>>>>>>> application and scheduler can make up for this missing
> information.
> >>>  Mesos
> >>>>>>>> was also first to "market", but Hadoop nextGen is catching up
> fast.
> >>>  The
> >>>>>>>> MR-279 has code that works, albeit with some issues in production
> >>> use.  From
> >>>>>>>> all reports, these issues are being resolved quickly as Yahoo's
> >>> considerable
> >>>>>>>> QA resources come to bear.
> >>>>>>>>
> >>>>>>>> Politically speaking, Mesos has a nearly inactive mailing list
> which,
> >>> to
> >>>>>>>> outward appearances, indicate a nearly inactive project.  There is
> >>> some
> >>>>>>>> evidence that considerable activity is occurring off-list, but
> this
> >>> is a
> >>>>>>>> process bug in the Apache model since "if it doesn't happen on the
> >>> list, it
> >>>>>>>> doesn't happen".
> >>>>>>>>
> >>>>>>>> On the other side, Hadoop nextGen has the Hadoop community pretty
> >>> much
> >>>>>>>> behind it.  Since HNG has the potential to breakdown some of the
> >>> deadlocks
> >>>>>>>> that have plagued the Hadoop community release process, there is
> >>>>>>>> considerable enthusiasm for it.
> >>>>>>>>
> >>>>>>>> Combined, these factors make it much more likely that HNG will be
> the
> >>>>>>>> dominant force in the Hadoop world.  That is, more likely in my
> own
> >>>>>>>> estimation.  Others may differ.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Thu, Jun 30, 2011 at 5:16 PM, Edward J. Yoon <
> >>> edwardyoon@apache.org>wrote:
> >>>>>>>>
> >>>>>>>>> Hi,
> >>>>>>>>>
> >>>>>>>>> I'm newbie, and wonder what's the main differences between Hadoop
> >>>>>>>>> nextGen and Mesos.
> >>>>>>>>>
> >>>>>>>>> Thanks.
> >>>>>>>>> --
> >>>>>>>>> Best Regards, Edward J. Yoon
> >>>>>>>>> @eddieyoon
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> --
> >>>>>>> Best Regards, Edward J. Yoon
> >>>>>>> @eddieyoon
> >>>>>>
> >>>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> Best Regards, Edward J. Yoon
> >>>>> @eddieyoon
> >>>>
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>> Best Regards, Edward J. Yoon
> >>> @eddieyoon
> >>>
> >>
> >
> >
> >
> > --
> > Best Regards, Edward J. Yoon
> > @eddieyoon
>
>

Re: Question about Mesos.

Posted by Ted Dunning <te...@gmail.com>.

This, btw, will be an excellent opportunity to answer key questions about
how informed the resource manager needs to be.  Mesos is making a bold claim
that negotiation is sufficient to convey the information and that by not
conveying resource details that the resource manager is more general.  The
latter claim is certainly true, but the former doesn't have many years of
use in ginormous clusters to prove it yet (no approach to this problem does
have that, of course).

HNG is making a much less exciting claim that more information really will
make the scheduler better.  I think it is uncontroversial that this is
likely to be at least epsilon better for the work-loads that HNG is designed
for.  As such, the HNG design point is a safer one for people with that
work-load.

Having the ability to observe and compare Mesos and HNG clusters on nearly
identical work-loads is one way to get closer to real answers to these
questions.  The comparisons will necessarily be indirect because no single
owner of a thousand node cluster in heavy use is going to set up a second
cluster just for the benefit of an experiment, but even indirect information
will be exciting.

Another option with strange implications is the idea of running HNG under
Mesos as a long-lived application or vice versa.  I am not sure what that
would teach us, but thinking about it can be fun.

On Fri, Jul 1, 2011 at 10:07 AM, Matei Zaharia <ma...@eecs.berkeley.edu>wrote:

> we were planning to provide a wrapper that has the same API as the resource
> manager in HNG

Re: Question about Mesos.

Posted by Matei Zaharia <ma...@eecs.berkeley.edu>.

That's a good question. Right now we were planning to provide a wrapper that has the same API as the resource manager in HNG, so that not only MapReduce but other apps written against that API will work. We'll see if we run into any unforeseen problems with that.

Matei

On Jul 1, 2011, at 3:37 AM, Edward J. Yoon wrote:

> Here's another silly question.
> 
> Mesos plans to add HNG? or will be supported only pure Map/Reduce?
> 
> On Fri, Jul 1, 2011 at 2:15 PM, Ted Dunning <te...@gmail.com> wrote:
>> Also, both projects are changing in terms of what they do and what they
>> intend to do.
>> 
>> For instance, support for long running processes and alternative execution
>> models other than map-reduce is an explicit goal for Yarn.
>> 
>> This illustrates how hard it is for anybody to compare systems.  Typically,
>> any given person knows much more about one system than the other leading to
>> many comparison points that are only half true (that half being the one with
>> better information).  This isn't remediable without collaborative discussion
>> between (differently) informed speakers.
>> 
>> 
>> On Thu, Jun 30, 2011 at 10:10 PM, Edward J. Yoon <ed...@apache.org>wrote:
>> 
>>> Understood.
>>> 
>>> On Fri, Jul 1, 2011 at 1:59 PM, Matei Zaharia <ma...@eecs.berkeley.edu>
>>> wrote:
>>>> I wouldn't say it's designed for Yahoo! only, but it's definitely meant
>>> to solve issues they saw with large Hadoop clusters (and provides a lot of
>>> value for that).
>>>> 
>>>> Matei
>>>> 
>>>> On Jul 1, 2011, at 12:51 AM, Edward J. Yoon wrote:
>>>> 
>>>>> Hmm, HNG seems designed for their (Y!) own circumstance.
>>>>> 
>>>>> On Fri, Jul 1, 2011 at 12:47 PM, Matei Zaharia <ma...@eecs.berkeley.edu>
>>> wrote:
>>>>>> Ted brought up some superficial differences, but if you want to
>>> understand technical differences, there are a bunch of those as well. Mesos
>>> and Hadoop next-gen have similar goals (more efficient resource sharing for
>>> data centers), but they are coming at it from different angles -- HNG is
>>> currently mainly focusing on MapReduce and aims to support other types of
>>> applications too, while Mesos was meant to support a very diverse set of
>>> applications, including long-running services and batch jobs (rather than
>>> only multiple instances of MapReduce), and is in fact being used for that
>>> already. More importantly, HNG is really two pieces -- a refactoring of
>>> MapReduce to allow one instance of MR per application, and a resource
>>> manager called YARN that lets these instances coordinate. We are going to
>>> support having the new MR2 application masters run on top of Mesos instead
>>> of YARN too (and indeed the refactoring is nice because it will enable
>>> Hadoop MapReduce to run on other cluster scheduling systems in the future).
>>>>>> 
>>>>>> In terms of the technical differences, here are some of the main ones
>>> currently:
>>>>>> 
>>>>>> - Mesos is implemented in C++ rather than Java, and has APIs in C++ and
>>> Python in addition to Java.
>>>>>> 
>>>>>> - The resource allocation models are different: HNG has a central
>>> scheduler that supports data locality constraints, while Mesos provides
>>> "resource offers" to let applications pick the resources they like according
>>> to other criteria in addition to requests/filters to describe which
>>> resources you want to be offered. Our belief is that resource offers will
>>> allow Mesos to support a wider range of application scheduling needs, while
>>> simultaneously making the system more scalable and highly available
>>> (minimizing the state and work required of the master).
>>>>>> 
>>>>>> - Mesos can enforce resource isolation through Linux Containers to
>>> guard against misbehaving / greedy tasks.
>>>>>> 
>>>>>> - HNG supports Kerberos authentication for users.
>>>>>> 
>>>>>> - HNG can run the MR2 version of Hadoop, while Mesos can run Hadoop
>>> 0.20, Spark and MPI.
>>>>>> 
>>>>>> - There are some smaller architectural differences that may matter for
>>> some applications, such as communication being based on message-passing in
>>> Mesos vs periodic heartbeats in HNG, which allows Mesos to provide lower
>>> scheduling latencies (e.g. to still be efficient if your tasks take 100ms
>>> each).
>>>>>> 
>>>>>> However, overall, as Ted said, many of these differences will likely go
>>> away as both projects add features. What will be interesting is whether some
>>> fundamental differences in the target workloads remain, which I think is
>>> likely to happen. For example, the main deployment of Mesos is currently to
>>> run long-running stream processing services at Twitter, which is something
>>> that typical Hadoop environments just don't do and that requires different
>>> things from the cluster scheduler. I also believe we're going to see a lot
>>> of other cluster scheduling systems besides Mesos and HNG in the future, as
>>> people's requirements for these systems grow. There are some very
>>> challenging problems in designing a general cluster scheduling system that
>>> even the Google folks are still working hard on.
>>>>>> 
>>>>>> Matei
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Jun 30, 2011, at 6:26 PM, Edward J. Yoon wrote:
>>>>>> 
>>>>>>> Thanks for your nice and quick explanation!
>>>>>>> 
>>>>>>> On Fri, Jul 1, 2011 at 10:21 AM, Ted Dunning <te...@gmail.com>
>>> wrote:
>>>>>>>> Technically speaking, Mesos has a less expressive model for
>>> expressing
>>>>>>>> resource requirements.  The thesis of Mesos is that the negotiation
>>> between
>>>>>>>> application and scheduler can make up for this missing information.
>>>  Mesos
>>>>>>>> was also first to "market", but Hadoop nextGen is catching up fast.
>>>  The
>>>>>>>> MR-279 has code that works, albeit with some issues in production
>>> use.  From
>>>>>>>> all reports, these issues are being resolved quickly as Yahoo's
>>> considerable
>>>>>>>> QA resources come to bear.
>>>>>>>> 
>>>>>>>> Politically speaking, Mesos has a nearly inactive mailing list which,
>>> to
>>>>>>>> outward appearances, indicate a nearly inactive project.  There is
>>> some
>>>>>>>> evidence that considerable activity is occurring off-list, but this
>>> is a
>>>>>>>> process bug in the Apache model since "if it doesn't happen on the
>>> list, it
>>>>>>>> doesn't happen".
>>>>>>>> 
>>>>>>>> On the other side, Hadoop nextGen has the Hadoop community pretty
>>> much
>>>>>>>> behind it.  Since HNG has the potential to breakdown some of the
>>> deadlocks
>>>>>>>> that have plagued the Hadoop community release process, there is
>>>>>>>> considerable enthusiasm for it.
>>>>>>>> 
>>>>>>>> Combined, these factors make it much more likely that HNG will be the
>>>>>>>> dominant force in the Hadoop world.  That is, more likely in my own
>>>>>>>> estimation.  Others may differ.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Thu, Jun 30, 2011 at 5:16 PM, Edward J. Yoon <
>>> edwardyoon@apache.org>wrote:
>>>>>>>> 
>>>>>>>>> Hi,
>>>>>>>>> 
>>>>>>>>> I'm newbie, and wonder what's the main differences between Hadoop
>>>>>>>>> nextGen and Mesos.
>>>>>>>>> 
>>>>>>>>> Thanks.
>>>>>>>>> --
>>>>>>>>> Best Regards, Edward J. Yoon
>>>>>>>>> @eddieyoon
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> --
>>>>>>> Best Regards, Edward J. Yoon
>>>>>>> @eddieyoon
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> Best Regards, Edward J. Yoon
>>>>> @eddieyoon
>>>> 
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> Best Regards, Edward J. Yoon
>>> @eddieyoon
>>> 
>> 
> 
> 
> 
> -- 
> Best Regards, Edward J. Yoon
> @eddieyoon

Re: Question about Mesos.

Posted by "Edward J. Yoon" <ed...@apache.org>.

Here's another silly question.

Mesos plans to add HNG? or will be supported only pure Map/Reduce?

On Fri, Jul 1, 2011 at 2:15 PM, Ted Dunning <te...@gmail.com> wrote:
> Also, both projects are changing in terms of what they do and what they
> intend to do.
>
> For instance, support for long running processes and alternative execution
> models other than map-reduce is an explicit goal for Yarn.
>
> This illustrates how hard it is for anybody to compare systems.  Typically,
> any given person knows much more about one system than the other leading to
> many comparison points that are only half true (that half being the one with
> better information).  This isn't remediable without collaborative discussion
> between (differently) informed speakers.
>
>
> On Thu, Jun 30, 2011 at 10:10 PM, Edward J. Yoon <ed...@apache.org>wrote:
>
>> Understood.
>>
>> On Fri, Jul 1, 2011 at 1:59 PM, Matei Zaharia <ma...@eecs.berkeley.edu>
>> wrote:
>> > I wouldn't say it's designed for Yahoo! only, but it's definitely meant
>> to solve issues they saw with large Hadoop clusters (and provides a lot of
>> value for that).
>> >
>> > Matei
>> >
>> > On Jul 1, 2011, at 12:51 AM, Edward J. Yoon wrote:
>> >
>> >> Hmm, HNG seems designed for their (Y!) own circumstance.
>> >>
>> >> On Fri, Jul 1, 2011 at 12:47 PM, Matei Zaharia <ma...@eecs.berkeley.edu>
>> wrote:
>> >>> Ted brought up some superficial differences, but if you want to
>> understand technical differences, there are a bunch of those as well. Mesos
>> and Hadoop next-gen have similar goals (more efficient resource sharing for
>> data centers), but they are coming at it from different angles -- HNG is
>> currently mainly focusing on MapReduce and aims to support other types of
>> applications too, while Mesos was meant to support a very diverse set of
>> applications, including long-running services and batch jobs (rather than
>> only multiple instances of MapReduce), and is in fact being used for that
>> already. More importantly, HNG is really two pieces -- a refactoring of
>> MapReduce to allow one instance of MR per application, and a resource
>> manager called YARN that lets these instances coordinate. We are going to
>> support having the new MR2 application masters run on top of Mesos instead
>> of YARN too (and indeed the refactoring is nice because it will enable
>> Hadoop MapReduce to run on other cluster scheduling systems in the future).
>> >>>
>> >>> In terms of the technical differences, here are some of the main ones
>> currently:
>> >>>
>> >>> - Mesos is implemented in C++ rather than Java, and has APIs in C++ and
>> Python in addition to Java.
>> >>>
>> >>> - The resource allocation models are different: HNG has a central
>> scheduler that supports data locality constraints, while Mesos provides
>> "resource offers" to let applications pick the resources they like according
>> to other criteria in addition to requests/filters to describe which
>> resources you want to be offered. Our belief is that resource offers will
>> allow Mesos to support a wider range of application scheduling needs, while
>> simultaneously making the system more scalable and highly available
>> (minimizing the state and work required of the master).
>> >>>
>> >>> - Mesos can enforce resource isolation through Linux Containers to
>> guard against misbehaving / greedy tasks.
>> >>>
>> >>> - HNG supports Kerberos authentication for users.
>> >>>
>> >>> - HNG can run the MR2 version of Hadoop, while Mesos can run Hadoop
>> 0.20, Spark and MPI.
>> >>>
>> >>> - There are some smaller architectural differences that may matter for
>> some applications, such as communication being based on message-passing in
>> Mesos vs periodic heartbeats in HNG, which allows Mesos to provide lower
>> scheduling latencies (e.g. to still be efficient if your tasks take 100ms
>> each).
>> >>>
>> >>> However, overall, as Ted said, many of these differences will likely go
>> away as both projects add features. What will be interesting is whether some
>> fundamental differences in the target workloads remain, which I think is
>> likely to happen. For example, the main deployment of Mesos is currently to
>> run long-running stream processing services at Twitter, which is something
>> that typical Hadoop environments just don't do and that requires different
>> things from the cluster scheduler. I also believe we're going to see a lot
>> of other cluster scheduling systems besides Mesos and HNG in the future, as
>> people's requirements for these systems grow. There are some very
>> challenging problems in designing a general cluster scheduling system that
>> even the Google folks are still working hard on.
>> >>>
>> >>> Matei
>> >>>
>> >>>
>> >>>
>> >>> On Jun 30, 2011, at 6:26 PM, Edward J. Yoon wrote:
>> >>>
>> >>>> Thanks for your nice and quick explanation!
>> >>>>
>> >>>> On Fri, Jul 1, 2011 at 10:21 AM, Ted Dunning <te...@gmail.com>
>> wrote:
>> >>>>> Technically speaking, Mesos has a less expressive model for
>> expressing
>> >>>>> resource requirements.  The thesis of Mesos is that the negotiation
>> between
>> >>>>> application and scheduler can make up for this missing information.
>>  Mesos
>> >>>>> was also first to "market", but Hadoop nextGen is catching up fast.
>>  The
>> >>>>> MR-279 has code that works, albeit with some issues in production
>> use.  From
>> >>>>> all reports, these issues are being resolved quickly as Yahoo's
>> considerable
>> >>>>> QA resources come to bear.
>> >>>>>
>> >>>>> Politically speaking, Mesos has a nearly inactive mailing list which,
>> to
>> >>>>> outward appearances, indicate a nearly inactive project.  There is
>> some
>> >>>>> evidence that considerable activity is occurring off-list, but this
>> is a
>> >>>>> process bug in the Apache model since "if it doesn't happen on the
>> list, it
>> >>>>> doesn't happen".
>> >>>>>
>> >>>>> On the other side, Hadoop nextGen has the Hadoop community pretty
>> much
>> >>>>> behind it.  Since HNG has the potential to breakdown some of the
>> deadlocks
>> >>>>> that have plagued the Hadoop community release process, there is
>> >>>>> considerable enthusiasm for it.
>> >>>>>
>> >>>>> Combined, these factors make it much more likely that HNG will be the
>> >>>>> dominant force in the Hadoop world.  That is, more likely in my own
>> >>>>> estimation.  Others may differ.
>> >>>>>
>> >>>>>
>> >>>>> On Thu, Jun 30, 2011 at 5:16 PM, Edward J. Yoon <
>> edwardyoon@apache.org>wrote:
>> >>>>>
>> >>>>>> Hi,
>> >>>>>>
>> >>>>>> I'm newbie, and wonder what's the main differences between Hadoop
>> >>>>>> nextGen and Mesos.
>> >>>>>>
>> >>>>>> Thanks.
>> >>>>>> --
>> >>>>>> Best Regards, Edward J. Yoon
>> >>>>>> @eddieyoon
>> >>>>>>
>> >>>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>> --
>> >>>> Best Regards, Edward J. Yoon
>> >>>> @eddieyoon
>> >>>
>> >>>
>> >>
>> >>
>> >>
>> >> --
>> >> Best Regards, Edward J. Yoon
>> >> @eddieyoon
>> >
>> >
>>
>>
>>
>> --
>> Best Regards, Edward J. Yoon
>> @eddieyoon
>>
>



-- 
Best Regards, Edward J. Yoon
@eddieyoon

Re: Question about Mesos.

Posted by Ted Dunning <te...@gmail.com>.

Also, both projects are changing in terms of what they do and what they
intend to do.

For instance, support for long running processes and alternative execution
models other than map-reduce is an explicit goal for Yarn.

This illustrates how hard it is for anybody to compare systems.  Typically,
any given person knows much more about one system than the other leading to
many comparison points that are only half true (that half being the one with
better information).  This isn't remediable without collaborative discussion
between (differently) informed speakers.


On Thu, Jun 30, 2011 at 10:10 PM, Edward J. Yoon <ed...@apache.org>wrote:

> Understood.
>
> On Fri, Jul 1, 2011 at 1:59 PM, Matei Zaharia <ma...@eecs.berkeley.edu>
> wrote:
> > I wouldn't say it's designed for Yahoo! only, but it's definitely meant
> to solve issues they saw with large Hadoop clusters (and provides a lot of
> value for that).
> >
> > Matei
> >
> > On Jul 1, 2011, at 12:51 AM, Edward J. Yoon wrote:
> >
> >> Hmm, HNG seems designed for their (Y!) own circumstance.
> >>
> >> On Fri, Jul 1, 2011 at 12:47 PM, Matei Zaharia <ma...@eecs.berkeley.edu>
> wrote:
> >>> Ted brought up some superficial differences, but if you want to
> understand technical differences, there are a bunch of those as well. Mesos
> and Hadoop next-gen have similar goals (more efficient resource sharing for
> data centers), but they are coming at it from different angles -- HNG is
> currently mainly focusing on MapReduce and aims to support other types of
> applications too, while Mesos was meant to support a very diverse set of
> applications, including long-running services and batch jobs (rather than
> only multiple instances of MapReduce), and is in fact being used for that
> already. More importantly, HNG is really two pieces -- a refactoring of
> MapReduce to allow one instance of MR per application, and a resource
> manager called YARN that lets these instances coordinate. We are going to
> support having the new MR2 application masters run on top of Mesos instead
> of YARN too (and indeed the refactoring is nice because it will enable
> Hadoop MapReduce to run on other cluster scheduling systems in the future).
> >>>
> >>> In terms of the technical differences, here are some of the main ones
> currently:
> >>>
> >>> - Mesos is implemented in C++ rather than Java, and has APIs in C++ and
> Python in addition to Java.
> >>>
> >>> - The resource allocation models are different: HNG has a central
> scheduler that supports data locality constraints, while Mesos provides
> "resource offers" to let applications pick the resources they like according
> to other criteria in addition to requests/filters to describe which
> resources you want to be offered. Our belief is that resource offers will
> allow Mesos to support a wider range of application scheduling needs, while
> simultaneously making the system more scalable and highly available
> (minimizing the state and work required of the master).
> >>>
> >>> - Mesos can enforce resource isolation through Linux Containers to
> guard against misbehaving / greedy tasks.
> >>>
> >>> - HNG supports Kerberos authentication for users.
> >>>
> >>> - HNG can run the MR2 version of Hadoop, while Mesos can run Hadoop
> 0.20, Spark and MPI.
> >>>
> >>> - There are some smaller architectural differences that may matter for
> some applications, such as communication being based on message-passing in
> Mesos vs periodic heartbeats in HNG, which allows Mesos to provide lower
> scheduling latencies (e.g. to still be efficient if your tasks take 100ms
> each).
> >>>
> >>> However, overall, as Ted said, many of these differences will likely go
> away as both projects add features. What will be interesting is whether some
> fundamental differences in the target workloads remain, which I think is
> likely to happen. For example, the main deployment of Mesos is currently to
> run long-running stream processing services at Twitter, which is something
> that typical Hadoop environments just don't do and that requires different
> things from the cluster scheduler. I also believe we're going to see a lot
> of other cluster scheduling systems besides Mesos and HNG in the future, as
> people's requirements for these systems grow. There are some very
> challenging problems in designing a general cluster scheduling system that
> even the Google folks are still working hard on.
> >>>
> >>> Matei
> >>>
> >>>
> >>>
> >>> On Jun 30, 2011, at 6:26 PM, Edward J. Yoon wrote:
> >>>
> >>>> Thanks for your nice and quick explanation!
> >>>>
> >>>> On Fri, Jul 1, 2011 at 10:21 AM, Ted Dunning <te...@gmail.com>
> wrote:
> >>>>> Technically speaking, Mesos has a less expressive model for
> expressing
> >>>>> resource requirements.  The thesis of Mesos is that the negotiation
> between
> >>>>> application and scheduler can make up for this missing information.
>  Mesos
> >>>>> was also first to "market", but Hadoop nextGen is catching up fast.
>  The
> >>>>> MR-279 has code that works, albeit with some issues in production
> use.  From
> >>>>> all reports, these issues are being resolved quickly as Yahoo's
> considerable
> >>>>> QA resources come to bear.
> >>>>>
> >>>>> Politically speaking, Mesos has a nearly inactive mailing list which,
> to
> >>>>> outward appearances, indicate a nearly inactive project.  There is
> some
> >>>>> evidence that considerable activity is occurring off-list, but this
> is a
> >>>>> process bug in the Apache model since "if it doesn't happen on the
> list, it
> >>>>> doesn't happen".
> >>>>>
> >>>>> On the other side, Hadoop nextGen has the Hadoop community pretty
> much
> >>>>> behind it.  Since HNG has the potential to breakdown some of the
> deadlocks
> >>>>> that have plagued the Hadoop community release process, there is
> >>>>> considerable enthusiasm for it.
> >>>>>
> >>>>> Combined, these factors make it much more likely that HNG will be the
> >>>>> dominant force in the Hadoop world.  That is, more likely in my own
> >>>>> estimation.  Others may differ.
> >>>>>
> >>>>>
> >>>>> On Thu, Jun 30, 2011 at 5:16 PM, Edward J. Yoon <
> edwardyoon@apache.org>wrote:
> >>>>>
> >>>>>> Hi,
> >>>>>>
> >>>>>> I'm newbie, and wonder what's the main differences between Hadoop
> >>>>>> nextGen and Mesos.
> >>>>>>
> >>>>>> Thanks.
> >>>>>> --
> >>>>>> Best Regards, Edward J. Yoon
> >>>>>> @eddieyoon
> >>>>>>
> >>>>>
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> Best Regards, Edward J. Yoon
> >>>> @eddieyoon
> >>>
> >>>
> >>
> >>
> >>
> >> --
> >> Best Regards, Edward J. Yoon
> >> @eddieyoon
> >
> >
>
>
>
> --
> Best Regards, Edward J. Yoon
> @eddieyoon
>

Re: Question about Mesos.

Posted by "Edward J. Yoon" <ed...@apache.org>.

Understood.

On Fri, Jul 1, 2011 at 1:59 PM, Matei Zaharia <ma...@eecs.berkeley.edu> wrote:
> I wouldn't say it's designed for Yahoo! only, but it's definitely meant to solve issues they saw with large Hadoop clusters (and provides a lot of value for that).
>
> Matei
>
> On Jul 1, 2011, at 12:51 AM, Edward J. Yoon wrote:
>
>> Hmm, HNG seems designed for their (Y!) own circumstance.
>>
>> On Fri, Jul 1, 2011 at 12:47 PM, Matei Zaharia <ma...@eecs.berkeley.edu> wrote:
>>> Ted brought up some superficial differences, but if you want to understand technical differences, there are a bunch of those as well. Mesos and Hadoop next-gen have similar goals (more efficient resource sharing for data centers), but they are coming at it from different angles -- HNG is currently mainly focusing on MapReduce and aims to support other types of applications too, while Mesos was meant to support a very diverse set of applications, including long-running services and batch jobs (rather than only multiple instances of MapReduce), and is in fact being used for that already. More importantly, HNG is really two pieces -- a refactoring of MapReduce to allow one instance of MR per application, and a resource manager called YARN that lets these instances coordinate. We are going to support having the new MR2 application masters run on top of Mesos instead of YARN too (and indeed the refactoring is nice because it will enable Hadoop MapReduce to run on other cluster scheduling systems in the future).
>>>
>>> In terms of the technical differences, here are some of the main ones currently:
>>>
>>> - Mesos is implemented in C++ rather than Java, and has APIs in C++ and Python in addition to Java.
>>>
>>> - The resource allocation models are different: HNG has a central scheduler that supports data locality constraints, while Mesos provides "resource offers" to let applications pick the resources they like according to other criteria in addition to requests/filters to describe which resources you want to be offered. Our belief is that resource offers will allow Mesos to support a wider range of application scheduling needs, while simultaneously making the system more scalable and highly available (minimizing the state and work required of the master).
>>>
>>> - Mesos can enforce resource isolation through Linux Containers to guard against misbehaving / greedy tasks.
>>>
>>> - HNG supports Kerberos authentication for users.
>>>
>>> - HNG can run the MR2 version of Hadoop, while Mesos can run Hadoop 0.20, Spark and MPI.
>>>
>>> - There are some smaller architectural differences that may matter for some applications, such as communication being based on message-passing in Mesos vs periodic heartbeats in HNG, which allows Mesos to provide lower scheduling latencies (e.g. to still be efficient if your tasks take 100ms each).
>>>
>>> However, overall, as Ted said, many of these differences will likely go away as both projects add features. What will be interesting is whether some fundamental differences in the target workloads remain, which I think is likely to happen. For example, the main deployment of Mesos is currently to run long-running stream processing services at Twitter, which is something that typical Hadoop environments just don't do and that requires different things from the cluster scheduler. I also believe we're going to see a lot of other cluster scheduling systems besides Mesos and HNG in the future, as people's requirements for these systems grow. There are some very challenging problems in designing a general cluster scheduling system that even the Google folks are still working hard on.
>>>
>>> Matei
>>>
>>>
>>>
>>> On Jun 30, 2011, at 6:26 PM, Edward J. Yoon wrote:
>>>
>>>> Thanks for your nice and quick explanation!
>>>>
>>>> On Fri, Jul 1, 2011 at 10:21 AM, Ted Dunning <te...@gmail.com> wrote:
>>>>> Technically speaking, Mesos has a less expressive model for expressing
>>>>> resource requirements.  The thesis of Mesos is that the negotiation between
>>>>> application and scheduler can make up for this missing information.  Mesos
>>>>> was also first to "market", but Hadoop nextGen is catching up fast.  The
>>>>> MR-279 has code that works, albeit with some issues in production use.  From
>>>>> all reports, these issues are being resolved quickly as Yahoo's considerable
>>>>> QA resources come to bear.
>>>>>
>>>>> Politically speaking, Mesos has a nearly inactive mailing list which, to
>>>>> outward appearances, indicate a nearly inactive project.  There is some
>>>>> evidence that considerable activity is occurring off-list, but this is a
>>>>> process bug in the Apache model since "if it doesn't happen on the list, it
>>>>> doesn't happen".
>>>>>
>>>>> On the other side, Hadoop nextGen has the Hadoop community pretty much
>>>>> behind it.  Since HNG has the potential to breakdown some of the deadlocks
>>>>> that have plagued the Hadoop community release process, there is
>>>>> considerable enthusiasm for it.
>>>>>
>>>>> Combined, these factors make it much more likely that HNG will be the
>>>>> dominant force in the Hadoop world.  That is, more likely in my own
>>>>> estimation.  Others may differ.
>>>>>
>>>>>
>>>>> On Thu, Jun 30, 2011 at 5:16 PM, Edward J. Yoon <ed...@apache.org>wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I'm newbie, and wonder what's the main differences between Hadoop
>>>>>> nextGen and Mesos.
>>>>>>
>>>>>> Thanks.
>>>>>> --
>>>>>> Best Regards, Edward J. Yoon
>>>>>> @eddieyoon
>>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Best Regards, Edward J. Yoon
>>>> @eddieyoon
>>>
>>>
>>
>>
>>
>> --
>> Best Regards, Edward J. Yoon
>> @eddieyoon
>
>



-- 
Best Regards, Edward J. Yoon
@eddieyoon

Re: Question about Mesos.

Posted by Matei Zaharia <ma...@eecs.berkeley.edu>.

I wouldn't say it's designed for Yahoo! only, but it's definitely meant to solve issues they saw with large Hadoop clusters (and provides a lot of value for that).

Matei

On Jul 1, 2011, at 12:51 AM, Edward J. Yoon wrote:

> Hmm, HNG seems designed for their (Y!) own circumstance.
> 
> On Fri, Jul 1, 2011 at 12:47 PM, Matei Zaharia <ma...@eecs.berkeley.edu> wrote:
>> Ted brought up some superficial differences, but if you want to understand technical differences, there are a bunch of those as well. Mesos and Hadoop next-gen have similar goals (more efficient resource sharing for data centers), but they are coming at it from different angles -- HNG is currently mainly focusing on MapReduce and aims to support other types of applications too, while Mesos was meant to support a very diverse set of applications, including long-running services and batch jobs (rather than only multiple instances of MapReduce), and is in fact being used for that already. More importantly, HNG is really two pieces -- a refactoring of MapReduce to allow one instance of MR per application, and a resource manager called YARN that lets these instances coordinate. We are going to support having the new MR2 application masters run on top of Mesos instead of YARN too (and indeed the refactoring is nice because it will enable Hadoop MapReduce to run on other cluster scheduling systems in the future).
>> 
>> In terms of the technical differences, here are some of the main ones currently:
>> 
>> - Mesos is implemented in C++ rather than Java, and has APIs in C++ and Python in addition to Java.
>> 
>> - The resource allocation models are different: HNG has a central scheduler that supports data locality constraints, while Mesos provides "resource offers" to let applications pick the resources they like according to other criteria in addition to requests/filters to describe which resources you want to be offered. Our belief is that resource offers will allow Mesos to support a wider range of application scheduling needs, while simultaneously making the system more scalable and highly available (minimizing the state and work required of the master).
>> 
>> - Mesos can enforce resource isolation through Linux Containers to guard against misbehaving / greedy tasks.
>> 
>> - HNG supports Kerberos authentication for users.
>> 
>> - HNG can run the MR2 version of Hadoop, while Mesos can run Hadoop 0.20, Spark and MPI.
>> 
>> - There are some smaller architectural differences that may matter for some applications, such as communication being based on message-passing in Mesos vs periodic heartbeats in HNG, which allows Mesos to provide lower scheduling latencies (e.g. to still be efficient if your tasks take 100ms each).
>> 
>> However, overall, as Ted said, many of these differences will likely go away as both projects add features. What will be interesting is whether some fundamental differences in the target workloads remain, which I think is likely to happen. For example, the main deployment of Mesos is currently to run long-running stream processing services at Twitter, which is something that typical Hadoop environments just don't do and that requires different things from the cluster scheduler. I also believe we're going to see a lot of other cluster scheduling systems besides Mesos and HNG in the future, as people's requirements for these systems grow. There are some very challenging problems in designing a general cluster scheduling system that even the Google folks are still working hard on.
>> 
>> Matei
>> 
>> 
>> 
>> On Jun 30, 2011, at 6:26 PM, Edward J. Yoon wrote:
>> 
>>> Thanks for your nice and quick explanation!
>>> 
>>> On Fri, Jul 1, 2011 at 10:21 AM, Ted Dunning <te...@gmail.com> wrote:
>>>> Technically speaking, Mesos has a less expressive model for expressing
>>>> resource requirements.  The thesis of Mesos is that the negotiation between
>>>> application and scheduler can make up for this missing information.  Mesos
>>>> was also first to "market", but Hadoop nextGen is catching up fast.  The
>>>> MR-279 has code that works, albeit with some issues in production use.  From
>>>> all reports, these issues are being resolved quickly as Yahoo's considerable
>>>> QA resources come to bear.
>>>> 
>>>> Politically speaking, Mesos has a nearly inactive mailing list which, to
>>>> outward appearances, indicate a nearly inactive project.  There is some
>>>> evidence that considerable activity is occurring off-list, but this is a
>>>> process bug in the Apache model since "if it doesn't happen on the list, it
>>>> doesn't happen".
>>>> 
>>>> On the other side, Hadoop nextGen has the Hadoop community pretty much
>>>> behind it.  Since HNG has the potential to breakdown some of the deadlocks
>>>> that have plagued the Hadoop community release process, there is
>>>> considerable enthusiasm for it.
>>>> 
>>>> Combined, these factors make it much more likely that HNG will be the
>>>> dominant force in the Hadoop world.  That is, more likely in my own
>>>> estimation.  Others may differ.
>>>> 
>>>> 
>>>> On Thu, Jun 30, 2011 at 5:16 PM, Edward J. Yoon <ed...@apache.org>wrote:
>>>> 
>>>>> Hi,
>>>>> 
>>>>> I'm newbie, and wonder what's the main differences between Hadoop
>>>>> nextGen and Mesos.
>>>>> 
>>>>> Thanks.
>>>>> --
>>>>> Best Regards, Edward J. Yoon
>>>>> @eddieyoon
>>>>> 
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> Best Regards, Edward J. Yoon
>>> @eddieyoon
>> 
>> 
> 
> 
> 
> -- 
> Best Regards, Edward J. Yoon
> @eddieyoon

Re: Question about Mesos.

Posted by "Edward J. Yoon" <ed...@apache.org>.

Hmm, HNG seems designed for their (Y!) own circumstance.

On Fri, Jul 1, 2011 at 12:47 PM, Matei Zaharia <ma...@eecs.berkeley.edu> wrote:
> Ted brought up some superficial differences, but if you want to understand technical differences, there are a bunch of those as well. Mesos and Hadoop next-gen have similar goals (more efficient resource sharing for data centers), but they are coming at it from different angles -- HNG is currently mainly focusing on MapReduce and aims to support other types of applications too, while Mesos was meant to support a very diverse set of applications, including long-running services and batch jobs (rather than only multiple instances of MapReduce), and is in fact being used for that already. More importantly, HNG is really two pieces -- a refactoring of MapReduce to allow one instance of MR per application, and a resource manager called YARN that lets these instances coordinate. We are going to support having the new MR2 application masters run on top of Mesos instead of YARN too (and indeed the refactoring is nice because it will enable Hadoop MapReduce to run on other cluster scheduling systems in the future).
>
> In terms of the technical differences, here are some of the main ones currently:
>
> - Mesos is implemented in C++ rather than Java, and has APIs in C++ and Python in addition to Java.
>
> - The resource allocation models are different: HNG has a central scheduler that supports data locality constraints, while Mesos provides "resource offers" to let applications pick the resources they like according to other criteria in addition to requests/filters to describe which resources you want to be offered. Our belief is that resource offers will allow Mesos to support a wider range of application scheduling needs, while simultaneously making the system more scalable and highly available (minimizing the state and work required of the master).
>
> - Mesos can enforce resource isolation through Linux Containers to guard against misbehaving / greedy tasks.
>
> - HNG supports Kerberos authentication for users.
>
> - HNG can run the MR2 version of Hadoop, while Mesos can run Hadoop 0.20, Spark and MPI.
>
> - There are some smaller architectural differences that may matter for some applications, such as communication being based on message-passing in Mesos vs periodic heartbeats in HNG, which allows Mesos to provide lower scheduling latencies (e.g. to still be efficient if your tasks take 100ms each).
>
> However, overall, as Ted said, many of these differences will likely go away as both projects add features. What will be interesting is whether some fundamental differences in the target workloads remain, which I think is likely to happen. For example, the main deployment of Mesos is currently to run long-running stream processing services at Twitter, which is something that typical Hadoop environments just don't do and that requires different things from the cluster scheduler. I also believe we're going to see a lot of other cluster scheduling systems besides Mesos and HNG in the future, as people's requirements for these systems grow. There are some very challenging problems in designing a general cluster scheduling system that even the Google folks are still working hard on.
>
> Matei
>
>
>
> On Jun 30, 2011, at 6:26 PM, Edward J. Yoon wrote:
>
>> Thanks for your nice and quick explanation!
>>
>> On Fri, Jul 1, 2011 at 10:21 AM, Ted Dunning <te...@gmail.com> wrote:
>>> Technically speaking, Mesos has a less expressive model for expressing
>>> resource requirements.  The thesis of Mesos is that the negotiation between
>>> application and scheduler can make up for this missing information.  Mesos
>>> was also first to "market", but Hadoop nextGen is catching up fast.  The
>>> MR-279 has code that works, albeit with some issues in production use.  From
>>> all reports, these issues are being resolved quickly as Yahoo's considerable
>>> QA resources come to bear.
>>>
>>> Politically speaking, Mesos has a nearly inactive mailing list which, to
>>> outward appearances, indicate a nearly inactive project.  There is some
>>> evidence that considerable activity is occurring off-list, but this is a
>>> process bug in the Apache model since "if it doesn't happen on the list, it
>>> doesn't happen".
>>>
>>> On the other side, Hadoop nextGen has the Hadoop community pretty much
>>> behind it.  Since HNG has the potential to breakdown some of the deadlocks
>>> that have plagued the Hadoop community release process, there is
>>> considerable enthusiasm for it.
>>>
>>> Combined, these factors make it much more likely that HNG will be the
>>> dominant force in the Hadoop world.  That is, more likely in my own
>>> estimation.  Others may differ.
>>>
>>>
>>> On Thu, Jun 30, 2011 at 5:16 PM, Edward J. Yoon <ed...@apache.org>wrote:
>>>
>>>> Hi,
>>>>
>>>> I'm newbie, and wonder what's the main differences between Hadoop
>>>> nextGen and Mesos.
>>>>
>>>> Thanks.
>>>> --
>>>> Best Regards, Edward J. Yoon
>>>> @eddieyoon
>>>>
>>>
>>
>>
>>
>> --
>> Best Regards, Edward J. Yoon
>> @eddieyoon
>
>



-- 
Best Regards, Edward J. Yoon
@eddieyoon

Re: Question about Mesos.

Posted by Matei Zaharia <ma...@eecs.berkeley.edu>.

Ted brought up some superficial differences, but if you want to understand technical differences, there are a bunch of those as well. Mesos and Hadoop next-gen have similar goals (more efficient resource sharing for data centers), but they are coming at it from different angles -- HNG is currently mainly focusing on MapReduce and aims to support other types of applications too, while Mesos was meant to support a very diverse set of applications, including long-running services and batch jobs (rather than only multiple instances of MapReduce), and is in fact being used for that already. More importantly, HNG is really two pieces -- a refactoring of MapReduce to allow one instance of MR per application, and a resource manager called YARN that lets these instances coordinate. We are going to support having the new MR2 application masters run on top of Mesos instead of YARN too (and indeed the refactoring is nice because it will enable Hadoop MapReduce to run on other cluster scheduling systems in the future).

In terms of the technical differences, here are some of the main ones currently:

- Mesos is implemented in C++ rather than Java, and has APIs in C++ and Python in addition to Java.

- The resource allocation models are different: HNG has a central scheduler that supports data locality constraints, while Mesos provides "resource offers" to let applications pick the resources they like according to other criteria in addition to requests/filters to describe which resources you want to be offered. Our belief is that resource offers will allow Mesos to support a wider range of application scheduling needs, while simultaneously making the system more scalable and highly available (minimizing the state and work required of the master).

- Mesos can enforce resource isolation through Linux Containers to guard against misbehaving / greedy tasks.

- HNG supports Kerberos authentication for users.

- HNG can run the MR2 version of Hadoop, while Mesos can run Hadoop 0.20, Spark and MPI.

- There are some smaller architectural differences that may matter for some applications, such as communication being based on message-passing in Mesos vs periodic heartbeats in HNG, which allows Mesos to provide lower scheduling latencies (e.g. to still be efficient if your tasks take 100ms each).

However, overall, as Ted said, many of these differences will likely go away as both projects add features. What will be interesting is whether some fundamental differences in the target workloads remain, which I think is likely to happen. For example, the main deployment of Mesos is currently to run long-running stream processing services at Twitter, which is something that typical Hadoop environments just don't do and that requires different things from the cluster scheduler. I also believe we're going to see a lot of other cluster scheduling systems besides Mesos and HNG in the future, as people's requirements for these systems grow. There are some very challenging problems in designing a general cluster scheduling system that even the Google folks are still working hard on.

Matei

On Jun 30, 2011, at 6:26 PM, Edward J. Yoon wrote:

> Thanks for your nice and quick explanation!
> 
> On Fri, Jul 1, 2011 at 10:21 AM, Ted Dunning <te...@gmail.com> wrote:
>> Technically speaking, Mesos has a less expressive model for expressing
>> resource requirements.  The thesis of Mesos is that the negotiation between
>> application and scheduler can make up for this missing information.  Mesos
>> was also first to "market", but Hadoop nextGen is catching up fast.  The
>> MR-279 has code that works, albeit with some issues in production use.  From
>> all reports, these issues are being resolved quickly as Yahoo's considerable
>> QA resources come to bear.
>> 
>> Politically speaking, Mesos has a nearly inactive mailing list which, to
>> outward appearances, indicate a nearly inactive project.  There is some
>> evidence that considerable activity is occurring off-list, but this is a
>> process bug in the Apache model since "if it doesn't happen on the list, it
>> doesn't happen".
>> 
>> On the other side, Hadoop nextGen has the Hadoop community pretty much
>> behind it.  Since HNG has the potential to breakdown some of the deadlocks
>> that have plagued the Hadoop community release process, there is
>> considerable enthusiasm for it.
>> 
>> Combined, these factors make it much more likely that HNG will be the
>> dominant force in the Hadoop world.  That is, more likely in my own
>> estimation.  Others may differ.
>> 
>> 
>> On Thu, Jun 30, 2011 at 5:16 PM, Edward J. Yoon <ed...@apache.org>wrote:
>> 
>>> Hi,
>>> 
>>> I'm newbie, and wonder what's the main differences between Hadoop
>>> nextGen and Mesos.
>>> 
>>> Thanks.
>>> --
>>> Best Regards, Edward J. Yoon
>>> @eddieyoon
>>> 
>> 
> 
> 
> 
> -- 
> Best Regards, Edward J. Yoon
> @eddieyoon

Re: Question about Mesos.

Posted by "Edward J. Yoon" <ed...@apache.org>.

Thanks for your nice and quick explanation!

On Fri, Jul 1, 2011 at 10:21 AM, Ted Dunning <te...@gmail.com> wrote:
> Technically speaking, Mesos has a less expressive model for expressing
> resource requirements.  The thesis of Mesos is that the negotiation between
> application and scheduler can make up for this missing information.  Mesos
> was also first to "market", but Hadoop nextGen is catching up fast.  The
> MR-279 has code that works, albeit with some issues in production use.  From
> all reports, these issues are being resolved quickly as Yahoo's considerable
> QA resources come to bear.
>
> Politically speaking, Mesos has a nearly inactive mailing list which, to
> outward appearances, indicate a nearly inactive project.  There is some
> evidence that considerable activity is occurring off-list, but this is a
> process bug in the Apache model since "if it doesn't happen on the list, it
> doesn't happen".
>
> On the other side, Hadoop nextGen has the Hadoop community pretty much
> behind it.  Since HNG has the potential to breakdown some of the deadlocks
> that have plagued the Hadoop community release process, there is
> considerable enthusiasm for it.
>
> Combined, these factors make it much more likely that HNG will be the
> dominant force in the Hadoop world.  That is, more likely in my own
> estimation.  Others may differ.
>
>
> On Thu, Jun 30, 2011 at 5:16 PM, Edward J. Yoon <ed...@apache.org>wrote:
>
>> Hi,
>>
>> I'm newbie, and wonder what's the main differences between Hadoop
>> nextGen and Mesos.
>>
>> Thanks.
>> --
>> Best Regards, Edward J. Yoon
>> @eddieyoon
>>
>



-- 
Best Regards, Edward J. Yoon
@eddieyoon

Re: Question about Mesos.

Posted by Ted Dunning <te...@gmail.com>.

Technically speaking, Mesos has a less expressive model for expressing
resource requirements.  The thesis of Mesos is that the negotiation between
application and scheduler can make up for this missing information.  Mesos
was also first to "market", but Hadoop nextGen is catching up fast.  The
MR-279 has code that works, albeit with some issues in production use.  From
all reports, these issues are being resolved quickly as Yahoo's considerable
QA resources come to bear.

Politically speaking, Mesos has a nearly inactive mailing list which, to
outward appearances, indicate a nearly inactive project.  There is some
evidence that considerable activity is occurring off-list, but this is a
process bug in the Apache model since "if it doesn't happen on the list, it
doesn't happen".

On the other side, Hadoop nextGen has the Hadoop community pretty much
behind it.  Since HNG has the potential to breakdown some of the deadlocks
that have plagued the Hadoop community release process, there is
considerable enthusiasm for it.

Combined, these factors make it much more likely that HNG will be the
dominant force in the Hadoop world.  That is, more likely in my own
estimation.  Others may differ.

On Thu, Jun 30, 2011 at 5:16 PM, Edward J. Yoon <ed...@apache.org>wrote:

> Hi,
>
> I'm newbie, and wonder what's the main differences between Hadoop
> nextGen and Mesos.
>
> Thanks.
> --
> Best Regards, Edward J. Yoon
> @eddieyoon
>