You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mesos.apache.org by Li Jin <ic...@gmail.com> on 2013/09/04 15:45:37 UTC

Some more question on resource sharing

Hello Mesosers:

Let me first make sure my understanding of how scheduling works (without
whitelisting) is correct

Basically:
(1) Allocator gathers all available resources
(2) Allocator picks a framework with the least share, offers all resources
to that framework
(3) Framework accepts/declines offers

If above is correct, my question is how to prevent a single framework
taking more resources than its share for a long time? Let's say we have
framework A,B,C, the shared cluster is pretty empty, A kicks in, launches a
bunch of jobs and takes the entire cluster for 2 hours...

Thanks,
Li

Re: Some more question on resource sharing

Posted by Sam Taha <ta...@gmail.com>.
I am still in learning phase with the project :) but from what I have
gathered Mesos makes offers to multiple frameworks and the first framework
to accept gets the resources and others get rescinded.

Thanks,
Sam Taha

http://grandlogic.com


On Mon, Sep 9, 2013 at 5:47 PM, Li Jin <ic...@gmail.com> wrote:

> In case this is missed, I am still interested in this question.
>
> Thanks,
> Li
>
>
> On Wed, Sep 4, 2013 at 9:45 AM, Li Jin <ic...@gmail.com> wrote:
>
>> Hello Mesosers:
>>
>> Let me first make sure my understanding of how scheduling works (without
>> whitelisting) is correct
>>
>> Basically:
>> (1) Allocator gathers all available resources
>> (2) Allocator picks a framework with the least share, offers all
>> resources to that framework
>> (3) Framework accepts/declines offers
>>
>> If above is correct, my question is how to prevent a single framework
>> taking more resources than its share for a long time? Let's say we have
>> framework A,B,C, the shared cluster is pretty empty, A kicks in, launches a
>> bunch of jobs and takes the entire cluster for 2 hours...
>>
>> Thanks,
>> Li
>>
>
>

Re: Some more question on resource sharing

Posted by Li Jin <ic...@gmail.com>.
In case this is missed, I am still interested in this question.

Thanks,
Li


On Wed, Sep 4, 2013 at 9:45 AM, Li Jin <ic...@gmail.com> wrote:

> Hello Mesosers:
>
> Let me first make sure my understanding of how scheduling works (without
> whitelisting) is correct
>
> Basically:
> (1) Allocator gathers all available resources
> (2) Allocator picks a framework with the least share, offers all
> resources to that framework
> (3) Framework accepts/declines offers
>
> If above is correct, my question is how to prevent a single framework
> taking more resources than its share for a long time? Let's say we have
> framework A,B,C, the shared cluster is pretty empty, A kicks in, launches a
> bunch of jobs and takes the entire cluster for 2 hours...
>
> Thanks,
> Li
>

Re: Some more question on resource sharing

Posted by Benjamin Hindman <be...@gmail.com>.
Hi Li,

Your understanding is correct. A few more thoughts:

The definition of "least share" is based on work we did at Berkeley called
dominant resource fairness (DRF, see
here<https://www.usenix.org/legacy/event/nsdi11/tech/full_papers/Ghodsi.pdf>).
In a nutshell, DRF looks at the resources a framework cares the most about
(it's dominant resource) and makes sure it has a "fair" share of that
resource. Here's the intuition: imagine if we statically partitioned the
cluster into 1/N sized smaller clusters for N frameworks. Then the amount
of work that each framework could run in their cluster will be upper
bounded by the resource they use up first, i.e., their dominant resource!
Thus, at the very least, we should probably make sure we let a framework in
a Mesos cluster be able to execute that much work at any point in time ...
and of course the benefit to running in Mesos is the framework can also
elastically scale to more than 1/N when those resources are idle.

Note that "fair" here by default implies 1/N of your dominant resource, but
you can tweak that with the --weight flag. Think of the weight flag as
capturing the fact that in the real world some frameworks are more
important and wouldn't actually have been given a 1/N sized statically
allocated cluster but something bigger (or smaller depending on the weight).

As you mentioned, a framework might snatch up all the idle resources
regardless of it's weight. You can also use resource reservations on the
slaves to make sure that some frameworks (or better, roles) are guaranteed
to have resources. You can do this by tagging resources with role names in
the --resources flag (see mesos-slave --help). Right now reservations can
not be used by other frameworks/roles, but we're working on that right now!
Look for it in 0.15.0.

Hope that helps.

Ben.


On Wed, Sep 4, 2013 at 6:45 AM, Li Jin <ic...@gmail.com> wrote:

> Hello Mesosers:
>
> Let me first make sure my understanding of how scheduling works (without
> whitelisting) is correct
>
> Basically:
> (1) Allocator gathers all available resources
> (2) Allocator picks a framework with the least share, offers all
> resources to that framework
> (3) Framework accepts/declines offers
>
> If above is correct, my question is how to prevent a single framework
> taking more resources than its share for a long time? Let's say we have
> framework A,B,C, the shared cluster is pretty empty, A kicks in, launches a
> bunch of jobs and takes the entire cluster for 2 hours...
>
> Thanks,
> Li
>