You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mesos.apache.org by Adam Bordelon <ad...@mesosphere.io> on 2014/04/18 01:48:02 UTC

Re: DRF on Mesos

Adding user@mesos.apache.org, in case anybody else (not necessarily named
'Adam') wants to chime in.

If I understand your example correctly, ETL (e.g. Hadoop) and Analytics
(e.g. Spark) are two different Mesos frameworks that can each run multiple
jobs consisting of many tasks. Client1 and Client2 are users submitting the
job requests to the Nginx frontend, which in turn submits the jobs to the
appropriate frameworks.
Right now, there is no way to specify per-user weights in Mesos itself,
such that Client1 is guaranteed 3x the resources of Client2, no matter
which framework. So that would have to be handled by each framework's
scheduler (this is why Mesos is known as a two-level scheduler), or perhaps
even by your Nginx frontend itself.

So from Mesos' perspective, you have two frameworks ETL and Analytics, and
when the Mesos master gets some available resources, it will offer them to
whichever framework is furthest below its weighted fair share. So, if there
are 100 cpus total, and each framework starts with no resources, and let's
assume offers are batched in groups of 5 cpus, then ETL's weighted share is
(0/100)/1 and Analytics' is (0/100)/3 so the first offer could go to
either. Let's assume it's Analytics first. Then:
ETL.share = 0, and Analytics.share = (5/100)/3 = 0.0167; next offer goes to
ETL
ETL.share = (5/100) = 0.05, and Analytics.share = 0.0167; next offer goes
to Analytics
ETL.share = 0.05, and Analytics.share = (10/100)/3 = 0.033; next offer goes
to Analytics, etc.
Once an offer goes to a framework (e.g. Analytics), it's up to the
framework to decide which task(s)/job/user/client to give that offer to. We
currently don't have any base scheduler that you can inherit from to get
that kind of functionality, but it's certainly the kind of feature that
many would find useful. Perhaps a worthy addition to
Marathon/Aurora/Chronos?

Or we could make the sharing per-task/job-user first-class in Mesos as I
described in the other thread I referenced.
Anybody else have other thoughts/suggestions?


On Mon, Apr 14, 2014 at 6:32 AM, Adam Rosenberger <
adam.rosenberger1@gmail.com> wrote:

> Hey Adam,
>
> Thanks for the detailed info, that is helpful.
>
> Here is a simple use case:
>
> The application I work on is a multi tenant financial application, with a
> web based interface fronted by an nginx cluster. On the backend there is a
> large compute cluster for processing. There are several frameworks in the
> Mesos nomenclature, but for our purposes I think two would suffice.
>
> 1) ETL jobs for raw client data
> 2) Analytics jobs for generating reports on client data
>
> Both of those are run across the compute cluster due to volume of data and
> the problems ranging from embarrassingly parallel to somewhat
> parallelizable. As such the Mesos DRF makes perfect sense to declare those
> top level frameworks and provide weights (e.g. make Analytics 3:1 to ETL).
>
> However what I am struggling with is the multi tenancy of the application.
>
> Say we have 100 total CPUs on the cluster and all jobs require the same
> amount of [infinitely available] RAM, just to focus on CPU contention.
>
> If two competing requests come in for say
> - ETL: 40 CPUs
> - Analytics: 80 CPUs
>
> Then the Mesos allocator should take care of that for me.
>
> But as I understand the Executor / Scheduler architecture, in each Nginx
> server, I would have to keep a memory-based queue of jobs that come from
> the HTTP requests. Then on resource offers, I would be able to drain the
> queue in one or more shots with the jobs for Mesos.
>
> Expanding on that, now assume we have Client1 and Client2 both submitting
> ETL and Analytics jobs at the same time.
> - Client1 ETL: 50 CPUs
> - Client1 Analytics: 100 CPUs
> - Client2 ETL: 10 CPUs
> - Client2 Analytics: 20 CPUS
>
> If those requests all routed into the same Nginx server and I just keep a
> memory based queue of requests to submit to Mesos, it looks to me like
> Client1 ETL and Analytics would be subject to allocation policy which is
> great, but I would starve Client2 ... ClientN if a large job(s) arrived
> first.
>
> What I was hoping for was some way to submit say all ETL requests when a
> resource offer occurs and have some sort of second-level allocation policy
> to try some sort of fair share or priority based scheduling for the
> different clients.
>
> If I am off the mark on my understanding of the Mesos internals please do
> let me know. Thanks again.
>
> Adam
>
>
> On Mon, Apr 14, 2014 at 5:24 AM, Adam Bordelon <ad...@mesosphere.io> wrote:
>
>> Hi AdamR,
>>
>> It is certainly possible to specify resource allocation weights for
>> frameworks (or groups of frameworks), using the --weights and --roles
>> parameters to the mesos master. We currently do not have a way to specify
>> weights for users/groups for each framework, but would like to build that
>> in the near future.
>>
>> Here's how we currently use roles and weights in Mesos:
>> - Roles (on Frameworks): Each framework can register with a role set in
>> its FrameworkInfo, which is used to group frameworks for allocation
>> decisions.
>> - Weights: On master startup, a list of role/weight pairs can be provided
>> as '--weights=role=weight,role=weight'. Weights, which do not need to be
>> integers, are used to indicate forms of priority in the allocator. Make
>> sure that each of the roles provided in --weights also appear in --roles on
>> master startup.
>> When weights are specified, a client's DRF share will be divided by the
>> weight. For example, a role that has a weight of 2 will be offered twice as
>> many resources as a role with weight 1.
>> So, when a new resource becomes available, the master allocator first
>> checks all the roles to see which role is furthest below its weighted fair
>> share. Then, within that role, it selects the framework that is furthest
>> below its fair share and offers the resource to it.
>>
>> - Roles on Resources: A slave can also assign roles to some/all of its
>> resources on startup. If a resource is assigned to role A, then only
>> frameworks with role A are eligible to get an offer for that resource.
>> Resources not assigned to a role can be offered to any framework. This
>> could be used in many ways; for example, you can ensure that there's always
>> 1 cpu and 1GB of RAM available on each slave to spin up your favorite
>> service.
>>
>> - Users: We currently only associate a framework with the user who
>> registered it, but we plan to allow frameworks to run tasks as other users
>> (MESOS-1141).
>> The allocator used to sort based on users, but now uses roles instead. We
>> could build a new allocator that focuses on users, as I suggested in a
>> previous thread:
>>
>> https://mail-archives.apache.org/mod_mbox/mesos-dev/201404.mbox/%3cCAK8jAgNTZXyEJGztWjOOb4q=u-ka1U60JHf_Zmomy4jyx3vSSw@mail.gmail.com%3e
>>
>> We would definitely like to build out the multi-user-per-framework story
>> for Mesos. Could you explain more about your use case so we can get a
>> better idea what you're looking for?
>>
>> Thanks,
>> -AdamB-
>>
>>
>>
>> On Thu, Apr 10, 2014 at 2:56 PM, Adam Rosenberger <
>> adam.rosenberger1@gmail.com> wrote:
>>
>>> Hi Adam,
>>>
>>> Your name was brought up when I asked some questions about the Mesos
>>> allocator policy on the IRC channel.
>>>
>>> Someone sent me a link to the DRF whitepaper which I need to digest, but
>>> I was hoping you could fast track me with some knowledge. I've so far been
>>> largely unsuccessful in Google searching for any useful examples of
>>> specifying weights/roles and how it ties into the DRF allocator.
>>>
>>> Say at a high level I have Frameworks A & B and each framework can have
>>> a number of concurrent users. I am wondering if it is possible to both
>>> specify some sort of weighting for the frameworks and for users / groups of
>>> users for each framework.
>>>
>>> Thanks for any info.
>>>
>>> Adam
>>>
>>
>>
>

Re: DRF on Mesos

Posted by Sharma Podila <sp...@netflix.com>.
(Caveat: I'm new to Mesos, but not to scheduling)
First, focusing on AdamR's use case:
Another way to look at this would be that Mesos handles (coarse grain)
resource allocation across frameworks and then each framework does (fine
grain) job scheduling via appropriate job ordering to achieve its needs.
Allocation and job scheduling/ordering are two different things, although
related. So, for example, the ETL framework wouldn't necessarily use its
allocated offers by launching tasks in a FIFO order. Instead, it could
partition the in memory queue by user (or other criteria) in order to
launch tasks for achieving resource sharing objective across users, or
other desired attributes - round robin by user, etc.

AdamB's point about adding user info into task launching attempts to
achieve this for user based allocation. However, that is just one criteria
for sharing. Other environments may want to share differently, say, using
departments/projects such that a single user can belong to multiple
departments/projects, while submitting jobs to multiple frameworks. More on
this later, below.

Preventing job starvation due to earlier submitted long running jobs can
get non-trivial soon. Barring preemptions, one would have to prevent long
running jobs from using up the "last few" resources (which would require
knowledge of capacity, etc.). Basically some kind of a "dam limit" or
partitioning the resources with at least one partition not accepting long
running jobs. The other side to this, of course, is that having such limits
fragments the resource pool and reduces resource utilization while
unnecessarily keeping the long jobs from launching if there are no other
jobs. Preemptions and friendly job mix and arrival rates are our good
friends, although not always practical. And we would have to know how long
a job is going to run, with the job itself declaring it, or some kind of
profiling.

​Regarding ​the other thread on user based allocations...
My understanding is that Mesos keeps resource sharing at a coarse grain
level across frameworks. User based allocations attempt to make resource
sharing more fine grain. In that case, it seems to me that there is a bit
of overloading happening with frameworks. A framework seems to be used for
both allocation purposes as well as for execution purposes. A Hadoop
framework, for example, knows how to service all hadoop tasks. However, it
also inherits allocation decisions by being associated with a role.
Therefore, if you need to define resource allocation among users that run
together in multiple frameworks, my understanding is that today you would
have to run multiple instances of the frameworks. Effectively, there's a F
x A matrix of sorts, where F is the number of frameworks and A is the
number of entities (users, for example) over which to define allocation.
It almost seems that it might help to separate the two out (allocation Vs
execution) by defining a pluggable allocation engine in Mesos that extracts
the attributes from tasks (user info in the referred thread) to achieve
defined resource sharing, while continuing to use the same framework to
execute jobs. However, this isn't necessarily as simple, and may compromise
some of Mesos' goals at achieving speed with coarse grain allocation model.

Sharma



On Thu, Apr 17, 2014 at 4:48 PM, Adam Bordelon <ad...@mesosphere.io> wrote:

> Adding user@mesos.apache.org, in case anybody else (not necessarily named
> 'Adam') wants to chime in.
>
> If I understand your example correctly, ETL (e.g. Hadoop) and Analytics
> (e.g. Spark) are two different Mesos frameworks that can each run multiple
> jobs consisting of many tasks. Client1 and Client2 are users submitting the
> job requests to the Nginx frontend, which in turn submits the jobs to the
> appropriate frameworks.
> Right now, there is no way to specify per-user weights in Mesos itself,
> such that Client1 is guaranteed 3x the resources of Client2, no matter
> which framework. So that would have to be handled by each framework's
> scheduler (this is why Mesos is known as a two-level scheduler), or perhaps
> even by your Nginx frontend itself.
>
> So from Mesos' perspective, you have two frameworks ETL and Analytics, and
> when the Mesos master gets some available resources, it will offer them to
> whichever framework is furthest below its weighted fair share. So, if there
> are 100 cpus total, and each framework starts with no resources, and let's
> assume offers are batched in groups of 5 cpus, then ETL's weighted share is
> (0/100)/1 and Analytics' is (0/100)/3 so the first offer could go to
> either. Let's assume it's Analytics first. Then:
> ETL.share = 0, and Analytics.share = (5/100)/3 = 0.0167; next offer goes
> to ETL
> ETL.share = (5/100) = 0.05, and Analytics.share = 0.0167; next offer goes
> to Analytics
> ETL.share = 0.05, and Analytics.share = (10/100)/3 = 0.033; next offer
> goes to Analytics, etc.
> Once an offer goes to a framework (e.g. Analytics), it's up to the
> framework to decide which task(s)/job/user/client to give that offer to. We
> currently don't have any base scheduler that you can inherit from to get
> that kind of functionality, but it's certainly the kind of feature that
> many would find useful. Perhaps a worthy addition to
> Marathon/Aurora/Chronos?
>
> Or we could make the sharing per-task/job-user first-class in Mesos as I
> described in the other thread I referenced.
> Anybody else have other thoughts/suggestions?
>
>
> On Mon, Apr 14, 2014 at 6:32 AM, Adam Rosenberger <
> adam.rosenberger1@gmail.com> wrote:
>
>> Hey Adam,
>>
>> Thanks for the detailed info, that is helpful.
>>
>> Here is a simple use case:
>>
>> The application I work on is a multi tenant financial application, with a
>> web based interface fronted by an nginx cluster. On the backend there is a
>> large compute cluster for processing. There are several frameworks in the
>> Mesos nomenclature, but for our purposes I think two would suffice.
>>
>> 1) ETL jobs for raw client data
>> 2) Analytics jobs for generating reports on client data
>>
>> Both of those are run across the compute cluster due to volume of data
>> and the problems ranging from embarrassingly parallel to somewhat
>> parallelizable. As such the Mesos DRF makes perfect sense to declare those
>> top level frameworks and provide weights (e.g. make Analytics 3:1 to ETL).
>>
>> However what I am struggling with is the multi tenancy of the application.
>>
>> Say we have 100 total CPUs on the cluster and all jobs require the same
>> amount of [infinitely available] RAM, just to focus on CPU contention.
>>
>> If two competing requests come in for say
>> - ETL: 40 CPUs
>> - Analytics: 80 CPUs
>>
>> Then the Mesos allocator should take care of that for me.
>>
>> But as I understand the Executor / Scheduler architecture, in each Nginx
>> server, I would have to keep a memory-based queue of jobs that come from
>> the HTTP requests. Then on resource offers, I would be able to drain the
>> queue in one or more shots with the jobs for Mesos.
>>
>> Expanding on that, now assume we have Client1 and Client2 both submitting
>> ETL and Analytics jobs at the same time.
>> - Client1 ETL: 50 CPUs
>> - Client1 Analytics: 100 CPUs
>> - Client2 ETL: 10 CPUs
>> - Client2 Analytics: 20 CPUS
>>
>> If those requests all routed into the same Nginx server and I just keep a
>> memory based queue of requests to submit to Mesos, it looks to me like
>> Client1 ETL and Analytics would be subject to allocation policy which is
>> great, but I would starve Client2 ... ClientN if a large job(s) arrived
>> first.
>>
>> What I was hoping for was some way to submit say all ETL requests when a
>> resource offer occurs and have some sort of second-level allocation policy
>> to try some sort of fair share or priority based scheduling for the
>> different clients.
>>
>> If I am off the mark on my understanding of the Mesos internals please do
>> let me know. Thanks again.
>>
>> Adam
>>
>>
>> On Mon, Apr 14, 2014 at 5:24 AM, Adam Bordelon <ad...@mesosphere.io>wrote:
>>
>>> Hi AdamR,
>>>
>>> It is certainly possible to specify resource allocation weights for
>>> frameworks (or groups of frameworks), using the --weights and --roles
>>> parameters to the mesos master. We currently do not have a way to specify
>>> weights for users/groups for each framework, but would like to build that
>>> in the near future.
>>>
>>> Here's how we currently use roles and weights in Mesos:
>>> - Roles (on Frameworks): Each framework can register with a role set in
>>> its FrameworkInfo, which is used to group frameworks for allocation
>>> decisions.
>>> - Weights: On master startup, a list of role/weight pairs can be
>>> provided as ‘--weights=role=weight,role=weight’. Weights, which do not need
>>> to be integers, are used to indicate forms of priority in the allocator.
>>> Make sure that each of the roles provided in --weights also appear in
>>> --roles on master startup.
>>> When weights are specified, a client’s DRF share will be divided by the
>>> weight. For example, a role that has a weight of 2 will be offered twice as
>>> many resources as a role with weight 1.
>>> So, when a new resource becomes available, the master allocator first
>>> checks all the roles to see which role is furthest below its weighted fair
>>> share. Then, within that role, it selects the framework that is furthest
>>> below its fair share and offers the resource to it.
>>>
>>> - Roles on Resources: A slave can also assign roles to some/all of its
>>> resources on startup. If a resource is assigned to role A, then only
>>> frameworks with role A are eligible to get an offer for that resource.
>>> Resources not assigned to a role can be offered to any framework. This
>>> could be used in many ways; for example, you can ensure that there's always
>>> 1 cpu and 1GB of RAM available on each slave to spin up your favorite
>>> service.
>>>
>>> - Users: We currently only associate a framework with the user who
>>> registered it, but we plan to allow frameworks to run tasks as other users
>>> (MESOS-1141).
>>> The allocator used to sort based on users, but now uses roles instead.
>>> We could build a new allocator that focuses on users, as I suggested in a
>>> previous thread:
>>>
>>> https://mail-archives.apache.org/mod_mbox/mesos-dev/201404.mbox/%3cCAK8jAgNTZXyEJGztWjOOb4q=u-ka1U60JHf_Zmomy4jyx3vSSw@mail.gmail.com%3e
>>>
>>> We would definitely like to build out the multi-user-per-framework story
>>> for Mesos. Could you explain more about your use case so we can get a
>>> better idea what you're looking for?
>>>
>>> Thanks,
>>> -AdamB-
>>>
>>>
>>>
>>> On Thu, Apr 10, 2014 at 2:56 PM, Adam Rosenberger <
>>> adam.rosenberger1@gmail.com> wrote:
>>>
>>>> Hi Adam,
>>>>
>>>> Your name was brought up when I asked some questions about the Mesos
>>>> allocator policy on the IRC channel.
>>>>
>>>> Someone sent me a link to the DRF whitepaper which I need to digest,
>>>> but I was hoping you could fast track me with some knowledge. I've so far
>>>> been largely unsuccessful in Google searching for any useful examples of
>>>> specifying weights/roles and how it ties into the DRF allocator.
>>>>
>>>> Say at a high level I have Frameworks A & B and each framework can have
>>>> a number of concurrent users. I am wondering if it is possible to both
>>>> specify some sort of weighting for the frameworks and for users / groups of
>>>> users for each framework.
>>>>
>>>> Thanks for any info.
>>>>
>>>> Adam
>>>>
>>>
>>>
>>
>