You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mesos.apache.org by Benjamin Mahler <bm...@apache.org> on 2017/12/04 19:37:17 UTC

Re: Resource allocation cycle in DRF for multiple frameworks

I don't think I understood the questions here, but let me add some
explanation and we can go from there.

Mesos will use DRF to choose an ordering amongst the roles that are
actively interested in obtaining resources. Within a role, we currently use
DRF again to choose an ordering amongst the frameworks in that role. The
simplified pseudo-code looks something like this:

for each agent:
  for each role in drf_sorted(roles):
    for each framework subscribed to role in drf_sorted(frameworks):
      if framework already filtered these resources:
        continue
      else
        allocate to framework

There is no strong concept of a "cycle" as you were referring to, that is,
mesos will not remember which offers were sent out during which time we ran
this overall loop. Currently, when resources are offered, as far as the
allocator is concerned, they are considered allocated to that role and
framework.

Mesos provides an --offer_timeout flag on the master after which the offer
will be rescinded.

If you could share a little more about what you're trying to accomplish in
your particular use case we could advise on how best to set things up.

On Thu, Nov 30, 2017 at 1:05 PM, bigggyan <bi...@gmail.com> wrote:

> Hello
> My understanding is, during a single DRF cycle mesos master will not offer
> same framework twice. I believe, if a framework rejects or left over offer
> after partial use will come to next eligible framework.
> Now the question is if one framework takes longer time to make decision,
> will the same DRF allocation cycle will stay alive to allocate rest of the
> resources to other users or master will start a new cycle?
> Is there any allocation cycle expiry period? I am using multiple in-house
> frameworks with same role and same weight with no quota set. Will
> appreciate your help to understand the resource allocation.
>
> Thanks
> Bigggyan
>

Re: Resource allocation cycle in DRF for multiple frameworks

Posted by Meng Zhu <mz...@mesosphere.com>.
Hi Bigggyan:

Q1 and Q2: Keep in mind that a framework can assert some control over its
share by rejecting resource offers.

Quote from the Mesos NSDI paper:

"To maintain a thin interface and enable frameworks to evolve
independently, Mesos does not require frameworks to specify their resource
requirements or constraints. Instead, Mesos gives frameworks the ability to
reject offers. A framework can reject resources that do not satisfy its
constraints in order to wait for ones that do. Thus, the rejection
mechanism enables frameworks to support arbitrarily complex resource
constraints while keeping Mesos simple and scalable."

Q3: Allocator needs to implement a set of callback functions, please take a
look at the base class in `include/mesos/allocator/allocator.hpp`.

Hope that helps.

-Meng

On Tue, Dec 12, 2017 at 10:28 AM, bigggyan <bi...@gmail.com> wrote:

> Hi Benjamin,
> Thanks for the detailed explanation. It took little time for me to
> understand the implementation. I would like to ask few questions regarding
> DRF  implementation which I could not figure out from the source code.
>
> Q1: DRF paper talks about the demand vector of each user and allocation
> modules goal is to satisfy the user's demand in a fair share way. I could
> not find how current allocation module is taking care of the demand of each
> users. It looks like some time frameworks are getting offers more than they
> required as allocation module can not see how much the user actually
> required.
> I tried setting demand vector in the framework at the end of the each
> "resourceOffer" call back but could not see any difference. Is it meant to
> be like this? Is allocation module is ignoring the demands knowingly?
> please guide.
>
> Q2: during allocation  "randomly picked agent is assigned to DRF_sorted
> framework"  may assign agents to a framework where that framework does not
> need the available offered resources. Say it received an offer with zero
> CPU but huge amount disk, though it required primarily CPU. Even though it
> does not use the disk resource but its share can go up for the disk
> assigned to it. So the framework may miss the next offer due to the high
> share of the disk.
>
> Why not pick a DRF_sorted framework first and check which agent can best
> fulfill its demands. Do you think in a huge production env checking each
> agent to best fit can cause a significant delay?
>
> Q3: documentation says we can add custom allocation module for custom
> needs. I was curious to know how easy it will be to tweak the code and see
> if it can make any improvement to a small cluster with a custom framework.
>
> Appreciate your help and thanks a lot.
>
> Thanks
>
> On Tue, Dec 5, 2017 at 9:28 PM, Benjamin Mahler <bm...@apache.org>
> wrote:
>
>> Q1: we randomly sort the agents, so the pseudo-code I showed is:
>>
>> - for each agent:
>> + for each agent in random_sort(agents):
>>
>> Q2: It depends on which version you're running. We used to immediately
>> re-offer, but this was problematic since it kept going back to the same
>> framework when using a low timeout. Now, the current implementation won't
>> immediately re-offer it in an attempt to let it go to another framework
>> during the next allocation "cycle":
>>
>> https://github.com/apache/mesos/blob/1.4.0/src/master/alloca
>> tor/mesos/hierarchical.cpp#L1202-L1213
>>
>> Q3: We implement best-effort DRF to improve utilization. That is, we let
>> a role go above its fair share if the lower share roles do not want the
>> resources, and a role may have to wait for the resources to be released
>> before it can get its fair share (since we cannot revoke resources). So, we
>> increase utilization at the cost of no longer providing a guarantee that a
>> role can get its fair share without waiting! In the future, we will use
>> revocation to ensure a user is guaranteed to get their fair share without
>> having to wait.
>>
>> On Tue, Dec 5, 2017 at 9:04 AM, bigggyan <bi...@gmail.com> wrote:
>>
>>> Hi Benjamin,
>>> Thanks for the clear explanation. This loop structure makes it clear to
>>> understand how resource allocation is actually happening inside mesos
>>> master allocation module. However I have few quires. I will try to ask
>>> questions to clarify them. My goal is to understand how DRF is implemented
>>> in Apache Mesos based on the DRF paper. I am doing this for an academic
>>> project to develop a custom framework.
>>> I am using few in-house frameworks along with Mesosphere Marathon and
>>> Chronos. I am using default role and no weigh to any frameworks and
>>> constraint. so  the loop becomes simpler.
>>>
>>> I understand that there exists no such cycle, but what I meant was the
>>> end of the outer loop when all the agents are allocated to frameworks.
>>>
>>> Q1: the loop "for each agent" : how one agent is being picked over
>>> other agents, to be assigned to a framework?
>>> Q2: now after all the agents are allocated to available frameworks, each
>>> framework can decide whether to use it or not. So the question is: what if
>>> a framework rejects a offer with 0 second filter duration, can it be
>>> offered to the same framework due to its low dominant share again ?  or is
>>> there any penalty that a rejected offer can not be immediately offered to
>>> the same framework?
>>>
>>> let me explain why this is important to know:
>>> User A may be using 80% of the share and user B is receiving the rest of
>>> the offers first, because of its low share, but rejecting offers due to no
>>> pending tasks to launch. Now according to DRF, master will always pick
>>>  user B first, and user A will not receive anything even though it has many
>>> tasks in the waiting queue.
>>>
>>> Q3: my observation is once a offers is declined or partially used by a
>>> framework, it immediately comes to to next available framework even though
>>> next frameworks share is higher than the previous one. Is that by
>>> implementation or I am getting something wrong here?
>>>
>>> Thanks
>>>
>>>
>>> On Mon, Dec 4, 2017 at 2:37 PM, Benjamin Mahler <bm...@apache.org>
>>> wrote:
>>>
>>>> I don't think I understood the questions here, but let me add some
>>>> explanation and we can go from there.
>>>>
>>>> Mesos will use DRF to choose an ordering amongst the roles that are
>>>> actively interested in obtaining resources. Within a role, we currently use
>>>> DRF again to choose an ordering amongst the frameworks in that role. The
>>>> simplified pseudo-code looks something like this:
>>>>
>>>> for each agent:
>>>>   for each role in drf_sorted(roles):
>>>>     for each framework subscribed to role in drf_sorted(frameworks):
>>>>       if framework already filtered these resources:
>>>>         continue
>>>>       else
>>>>         allocate to framework
>>>>
>>>> There is no strong concept of a "cycle" as you were referring to, that
>>>> is, mesos will not remember which offers were sent out during which time we
>>>> ran this overall loop. Currently, when resources are offered, as far as the
>>>> allocator is concerned, they are considered allocated to that role and
>>>> framework.
>>>>
>>>> Mesos provides an --offer_timeout flag on the master after which the
>>>> offer will be rescinded.
>>>>
>>>> If you could share a little more about what you're trying to accomplish
>>>> in your particular use case we could advise on how best to set things up.
>>>>
>>>> On Thu, Nov 30, 2017 at 1:05 PM, bigggyan <bi...@gmail.com> wrote:
>>>>
>>>>> Hello
>>>>> My understanding is, during a single DRF cycle mesos master will not
>>>>> offer same framework twice. I believe, if a framework rejects or left over
>>>>> offer after partial use will come to next eligible framework.
>>>>> Now the question is if one framework takes longer time to make
>>>>> decision, will the same DRF allocation cycle will stay alive to allocate
>>>>> rest of the resources to other users or master will start a new cycle?
>>>>> Is there any allocation cycle expiry period? I am using multiple
>>>>> in-house frameworks with same role and same weight with no quota set. Will
>>>>> appreciate your help to understand the resource allocation.
>>>>>
>>>>> Thanks
>>>>> Bigggyan
>>>>>
>>>>
>>>>
>>>
>>
>

Re: Resource allocation cycle in DRF for multiple frameworks

Posted by bigggyan <bi...@gmail.com>.
Hi Benjamin,
Thanks for the detailed explanation. It took little time for me to
understand the implementation. I would like to ask few questions regarding
DRF  implementation which I could not figure out from the source code.

Q1: DRF paper talks about the demand vector of each user and allocation
modules goal is to satisfy the user's demand in a fair share way. I could
not find how current allocation module is taking care of the demand of each
users. It looks like some time frameworks are getting offers more than they
required as allocation module can not see how much the user actually
required.
I tried setting demand vector in the framework at the end of the each
"resourceOffer" call back but could not see any difference. Is it meant to
be like this? Is allocation module is ignoring the demands knowingly?
please guide.

Q2: during allocation  "randomly picked agent is assigned to DRF_sorted
framework"  may assign agents to a framework where that framework does not
need the available offered resources. Say it received an offer with zero
CPU but huge amount disk, though it required primarily CPU. Even though it
does not use the disk resource but its share can go up for the disk
assigned to it. So the framework may miss the next offer due to the high
share of the disk.

Why not pick a DRF_sorted framework first and check which agent can best
fulfill its demands. Do you think in a huge production env checking each
agent to best fit can cause a significant delay?

Q3: documentation says we can add custom allocation module for custom
needs. I was curious to know how easy it will be to tweak the code and see
if it can make any improvement to a small cluster with a custom framework.

Appreciate your help and thanks a lot.

Thanks

On Tue, Dec 5, 2017 at 9:28 PM, Benjamin Mahler <bm...@apache.org> wrote:

> Q1: we randomly sort the agents, so the pseudo-code I showed is:
>
> - for each agent:
> + for each agent in random_sort(agents):
>
> Q2: It depends on which version you're running. We used to immediately
> re-offer, but this was problematic since it kept going back to the same
> framework when using a low timeout. Now, the current implementation won't
> immediately re-offer it in an attempt to let it go to another framework
> during the next allocation "cycle":
>
> https://github.com/apache/mesos/blob/1.4.0/src/master/alloca
> tor/mesos/hierarchical.cpp#L1202-L1213
>
> Q3: We implement best-effort DRF to improve utilization. That is, we let a
> role go above its fair share if the lower share roles do not want the
> resources, and a role may have to wait for the resources to be released
> before it can get its fair share (since we cannot revoke resources). So, we
> increase utilization at the cost of no longer providing a guarantee that a
> role can get its fair share without waiting! In the future, we will use
> revocation to ensure a user is guaranteed to get their fair share without
> having to wait.
>
> On Tue, Dec 5, 2017 at 9:04 AM, bigggyan <bi...@gmail.com> wrote:
>
>> Hi Benjamin,
>> Thanks for the clear explanation. This loop structure makes it clear to
>> understand how resource allocation is actually happening inside mesos
>> master allocation module. However I have few quires. I will try to ask
>> questions to clarify them. My goal is to understand how DRF is implemented
>> in Apache Mesos based on the DRF paper. I am doing this for an academic
>> project to develop a custom framework.
>> I am using few in-house frameworks along with Mesosphere Marathon and
>> Chronos. I am using default role and no weigh to any frameworks and
>> constraint. so  the loop becomes simpler.
>>
>> I understand that there exists no such cycle, but what I meant was the
>> end of the outer loop when all the agents are allocated to frameworks.
>>
>> Q1: the loop "for each agent" : how one agent is being picked over other
>> agents, to be assigned to a framework?
>> Q2: now after all the agents are allocated to available frameworks, each
>> framework can decide whether to use it or not. So the question is: what if
>> a framework rejects a offer with 0 second filter duration, can it be
>> offered to the same framework due to its low dominant share again ?  or is
>> there any penalty that a rejected offer can not be immediately offered to
>> the same framework?
>>
>> let me explain why this is important to know:
>> User A may be using 80% of the share and user B is receiving the rest of
>> the offers first, because of its low share, but rejecting offers due to no
>> pending tasks to launch. Now according to DRF, master will always pick
>>  user B first, and user A will not receive anything even though it has many
>> tasks in the waiting queue.
>>
>> Q3: my observation is once a offers is declined or partially used by a
>> framework, it immediately comes to to next available framework even though
>> next frameworks share is higher than the previous one. Is that by
>> implementation or I am getting something wrong here?
>>
>> Thanks
>>
>>
>> On Mon, Dec 4, 2017 at 2:37 PM, Benjamin Mahler <bm...@apache.org>
>> wrote:
>>
>>> I don't think I understood the questions here, but let me add some
>>> explanation and we can go from there.
>>>
>>> Mesos will use DRF to choose an ordering amongst the roles that are
>>> actively interested in obtaining resources. Within a role, we currently use
>>> DRF again to choose an ordering amongst the frameworks in that role. The
>>> simplified pseudo-code looks something like this:
>>>
>>> for each agent:
>>>   for each role in drf_sorted(roles):
>>>     for each framework subscribed to role in drf_sorted(frameworks):
>>>       if framework already filtered these resources:
>>>         continue
>>>       else
>>>         allocate to framework
>>>
>>> There is no strong concept of a "cycle" as you were referring to, that
>>> is, mesos will not remember which offers were sent out during which time we
>>> ran this overall loop. Currently, when resources are offered, as far as the
>>> allocator is concerned, they are considered allocated to that role and
>>> framework.
>>>
>>> Mesos provides an --offer_timeout flag on the master after which the
>>> offer will be rescinded.
>>>
>>> If you could share a little more about what you're trying to accomplish
>>> in your particular use case we could advise on how best to set things up.
>>>
>>> On Thu, Nov 30, 2017 at 1:05 PM, bigggyan <bi...@gmail.com> wrote:
>>>
>>>> Hello
>>>> My understanding is, during a single DRF cycle mesos master will not
>>>> offer same framework twice. I believe, if a framework rejects or left over
>>>> offer after partial use will come to next eligible framework.
>>>> Now the question is if one framework takes longer time to make
>>>> decision, will the same DRF allocation cycle will stay alive to allocate
>>>> rest of the resources to other users or master will start a new cycle?
>>>> Is there any allocation cycle expiry period? I am using multiple
>>>> in-house frameworks with same role and same weight with no quota set. Will
>>>> appreciate your help to understand the resource allocation.
>>>>
>>>> Thanks
>>>> Bigggyan
>>>>
>>>
>>>
>>
>

Re: Resource allocation cycle in DRF for multiple frameworks

Posted by Benjamin Mahler <bm...@apache.org>.
Q1: we randomly sort the agents, so the pseudo-code I showed is:

- for each agent:
+ for each agent in random_sort(agents):

Q2: It depends on which version you're running. We used to immediately
re-offer, but this was problematic since it kept going back to the same
framework when using a low timeout. Now, the current implementation won't
immediately re-offer it in an attempt to let it go to another framework
during the next allocation "cycle":

https://github.com/apache/mesos/blob/1.4.0/src/master/
allocator/mesos/hierarchical.cpp#L1202-L1213

Q3: We implement best-effort DRF to improve utilization. That is, we let a
role go above its fair share if the lower share roles do not want the
resources, and a role may have to wait for the resources to be released
before it can get its fair share (since we cannot revoke resources). So, we
increase utilization at the cost of no longer providing a guarantee that a
role can get its fair share without waiting! In the future, we will use
revocation to ensure a user is guaranteed to get their fair share without
having to wait.

On Tue, Dec 5, 2017 at 9:04 AM, bigggyan <bi...@gmail.com> wrote:

> Hi Benjamin,
> Thanks for the clear explanation. This loop structure makes it clear to
> understand how resource allocation is actually happening inside mesos
> master allocation module. However I have few quires. I will try to ask
> questions to clarify them. My goal is to understand how DRF is implemented
> in Apache Mesos based on the DRF paper. I am doing this for an academic
> project to develop a custom framework.
> I am using few in-house frameworks along with Mesosphere Marathon and
> Chronos. I am using default role and no weigh to any frameworks and
> constraint. so  the loop becomes simpler.
>
> I understand that there exists no such cycle, but what I meant was the end
> of the outer loop when all the agents are allocated to frameworks.
>
> Q1: the loop "for each agent" : how one agent is being picked over other
> agents, to be assigned to a framework?
> Q2: now after all the agents are allocated to available frameworks, each
> framework can decide whether to use it or not. So the question is: what if
> a framework rejects a offer with 0 second filter duration, can it be
> offered to the same framework due to its low dominant share again ?  or is
> there any penalty that a rejected offer can not be immediately offered to
> the same framework?
>
> let me explain why this is important to know:
> User A may be using 80% of the share and user B is receiving the rest of
> the offers first, because of its low share, but rejecting offers due to no
> pending tasks to launch. Now according to DRF, master will always pick
>  user B first, and user A will not receive anything even though it has many
> tasks in the waiting queue.
>
> Q3: my observation is once a offers is declined or partially used by a
> framework, it immediately comes to to next available framework even though
> next frameworks share is higher than the previous one. Is that by
> implementation or I am getting something wrong here?
>
> Thanks
>
>
> On Mon, Dec 4, 2017 at 2:37 PM, Benjamin Mahler <bm...@apache.org>
> wrote:
>
>> I don't think I understood the questions here, but let me add some
>> explanation and we can go from there.
>>
>> Mesos will use DRF to choose an ordering amongst the roles that are
>> actively interested in obtaining resources. Within a role, we currently use
>> DRF again to choose an ordering amongst the frameworks in that role. The
>> simplified pseudo-code looks something like this:
>>
>> for each agent:
>>   for each role in drf_sorted(roles):
>>     for each framework subscribed to role in drf_sorted(frameworks):
>>       if framework already filtered these resources:
>>         continue
>>       else
>>         allocate to framework
>>
>> There is no strong concept of a "cycle" as you were referring to, that
>> is, mesos will not remember which offers were sent out during which time we
>> ran this overall loop. Currently, when resources are offered, as far as the
>> allocator is concerned, they are considered allocated to that role and
>> framework.
>>
>> Mesos provides an --offer_timeout flag on the master after which the
>> offer will be rescinded.
>>
>> If you could share a little more about what you're trying to accomplish
>> in your particular use case we could advise on how best to set things up.
>>
>> On Thu, Nov 30, 2017 at 1:05 PM, bigggyan <bi...@gmail.com> wrote:
>>
>>> Hello
>>> My understanding is, during a single DRF cycle mesos master will not
>>> offer same framework twice. I believe, if a framework rejects or left over
>>> offer after partial use will come to next eligible framework.
>>> Now the question is if one framework takes longer time to make decision,
>>> will the same DRF allocation cycle will stay alive to allocate rest of the
>>> resources to other users or master will start a new cycle?
>>> Is there any allocation cycle expiry period? I am using multiple
>>> in-house frameworks with same role and same weight with no quota set. Will
>>> appreciate your help to understand the resource allocation.
>>>
>>> Thanks
>>> Bigggyan
>>>
>>
>>
>

Re: Resource allocation cycle in DRF for multiple frameworks

Posted by bigggyan <bi...@gmail.com>.
Hi Benjamin,
Thanks for the clear explanation. This loop structure makes it clear to
understand how resource allocation is actually happening inside mesos
master allocation module. However I have few quires. I will try to ask
questions to clarify them. My goal is to understand how DRF is implemented
in Apache Mesos based on the DRF paper. I am doing this for an academic
project to develop a custom framework.
I am using few in-house frameworks along with Mesosphere Marathon and
Chronos. I am using default role and no weigh to any frameworks and
constraint. so  the loop becomes simpler.

I understand that there exists no such cycle, but what I meant was the end
of the outer loop when all the agents are allocated to frameworks.

Q1: the loop "for each agent" : how one agent is being picked over other
agents, to be assigned to a framework?
Q2: now after all the agents are allocated to available frameworks, each
framework can decide whether to use it or not. So the question is: what if
a framework rejects a offer with 0 second filter duration, can it be
offered to the same framework due to its low dominant share again ?  or is
there any penalty that a rejected offer can not be immediately offered to
the same framework?

let me explain why this is important to know:
User A may be using 80% of the share and user B is receiving the rest of
the offers first, because of its low share, but rejecting offers due to no
pending tasks to launch. Now according to DRF, master will always pick
 user B first, and user A will not receive anything even though it has many
tasks in the waiting queue.

Q3: my observation is once a offers is declined or partially used by a
framework, it immediately comes to to next available framework even though
next frameworks share is higher than the previous one. Is that by
implementation or I am getting something wrong here?

Thanks


On Mon, Dec 4, 2017 at 2:37 PM, Benjamin Mahler <bm...@apache.org> wrote:

> I don't think I understood the questions here, but let me add some
> explanation and we can go from there.
>
> Mesos will use DRF to choose an ordering amongst the roles that are
> actively interested in obtaining resources. Within a role, we currently use
> DRF again to choose an ordering amongst the frameworks in that role. The
> simplified pseudo-code looks something like this:
>
> for each agent:
>   for each role in drf_sorted(roles):
>     for each framework subscribed to role in drf_sorted(frameworks):
>       if framework already filtered these resources:
>         continue
>       else
>         allocate to framework
>
> There is no strong concept of a "cycle" as you were referring to, that is,
> mesos will not remember which offers were sent out during which time we ran
> this overall loop. Currently, when resources are offered, as far as the
> allocator is concerned, they are considered allocated to that role and
> framework.
>
> Mesos provides an --offer_timeout flag on the master after which the offer
> will be rescinded.
>
> If you could share a little more about what you're trying to accomplish in
> your particular use case we could advise on how best to set things up.
>
> On Thu, Nov 30, 2017 at 1:05 PM, bigggyan <bi...@gmail.com> wrote:
>
>> Hello
>> My understanding is, during a single DRF cycle mesos master will not
>> offer same framework twice. I believe, if a framework rejects or left over
>> offer after partial use will come to next eligible framework.
>> Now the question is if one framework takes longer time to make decision,
>> will the same DRF allocation cycle will stay alive to allocate rest of the
>> resources to other users or master will start a new cycle?
>> Is there any allocation cycle expiry period? I am using multiple in-house
>> frameworks with same role and same weight with no quota set. Will
>> appreciate your help to understand the resource allocation.
>>
>> Thanks
>> Bigggyan
>>
>
>