You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Kevin Klues (JIRA)" <ji...@apache.org> on 2017/04/25 21:24:04 UTC

[jira] [Comment Edited] (MESOS-7375) provide additional insight for framework developers re: GPU_RESOURCES capability

    [ https://issues.apache.org/jira/browse/MESOS-7375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15983645#comment-15983645 ] 

Kevin Klues edited comment on MESOS-7375 at 4/25/17 9:23 PM:
-------------------------------------------------------------

The flag you are thinking of is {{\-\-allocator_fairness_excluded_resource_names}} (i.e. you can set it as {{\-\-allocator_fairness_excluded_resource_names=gpus}}).

Regarding motivation for the GPU_RESOURCES capability-- here is an excerpt from an email I sent out recently:

"""
Ideally, marathon (and any other frameworks -- SDK include) should do some sort of preferential scheduling when they opt-in to use GPUs.  That is, they should *prefer* to run GPU jobs on GPU machines and non-GPU jobs on non-GPU machines (falling back to running them on GPU machines only if that is all that is available).

Additionally, we need a way for an operator to indicate whether GPUs are a scarce resource in their cluster or not. We have a flag in mesos that allows us to set this ( `--allocator_fairness_excluded_resource_names=gpus`), but we don't yet have a way of setting this through DC/OS. If we don't set this flag, we run the risk of Mesos's DRF algorithm choosing to very rarely send out offers from GPU machines once the first GPU job has been launched on them.

As a concrete example, imagine you have a machine with only 1 GPU and you launch a task that consumes it -- from DRF's perspective that node now has 100% usage of one of its resources. Even if you have 2 GPUs, and one gets consumed, DRF still thinks you have consumed 50% of one of its resources. Out of fairness, DRF will choose not to send offers from you until some other resource on *all* other nodes approaches 50% as well (which may take a while if you are allocating CPUs, memory, and disk in small increments).

Right now we don't set `--allocator_fairness_excluded_resource_names=gpus` in DC/OS (but maybe we should?). Is it the case that most DC/OS users only install GPUs on a small number of nodes in their cluster? If so, we should consider it a scarce resource and set this flag by default. If not, then GPUs aren't actually a scarce resource and we shouldn't be setting this flag-- DRF will perform as expected without it.
"""


was (Author: klueska):
The flag you are thinking of is {{\-\-allocator_fairness_excluded_resource_names}} (i.e. you can set it as {{\-\-allocator_fairness_excluded_resource_names=gpus}}).

Regarding motivation for the GPU_RESOURCES capability-- here is an excerpt from an email I sent out recently:
{noformat}
Ideally, marathon (and any other frameworks -- SDK include) should do some sort of preferential scheduling when they opt-in to use GPUs.  That is, they should *prefer* to run GPU jobs on GPU machines and non-GPU jobs on non-GPU machines (falling back to running them on GPU machines only if that is all that is available).

Additionally, we need a way for an operator to indicate whether GPUs are a scarce resource in their cluster or not. We have a flag in mesos that allows us to set this ( `--allocator_fairness_excluded_resource_names=gpus`), but we don't yet have a way of setting this through DC/OS. If we don't set this flag, we run the risk of Mesos's DRF algorithm choosing to very rarely send out offers from GPU machines once the first GPU job has been launched on them.

As a concrete example, imagine you have a machine with only 1 GPU and you launch a task that consumes it -- from DRF's perspective that node now has 100% usage of one of its resources. Even if you have 2 GPUs, and one gets consumed, DRF still thinks you have consumed 50% of one of its resources. Out of fairness, DRF will choose not to send offers from you until some other resource on *all* other nodes approaches 50% as well (which may take a while if you are allocating CPUs, memory, and disk in small increments).

Right now we don't set `--allocator_fairness_excluded_resource_names=gpus` in DC/OS (but maybe we should?). Is it the case that most DC/OS users only install GPUs on a small number of nodes in their cluster? If so, we should consider it a scarce resource and set this flag by default. If not, then GPUs aren't actually a scarce resource and we shouldn't be setting this flag-- DRF will perform as expected without it.
{noformat}

> provide additional insight for framework developers re: GPU_RESOURCES capability
> --------------------------------------------------------------------------------
>
>                 Key: MESOS-7375
>                 URL: https://issues.apache.org/jira/browse/MESOS-7375
>             Project: Mesos
>          Issue Type: Bug
>          Components: allocation
>            Reporter: James DeFelice
>              Labels: mesosphere
>
> On clusters where all nodes are equal and every node has a GPU, frameworks that **don't** opt-in to the `GPU_RESOURCES` capability won't get any offers. This is surprising for operators.
> Even when a framework doesn't **need** GPU resources, it may make sense for a framework scheduler to provide a `--gpu-cluster-compat` (or similar) flag that results in the framework advertising the `GPU_RESOURCES` capability even though it does not intend to consume any GPU. The effect being that said framework will now receive offers on clusters where all nodes have GPU resources.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)