You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Charles (Jira)" <ji...@apache.org> on 2020/01/20 20:53:00 UTC

[jira] [Comment Edited] (MESOS-1807) Disallow executors with cpu only or memory only resources

    [ https://issues.apache.org/jira/browse/MESOS-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17019733#comment-17019733 ] 

Charles edited comment on MESOS-1807 at 1/20/20 8:52 PM:
---------------------------------------------------------

Is there any way I could help this move forward?

I just got bitten by this where my custom executor would lead to random errors described as [~vinodkone] "when the last task on the executor finishes and Containerizer::update() is called with 0 cpus or 0 mem.". See for example https://github.com/mesos/chronos/issues/428

{noformat}
ec2-__-___-___-___.compute-1.amazonaws.com E0414 00:41:50.864876 29069 slave.cpp:2344] Failed to update resources for container 867bfec1-ac28-4a4f-8904-3404e6d1e3e9 of executor shell-wrapper-executor running task ct:1428972109061:0:my-chronos-job on status update for terminal task, destroying container: Collect failed: No cpus resource given
{noformat}


In the mean time what's the proper workaround? Always define CPU and memory resources for the executor? It's a bit annoying because it effectively means arbitrarily limiting the CPU usage of the task (e.g. if there's 1 core and we allocate 0.01 CPU to the executor, we only have 0.99 left for the task), but I guess there's no really any way around that.

In any case returning an error before accepting the tasks is better than accepting them with a warning and then randomly fail at a later point when the last task on the executor finishes.

Maybe [~bmahler] has an idea?



was (Author: charle):
Is there any way I could help this move forward?

I just got bitten by this where my custom executor would lead to random errors described as [~vinodkone] "when the last task on the executor finishes and Containerizer::update() is called with 0 cpus or 0 mem.". See for example https://github.com/mesos/chronos/issues/428

{noformat}
ec2-__-___-___-___.compute-1.amazonaws.com E0414 00:41:50.864876 29069 slave.cpp:2344] Failed to update resources for container 867bfec1-ac28-4a4f-8904-3404e6d1e3e9 of executor shell-wrapper-executor running task ct:1428972109061:0:my-chronos-job on status update for terminal task, destroying container: Collect failed: No cpus resource given
{noformat}


In the mean time what's the proper workaround? Always define CPU and memory resources for the executor? It's a bit annoying because it effectively means arbitrarily limiting the CPU usage of the task (e.g. if there's 1 core and we allocate 0.01 CPU to the executor, we only have 0.99 left for the task), but I guess there's no really any way around that. Maybe [~bmahler] has an idea?


> Disallow executors with cpu only or memory only resources
> ---------------------------------------------------------
>
>                 Key: MESOS-1807
>                 URL: https://issues.apache.org/jira/browse/MESOS-1807
>             Project: Mesos
>          Issue Type: Improvement
>            Reporter: Vinod Kone
>            Priority: Major
>         Attachments: Screenshot 2015-07-28 14.40.35.png
>
>
> Currently master allows executors to be launched with either only cpus or only memory but we shouldn't allow that.
> This is because executor is an actual unix process that is launched by the slave. If an executor doesn't specify cpus, what should the cpu limits be for that executor when there are no tasks running on it? If no cpu limits are set then it might starve other executors/tasks on the slave violating isolation guarantees. Same goes with memory. Moreover, the current containerizer/isolator code will throw failures when using such an executor, e.g., when the last task on the executor finishes and Containerizer::update() is called with 0 cpus or 0 mem.
> According to a source code [TODO | https://github.com/apache/mesos/blob/0226620747e1769434a1a83da547bfc3470a9549/src/master/validation.cpp#L400] this should also include checking whether requested resources are greater than  MIN_CPUS/MIN_BYTES.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)