You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@aurora.apache.org by "John Sirois (JIRA)" <ji...@apache.org> on 2016/04/13 00:26:25 UTC

[jira] [Commented] (AURORA-1662) No Memory and CPU Enforcement

    [ https://issues.apache.org/jira/browse/AURORA-1662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15238115#comment-15238115 ] 

John Sirois commented on AURORA-1662:
-------------------------------------

See interesting relevant mesos-slave (0.26.0) flags below.  You want cgroups {{--isolation}} if your kernels support cgroups.

{noformat}
Usage: mesos-slave [options]

...
  --[no-]cgroups_cpu_enable_pids_and_tids_count     Cgroups feature flag to enable counting of processes and threads
                                                    inside a container.
                                                    (default: false)
  --[no-]cgroups_enable_cfs                         Cgroups feature flag to enable hard limits on CPU resources
                                                    via the CFS bandwidth limiting subfeature.
                                                    (default: false)
  --cgroups_hierarchy=VALUE                         The path to the cgroups hierarchy root
                                                    (default: /sys/fs/cgroup)
  --[no-]cgroups_limit_swap                         Cgroups feature flag to enable memory limits on both memory and
                                                    swap instead of just memory.
                                                    (default: false)
  --cgroups_root=VALUE                              Name of the root cgroup
                                                    (default: mesos)
                                                    (default: mesos)
  --container_disk_watch_interval=VALUE             The interval between disk quota checks for containers. This flag is
                                                    used for the 'posix/disk' isolator. (default: 15secs)
  --containerizer_path=VALUE                        The path to the external containerizer executable used when
                                                    external isolation is activated (--isolation=external).
  --containerizers=VALUE                            Comma-separated list of containerizer implementations
                                                    to compose in order to provide containerization.
                                                    Available options are 'mesos', 'external', and
                                                    'docker' (on Linux). The order the containerizers
                                                    are specified is the order they are tried
                                                    (--containerizers=mesos).
                                                    (default: mesos)
...
  --isolation=VALUE                                 Isolation mechanisms to use, e.g., 'posix/cpu,posix/mem', or
                                                    'cgroups/cpu,cgroups/mem', or network/port_mapping
                                                    (configure with flag: --with-network-isolator to enable),
                                                    or 'external', or load an alternate isolator module using
                                                    the --modules flag. Note that this flag is only relevant
                                                    for the Mesos Containerizer. (default: posix/cpu,posix/mem)
...
{noformat}


> No Memory and CPU Enforcement
> -----------------------------
>
>                 Key: AURORA-1662
>                 URL: https://issues.apache.org/jira/browse/AURORA-1662
>             Project: Aurora
>          Issue Type: Bug
>          Components: Executor
>    Affects Versions: 0.11.0
>            Reporter: zane silver
>
> I'm running a job that is consuming more memory (ram) than I've been allocated. The Mesos and Aurora UIs properly display the memory utilization vs the allocated/requested amount. However, the executor is not stopped once the job extends beyond it's limit. There appears to be no enforcement.
> Looking at the source, it also seems that there is only enforcement on the disk usage. I see in (src/main/python/apache/aurora/executor/common/resource_manager.py) the ResourceManager status() method, that only disk is explicitly checked.
> I feel like I must be missing something and that the enforcement for cpu and memory is actually elsewhere. If not, this is an easy fix.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)