You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Pierre Cheynier (JIRA)" <ji...@apache.org> on 2017/11/16 18:42:00 UTC

[jira] [Commented] (MESOS-6575) Change `disk/xfs` isolator to terminate executor when it exceeds quota

    [ https://issues.apache.org/jira/browse/MESOS-6575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16255758#comment-16255758 ] 

Pierre Cheynier commented on MESOS-6575:
----------------------------------------

We may also be interested in this feature.

Actually, XFS offer real enforcement and this is what's nice with it (avoid someone to fallocate the whole disk).
But, a lot of applications are not developed to handle EDQUOT correctly (think what happens on a non-containerized environment), or cannot react preventively because they are not directly aware of what's happening (a companion process is filling up the disk by writing logs, etc.). So it's better to actually kill the task, like what's happening with oom-killer when using {{cgroups/memory}}.

So, our feeling is that we could leverage the XFS soft limit and eventually the timer to introduce more modularity:
* it would have to be specified at the agent level that you want to enforce (probably by reusing {{enforce_container_disk}} as suggested here)
* the soft limit would be customizable (ex: soft limit = hard limit  - 2%)
* a collector would watch the container to eventually reach the soft limit and eventually kill the container, like what cgroups/mem is performing indirectly by relying on Linux oom-killer (and like what disk/du did for disk usage).

What do you think?

> Change `disk/xfs` isolator to terminate executor when it exceeds quota
> ----------------------------------------------------------------------
>
>                 Key: MESOS-6575
>                 URL: https://issues.apache.org/jira/browse/MESOS-6575
>             Project: Mesos
>          Issue Type: Task
>          Components: agent, containerization
>            Reporter: Santhosh Kumar Shanmugham
>
> Unlike {{disk/du}} isolator which sends a {{ContainerLimitation}} protobuf when the executor exceeds the quota, {{disk/xfs}} isolator, which relies on XFS's internal quota enforcement, silently fails the {{write}} operation, that causes the quota limit to be exceeded, without surfacing the quota breach information.
> This task is to change the `disk/xfs` isolator so that, a {{ContainerLimitation}} message is triggered when the quota is exceeded. 
> This feature will rely on the underlying filesystem being mounted with {{pqnoenforce}} (accounting-only mode), so that XFS does not silently causes a {{EDQUOT}} error on writes that causes the quota to be exceeded. Now the isolator can track the disk quota via {{xfs_quota}}, very much like {{disk/du}} using {{du}}, every {{container_disk_watch_interval}} and surface the disk quota limit exceed event via a {{ContainerLimitation}} protobuf, causing the executor to be terminated. This feature can then be turned on/off via the existing {{enforce_container_disk_quota}} option.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)