You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@mesos.apache.org by Grégoire Seux <g....@criteo.com> on 2020/07/07 10:14:12 UTC

Re: [BULK]Re: cgroup CPUSET for mesos agent

Hello,

I'd like to give you a return of experience because we've worked on this last year.
We've used CFS bandwidth isolation for several years and encountered many issues (lack of predictability, bugs present in old linux kernels and lack of cache/memory locality). At some point, we've implemented a custom isolator to manage cpusets (using https://github.com/criteo/mesos-command-modules/ as a base to write an isolator in a scripting language).

The isolator had a very simple behavior: upon new task, look at which cpus are not within a cpuset cgroup, select (if possible) cpus from the same numa node and create cpuset cgroup for the starting task.
In practice, it provided a general decrease of cpu consumption (up to 8% of some cpu intensive applications) and better ability to reason about the cpu isolation model.
The allocation is optimistic: it tries to use cpus from the same numa node but if it's not possible, task is spread accross nodes. In practice it happens very rarely because of one small optimization to assign cpus from the most loaded numa node (decreasing fragmentation of available cpus accross numa nodes).

I'd be glad to give more details if you are interested

--
Grégoire

Re: [BULK]Re: cgroup CPUSET for mesos agent

Posted by Vinod Kone <vi...@apache.org>.

Great to hear! Thanks for the update.

On Thu, Jan 14, 2021 at 5:18 PM Charles-François Natali <cf...@gmail.com>
wrote:

> It's a bit old but in case it could help, we recently implemented this
> at work - here's how we did it:
> - the NUMA topology is exposed via agent custom resources
> - the framework does the allocation of the corresponding resources to
> the tasks according to the NUMA topology: e.g. if the task requests 2
> CPUs within the same NUMA node, the framework would allocate them
> - a custom executor then implements the CPU affinity/cpuset using the
> resources provided by the framework
>
> It works really nicely.
>
> Cheers,
>
> Charles
>
>
> Le mar. 7 juil. 2020 à 18:12, Milind Chabbi <mi...@uber.com> a écrit :
> >
> > Grégoire, thanks for your reply. This is super helpful to make a
> stronger case around the affinity benefits.
> > Would you be able to offer additional details that you mentioned? I am
> definitely interested.
> > Is your isolator source code publicly available?
> >
> > -Milind
> >
> > On Tue, Jul 7, 2020 at 3:14 AM Grégoire Seux <g....@criteo.com> wrote:
> >>
> >> Hello,
> >>
> >> I'd like to give you a return of experience because we've worked on
> this last year.
> >> We've used CFS bandwidth isolation for several years and encountered
> many issues (lack of predictability, bugs present in old linux kernels and
> lack of cache/memory locality). At some point, we've implemented a custom
> isolator to manage cpusets (using
> https://github.com/criteo/mesos-command-modules/ as a base to write an
> isolator in a scripting language).
> >>
> >> The isolator had a very simple behavior: upon new task, look at which
> cpus are not within a cpuset cgroup, select (if possible) cpus from the
> same numa node and create cpuset cgroup for the starting task.
> >> In practice, it provided a general decrease of cpu consumption (up to
> 8% of some cpu intensive applications) and better ability to reason about
> the cpu isolation model.
> >> The allocation is optimistic: it tries to use cpus from the same numa
> node but if it's not possible, task is spread accross nodes. In practice it
> happens very rarely because of one small optimization to assign cpus from
> the most loaded numa node (decreasing fragmentation of available cpus
> accross numa nodes).
> >>
> >> I'd be glad to give more details if you are interested
> >>
> >> --
> >> Grégoire
>

Re: [BULK]Re: cgroup CPUSET for mesos agent

Posted by Charles-François Natali <cf...@gmail.com>.

It's a bit old but in case it could help, we recently implemented this
at work - here's how we did it:
- the NUMA topology is exposed via agent custom resources
- the framework does the allocation of the corresponding resources to
the tasks according to the NUMA topology: e.g. if the task requests 2
CPUs within the same NUMA node, the framework would allocate them
- a custom executor then implements the CPU affinity/cpuset using the
resources provided by the framework

It works really nicely.

Cheers,

Charles


Le mar. 7 juil. 2020 à 18:12, Milind Chabbi <mi...@uber.com> a écrit :
>
> Grégoire, thanks for your reply. This is super helpful to make a stronger case around the affinity benefits.
> Would you be able to offer additional details that you mentioned? I am definitely interested.
> Is your isolator source code publicly available?
>
> -Milind
>
> On Tue, Jul 7, 2020 at 3:14 AM Grégoire Seux <g....@criteo.com> wrote:
>>
>> Hello,
>>
>> I'd like to give you a return of experience because we've worked on this last year.
>> We've used CFS bandwidth isolation for several years and encountered many issues (lack of predictability, bugs present in old linux kernels and lack of cache/memory locality). At some point, we've implemented a custom isolator to manage cpusets (using https://github.com/criteo/mesos-command-modules/ as a base to write an isolator in a scripting language).
>>
>> The isolator had a very simple behavior: upon new task, look at which cpus are not within a cpuset cgroup, select (if possible) cpus from the same numa node and create cpuset cgroup for the starting task.
>> In practice, it provided a general decrease of cpu consumption (up to 8% of some cpu intensive applications) and better ability to reason about the cpu isolation model.
>> The allocation is optimistic: it tries to use cpus from the same numa node but if it's not possible, task is spread accross nodes. In practice it happens very rarely because of one small optimization to assign cpus from the most loaded numa node (decreasing fragmentation of available cpus accross numa nodes).
>>
>> I'd be glad to give more details if you are interested
>>
>> --
>> Grégoire

Re: [BULK]Re: cgroup CPUSET for mesos agent

Posted by Milind Chabbi <mi...@uber.com>.

Grégoire, thanks for your reply. This is super helpful to make a
stronger case around the affinity benefits.
Would you be able to offer additional details that you mentioned? I am
definitely interested.
Is your isolator source code publicly available?

-Milind

On Tue, Jul 7, 2020 at 3:14 AM Grégoire Seux <g....@criteo.com> wrote:

> Hello,
>
> I'd like to give you a return of experience because we've worked on this
> last year.
> We've used CFS bandwidth isolation for several years and encountered many
> issues (lack of predictability, bugs present in old linux kernels and lack
> of cache/memory locality). At some point, we've implemented a custom
> isolator to manage cpusets (using
> https://github.com/criteo/mesos-command-modules/ as a base to write an
> isolator in a scripting language).
>
> The isolator had a very simple behavior: upon new task, look at which cpus
> are not within a cpuset cgroup, select (if possible) cpus from the same
> numa node and create cpuset cgroup for the starting task.
> In practice, it provided a general decrease of cpu consumption (up to 8%
> of some cpu intensive applications) and better ability to reason about the
> cpu isolation model.
> The allocation is optimistic: it tries to use cpus from the same numa node
> but if it's not possible, task is spread accross nodes. In practice it
> happens very rarely because of one small optimization to assign cpus from
> the most loaded numa node (decreasing fragmentation of available cpus
> accross numa nodes).
>
> I'd be glad to give more details if you are interested
>
> --
> Grégoire
>