You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by "Nielsen, Niklas" <ni...@intel.com> on 2016/01/22 13:53:10 UTC

Core affinity in Mesos

Hi everyone,

We have been talking about core affinity in Mesos for a while, and Ian D. has recently been giving this topic thought in his ‘exclusive resources’ proposal [1].
Trying to avoid too conservative placements, latency critical workloads are at risk without it.
We are interested in the topic through our work on oversubscription in Serenity [2], as oversubscription was exactly to be able to colocate latency critical and best-effort batch jobs.
We had an informal meeting yesterday, going over the proposal and trying to get some cadence behind the capability.

It is a tricky but exciting topic:
 - How do we avoid making task launch even more complex? How do we express the topology and acquire parts of it. Do we use hints on the affinity properties instead?
 - How do we mix pinned with normal ‘floating’ tasks.
 - How do we convey information to the resource estimator about the task sensitivity.

Note, above list not meant for inlined discussion or answers. Let’s collect feedback on the proposals themselves.

Here are our proposed next steps:
 - We are going to use the ‘Isolation Working Group’ as an umbrella for this. I will fill in details and members.
 - We will schedule an online meeting within the Wednesday 9AM PST next week discussing next steps. I will share a hangout link when we get closer.
 - Plan being, getting to designs (maybe more than one) we agree on and then scope out and distribute the work needed to be done.

Who ever is interested, join us. The use cases for this work are critical. Maybe we can even work on some representative workloads we can verify our proposal against.

Cheers,
Niklas

PS For comments on the proposal itself, please refer to Ian’s thread for the dev list [3].

[1] https://issues.apache.org/jira/browse/MESOS-4138
[2] https://github.com/mesosphere/serenity
[3] https://www.mail-archive.com/dev%40mesos.apache.org/msg33892.html

Re: Core affinity in Mesos

Posted by Niklas Nielsen <ni...@qni.dk>.
Ben,

I agree that isolation encompass more than performance isolation, but
instead of inflating with too granular working groups, I thought we could
start work under the 'isolation' working group. The group was passive
before but had an entry in the document. I have no real preference and can
rename to 'performance isolation'.


Deepak,

We are very interested in that area as well. Placement biases based on
interference/sensitivity profiles, balancing power and load, etc. Hope that
we can get to a nice decoupled way of doing this, so those details
(analysis, objectives, etc) doesn't leak into the allocator.

Cheers,
Niklas

On Fri, Jan 29, 2016 at 8:48 PM, Deepak Vij (A) <de...@huawei.com>
wrote:

> On the similar lines, Interference-aware scheduling could be one of the
> desired capabilities from a Resource Manager like Mesos. This essentially
> is tied into the fact that all data centers/nodes are not really
> homogeneous. Typically, it is assumed that all placement choices are
> equally good. Although, different types of machines are mixed within the
> same cluster, and co-located tasks compete for resources, which leads to
> negative interference.
>
> In order to solve Interference-aware scheduling problem, one might have to
> periodically monitor running tasks performance and use the information
> collected to make better future scheduling decisions. Having explicit
> information about the environment helps make optimal choices for
> co-scheduling and workload partitioning, and may yield superior performance
> on many common workloads. Collected detailed resource utilization and
> performance profiles from running tasks could be things such as measuring
> CPU and memory usage, cache misses etc. etc.
>
> My question is would such Interference-aware scheduling capability fit
> into the similar category or it should be something separate altogether.
> Thanks.
>
> Regards,
> Deepak Vij
> (Huawei Software Lab., Santa Clara)
>
> -----Original Message-----
> From: Kevin Klues [mailto:klueska@gmail.com]
> Sent: Friday, January 29, 2016 11:28 AM
> To: dev@mesos.apache.org
> Subject: Re: Core affinity in Mesos
>
> I agree. "Isolation" on it's own is too broad a term. However, since
> we are talking mostly about reducing interference, which typically
> implies performance isolation, my vote for the group name is the
> "Performance Isolation Working Group".
>
> On Fri, Jan 29, 2016 at 11:22 AM, Benjamin Mahler <bm...@apache.org>
> wrote:
> > Since "Isolation" applies broadly outside of the context of addressing
> > latency sensitive workloads (e.g. user/pid/network namespacing,
> > resource limitations (e.g. cpu quota, memory limits, gpu device
> visibility) it
> > would be great to choose a more specific name. Some suggestions:
> > interference, performance-related isolation, colocation, latency
> > sensitivity.
> >
> > Thoughts?
> >
> > Looking forward to seeing the discussions here!
> >
> > Ben
> >
> > On Friday, January 22, 2016, Nielsen, Niklas <ni...@intel.com>
> > wrote:
> >
> >> Hi everyone,
> >>
> >> We have been talking about core affinity in Mesos for a while, and Ian
> D.
> >> has recently been giving this topic thought in his ‘exclusive resources’
> >> proposal [1].
> >> Trying to avoid too conservative placements, latency critical workloads
> >> are at risk without it.
> >> We are interested in the topic through our work on oversubscription in
> >> Serenity [2], as oversubscription was exactly to be able to colocate
> >> latency critical and best-effort batch jobs.
> >> We had an informal meeting yesterday, going over the proposal and trying
> >> to get some cadence behind the capability.
> >>
> >> It is a tricky but exciting topic:
> >>  - How do we avoid making task launch even more complex? How do we
> express
> >> the topology and acquire parts of it. Do we use hints on the affinity
> >> properties instead?
> >>  - How do we mix pinned with normal ‘floating’ tasks.
> >>  - How do we convey information to the resource estimator about the task
> >> sensitivity.
> >>
> >> Note, above list not meant for inlined discussion or answers. Let’s
> >> collect feedback on the proposals themselves.
> >>
> >> Here are our proposed next steps:
> >>  - We are going to use the ‘Isolation Working Group’ as an umbrella for
> >> this. I will fill in details and members.
> >>  - We will schedule an online meeting within the Wednesday 9AM PST next
> >> week discussing next steps. I will share a hangout link when we get
> closer.
> >>  - Plan being, getting to designs (maybe more than one) we agree on and
> >> then scope out and distribute the work needed to be done.
> >>
> >> Who ever is interested, join us. The use cases for this work are
> critical.
> >> Maybe we can even work on some representative workloads we can verify
> our
> >> proposal against.
> >>
> >> Cheers,
> >> Niklas
> >>
> >> PS For comments on the proposal itself, please refer to Ian’s thread for
> >> the dev list [3].
> >>
> >> [1] https://issues.apache.org/jira/browse/MESOS-4138
> >> [2] https://github.com/mesosphere/serenity
> >> [3] https://www.mail-archive.com/dev%40mesos.apache.org/msg33892.html
> >>
>
>
>
> --
> ~Kevin
>



-- 
Niklas

RE: Core affinity in Mesos

Posted by "Deepak Vij (A)" <de...@huawei.com>.
On the similar lines, Interference-aware scheduling could be one of the desired capabilities from a Resource Manager like Mesos. This essentially is tied into the fact that all data centers/nodes are not really homogeneous. Typically, it is assumed that all placement choices are equally good. Although, different types of machines are mixed within the same cluster, and co-located tasks compete for resources, which leads to negative interference.

In order to solve Interference-aware scheduling problem, one might have to periodically monitor running tasks performance and use the information collected to make better future scheduling decisions. Having explicit information about the environment helps make optimal choices for co-scheduling and workload partitioning, and may yield superior performance on many common workloads. Collected detailed resource utilization and performance profiles from running tasks could be things such as measuring CPU and memory usage, cache misses etc. etc.

My question is would such Interference-aware scheduling capability fit into the similar category or it should be something separate altogether. Thanks.

Regards,
Deepak Vij
(Huawei Software Lab., Santa Clara)

-----Original Message-----
From: Kevin Klues [mailto:klueska@gmail.com] 
Sent: Friday, January 29, 2016 11:28 AM
To: dev@mesos.apache.org
Subject: Re: Core affinity in Mesos

I agree. "Isolation" on it's own is too broad a term. However, since
we are talking mostly about reducing interference, which typically
implies performance isolation, my vote for the group name is the
"Performance Isolation Working Group".

On Fri, Jan 29, 2016 at 11:22 AM, Benjamin Mahler <bm...@apache.org> wrote:
> Since "Isolation" applies broadly outside of the context of addressing
> latency sensitive workloads (e.g. user/pid/network namespacing,
> resource limitations (e.g. cpu quota, memory limits, gpu device visibility) it
> would be great to choose a more specific name. Some suggestions:
> interference, performance-related isolation, colocation, latency
> sensitivity.
>
> Thoughts?
>
> Looking forward to seeing the discussions here!
>
> Ben
>
> On Friday, January 22, 2016, Nielsen, Niklas <ni...@intel.com>
> wrote:
>
>> Hi everyone,
>>
>> We have been talking about core affinity in Mesos for a while, and Ian D.
>> has recently been giving this topic thought in his ‘exclusive resources’
>> proposal [1].
>> Trying to avoid too conservative placements, latency critical workloads
>> are at risk without it.
>> We are interested in the topic through our work on oversubscription in
>> Serenity [2], as oversubscription was exactly to be able to colocate
>> latency critical and best-effort batch jobs.
>> We had an informal meeting yesterday, going over the proposal and trying
>> to get some cadence behind the capability.
>>
>> It is a tricky but exciting topic:
>>  - How do we avoid making task launch even more complex? How do we express
>> the topology and acquire parts of it. Do we use hints on the affinity
>> properties instead?
>>  - How do we mix pinned with normal ‘floating’ tasks.
>>  - How do we convey information to the resource estimator about the task
>> sensitivity.
>>
>> Note, above list not meant for inlined discussion or answers. Let’s
>> collect feedback on the proposals themselves.
>>
>> Here are our proposed next steps:
>>  - We are going to use the ‘Isolation Working Group’ as an umbrella for
>> this. I will fill in details and members.
>>  - We will schedule an online meeting within the Wednesday 9AM PST next
>> week discussing next steps. I will share a hangout link when we get closer.
>>  - Plan being, getting to designs (maybe more than one) we agree on and
>> then scope out and distribute the work needed to be done.
>>
>> Who ever is interested, join us. The use cases for this work are critical.
>> Maybe we can even work on some representative workloads we can verify our
>> proposal against.
>>
>> Cheers,
>> Niklas
>>
>> PS For comments on the proposal itself, please refer to Ian’s thread for
>> the dev list [3].
>>
>> [1] https://issues.apache.org/jira/browse/MESOS-4138
>> [2] https://github.com/mesosphere/serenity
>> [3] https://www.mail-archive.com/dev%40mesos.apache.org/msg33892.html
>>



-- 
~Kevin

Re: Core affinity in Mesos

Posted by Kevin Klues <kl...@gmail.com>.
I agree. "Isolation" on it's own is too broad a term. However, since
we are talking mostly about reducing interference, which typically
implies performance isolation, my vote for the group name is the
"Performance Isolation Working Group".

On Fri, Jan 29, 2016 at 11:22 AM, Benjamin Mahler <bm...@apache.org> wrote:
> Since "Isolation" applies broadly outside of the context of addressing
> latency sensitive workloads (e.g. user/pid/network namespacing,
> resource limitations (e.g. cpu quota, memory limits, gpu device visibility) it
> would be great to choose a more specific name. Some suggestions:
> interference, performance-related isolation, colocation, latency
> sensitivity.
>
> Thoughts?
>
> Looking forward to seeing the discussions here!
>
> Ben
>
> On Friday, January 22, 2016, Nielsen, Niklas <ni...@intel.com>
> wrote:
>
>> Hi everyone,
>>
>> We have been talking about core affinity in Mesos for a while, and Ian D.
>> has recently been giving this topic thought in his ‘exclusive resources’
>> proposal [1].
>> Trying to avoid too conservative placements, latency critical workloads
>> are at risk without it.
>> We are interested in the topic through our work on oversubscription in
>> Serenity [2], as oversubscription was exactly to be able to colocate
>> latency critical and best-effort batch jobs.
>> We had an informal meeting yesterday, going over the proposal and trying
>> to get some cadence behind the capability.
>>
>> It is a tricky but exciting topic:
>>  - How do we avoid making task launch even more complex? How do we express
>> the topology and acquire parts of it. Do we use hints on the affinity
>> properties instead?
>>  - How do we mix pinned with normal ‘floating’ tasks.
>>  - How do we convey information to the resource estimator about the task
>> sensitivity.
>>
>> Note, above list not meant for inlined discussion or answers. Let’s
>> collect feedback on the proposals themselves.
>>
>> Here are our proposed next steps:
>>  - We are going to use the ‘Isolation Working Group’ as an umbrella for
>> this. I will fill in details and members.
>>  - We will schedule an online meeting within the Wednesday 9AM PST next
>> week discussing next steps. I will share a hangout link when we get closer.
>>  - Plan being, getting to designs (maybe more than one) we agree on and
>> then scope out and distribute the work needed to be done.
>>
>> Who ever is interested, join us. The use cases for this work are critical.
>> Maybe we can even work on some representative workloads we can verify our
>> proposal against.
>>
>> Cheers,
>> Niklas
>>
>> PS For comments on the proposal itself, please refer to Ian’s thread for
>> the dev list [3].
>>
>> [1] https://issues.apache.org/jira/browse/MESOS-4138
>> [2] https://github.com/mesosphere/serenity
>> [3] https://www.mail-archive.com/dev%40mesos.apache.org/msg33892.html
>>



-- 
~Kevin

Re: Core affinity in Mesos

Posted by Benjamin Mahler <bm...@apache.org>.
Since "Isolation" applies broadly outside of the context of addressing
latency sensitive workloads (e.g. user/pid/network namespacing,
resource limitations (e.g. cpu quota, memory limits, gpu device visibility) it
would be great to choose a more specific name. Some suggestions:
interference, performance-related isolation, colocation, latency
sensitivity.

Thoughts?

Looking forward to seeing the discussions here!

Ben

On Friday, January 22, 2016, Nielsen, Niklas <ni...@intel.com>
wrote:

> Hi everyone,
>
> We have been talking about core affinity in Mesos for a while, and Ian D.
> has recently been giving this topic thought in his ‘exclusive resources’
> proposal [1].
> Trying to avoid too conservative placements, latency critical workloads
> are at risk without it.
> We are interested in the topic through our work on oversubscription in
> Serenity [2], as oversubscription was exactly to be able to colocate
> latency critical and best-effort batch jobs.
> We had an informal meeting yesterday, going over the proposal and trying
> to get some cadence behind the capability.
>
> It is a tricky but exciting topic:
>  - How do we avoid making task launch even more complex? How do we express
> the topology and acquire parts of it. Do we use hints on the affinity
> properties instead?
>  - How do we mix pinned with normal ‘floating’ tasks.
>  - How do we convey information to the resource estimator about the task
> sensitivity.
>
> Note, above list not meant for inlined discussion or answers. Let’s
> collect feedback on the proposals themselves.
>
> Here are our proposed next steps:
>  - We are going to use the ‘Isolation Working Group’ as an umbrella for
> this. I will fill in details and members.
>  - We will schedule an online meeting within the Wednesday 9AM PST next
> week discussing next steps. I will share a hangout link when we get closer.
>  - Plan being, getting to designs (maybe more than one) we agree on and
> then scope out and distribute the work needed to be done.
>
> Who ever is interested, join us. The use cases for this work are critical.
> Maybe we can even work on some representative workloads we can verify our
> proposal against.
>
> Cheers,
> Niklas
>
> PS For comments on the proposal itself, please refer to Ian’s thread for
> the dev list [3].
>
> [1] https://issues.apache.org/jira/browse/MESOS-4138
> [2] https://github.com/mesosphere/serenity
> [3] https://www.mail-archive.com/dev%40mesos.apache.org/msg33892.html
>