You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by Dmitry Zhuk <dz...@twopensource.com> on 2017/03/22 13:48:50 UTC

CPU affinity

Hi

Is anyone working on MESOS-314
<https://issues.apache.org/jira/browse/MESOS-314> “Support the cgroups
'cpusets' subsystem” or related functionality? I found other related
tickets in JIRA, but there seems to be no recent progress on them:
MESOS-5342 <https://issues.apache.org/jira/browse/MESOS-5342>, MESOS-5358
<https://issues.apache.org/jira/browse/MESOS-5358>. There’s also a mention
of idea of exposing cpusets similar to network ports.

I’d like propose an alternative approach for adding CPU affinity support
and would be interested in any feedback on it. If the community is
interested in this approach, I can work on design document and
implementation.

The basic idea is to let frameworks specify affinity requirements in
ContainerInfo using the following structure:
message AffinityInfo {
  enum ProcessingUnit {
    THREAD = 1;
    CORE = 2;
    SOCKET = 3;
    NUMA_NODE = 4;
  }

  // Indicates that container should be bound to the units of specified
type.
  // For example: bind = NUMA_NODE indicates, that process
  // can run on any thread from some NUMA node.
  required ProcessingUnit bind = 1;

  // Indicates that assigned processing units must not be shared with
  // other containers.
  optional bool exclusive = 2 [default = false];
}


message ContainerInfo {
  …
  optional AffinityInfo affinity_info = …;
}

In future this can be extended to require exclusive NUMA node memory
access, proximity to devices, etc.
This also requires exposing hardware topology information (such as number
of cpus per node) to frameworks to evaluate offer suitability, and
providing visibility to frameworks on failures to assign CPUs per
requirements, but this can be left out of scope of the MVP.

Thanks

Re: CPU affinity

Posted by Benjamin Mahler <bm...@apache.org>.
Thanks for bringing this up Dmitry, I would also be happy to participate.

Off the top of my head the following high level approaches have been
brought up:

(1) Categorize jobs into classes. E.g. LATENCY_SENSITIVE,
THROUGHPUT_ORIENTED, etc. Essentially capturing the distinction between
online vs batch jobs. With these classifications, mesos can do the best it
can in terms of assigning affinity.

(2) Expose the topology information to schedulers, let schedulers choose
the exact devices they want.

(3) Let schedulers specify an affinity "request" specifying at what
topology level they want to be confined to, and we try to satisfy it (might
fail).

It seemed to me like (1) was the simplest conceptually and gave us the most
flexibility in how we accomplish it (and improving / changing our technique
over time without breaking the API contract).

On Thu, Apr 6, 2017 at 4:55 PM, Jie Yu <yu...@gmail.com> wrote:

> Hi Dmitry,
>
> Thank you for sending this out for comments! I'd love to participate in the
> discussion!
>
> Are you available for some f2f meetings (or remote hangout) for this?
> Thought it'll be more efficient if we can discuss that way than over
> emails. We can invite others in the community that are interested in this
> into the meeting.
>
> Also, I'd suggest you do some related work research (e.g., how k8s or
> Docker solves this problem). It'll always be useful to learn from what
> other people are doing.
>
> Let me know!
> - Jie
>
> On Thu, Apr 6, 2017 at 3:23 AM, Dmitry Zhuk <dz...@twopensource.com>
> wrote:
>
> > Hi Vikram,
> >
> > Thank you for the reply.
> > I understand that hardware information is required for frameworks to
> make a
> > decision in some cases, but for the case I'm interested in, hardware
> > information is not that much important. In particular, I have a memory
> > bound task, which achieves 10-20% performance gain if it's pinned to a
> > single NUMA node due to eliminated foreign memory access. So basically I
> > need to be able to specify that task needs 4 CPUs, all on one NUMA node.
> > This needs some basic knowledge about hardware, like number of CPUs per
> > node. So to me it makes sense to start with affinity support design, and
> > then make sure that hardware topology information provides enough details
> > for frameworks to specify affinity constraints.
> >
> > ---
> > Dmitry
> >
> > On Thu, Mar 23, 2017 at 12:28 AM, Vikrama Ditya <vd...@nvidia.com>
> wrote:
> >
> > > Hi Dmitry
> > >
> > > This problem needs to be addressed with topology information so that
> > > scheduler framework can utilize it and request affinity constraints.
> > >
> > > We started to look into this when we are required to expose GPU HW
> > > information. It would be good to introduce generic topology structure
> so
> > > that generic interconnects and associated resource topology can be
> > > expressed.
> > >
> > > Please have a look at https://issues.apache.org/jira/browse/MESOS-7080
> > >
> > > --
> > > Vikram
> > >
> > > -----Original Message-----
> > > From: Dmitry Zhuk [mailto:dzhuk@twopensource.com]
> > > Sent: Wednesday, March 22, 2017 6:49 AM
> > > To: dev@mesos.apache.org
> > > Subject: CPU affinity
> > >
> > > Hi
> > >
> > > Is anyone working on MESOS-314
> > > <https://issues.apache.org/jira/browse/MESOS-314> “Support the cgroups
> > > 'cpusets' subsystem” or related functionality? I found other related
> > > tickets in JIRA, but there seems to be no recent progress on them:
> > > MESOS-5342 <https://issues.apache.org/jira/browse/MESOS-5342>,
> > MESOS-5358
> > > <https://issues.apache.org/jira/browse/MESOS-5358>. There’s also a
> > mention
> > > of idea of exposing cpusets similar to network ports.
> > >
> > > I’d like propose an alternative approach for adding CPU affinity
> support
> > > and would be interested in any feedback on it. If the community is
> > > interested in this approach, I can work on design document and
> > > implementation.
> > >
> > > The basic idea is to let frameworks specify affinity requirements in
> > > ContainerInfo using the following structure:
> > > message AffinityInfo {
> > >   enum ProcessingUnit {
> > >     THREAD = 1;
> > >     CORE = 2;
> > >     SOCKET = 3;
> > >     NUMA_NODE = 4;
> > >   }
> > >
> > >   // Indicates that container should be bound to the units of specified
> > > type.
> > >   // For example: bind = NUMA_NODE indicates, that process
> > >   // can run on any thread from some NUMA node.
> > >   required ProcessingUnit bind = 1;
> > >
> > >   // Indicates that assigned processing units must not be shared with
> > >   // other containers.
> > >   optional bool exclusive = 2 [default = false];
> > > }
> > >
> > >
> > > message ContainerInfo {
> > >   …
> > >   optional AffinityInfo affinity_info = …;
> > > }
> > >
> > > In future this can be extended to require exclusive NUMA node memory
> > > access, proximity to devices, etc.
> > > This also requires exposing hardware topology information (such as
> number
> > > of cpus per node) to frameworks to evaluate offer suitability, and
> > > providing visibility to frameworks on failures to assign CPUs per
> > > requirements, but this can be left out of scope of the MVP.
> > >
> > > Thanks
> > >
> > > ------------------------------------------------------------
> > > -----------------------
> > > This email message is for the sole use of the intended recipient(s) and
> > > may contain
> > > confidential information.  Any unauthorized review, use, disclosure or
> > > distribution
> > > is prohibited.  If you are not the intended recipient, please contact
> the
> > > sender by
> > > reply email and destroy all copies of the original message.
> > > ------------------------------------------------------------
> > > -----------------------
> > >
> >
>

Re: CPU affinity

Posted by Jie Yu <yu...@gmail.com>.
Hi Dmitry,

Thank you for sending this out for comments! I'd love to participate in the
discussion!

Are you available for some f2f meetings (or remote hangout) for this?
Thought it'll be more efficient if we can discuss that way than over
emails. We can invite others in the community that are interested in this
into the meeting.

Also, I'd suggest you do some related work research (e.g., how k8s or
Docker solves this problem). It'll always be useful to learn from what
other people are doing.

Let me know!
- Jie

On Thu, Apr 6, 2017 at 3:23 AM, Dmitry Zhuk <dz...@twopensource.com> wrote:

> Hi Vikram,
>
> Thank you for the reply.
> I understand that hardware information is required for frameworks to make a
> decision in some cases, but for the case I'm interested in, hardware
> information is not that much important. In particular, I have a memory
> bound task, which achieves 10-20% performance gain if it's pinned to a
> single NUMA node due to eliminated foreign memory access. So basically I
> need to be able to specify that task needs 4 CPUs, all on one NUMA node.
> This needs some basic knowledge about hardware, like number of CPUs per
> node. So to me it makes sense to start with affinity support design, and
> then make sure that hardware topology information provides enough details
> for frameworks to specify affinity constraints.
>
> ---
> Dmitry
>
> On Thu, Mar 23, 2017 at 12:28 AM, Vikrama Ditya <vd...@nvidia.com> wrote:
>
> > Hi Dmitry
> >
> > This problem needs to be addressed with topology information so that
> > scheduler framework can utilize it and request affinity constraints.
> >
> > We started to look into this when we are required to expose GPU HW
> > information. It would be good to introduce generic topology structure so
> > that generic interconnects and associated resource topology can be
> > expressed.
> >
> > Please have a look at https://issues.apache.org/jira/browse/MESOS-7080
> >
> > --
> > Vikram
> >
> > -----Original Message-----
> > From: Dmitry Zhuk [mailto:dzhuk@twopensource.com]
> > Sent: Wednesday, March 22, 2017 6:49 AM
> > To: dev@mesos.apache.org
> > Subject: CPU affinity
> >
> > Hi
> >
> > Is anyone working on MESOS-314
> > <https://issues.apache.org/jira/browse/MESOS-314> “Support the cgroups
> > 'cpusets' subsystem” or related functionality? I found other related
> > tickets in JIRA, but there seems to be no recent progress on them:
> > MESOS-5342 <https://issues.apache.org/jira/browse/MESOS-5342>,
> MESOS-5358
> > <https://issues.apache.org/jira/browse/MESOS-5358>. There’s also a
> mention
> > of idea of exposing cpusets similar to network ports.
> >
> > I’d like propose an alternative approach for adding CPU affinity support
> > and would be interested in any feedback on it. If the community is
> > interested in this approach, I can work on design document and
> > implementation.
> >
> > The basic idea is to let frameworks specify affinity requirements in
> > ContainerInfo using the following structure:
> > message AffinityInfo {
> >   enum ProcessingUnit {
> >     THREAD = 1;
> >     CORE = 2;
> >     SOCKET = 3;
> >     NUMA_NODE = 4;
> >   }
> >
> >   // Indicates that container should be bound to the units of specified
> > type.
> >   // For example: bind = NUMA_NODE indicates, that process
> >   // can run on any thread from some NUMA node.
> >   required ProcessingUnit bind = 1;
> >
> >   // Indicates that assigned processing units must not be shared with
> >   // other containers.
> >   optional bool exclusive = 2 [default = false];
> > }
> >
> >
> > message ContainerInfo {
> >   …
> >   optional AffinityInfo affinity_info = …;
> > }
> >
> > In future this can be extended to require exclusive NUMA node memory
> > access, proximity to devices, etc.
> > This also requires exposing hardware topology information (such as number
> > of cpus per node) to frameworks to evaluate offer suitability, and
> > providing visibility to frameworks on failures to assign CPUs per
> > requirements, but this can be left out of scope of the MVP.
> >
> > Thanks
> >
> > ------------------------------------------------------------
> > -----------------------
> > This email message is for the sole use of the intended recipient(s) and
> > may contain
> > confidential information.  Any unauthorized review, use, disclosure or
> > distribution
> > is prohibited.  If you are not the intended recipient, please contact the
> > sender by
> > reply email and destroy all copies of the original message.
> > ------------------------------------------------------------
> > -----------------------
> >
>

Re: CPU affinity

Posted by Dmitry Zhuk <dz...@twopensource.com>.
Hi Vikram,

Thank you for the reply.
I understand that hardware information is required for frameworks to make a
decision in some cases, but for the case I'm interested in, hardware
information is not that much important. In particular, I have a memory
bound task, which achieves 10-20% performance gain if it's pinned to a
single NUMA node due to eliminated foreign memory access. So basically I
need to be able to specify that task needs 4 CPUs, all on one NUMA node.
This needs some basic knowledge about hardware, like number of CPUs per
node. So to me it makes sense to start with affinity support design, and
then make sure that hardware topology information provides enough details
for frameworks to specify affinity constraints.

---
Dmitry

On Thu, Mar 23, 2017 at 12:28 AM, Vikrama Ditya <vd...@nvidia.com> wrote:

> Hi Dmitry
>
> This problem needs to be addressed with topology information so that
> scheduler framework can utilize it and request affinity constraints.
>
> We started to look into this when we are required to expose GPU HW
> information. It would be good to introduce generic topology structure so
> that generic interconnects and associated resource topology can be
> expressed.
>
> Please have a look at https://issues.apache.org/jira/browse/MESOS-7080
>
> --
> Vikram
>
> -----Original Message-----
> From: Dmitry Zhuk [mailto:dzhuk@twopensource.com]
> Sent: Wednesday, March 22, 2017 6:49 AM
> To: dev@mesos.apache.org
> Subject: CPU affinity
>
> Hi
>
> Is anyone working on MESOS-314
> <https://issues.apache.org/jira/browse/MESOS-314> “Support the cgroups
> 'cpusets' subsystem” or related functionality? I found other related
> tickets in JIRA, but there seems to be no recent progress on them:
> MESOS-5342 <https://issues.apache.org/jira/browse/MESOS-5342>, MESOS-5358
> <https://issues.apache.org/jira/browse/MESOS-5358>. There’s also a mention
> of idea of exposing cpusets similar to network ports.
>
> I’d like propose an alternative approach for adding CPU affinity support
> and would be interested in any feedback on it. If the community is
> interested in this approach, I can work on design document and
> implementation.
>
> The basic idea is to let frameworks specify affinity requirements in
> ContainerInfo using the following structure:
> message AffinityInfo {
>   enum ProcessingUnit {
>     THREAD = 1;
>     CORE = 2;
>     SOCKET = 3;
>     NUMA_NODE = 4;
>   }
>
>   // Indicates that container should be bound to the units of specified
> type.
>   // For example: bind = NUMA_NODE indicates, that process
>   // can run on any thread from some NUMA node.
>   required ProcessingUnit bind = 1;
>
>   // Indicates that assigned processing units must not be shared with
>   // other containers.
>   optional bool exclusive = 2 [default = false];
> }
>
>
> message ContainerInfo {
>   …
>   optional AffinityInfo affinity_info = …;
> }
>
> In future this can be extended to require exclusive NUMA node memory
> access, proximity to devices, etc.
> This also requires exposing hardware topology information (such as number
> of cpus per node) to frameworks to evaluate offer suitability, and
> providing visibility to frameworks on failures to assign CPUs per
> requirements, but this can be left out of scope of the MVP.
>
> Thanks
>
> ------------------------------------------------------------
> -----------------------
> This email message is for the sole use of the intended recipient(s) and
> may contain
> confidential information.  Any unauthorized review, use, disclosure or
> distribution
> is prohibited.  If you are not the intended recipient, please contact the
> sender by
> reply email and destroy all copies of the original message.
> ------------------------------------------------------------
> -----------------------
>

RE: CPU affinity

Posted by Vikrama Ditya <vd...@nvidia.com>.
Hi Dmitry

This problem needs to be addressed with topology information so that scheduler framework can utilize it and request affinity constraints.

We started to look into this when we are required to expose GPU HW information. It would be good to introduce generic topology structure so that generic interconnects and associated resource topology can be expressed.

Please have a look at https://issues.apache.org/jira/browse/MESOS-7080

--
Vikram

-----Original Message-----
From: Dmitry Zhuk [mailto:dzhuk@twopensource.com] 
Sent: Wednesday, March 22, 2017 6:49 AM
To: dev@mesos.apache.org
Subject: CPU affinity

Hi

Is anyone working on MESOS-314
<https://issues.apache.org/jira/browse/MESOS-314> “Support the cgroups
'cpusets' subsystem” or related functionality? I found other related
tickets in JIRA, but there seems to be no recent progress on them:
MESOS-5342 <https://issues.apache.org/jira/browse/MESOS-5342>, MESOS-5358
<https://issues.apache.org/jira/browse/MESOS-5358>. There’s also a mention
of idea of exposing cpusets similar to network ports.

I’d like propose an alternative approach for adding CPU affinity support
and would be interested in any feedback on it. If the community is
interested in this approach, I can work on design document and
implementation.

The basic idea is to let frameworks specify affinity requirements in
ContainerInfo using the following structure:
message AffinityInfo {
  enum ProcessingUnit {
    THREAD = 1;
    CORE = 2;
    SOCKET = 3;
    NUMA_NODE = 4;
  }

  // Indicates that container should be bound to the units of specified
type.
  // For example: bind = NUMA_NODE indicates, that process
  // can run on any thread from some NUMA node.
  required ProcessingUnit bind = 1;

  // Indicates that assigned processing units must not be shared with
  // other containers.
  optional bool exclusive = 2 [default = false];
}


message ContainerInfo {
  …
  optional AffinityInfo affinity_info = …;
}

In future this can be extended to require exclusive NUMA node memory
access, proximity to devices, etc.
This also requires exposing hardware topology information (such as number
of cpus per node) to frameworks to evaluate offer suitability, and
providing visibility to frameworks on failures to assign CPUs per
requirements, but this can be left out of scope of the MVP.

Thanks

-----------------------------------------------------------------------------------
This email message is for the sole use of the intended recipient(s) and may contain
confidential information.  Any unauthorized review, use, disclosure or distribution
is prohibited.  If you are not the intended recipient, please contact the sender by
reply email and destroy all copies of the original message.
-----------------------------------------------------------------------------------