You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by George Ioannidis <gi...@gmail.com> on 2015/04/06 22:25:24 UTC

Pin Map/Reduce tasks to specific cores

Hello. My question, which can be found on *Stack Overflow
<http://stackoverflow.com/questions/29283213/core-affinity-of-map-tasks-in-hadoop>*
as well, regards pinning map/reduce tasks to specific cores, either on
hadoop v.1.2.1 or hadoop v.2.
In specific, I would like to know if the end-user can have any control on
which core executes a specific map/reduce task.

To pin an application on linux, there's the "taskset" command, but is
anything similar provided by hadoop? If not, is the Linux Scheduler in
charge of allocating tasks to specific cores?

------------------

Below I am providing two cases to better illustrate my question:

*Case #1:* 2 GiB input size, HDFS block size of 64 MiB and 2 compute nodes
available, with 32 cores each.
As follows, 32 map tasks will be called; let's suppose that
mapred.tasktracker.map.tasks.maximum
= 16, so 16 map tasks will be allocated to each node.

Can I guarantee that each Map Task will run on a specific core, or is it up
to the Linux Scheduler?

------------------

*Case #2:* The same as case #1, but now the input size is 8 GiB, so there
are not enough slots for all map tasks (128), so multiple tasks will share
the same cores.
Can I control how much "time" each task will spend on a specific core and
if it will be reassigned to the same core in the future?

Any information on the above would be highly appreciated.

Kind Regards,
George

Re: Pin Map/Reduce tasks to specific cores

Posted by George Ioannidis <gi...@gmail.com>.
Dear Rohith and Naga,

Thank you very much for your quick responses, your information has proven
very useful.

Cheers,
George

On 7 April 2015 at 07:08, Naganarasimha G R (Naga) <
garlanaganarasimha@huawei.com> wrote:

>  Hi George,
>
>  The current implementation present in YARN using Cgroups supports CPU
> isolation but not by pinning to specific cores (Cgroup CPUsets) but based
> on cpu cycles (quota & Period).
> Admin is provided with an option of specifying how much percentage of CPU
> can be used by YARN containers. And Yarn will take care of configuring
> Cgroup Quota and Period files and
> ensures only configured CPU percentage is only used by YARN containers
>
>  Is there any particular need to pin the MR tasks to the specific cores ?
> or you just want to ensure YARN is not using more than the specified
> percentage of CPU in a give node ?
>
>  Regards,
> Naga
>
>  ------------------------------
> *From:* Rohith Sharma K S [rohithsharmaks@huawei.com]
> *Sent:* Tuesday, April 07, 2015 09:23
> *To:* user@hadoop.apache.org
> *Subject:* RE: Pin Map/Reduce tasks to specific cores
>
>   Hi George
>
>
>
> In MRV2, YARN supports CGroups implementation.  Using CGroup it is
> possible to run containers in specific cores.
>
>
>
> For your detailed reference, some of the useful links
>
>
> http://dev.hortonworks.com.s3.amazonaws.com/HDPDocuments/HDP2/HDP-2-trunk/bk_system-admin-guide/content/ch_cgroups.html
>
>
> http://blog.cloudera.com/blog/2013/12/managing-multiple-resources-in-hadoop-2-with-yarn/
>
> http://riccomini.name/posts/hadoop/2013-06-14-yarn-with-cgroups/
>
>
>
> P.S : I could not find any related document in Hadoop Yarn docs. I will
> raise ticket for the same  in community.
>
>
>
> Hope the above information will help your use case!!!
>
>
>
> Thanks & Regards
>
> Rohith Sharma K S
>
>
>
> *From:* George Ioannidis [mailto:giorgioath@gmail.com]
> *Sent:* 07 April 2015 01:55
> *To:* user@hadoop.apache.org
> *Subject:* Pin Map/Reduce tasks to specific cores
>
>
>
> Hello. My question, which can be found on *Stack Overflow
> <http://stackoverflow.com/questions/29283213/core-affinity-of-map-tasks-in-hadoop>*
> as well, regards pinning map/reduce tasks to specific cores, either on
> hadoop v.1.2.1 or hadoop v.2.
>
> In specific, I would like to know if the end-user can have any control on
> which core executes a specific map/reduce task.
>
> To pin an application on linux, there's the "taskset" command, but is
> anything similar provided by hadoop? If not, is the Linux Scheduler in
> charge of allocating tasks to specific cores?
>
>
>
> ------------------
>
> Below I am providing two cases to better illustrate my question:
>
> *Case #1:* 2 GiB input size, HDFS block size of 64 MiB and 2 compute
> nodes available, with 32 cores each.
>
> As follows, 32 map tasks will be called; let's suppose that mapred.tasktracker.map.tasks.maximum
> = 16, so 16 map tasks will be allocated to each node.
>
> Can I guarantee that each Map Task will run on a specific core, or is it
> up to the Linux Scheduler?
>
> ------------------
>
> *Case #2:* The same as case #1, but now the input size is 8 GiB, so there
> are not enough slots for all map tasks (128), so multiple tasks will share
> the same cores.
>
> Can I control how much "time" each task will spend on a specific core and
> if it will be reassigned to the same core in the future?
>
> Any information on the above would be highly appreciated.
>
> Kind Regards,
>
> George
>

Re: Pin Map/Reduce tasks to specific cores

Posted by George Ioannidis <gi...@gmail.com>.
Dear Rohith and Naga,

Thank you very much for your quick responses, your information has proven
very useful.

Cheers,
George

On 7 April 2015 at 07:08, Naganarasimha G R (Naga) <
garlanaganarasimha@huawei.com> wrote:

>  Hi George,
>
>  The current implementation present in YARN using Cgroups supports CPU
> isolation but not by pinning to specific cores (Cgroup CPUsets) but based
> on cpu cycles (quota & Period).
> Admin is provided with an option of specifying how much percentage of CPU
> can be used by YARN containers. And Yarn will take care of configuring
> Cgroup Quota and Period files and
> ensures only configured CPU percentage is only used by YARN containers
>
>  Is there any particular need to pin the MR tasks to the specific cores ?
> or you just want to ensure YARN is not using more than the specified
> percentage of CPU in a give node ?
>
>  Regards,
> Naga
>
>  ------------------------------
> *From:* Rohith Sharma K S [rohithsharmaks@huawei.com]
> *Sent:* Tuesday, April 07, 2015 09:23
> *To:* user@hadoop.apache.org
> *Subject:* RE: Pin Map/Reduce tasks to specific cores
>
>   Hi George
>
>
>
> In MRV2, YARN supports CGroups implementation.  Using CGroup it is
> possible to run containers in specific cores.
>
>
>
> For your detailed reference, some of the useful links
>
>
> http://dev.hortonworks.com.s3.amazonaws.com/HDPDocuments/HDP2/HDP-2-trunk/bk_system-admin-guide/content/ch_cgroups.html
>
>
> http://blog.cloudera.com/blog/2013/12/managing-multiple-resources-in-hadoop-2-with-yarn/
>
> http://riccomini.name/posts/hadoop/2013-06-14-yarn-with-cgroups/
>
>
>
> P.S : I could not find any related document in Hadoop Yarn docs. I will
> raise ticket for the same  in community.
>
>
>
> Hope the above information will help your use case!!!
>
>
>
> Thanks & Regards
>
> Rohith Sharma K S
>
>
>
> *From:* George Ioannidis [mailto:giorgioath@gmail.com]
> *Sent:* 07 April 2015 01:55
> *To:* user@hadoop.apache.org
> *Subject:* Pin Map/Reduce tasks to specific cores
>
>
>
> Hello. My question, which can be found on *Stack Overflow
> <http://stackoverflow.com/questions/29283213/core-affinity-of-map-tasks-in-hadoop>*
> as well, regards pinning map/reduce tasks to specific cores, either on
> hadoop v.1.2.1 or hadoop v.2.
>
> In specific, I would like to know if the end-user can have any control on
> which core executes a specific map/reduce task.
>
> To pin an application on linux, there's the "taskset" command, but is
> anything similar provided by hadoop? If not, is the Linux Scheduler in
> charge of allocating tasks to specific cores?
>
>
>
> ------------------
>
> Below I am providing two cases to better illustrate my question:
>
> *Case #1:* 2 GiB input size, HDFS block size of 64 MiB and 2 compute
> nodes available, with 32 cores each.
>
> As follows, 32 map tasks will be called; let's suppose that mapred.tasktracker.map.tasks.maximum
> = 16, so 16 map tasks will be allocated to each node.
>
> Can I guarantee that each Map Task will run on a specific core, or is it
> up to the Linux Scheduler?
>
> ------------------
>
> *Case #2:* The same as case #1, but now the input size is 8 GiB, so there
> are not enough slots for all map tasks (128), so multiple tasks will share
> the same cores.
>
> Can I control how much "time" each task will spend on a specific core and
> if it will be reassigned to the same core in the future?
>
> Any information on the above would be highly appreciated.
>
> Kind Regards,
>
> George
>

Re: Pin Map/Reduce tasks to specific cores

Posted by George Ioannidis <gi...@gmail.com>.
Dear Rohith and Naga,

Thank you very much for your quick responses, your information has proven
very useful.

Cheers,
George

On 7 April 2015 at 07:08, Naganarasimha G R (Naga) <
garlanaganarasimha@huawei.com> wrote:

>  Hi George,
>
>  The current implementation present in YARN using Cgroups supports CPU
> isolation but not by pinning to specific cores (Cgroup CPUsets) but based
> on cpu cycles (quota & Period).
> Admin is provided with an option of specifying how much percentage of CPU
> can be used by YARN containers. And Yarn will take care of configuring
> Cgroup Quota and Period files and
> ensures only configured CPU percentage is only used by YARN containers
>
>  Is there any particular need to pin the MR tasks to the specific cores ?
> or you just want to ensure YARN is not using more than the specified
> percentage of CPU in a give node ?
>
>  Regards,
> Naga
>
>  ------------------------------
> *From:* Rohith Sharma K S [rohithsharmaks@huawei.com]
> *Sent:* Tuesday, April 07, 2015 09:23
> *To:* user@hadoop.apache.org
> *Subject:* RE: Pin Map/Reduce tasks to specific cores
>
>   Hi George
>
>
>
> In MRV2, YARN supports CGroups implementation.  Using CGroup it is
> possible to run containers in specific cores.
>
>
>
> For your detailed reference, some of the useful links
>
>
> http://dev.hortonworks.com.s3.amazonaws.com/HDPDocuments/HDP2/HDP-2-trunk/bk_system-admin-guide/content/ch_cgroups.html
>
>
> http://blog.cloudera.com/blog/2013/12/managing-multiple-resources-in-hadoop-2-with-yarn/
>
> http://riccomini.name/posts/hadoop/2013-06-14-yarn-with-cgroups/
>
>
>
> P.S : I could not find any related document in Hadoop Yarn docs. I will
> raise ticket for the same  in community.
>
>
>
> Hope the above information will help your use case!!!
>
>
>
> Thanks & Regards
>
> Rohith Sharma K S
>
>
>
> *From:* George Ioannidis [mailto:giorgioath@gmail.com]
> *Sent:* 07 April 2015 01:55
> *To:* user@hadoop.apache.org
> *Subject:* Pin Map/Reduce tasks to specific cores
>
>
>
> Hello. My question, which can be found on *Stack Overflow
> <http://stackoverflow.com/questions/29283213/core-affinity-of-map-tasks-in-hadoop>*
> as well, regards pinning map/reduce tasks to specific cores, either on
> hadoop v.1.2.1 or hadoop v.2.
>
> In specific, I would like to know if the end-user can have any control on
> which core executes a specific map/reduce task.
>
> To pin an application on linux, there's the "taskset" command, but is
> anything similar provided by hadoop? If not, is the Linux Scheduler in
> charge of allocating tasks to specific cores?
>
>
>
> ------------------
>
> Below I am providing two cases to better illustrate my question:
>
> *Case #1:* 2 GiB input size, HDFS block size of 64 MiB and 2 compute
> nodes available, with 32 cores each.
>
> As follows, 32 map tasks will be called; let's suppose that mapred.tasktracker.map.tasks.maximum
> = 16, so 16 map tasks will be allocated to each node.
>
> Can I guarantee that each Map Task will run on a specific core, or is it
> up to the Linux Scheduler?
>
> ------------------
>
> *Case #2:* The same as case #1, but now the input size is 8 GiB, so there
> are not enough slots for all map tasks (128), so multiple tasks will share
> the same cores.
>
> Can I control how much "time" each task will spend on a specific core and
> if it will be reassigned to the same core in the future?
>
> Any information on the above would be highly appreciated.
>
> Kind Regards,
>
> George
>

Re: Pin Map/Reduce tasks to specific cores

Posted by George Ioannidis <gi...@gmail.com>.
Dear Rohith and Naga,

Thank you very much for your quick responses, your information has proven
very useful.

Cheers,
George

On 7 April 2015 at 07:08, Naganarasimha G R (Naga) <
garlanaganarasimha@huawei.com> wrote:

>  Hi George,
>
>  The current implementation present in YARN using Cgroups supports CPU
> isolation but not by pinning to specific cores (Cgroup CPUsets) but based
> on cpu cycles (quota & Period).
> Admin is provided with an option of specifying how much percentage of CPU
> can be used by YARN containers. And Yarn will take care of configuring
> Cgroup Quota and Period files and
> ensures only configured CPU percentage is only used by YARN containers
>
>  Is there any particular need to pin the MR tasks to the specific cores ?
> or you just want to ensure YARN is not using more than the specified
> percentage of CPU in a give node ?
>
>  Regards,
> Naga
>
>  ------------------------------
> *From:* Rohith Sharma K S [rohithsharmaks@huawei.com]
> *Sent:* Tuesday, April 07, 2015 09:23
> *To:* user@hadoop.apache.org
> *Subject:* RE: Pin Map/Reduce tasks to specific cores
>
>   Hi George
>
>
>
> In MRV2, YARN supports CGroups implementation.  Using CGroup it is
> possible to run containers in specific cores.
>
>
>
> For your detailed reference, some of the useful links
>
>
> http://dev.hortonworks.com.s3.amazonaws.com/HDPDocuments/HDP2/HDP-2-trunk/bk_system-admin-guide/content/ch_cgroups.html
>
>
> http://blog.cloudera.com/blog/2013/12/managing-multiple-resources-in-hadoop-2-with-yarn/
>
> http://riccomini.name/posts/hadoop/2013-06-14-yarn-with-cgroups/
>
>
>
> P.S : I could not find any related document in Hadoop Yarn docs. I will
> raise ticket for the same  in community.
>
>
>
> Hope the above information will help your use case!!!
>
>
>
> Thanks & Regards
>
> Rohith Sharma K S
>
>
>
> *From:* George Ioannidis [mailto:giorgioath@gmail.com]
> *Sent:* 07 April 2015 01:55
> *To:* user@hadoop.apache.org
> *Subject:* Pin Map/Reduce tasks to specific cores
>
>
>
> Hello. My question, which can be found on *Stack Overflow
> <http://stackoverflow.com/questions/29283213/core-affinity-of-map-tasks-in-hadoop>*
> as well, regards pinning map/reduce tasks to specific cores, either on
> hadoop v.1.2.1 or hadoop v.2.
>
> In specific, I would like to know if the end-user can have any control on
> which core executes a specific map/reduce task.
>
> To pin an application on linux, there's the "taskset" command, but is
> anything similar provided by hadoop? If not, is the Linux Scheduler in
> charge of allocating tasks to specific cores?
>
>
>
> ------------------
>
> Below I am providing two cases to better illustrate my question:
>
> *Case #1:* 2 GiB input size, HDFS block size of 64 MiB and 2 compute
> nodes available, with 32 cores each.
>
> As follows, 32 map tasks will be called; let's suppose that mapred.tasktracker.map.tasks.maximum
> = 16, so 16 map tasks will be allocated to each node.
>
> Can I guarantee that each Map Task will run on a specific core, or is it
> up to the Linux Scheduler?
>
> ------------------
>
> *Case #2:* The same as case #1, but now the input size is 8 GiB, so there
> are not enough slots for all map tasks (128), so multiple tasks will share
> the same cores.
>
> Can I control how much "time" each task will spend on a specific core and
> if it will be reassigned to the same core in the future?
>
> Any information on the above would be highly appreciated.
>
> Kind Regards,
>
> George
>

RE: Pin Map/Reduce tasks to specific cores

Posted by "Naganarasimha G R (Naga)" <ga...@huawei.com>.
Hi George,

The current implementation present in YARN using Cgroups supports CPU isolation but not by pinning to specific cores (Cgroup CPUsets) but based on cpu cycles (quota & Period).
Admin is provided with an option of specifying how much percentage of CPU can be used by YARN containers. And Yarn will take care of configuring Cgroup Quota and Period files and
ensures only configured CPU percentage is only used by YARN containers

Is there any particular need to pin the MR tasks to the specific cores ? or you just want to ensure YARN is not using more than the specified percentage of CPU in a give node ?

Regards,
Naga

________________________________
From: Rohith Sharma K S [rohithsharmaks@huawei.com]
Sent: Tuesday, April 07, 2015 09:23
To: user@hadoop.apache.org
Subject: RE: Pin Map/Reduce tasks to specific cores

Hi George

In MRV2, YARN supports CGroups implementation.  Using CGroup it is possible to run containers in specific cores.

For your detailed reference, some of the useful links
http://dev.hortonworks.com.s3.amazonaws.com/HDPDocuments/HDP2/HDP-2-trunk/bk_system-admin-guide/content/ch_cgroups.html
http://blog.cloudera.com/blog/2013/12/managing-multiple-resources-in-hadoop-2-with-yarn/
http://riccomini.name/posts/hadoop/2013-06-14-yarn-with-cgroups/

P.S : I could not find any related document in Hadoop Yarn docs. I will raise ticket for the same  in community.

Hope the above information will help your use case!!!

Thanks & Regards
Rohith Sharma K S

From: George Ioannidis [mailto:giorgioath@gmail.com]
Sent: 07 April 2015 01:55
To: user@hadoop.apache.org
Subject: Pin Map/Reduce tasks to specific cores

Hello. My question, which can be found on Stack Overflow<http://stackoverflow.com/questions/29283213/core-affinity-of-map-tasks-in-hadoop> as well, regards pinning map/reduce tasks to specific cores, either on hadoop v.1.2.1 or hadoop v.2.
In specific, I would like to know if the end-user can have any control on which core executes a specific map/reduce task.

To pin an application on linux, there's the "taskset" command, but is anything similar provided by hadoop? If not, is the Linux Scheduler in charge of allocating tasks to specific cores?

------------------
Below I am providing two cases to better illustrate my question:
Case #1: 2 GiB input size, HDFS block size of 64 MiB and 2 compute nodes available, with 32 cores each.
As follows, 32 map tasks will be called; let's suppose that mapred.tasktracker.map.tasks.maximum = 16, so 16 map tasks will be allocated to each node.
Can I guarantee that each Map Task will run on a specific core, or is it up to the Linux Scheduler?

------------------

Case #2: The same as case #1, but now the input size is 8 GiB, so there are not enough slots for all map tasks (128), so multiple tasks will share the same cores.
Can I control how much "time" each task will spend on a specific core and if it will be reassigned to the same core in the future?
Any information on the above would be highly appreciated.
Kind Regards,
George

RE: Pin Map/Reduce tasks to specific cores

Posted by "Naganarasimha G R (Naga)" <ga...@huawei.com>.
Hi George,

The current implementation present in YARN using Cgroups supports CPU isolation but not by pinning to specific cores (Cgroup CPUsets) but based on cpu cycles (quota & Period).
Admin is provided with an option of specifying how much percentage of CPU can be used by YARN containers. And Yarn will take care of configuring Cgroup Quota and Period files and
ensures only configured CPU percentage is only used by YARN containers

Is there any particular need to pin the MR tasks to the specific cores ? or you just want to ensure YARN is not using more than the specified percentage of CPU in a give node ?

Regards,
Naga

________________________________
From: Rohith Sharma K S [rohithsharmaks@huawei.com]
Sent: Tuesday, April 07, 2015 09:23
To: user@hadoop.apache.org
Subject: RE: Pin Map/Reduce tasks to specific cores

Hi George

In MRV2, YARN supports CGroups implementation.  Using CGroup it is possible to run containers in specific cores.

For your detailed reference, some of the useful links
http://dev.hortonworks.com.s3.amazonaws.com/HDPDocuments/HDP2/HDP-2-trunk/bk_system-admin-guide/content/ch_cgroups.html
http://blog.cloudera.com/blog/2013/12/managing-multiple-resources-in-hadoop-2-with-yarn/
http://riccomini.name/posts/hadoop/2013-06-14-yarn-with-cgroups/

P.S : I could not find any related document in Hadoop Yarn docs. I will raise ticket for the same  in community.

Hope the above information will help your use case!!!

Thanks & Regards
Rohith Sharma K S

From: George Ioannidis [mailto:giorgioath@gmail.com]
Sent: 07 April 2015 01:55
To: user@hadoop.apache.org
Subject: Pin Map/Reduce tasks to specific cores

Hello. My question, which can be found on Stack Overflow<http://stackoverflow.com/questions/29283213/core-affinity-of-map-tasks-in-hadoop> as well, regards pinning map/reduce tasks to specific cores, either on hadoop v.1.2.1 or hadoop v.2.
In specific, I would like to know if the end-user can have any control on which core executes a specific map/reduce task.

To pin an application on linux, there's the "taskset" command, but is anything similar provided by hadoop? If not, is the Linux Scheduler in charge of allocating tasks to specific cores?

------------------
Below I am providing two cases to better illustrate my question:
Case #1: 2 GiB input size, HDFS block size of 64 MiB and 2 compute nodes available, with 32 cores each.
As follows, 32 map tasks will be called; let's suppose that mapred.tasktracker.map.tasks.maximum = 16, so 16 map tasks will be allocated to each node.
Can I guarantee that each Map Task will run on a specific core, or is it up to the Linux Scheduler?

------------------

Case #2: The same as case #1, but now the input size is 8 GiB, so there are not enough slots for all map tasks (128), so multiple tasks will share the same cores.
Can I control how much "time" each task will spend on a specific core and if it will be reassigned to the same core in the future?
Any information on the above would be highly appreciated.
Kind Regards,
George

RE: Pin Map/Reduce tasks to specific cores

Posted by "Naganarasimha G R (Naga)" <ga...@huawei.com>.
Hi George,

The current implementation present in YARN using Cgroups supports CPU isolation but not by pinning to specific cores (Cgroup CPUsets) but based on cpu cycles (quota & Period).
Admin is provided with an option of specifying how much percentage of CPU can be used by YARN containers. And Yarn will take care of configuring Cgroup Quota and Period files and
ensures only configured CPU percentage is only used by YARN containers

Is there any particular need to pin the MR tasks to the specific cores ? or you just want to ensure YARN is not using more than the specified percentage of CPU in a give node ?

Regards,
Naga

________________________________
From: Rohith Sharma K S [rohithsharmaks@huawei.com]
Sent: Tuesday, April 07, 2015 09:23
To: user@hadoop.apache.org
Subject: RE: Pin Map/Reduce tasks to specific cores

Hi George

In MRV2, YARN supports CGroups implementation.  Using CGroup it is possible to run containers in specific cores.

For your detailed reference, some of the useful links
http://dev.hortonworks.com.s3.amazonaws.com/HDPDocuments/HDP2/HDP-2-trunk/bk_system-admin-guide/content/ch_cgroups.html
http://blog.cloudera.com/blog/2013/12/managing-multiple-resources-in-hadoop-2-with-yarn/
http://riccomini.name/posts/hadoop/2013-06-14-yarn-with-cgroups/

P.S : I could not find any related document in Hadoop Yarn docs. I will raise ticket for the same  in community.

Hope the above information will help your use case!!!

Thanks & Regards
Rohith Sharma K S

From: George Ioannidis [mailto:giorgioath@gmail.com]
Sent: 07 April 2015 01:55
To: user@hadoop.apache.org
Subject: Pin Map/Reduce tasks to specific cores

Hello. My question, which can be found on Stack Overflow<http://stackoverflow.com/questions/29283213/core-affinity-of-map-tasks-in-hadoop> as well, regards pinning map/reduce tasks to specific cores, either on hadoop v.1.2.1 or hadoop v.2.
In specific, I would like to know if the end-user can have any control on which core executes a specific map/reduce task.

To pin an application on linux, there's the "taskset" command, but is anything similar provided by hadoop? If not, is the Linux Scheduler in charge of allocating tasks to specific cores?

------------------
Below I am providing two cases to better illustrate my question:
Case #1: 2 GiB input size, HDFS block size of 64 MiB and 2 compute nodes available, with 32 cores each.
As follows, 32 map tasks will be called; let's suppose that mapred.tasktracker.map.tasks.maximum = 16, so 16 map tasks will be allocated to each node.
Can I guarantee that each Map Task will run on a specific core, or is it up to the Linux Scheduler?

------------------

Case #2: The same as case #1, but now the input size is 8 GiB, so there are not enough slots for all map tasks (128), so multiple tasks will share the same cores.
Can I control how much "time" each task will spend on a specific core and if it will be reassigned to the same core in the future?
Any information on the above would be highly appreciated.
Kind Regards,
George

RE: Pin Map/Reduce tasks to specific cores

Posted by "Naganarasimha G R (Naga)" <ga...@huawei.com>.
Hi George,

The current implementation present in YARN using Cgroups supports CPU isolation but not by pinning to specific cores (Cgroup CPUsets) but based on cpu cycles (quota & Period).
Admin is provided with an option of specifying how much percentage of CPU can be used by YARN containers. And Yarn will take care of configuring Cgroup Quota and Period files and
ensures only configured CPU percentage is only used by YARN containers

Is there any particular need to pin the MR tasks to the specific cores ? or you just want to ensure YARN is not using more than the specified percentage of CPU in a give node ?

Regards,
Naga

________________________________
From: Rohith Sharma K S [rohithsharmaks@huawei.com]
Sent: Tuesday, April 07, 2015 09:23
To: user@hadoop.apache.org
Subject: RE: Pin Map/Reduce tasks to specific cores

Hi George

In MRV2, YARN supports CGroups implementation.  Using CGroup it is possible to run containers in specific cores.

For your detailed reference, some of the useful links
http://dev.hortonworks.com.s3.amazonaws.com/HDPDocuments/HDP2/HDP-2-trunk/bk_system-admin-guide/content/ch_cgroups.html
http://blog.cloudera.com/blog/2013/12/managing-multiple-resources-in-hadoop-2-with-yarn/
http://riccomini.name/posts/hadoop/2013-06-14-yarn-with-cgroups/

P.S : I could not find any related document in Hadoop Yarn docs. I will raise ticket for the same  in community.

Hope the above information will help your use case!!!

Thanks & Regards
Rohith Sharma K S

From: George Ioannidis [mailto:giorgioath@gmail.com]
Sent: 07 April 2015 01:55
To: user@hadoop.apache.org
Subject: Pin Map/Reduce tasks to specific cores

Hello. My question, which can be found on Stack Overflow<http://stackoverflow.com/questions/29283213/core-affinity-of-map-tasks-in-hadoop> as well, regards pinning map/reduce tasks to specific cores, either on hadoop v.1.2.1 or hadoop v.2.
In specific, I would like to know if the end-user can have any control on which core executes a specific map/reduce task.

To pin an application on linux, there's the "taskset" command, but is anything similar provided by hadoop? If not, is the Linux Scheduler in charge of allocating tasks to specific cores?

------------------
Below I am providing two cases to better illustrate my question:
Case #1: 2 GiB input size, HDFS block size of 64 MiB and 2 compute nodes available, with 32 cores each.
As follows, 32 map tasks will be called; let's suppose that mapred.tasktracker.map.tasks.maximum = 16, so 16 map tasks will be allocated to each node.
Can I guarantee that each Map Task will run on a specific core, or is it up to the Linux Scheduler?

------------------

Case #2: The same as case #1, but now the input size is 8 GiB, so there are not enough slots for all map tasks (128), so multiple tasks will share the same cores.
Can I control how much "time" each task will spend on a specific core and if it will be reassigned to the same core in the future?
Any information on the above would be highly appreciated.
Kind Regards,
George

RE: Pin Map/Reduce tasks to specific cores

Posted by Rohith Sharma K S <ro...@huawei.com>.
Hi George

In MRV2, YARN supports CGroups implementation.  Using CGroup it is possible to run containers in specific cores.

For your detailed reference, some of the useful links
http://dev.hortonworks.com.s3.amazonaws.com/HDPDocuments/HDP2/HDP-2-trunk/bk_system-admin-guide/content/ch_cgroups.html
http://blog.cloudera.com/blog/2013/12/managing-multiple-resources-in-hadoop-2-with-yarn/
http://riccomini.name/posts/hadoop/2013-06-14-yarn-with-cgroups/

P.S : I could not find any related document in Hadoop Yarn docs. I will raise ticket for the same  in community.

Hope the above information will help your use case!!!

Thanks & Regards
Rohith Sharma K S

From: George Ioannidis [mailto:giorgioath@gmail.com]
Sent: 07 April 2015 01:55
To: user@hadoop.apache.org
Subject: Pin Map/Reduce tasks to specific cores

Hello. My question, which can be found on Stack Overflow<http://stackoverflow.com/questions/29283213/core-affinity-of-map-tasks-in-hadoop> as well, regards pinning map/reduce tasks to specific cores, either on hadoop v.1.2.1 or hadoop v.2.
In specific, I would like to know if the end-user can have any control on which core executes a specific map/reduce task.

To pin an application on linux, there's the "taskset" command, but is anything similar provided by hadoop? If not, is the Linux Scheduler in charge of allocating tasks to specific cores?

------------------
Below I am providing two cases to better illustrate my question:
Case #1: 2 GiB input size, HDFS block size of 64 MiB and 2 compute nodes available, with 32 cores each.
As follows, 32 map tasks will be called; let's suppose that mapred.tasktracker.map.tasks.maximum = 16, so 16 map tasks will be allocated to each node.
Can I guarantee that each Map Task will run on a specific core, or is it up to the Linux Scheduler?

------------------

Case #2: The same as case #1, but now the input size is 8 GiB, so there are not enough slots for all map tasks (128), so multiple tasks will share the same cores.
Can I control how much "time" each task will spend on a specific core and if it will be reassigned to the same core in the future?
Any information on the above would be highly appreciated.
Kind Regards,
George

RE: Pin Map/Reduce tasks to specific cores

Posted by Rohith Sharma K S <ro...@huawei.com>.
Hi George

In MRV2, YARN supports CGroups implementation.  Using CGroup it is possible to run containers in specific cores.

For your detailed reference, some of the useful links
http://dev.hortonworks.com.s3.amazonaws.com/HDPDocuments/HDP2/HDP-2-trunk/bk_system-admin-guide/content/ch_cgroups.html
http://blog.cloudera.com/blog/2013/12/managing-multiple-resources-in-hadoop-2-with-yarn/
http://riccomini.name/posts/hadoop/2013-06-14-yarn-with-cgroups/

P.S : I could not find any related document in Hadoop Yarn docs. I will raise ticket for the same  in community.

Hope the above information will help your use case!!!

Thanks & Regards
Rohith Sharma K S

From: George Ioannidis [mailto:giorgioath@gmail.com]
Sent: 07 April 2015 01:55
To: user@hadoop.apache.org
Subject: Pin Map/Reduce tasks to specific cores

Hello. My question, which can be found on Stack Overflow<http://stackoverflow.com/questions/29283213/core-affinity-of-map-tasks-in-hadoop> as well, regards pinning map/reduce tasks to specific cores, either on hadoop v.1.2.1 or hadoop v.2.
In specific, I would like to know if the end-user can have any control on which core executes a specific map/reduce task.

To pin an application on linux, there's the "taskset" command, but is anything similar provided by hadoop? If not, is the Linux Scheduler in charge of allocating tasks to specific cores?

------------------
Below I am providing two cases to better illustrate my question:
Case #1: 2 GiB input size, HDFS block size of 64 MiB and 2 compute nodes available, with 32 cores each.
As follows, 32 map tasks will be called; let's suppose that mapred.tasktracker.map.tasks.maximum = 16, so 16 map tasks will be allocated to each node.
Can I guarantee that each Map Task will run on a specific core, or is it up to the Linux Scheduler?

------------------

Case #2: The same as case #1, but now the input size is 8 GiB, so there are not enough slots for all map tasks (128), so multiple tasks will share the same cores.
Can I control how much "time" each task will spend on a specific core and if it will be reassigned to the same core in the future?
Any information on the above would be highly appreciated.
Kind Regards,
George

RE: Pin Map/Reduce tasks to specific cores

Posted by Rohith Sharma K S <ro...@huawei.com>.
Hi George

In MRV2, YARN supports CGroups implementation.  Using CGroup it is possible to run containers in specific cores.

For your detailed reference, some of the useful links
http://dev.hortonworks.com.s3.amazonaws.com/HDPDocuments/HDP2/HDP-2-trunk/bk_system-admin-guide/content/ch_cgroups.html
http://blog.cloudera.com/blog/2013/12/managing-multiple-resources-in-hadoop-2-with-yarn/
http://riccomini.name/posts/hadoop/2013-06-14-yarn-with-cgroups/

P.S : I could not find any related document in Hadoop Yarn docs. I will raise ticket for the same  in community.

Hope the above information will help your use case!!!

Thanks & Regards
Rohith Sharma K S

From: George Ioannidis [mailto:giorgioath@gmail.com]
Sent: 07 April 2015 01:55
To: user@hadoop.apache.org
Subject: Pin Map/Reduce tasks to specific cores

Hello. My question, which can be found on Stack Overflow<http://stackoverflow.com/questions/29283213/core-affinity-of-map-tasks-in-hadoop> as well, regards pinning map/reduce tasks to specific cores, either on hadoop v.1.2.1 or hadoop v.2.
In specific, I would like to know if the end-user can have any control on which core executes a specific map/reduce task.

To pin an application on linux, there's the "taskset" command, but is anything similar provided by hadoop? If not, is the Linux Scheduler in charge of allocating tasks to specific cores?

------------------
Below I am providing two cases to better illustrate my question:
Case #1: 2 GiB input size, HDFS block size of 64 MiB and 2 compute nodes available, with 32 cores each.
As follows, 32 map tasks will be called; let's suppose that mapred.tasktracker.map.tasks.maximum = 16, so 16 map tasks will be allocated to each node.
Can I guarantee that each Map Task will run on a specific core, or is it up to the Linux Scheduler?

------------------

Case #2: The same as case #1, but now the input size is 8 GiB, so there are not enough slots for all map tasks (128), so multiple tasks will share the same cores.
Can I control how much "time" each task will spend on a specific core and if it will be reassigned to the same core in the future?
Any information on the above would be highly appreciated.
Kind Regards,
George

RE: Pin Map/Reduce tasks to specific cores

Posted by Rohith Sharma K S <ro...@huawei.com>.
Hi George

In MRV2, YARN supports CGroups implementation.  Using CGroup it is possible to run containers in specific cores.

For your detailed reference, some of the useful links
http://dev.hortonworks.com.s3.amazonaws.com/HDPDocuments/HDP2/HDP-2-trunk/bk_system-admin-guide/content/ch_cgroups.html
http://blog.cloudera.com/blog/2013/12/managing-multiple-resources-in-hadoop-2-with-yarn/
http://riccomini.name/posts/hadoop/2013-06-14-yarn-with-cgroups/

P.S : I could not find any related document in Hadoop Yarn docs. I will raise ticket for the same  in community.

Hope the above information will help your use case!!!

Thanks & Regards
Rohith Sharma K S

From: George Ioannidis [mailto:giorgioath@gmail.com]
Sent: 07 April 2015 01:55
To: user@hadoop.apache.org
Subject: Pin Map/Reduce tasks to specific cores

Hello. My question, which can be found on Stack Overflow<http://stackoverflow.com/questions/29283213/core-affinity-of-map-tasks-in-hadoop> as well, regards pinning map/reduce tasks to specific cores, either on hadoop v.1.2.1 or hadoop v.2.
In specific, I would like to know if the end-user can have any control on which core executes a specific map/reduce task.

To pin an application on linux, there's the "taskset" command, but is anything similar provided by hadoop? If not, is the Linux Scheduler in charge of allocating tasks to specific cores?

------------------
Below I am providing two cases to better illustrate my question:
Case #1: 2 GiB input size, HDFS block size of 64 MiB and 2 compute nodes available, with 32 cores each.
As follows, 32 map tasks will be called; let's suppose that mapred.tasktracker.map.tasks.maximum = 16, so 16 map tasks will be allocated to each node.
Can I guarantee that each Map Task will run on a specific core, or is it up to the Linux Scheduler?

------------------

Case #2: The same as case #1, but now the input size is 8 GiB, so there are not enough slots for all map tasks (128), so multiple tasks will share the same cores.
Can I control how much "time" each task will spend on a specific core and if it will be reassigned to the same core in the future?
Any information on the above would be highly appreciated.
Kind Regards,
George