You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Tsz Wo (Nicholas), SZE (JIRA)" <ji...@apache.org> on 2009/04/23 01:51:47 UTC

[jira] Created: (HADOOP-5725) TaskTracker shuold run user tasks nicely in the local machine

TaskTracker shuold run user tasks nicely in the local machine
-------------------------------------------------------------

                 Key: HADOOP-5725
                 URL: https://issues.apache.org/jira/browse/HADOOP-5725
             Project: Hadoop Core
          Issue Type: Improvement
          Components: mapred
            Reporter: Tsz Wo (Nicholas), SZE


If one task tried to use all CPUs in a local machine, all other tasks or processes (includes tasktracker and datanode daemons) may hardly get a chance to run.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5725) TaskTracker shuold run user tasks nicely in the local machine

Posted by "Tsz Wo (Nicholas), SZE (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12702016#action_12702016 ] 

Tsz Wo (Nicholas), SZE commented on HADOOP-5725:
------------------------------------------------

Here is my experience:
I had jobs with tasks expected to run ~30 minutes.  However, there were some CPU intensive tasks from other jobs running in the cluster.  If my task got scheduled to the same machine with these CPU intensive tasks, my tasks might run 2 or more hours.  In some cases, the task got timeout or lost TaskTracker.

> TaskTracker shuold run user tasks nicely in the local machine
> -------------------------------------------------------------
>
>                 Key: HADOOP-5725
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5725
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Tsz Wo (Nicholas), SZE
>
> If one task tried to use all CPUs in a local machine, all other tasks or processes (includes tasktracker and datanode daemons) may hardly get a chance to run.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5725) TaskTracker shuold run user tasks nicely in the local machine

Posted by "Hong Tang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12702049#action_12702049 ] 

Hong Tang commented on HADOOP-5725:
-----------------------------------

@Nicholas,

The issue may be more complicated than your description. 

- The other jobs happen to be mutli-threaded, so a task takes more than one core, leading to even severe CPU contention.
-  The other jobs also use more memory aggressively. They were supposed to have 3 threads in each instance and run one instance on each node, which was doable under HOD. But now they are configured with 4 threads with two instances per node so the memory requirement increases by 2.7x.

The end effect of both lead to non-linear slow-down on everything running on that node. In my opinion, the root cause of this problem is that our resource management per node is static. The ultimate solution is to allow user to provide finer control of how much resources are needed for their job/tasks and schedule tasks based on available resources at run time.



> TaskTracker shuold run user tasks nicely in the local machine
> -------------------------------------------------------------
>
>                 Key: HADOOP-5725
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5725
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Tsz Wo (Nicholas), SZE
>
> If one task tried to use all CPUs in a local machine, all other tasks or processes (includes tasktracker and datanode daemons) may hardly get a chance to run.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5725) TaskTracker shuold run user tasks nicely in the local machine

Posted by "Tsz Wo (Nicholas), SZE (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12702443#action_12702443 ] 

Tsz Wo (Nicholas), SZE commented on HADOOP-5725:
------------------------------------------------

> Nicholas - did you require higher memory for your tasks too? If so, you could use the memory-requirement control knob...

No, my jobs did not require high memory.

> TaskTracker shuold run user tasks nicely in the local machine
> -------------------------------------------------------------
>
>                 Key: HADOOP-5725
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5725
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Tsz Wo (Nicholas), SZE
>
> If one task tried to use all CPUs in a local machine, all other tasks or processes (includes tasktracker and datanode daemons) may hardly get a chance to run.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5725) TaskTracker shuold run user tasks nicely in the local machine

Posted by "Tsz Wo (Nicholas), SZE (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12702017#action_12702017 ] 

Tsz Wo (Nicholas), SZE commented on HADOOP-5725:
------------------------------------------------

I think we should at least set the nice level for tasks to 19.  So tasks would not block tasktracker and datanode daemons, although it may not help on other tasks from a different job.  This problem may be hard in general.

> TaskTracker shuold run user tasks nicely in the local machine
> -------------------------------------------------------------
>
>                 Key: HADOOP-5725
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5725
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Tsz Wo (Nicholas), SZE
>
> If one task tried to use all CPUs in a local machine, all other tasks or processes (includes tasktracker and datanode daemons) may hardly get a chance to run.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5725) TaskTracker shuold run user tasks nicely in the local machine

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12702254#action_12702254 ] 

Arun C Murthy commented on HADOOP-5725:
---------------------------------------

bq. I had jobs with tasks expected to run ~30 minutes. However, there were some CPU intensive tasks from other jobs running in the cluster. If my task got scheduled to the same machine with these CPU intensive tasks, my tasks might run 2 or more hours. In some cases, the task got timeout or lost TaskTracker.

Nicholas - did you require higher memory for your tasks too? If so, you could use the memory-requirement control knob...

----

Overall, I agree with Hong - the better solution is for the JobTracker to dynamically react to the load on the TaskTrackers and throttle slots etc.

> TaskTracker shuold run user tasks nicely in the local machine
> -------------------------------------------------------------
>
>                 Key: HADOOP-5725
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5725
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Tsz Wo (Nicholas), SZE
>
> If one task tried to use all CPUs in a local machine, all other tasks or processes (includes tasktracker and datanode daemons) may hardly get a chance to run.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.