You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-dev@hadoop.apache.org by "Vinod K V (JIRA)" <ji...@apache.org> on 2009/03/25 09:04:51 UTC

[jira] Created: (HADOOP-5568) TaskMemoryManager not enforcing memory limits in the presence of rogue tasks

TaskMemoryManager not enforcing memory limits in the presence of rogue tasks
----------------------------------------------------------------------------

                 Key: HADOOP-5568
                 URL: https://issues.apache.org/jira/browse/HADOOP-5568
             Project: Hadoop Core
          Issue Type: Bug
          Components: mapred
            Reporter: Vinod K V




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-5568) TaskMemoryManager not enforcing memory limits in the presence of rogue tasks

Posted by "Vinod K V (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-5568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12689061#action_12689061 ] 

Vinod K V commented on HADOOP-5568:
-----------------------------------

The primary observation while testing TaskMemoryManager is that it is not able to prevent nodes from going down when rogue tasks start consuming memory. It currently does the following:
 - It monitors the memory usage of each task (the task jvm and the descendant processes), and makes sure that the task is failed if the task goes beyond its memory reqs(specified via mapred.task.maxvmem).
 - Further, it also monitors the memory usage of all tasks running on a TT and makes sure that cumulative memory usage doesn't cross a specific limit (Total TT Vmem less mapred.tasktracker.vmem.reserved) by killing the least-progress tasks to bring down the memory usage.

The per-task monitoring is working fine with tasks growing at a moderate rate of till/around 100MB/sec. There are problems with the cumulative-usage monitoring.
 - The limit mapred.task.limit.maxvmem is supposed originally to prevent jobs from asking too much memory. If a single task asks for memory nearing the total usable Vmem on the TT, we don't prevent the task from running and as of now just log at warn level in the TT if it crosses mapred.task.limit.maxvmem. This is very problematic without any support for memory-based scheduling as tasks can potentially bring down nodes and we have seen instances of this.
 - Even if the tasks are withing limits, as mapred.task.limit.maxvmem is really not enforced, cumulative usage near total usable Vmem on the TT brings down the node and we have seen instances of this too.

> TaskMemoryManager not enforcing memory limits in the presence of rogue tasks
> ----------------------------------------------------------------------------
>
>                 Key: HADOOP-5568
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5568
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Vinod K V
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-5568) TaskMemoryManager not enforcing memory limits in the presence of rogue tasks

Posted by "Vinod K V (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-5568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12689062#action_12689062 ] 

Vinod K V commented on HADOOP-5568:
-----------------------------------

The fix for this is to enforce mapred.task.limit.maxvmem either on the JT and TT. Any thoughts about the options?

> TaskMemoryManager not enforcing memory limits in the presence of rogue tasks
> ----------------------------------------------------------------------------
>
>                 Key: HADOOP-5568
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5568
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Vinod K V
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.