You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-issues@hadoop.apache.org by "dhruba borthakur (JIRA)" <ji...@apache.org> on 2009/11/19 01:49:39 UTC

[jira] Created: (MAPREDUCE-1221) Kill tasks on a node if the free physical memory on that machine falls below a configured threshold

Kill tasks on a node if the free physical memory on that machine falls below a configured threshold
---------------------------------------------------------------------------------------------------

Key: MAPREDUCE-1221
URL: https://issues.apache.org/jira/browse/MAPREDUCE-1221
Project: Hadoop Map/Reduce
Issue Type: Improvement
Components: tasktracker
Reporter: dhruba borthakur
Assignee: dhruba borthakur

The TaskTracker currently supports killing tasks if the virtual memory of a task exceeds a set of configured thresholds. I would like to extend this feature to enable killing tasks if the physical memory used by that task exceeds a certain threshold.

On a certain operating system (guess?), if user space processes start using lots of memory, the machine hangs and dies quickly. This means that we would like to prevent map-reduce jobs from triggering this condition. From my understanding, the killing-based-on-virtual-memory-limits (HADOOP-5883) were designed to address this problem. This works well when most map-reduce jobs are Java jobs and have well-defined -Xmx parameters that specify the max virtual memory for each task. On the other hand, if each task forks off mappers/reducers written in other languages (python/php, etc), the total virtual memory usage of the process-subtree varies greatly. In these cases, it is better to use kill-tasks-using-physical-memory-limits.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1221) Kill tasks on a node if the free physical memory on that machine falls below a configured threshold

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12835966#action_12835966 ] 

Zheng Shao commented on MAPREDUCE-1221:
---------------------------------------

Scott, can you replace TAB with 2 spaces in your code?


> Kill tasks on a node if the free physical memory on that machine falls below a configured threshold
> ---------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1221
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1221
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.22.0
>            Reporter: dhruba borthakur
>            Assignee: Scott Chen
>             Fix For: 0.22.0
>
>         Attachments: MAPREDUCE-1221-v1.patch, MAPREDUCE-1221-v2.patch
>
>
> The TaskTracker currently supports killing tasks if the virtual memory of a task exceeds a set of configured thresholds. I would like to extend this feature to enable killing tasks if the physical memory used by that task exceeds a certain threshold.
> On a certain operating system (guess?), if user space processes start using lots of memory, the machine hangs and dies quickly. This means that we would like to prevent map-reduce jobs from triggering this condition. From my understanding, the killing-based-on-virtual-memory-limits (HADOOP-5883) were designed to address this problem. This works well when most map-reduce jobs are Java jobs and have well-defined -Xmx parameters that specify the max virtual memory for each task. On the other hand, if each task forks off mappers/reducers written in other languages (python/php, etc), the total virtual memory usage of the process-subtree varies greatly. In these cases, it is better to use kill-tasks-using-physical-memory-limits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1221) Kill tasks on a node if the free physical memory on that machine falls below a configured threshold

Posted by "Scott Chen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12792736#action_12792736 ] 

Scott Chen commented on MAPREDUCE-1221:
---------------------------------------

About the virtual memory limiting, we have tried to use it in our cluster. Our experience is that even if we set the total memory threshold to a high enough value, TaskTracker would still kills a considerable amount of tasks when there is nothing wrong with the RSS memory. So we decided to extend the virtual memory limiting to this physical one. Anyway, it doesn't hurt to have more options. It will not be turned on if the configuration is not set.

> Kill tasks on a node if the free physical memory on that machine falls below a configured threshold
> ---------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1221
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1221
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>            Reporter: dhruba borthakur
>            Assignee: Scott Chen
>         Attachments: MAPREDUCE-1221-v1.patch
>
>
> The TaskTracker currently supports killing tasks if the virtual memory of a task exceeds a set of configured thresholds. I would like to extend this feature to enable killing tasks if the physical memory used by that task exceeds a certain threshold.
> On a certain operating system (guess?), if user space processes start using lots of memory, the machine hangs and dies quickly. This means that we would like to prevent map-reduce jobs from triggering this condition. From my understanding, the killing-based-on-virtual-memory-limits (HADOOP-5883) were designed to address this problem. This works well when most map-reduce jobs are Java jobs and have well-defined -Xmx parameters that specify the max virtual memory for each task. On the other hand, if each task forks off mappers/reducers written in other languages (python/php, etc), the total virtual memory usage of the process-subtree varies greatly. In these cases, it is better to use kill-tasks-using-physical-memory-limits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1221) Kill tasks on a node if the free physical memory on that machine falls below a configured threshold

Posted by "Scott Chen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12838475#action_12838475 ] 

Scott Chen commented on MAPREDUCE-1221:
---------------------------------------

+1 on having a JIRA for per-task-physical-memory-limiting

+1 on Hong's idea of fine tuning the policy. We can make this part configurable.



> Kill tasks on a node if the free physical memory on that machine falls below a configured threshold
> ---------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1221
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1221
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.22.0
>            Reporter: dhruba borthakur
>            Assignee: Scott Chen
>             Fix For: 0.22.0
>
>         Attachments: MAPREDUCE-1221-v1.patch, MAPREDUCE-1221-v2.patch, MAPREDUCE-1221-v3.patch
>
>
> The TaskTracker currently supports killing tasks if the virtual memory of a task exceeds a set of configured thresholds. I would like to extend this feature to enable killing tasks if the physical memory used by that task exceeds a certain threshold.
> On a certain operating system (guess?), if user space processes start using lots of memory, the machine hangs and dies quickly. This means that we would like to prevent map-reduce jobs from triggering this condition. From my understanding, the killing-based-on-virtual-memory-limits (HADOOP-5883) were designed to address this problem. This works well when most map-reduce jobs are Java jobs and have well-defined -Xmx parameters that specify the max virtual memory for each task. On the other hand, if each task forks off mappers/reducers written in other languages (python/php, etc), the total virtual memory usage of the process-subtree varies greatly. In these cases, it is better to use kill-tasks-using-physical-memory-limits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1221) Kill tasks on a node if the free physical memory on that machine falls below a configured threshold

Posted by "Amareshwari Sriramadasu (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12851891#action_12851891 ] 

Amareshwari Sriramadasu commented on MAPREDUCE-1221:
----------------------------------------------------

bq. First up: Do you agree that we need to fail the task and not just kill it, so that the job fails fast?
One more thing to add here is  "If a task is failed, JobTracker will try to schedule the task on different machine. If the task is killed, there are more chances that the task will be re-executed on the same tracker".  

bq. If the total limit is violated, TaskTracker will kill the task with highest amount of memory to relief the memory pressure.
So, if we kill a task with highest amount of memory, these are more chances that the task will be executed on the same machine. Doesn't this trouble the tracker again?


> Kill tasks on a node if the free physical memory on that machine falls below a configured threshold
> ---------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1221
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1221
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.22.0
>            Reporter: dhruba borthakur
>            Assignee: Scott Chen
>             Fix For: 0.22.0
>
>         Attachments: MAPREDUCE-1221-v1.patch, MAPREDUCE-1221-v2.patch, MAPREDUCE-1221-v3.patch, MAPREDUCE-1221-v4.patch
>
>
> The TaskTracker currently supports killing tasks if the virtual memory of a task exceeds a set of configured thresholds. I would like to extend this feature to enable killing tasks if the physical memory used by that task exceeds a certain threshold.
> On a certain operating system (guess?), if user space processes start using lots of memory, the machine hangs and dies quickly. This means that we would like to prevent map-reduce jobs from triggering this condition. From my understanding, the killing-based-on-virtual-memory-limits (HADOOP-5883) were designed to address this problem. This works well when most map-reduce jobs are Java jobs and have well-defined -Xmx parameters that specify the max virtual memory for each task. On the other hand, if each task forks off mappers/reducers written in other languages (python/php, etc), the total virtual memory usage of the process-subtree varies greatly. In these cases, it is better to use kill-tasks-using-physical-memory-limits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1221) Kill tasks on a node if the free physical memory on that machine falls below a configured threshold

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12857532#action_12857532 ] 

Hadoop QA commented on MAPREDUCE-1221:
--------------------------------------

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12441860/MAPREDUCE-1221-v5.txt
  against trunk revision 933441.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 3 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/113/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/113/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/113/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/113/console

This message is automatically generated.

> Kill tasks on a node if the free physical memory on that machine falls below a configured threshold
> ---------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1221
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1221
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.22.0
>            Reporter: dhruba borthakur
>            Assignee: Scott Chen
>             Fix For: 0.21.0
>
>         Attachments: MAPREDUCE-1221-v1.patch, MAPREDUCE-1221-v2.patch, MAPREDUCE-1221-v3.patch, MAPREDUCE-1221-v4.patch, MAPREDUCE-1221-v5.txt
>
>
> The TaskTracker currently supports killing tasks if the virtual memory of a task exceeds a set of configured thresholds. I would like to extend this feature to enable killing tasks if the physical memory used by that task exceeds a certain threshold.
> On a certain operating system (guess?), if user space processes start using lots of memory, the machine hangs and dies quickly. This means that we would like to prevent map-reduce jobs from triggering this condition. From my understanding, the killing-based-on-virtual-memory-limits (HADOOP-5883) were designed to address this problem. This works well when most map-reduce jobs are Java jobs and have well-defined -Xmx parameters that specify the max virtual memory for each task. On the other hand, if each task forks off mappers/reducers written in other languages (python/php, etc), the total virtual memory usage of the process-subtree varies greatly. In these cases, it is better to use kill-tasks-using-physical-memory-limits.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (MAPREDUCE-1221) Kill tasks on a node if the free physical memory on that machine falls below a configured threshold

Posted by "Scott Chen (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Scott Chen updated MAPREDUCE-1221:
----------------------------------

    Status: Open  (was: Patch Available)

> Kill tasks on a node if the free physical memory on that machine falls below a configured threshold
> ---------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1221
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1221
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.22.0
>            Reporter: dhruba borthakur
>            Assignee: Scott Chen
>             Fix For: 0.22.0
>
>         Attachments: MAPREDUCE-1221-v1.patch, MAPREDUCE-1221-v2.patch
>
>
> The TaskTracker currently supports killing tasks if the virtual memory of a task exceeds a set of configured thresholds. I would like to extend this feature to enable killing tasks if the physical memory used by that task exceeds a certain threshold.
> On a certain operating system (guess?), if user space processes start using lots of memory, the machine hangs and dies quickly. This means that we would like to prevent map-reduce jobs from triggering this condition. From my understanding, the killing-based-on-virtual-memory-limits (HADOOP-5883) were designed to address this problem. This works well when most map-reduce jobs are Java jobs and have well-defined -Xmx parameters that specify the max virtual memory for each task. On the other hand, if each task forks off mappers/reducers written in other languages (python/php, etc), the total virtual memory usage of the process-subtree varies greatly. In these cases, it is better to use kill-tasks-using-physical-memory-limits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1221) Kill tasks on a node if the free physical memory on that machine falls below a configured threshold

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12779840#action_12779840 ] 

dhruba borthakur commented on MAPREDUCE-1221:
---------------------------------------------

> In any case, it currently works only on Linux

It is, of course, for Linux :-)

> The TaskMemoryManager indeed looks at the memory usage of all the processes in the task's process-tree irr

That's is right. But my point was that each python/php process that is forked by the mapper actually uses about 500GB of virtual memory (at the minimum). And for each python library that the process includes, it adds yet another chunk of virtual memory. Most of this virtual memory is  unused the python/php interpreter; i.e. no physical pages allocated for the process. But since the existing MR framework looks at the total virtual memory of the process subtree, it starts killing tasks even though a large chunk  of the physical RAM on on the machine is unused.

The proposal to have the ability to kill tasks based on physical memory usage should make system utilization much better, do you agree? 

> Kill tasks on a node if the free physical memory on that machine falls below a configured threshold
> ---------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1221
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1221
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>
> The TaskTracker currently supports killing tasks if the virtual memory of a task exceeds a set of configured thresholds. I would like to extend this feature to enable killing tasks if the physical memory used by that task exceeds a certain threshold.
> On a certain operating system (guess?), if user space processes start using lots of memory, the machine hangs and dies quickly. This means that we would like to prevent map-reduce jobs from triggering this condition. From my understanding, the killing-based-on-virtual-memory-limits (HADOOP-5883) were designed to address this problem. This works well when most map-reduce jobs are Java jobs and have well-defined -Xmx parameters that specify the max virtual memory for each task. On the other hand, if each task forks off mappers/reducers written in other languages (python/php, etc), the total virtual memory usage of the process-subtree varies greatly. In these cases, it is better to use kill-tasks-using-physical-memory-limits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1221) Kill tasks on a node if the free physical memory on that machine falls below a configured threshold

Posted by "Scott Chen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12846541#action_12846541 ] 

Scott Chen commented on MAPREDUCE-1221:
---------------------------------------

The patch has been changed over time. Here is a quick overall summary of what this patch does.

The purpose of this patch is to allow TaskTracker to kill/fail tasks based on the RSS memory status.

We can set the following three different parameters
mapreduce.tasktracker.reserved.physicalmemory.mb
mapreduce.map.memory.physical.mb 
mapreduce.reduce.memory.physical.mb

They will determine the total allowed RSS memory for tasks and the limit for individual tasks.
If the total limit is violated, TaskTracker will kill the task with highest amount of memory to relief the memory pressure.
If the per task limit is violated, TaskTracker will *fail* the task that violate the limit. 
If the parameters are not set, there will not be any limit.

The implementation is mostly follow the virtual memory limiting logic that we already have.
And ProcfsBasedProcessTree also allow us to obtain physical memory of tasks.

The tests added are the following
TestTaskTrackerMemoryManager.testTasksCumulativelyExceedingTTPhysicalLimits()
TestTaskTrackerMemoryManager.testTasksBeyondPhysicalLimits()
They verifies the behavior of the cases when total memory limit and per task limit are triggered.
There is also a slight modification in TestTaskTrackerMemoryManager.testTasksWithinLimits() to make sure the tasks within physical memory limit will run correctly.


> Kill tasks on a node if the free physical memory on that machine falls below a configured threshold
> ---------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1221
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1221
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.22.0
>            Reporter: dhruba borthakur
>            Assignee: Scott Chen
>             Fix For: 0.22.0
>
>         Attachments: MAPREDUCE-1221-v1.patch, MAPREDUCE-1221-v2.patch, MAPREDUCE-1221-v3.patch, MAPREDUCE-1221-v4.patch
>
>
> The TaskTracker currently supports killing tasks if the virtual memory of a task exceeds a set of configured thresholds. I would like to extend this feature to enable killing tasks if the physical memory used by that task exceeds a certain threshold.
> On a certain operating system (guess?), if user space processes start using lots of memory, the machine hangs and dies quickly. This means that we would like to prevent map-reduce jobs from triggering this condition. From my understanding, the killing-based-on-virtual-memory-limits (HADOOP-5883) were designed to address this problem. This works well when most map-reduce jobs are Java jobs and have well-defined -Xmx parameters that specify the max virtual memory for each task. On the other hand, if each task forks off mappers/reducers written in other languages (python/php, etc), the total virtual memory usage of the process-subtree varies greatly. In these cases, it is better to use kill-tasks-using-physical-memory-limits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1221) Kill tasks on a node if the free physical memory on that machine falls below a configured threshold

Posted by "Scott Chen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12857468#action_12857468 ] 

Scott Chen commented on MAPREDUCE-1221:
---------------------------------------

Arun, Thanks for the review
bq. We shouldn't be using the TT.fConf to read config values each time, please save it and re-use:
Good catch. I made some change. A member called reservedPhysicalMemoryOnTT is created to save this value.

bq. I don't understand why 'limitPhysical > 0' is necessary. If 'doCheckPhysicalMemory' returns true, shouldn't limitPhysical always be positive? Please help me understand, thanks.
You are right about this one. That check is redundant. If doCheckPhysicalMemory returns true, it means the user has configured physical memory limit (should include per task limit) and this value should not be negative.
The logic should be the same as the one in virtual memory limit. I have removed this redundant check.

> Kill tasks on a node if the free physical memory on that machine falls below a configured threshold
> ---------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1221
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1221
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.22.0
>            Reporter: dhruba borthakur
>            Assignee: Scott Chen
>             Fix For: 0.22.0
>
>         Attachments: MAPREDUCE-1221-v1.patch, MAPREDUCE-1221-v2.patch, MAPREDUCE-1221-v3.patch, MAPREDUCE-1221-v4.patch, MAPREDUCE-1221-v5.txt
>
>
> The TaskTracker currently supports killing tasks if the virtual memory of a task exceeds a set of configured thresholds. I would like to extend this feature to enable killing tasks if the physical memory used by that task exceeds a certain threshold.
> On a certain operating system (guess?), if user space processes start using lots of memory, the machine hangs and dies quickly. This means that we would like to prevent map-reduce jobs from triggering this condition. From my understanding, the killing-based-on-virtual-memory-limits (HADOOP-5883) were designed to address this problem. This works well when most map-reduce jobs are Java jobs and have well-defined -Xmx parameters that specify the max virtual memory for each task. On the other hand, if each task forks off mappers/reducers written in other languages (python/php, etc), the total virtual memory usage of the process-subtree varies greatly. In these cases, it is better to use kill-tasks-using-physical-memory-limits.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (MAPREDUCE-1221) Kill tasks on a node if the free physical memory on that machine falls below a configured threshold

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12828289#action_12828289 ] 

Hadoop QA commented on MAPREDUCE-1221:
--------------------------------------

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12431623/MAPREDUCE-1221-v2.patch
  against trunk revision 905008.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 3 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/422/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/422/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/422/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/422/console

This message is automatically generated.

> Kill tasks on a node if the free physical memory on that machine falls below a configured threshold
> ---------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1221
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1221
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.22.0
>            Reporter: dhruba borthakur
>            Assignee: Scott Chen
>             Fix For: 0.22.0
>
>         Attachments: MAPREDUCE-1221-v1.patch, MAPREDUCE-1221-v2.patch
>
>
> The TaskTracker currently supports killing tasks if the virtual memory of a task exceeds a set of configured thresholds. I would like to extend this feature to enable killing tasks if the physical memory used by that task exceeds a certain threshold.
> On a certain operating system (guess?), if user space processes start using lots of memory, the machine hangs and dies quickly. This means that we would like to prevent map-reduce jobs from triggering this condition. From my understanding, the killing-based-on-virtual-memory-limits (HADOOP-5883) were designed to address this problem. This works well when most map-reduce jobs are Java jobs and have well-defined -Xmx parameters that specify the max virtual memory for each task. On the other hand, if each task forks off mappers/reducers written in other languages (python/php, etc), the total virtual memory usage of the process-subtree varies greatly. In these cases, it is better to use kill-tasks-using-physical-memory-limits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1221) Kill tasks on a node if the free physical memory on that machine falls below a configured threshold

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12838011#action_12838011 ] 

Arun C Murthy commented on MAPREDUCE-1221:
------------------------------------------

Scott: setrlimit works for a process and all it's children. So, if the TT set it for each of the map/reduce tasks we are good, no?

{noformat}
NAME
     getrlimit, setrlimit -- control maximum system resource consumption

SYNOPSIS
     #include <sys/resource.h>

     int
     getrlimit(int resource, struct rlimit *rlp);

     int
     setrlimit(int resource, const struct rlimit *rlp);

DESCRIPTION
     Limits on the consumption of system resources by the current process and each
     process it creates may be obtained with the getrlimit() call, and set with
     the setrlimit() call.
{noformat}

The reason I'm pushing for this is that we already have (or had) a feature to set ulimit on the children. This seems like a natural extension of that feature.


> Kill tasks on a node if the free physical memory on that machine falls below a configured threshold
> ---------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1221
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1221
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.22.0
>            Reporter: dhruba borthakur
>            Assignee: Scott Chen
>             Fix For: 0.22.0
>
>         Attachments: MAPREDUCE-1221-v1.patch, MAPREDUCE-1221-v2.patch, MAPREDUCE-1221-v3.patch
>
>
> The TaskTracker currently supports killing tasks if the virtual memory of a task exceeds a set of configured thresholds. I would like to extend this feature to enable killing tasks if the physical memory used by that task exceeds a certain threshold.
> On a certain operating system (guess?), if user space processes start using lots of memory, the machine hangs and dies quickly. This means that we would like to prevent map-reduce jobs from triggering this condition. From my understanding, the killing-based-on-virtual-memory-limits (HADOOP-5883) were designed to address this problem. This works well when most map-reduce jobs are Java jobs and have well-defined -Xmx parameters that specify the max virtual memory for each task. On the other hand, if each task forks off mappers/reducers written in other languages (python/php, etc), the total virtual memory usage of the process-subtree varies greatly. In these cases, it is better to use kill-tasks-using-physical-memory-limits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1221) Kill tasks on a node if the free physical memory on that machine falls below a configured threshold

Posted by "Scott Chen (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Scott Chen updated MAPREDUCE-1221:
----------------------------------

    Attachment: MAPREDUCE-1221-v1.patch

The patch allows us to set an amount of memory that will not be used to run tasks. If this limit is violated, the task uses the highest amount of memory will be killed.

Ex: Configure mapreduce.tasktracker.reserved.physicalmemory.mb=3072
If there's a TaskTracker with 16GB of memory and currently the tasks are using 14GB of memory, then we have 16GB-14GB < 3GB. In this case, TaskMemoryManagerThread will kill the task uses the highest amount of memory. Note that if the value is not configured, this policy will not triggered.

Killing tasks will slow down the job because the task has to be scheduled again. But if the task is not killed in this case, it is very likely that it will failed on this node because the node has no memory to run it. Also the node might crashed because of this. We choose the highest memory-consuming task to kill because it is likely that it is the bad job that's causing the problem.

A part of this patch is done by Dhruba. He has sent me his half-done patch and I continued from there.

> Kill tasks on a node if the free physical memory on that machine falls below a configured threshold
> ---------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1221
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1221
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>            Reporter: dhruba borthakur
>            Assignee: Scott Chen
>         Attachments: MAPREDUCE-1221-v1.patch
>
>
> The TaskTracker currently supports killing tasks if the virtual memory of a task exceeds a set of configured thresholds. I would like to extend this feature to enable killing tasks if the physical memory used by that task exceeds a certain threshold.
> On a certain operating system (guess?), if user space processes start using lots of memory, the machine hangs and dies quickly. This means that we would like to prevent map-reduce jobs from triggering this condition. From my understanding, the killing-based-on-virtual-memory-limits (HADOOP-5883) were designed to address this problem. This works well when most map-reduce jobs are Java jobs and have well-defined -Xmx parameters that specify the max virtual memory for each task. On the other hand, if each task forks off mappers/reducers written in other languages (python/php, etc), the total virtual memory usage of the process-subtree varies greatly. In these cases, it is better to use kill-tasks-using-physical-memory-limits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1221) Kill tasks on a node if the free physical memory on that machine falls below a configured threshold

Posted by "Scott Chen (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Scott Chen updated MAPREDUCE-1221:
----------------------------------

    Status: Open  (was: Patch Available)

> Kill tasks on a node if the free physical memory on that machine falls below a configured threshold
> ---------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1221
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1221
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.22.0
>            Reporter: dhruba borthakur
>            Assignee: Scott Chen
>             Fix For: 0.22.0
>
>         Attachments: MAPREDUCE-1221-v1.patch, MAPREDUCE-1221-v2.patch
>
>
> The TaskTracker currently supports killing tasks if the virtual memory of a task exceeds a set of configured thresholds. I would like to extend this feature to enable killing tasks if the physical memory used by that task exceeds a certain threshold.
> On a certain operating system (guess?), if user space processes start using lots of memory, the machine hangs and dies quickly. This means that we would like to prevent map-reduce jobs from triggering this condition. From my understanding, the killing-based-on-virtual-memory-limits (HADOOP-5883) were designed to address this problem. This works well when most map-reduce jobs are Java jobs and have well-defined -Xmx parameters that specify the max virtual memory for each task. On the other hand, if each task forks off mappers/reducers written in other languages (python/php, etc), the total virtual memory usage of the process-subtree varies greatly. In these cases, it is better to use kill-tasks-using-physical-memory-limits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1221) Kill tasks on a node if the free physical memory on that machine falls below a configured threshold

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy updated MAPREDUCE-1221:
-------------------------------------

           Status: Resolved  (was: Patch Available)
     Hadoop Flags: [Reviewed]
    Fix Version/s: 0.22.0
                       (was: 0.21.0)
       Resolution: Fixed

I just committed this. Thanks Scott!

I've committed this to trunk, the patch doesn't apply to branch-0.21.

> Kill tasks on a node if the free physical memory on that machine falls below a configured threshold
> ---------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1221
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1221
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.22.0
>            Reporter: dhruba borthakur
>            Assignee: Scott Chen
>             Fix For: 0.22.0
>
>         Attachments: MAPREDUCE-1221-v1.patch, MAPREDUCE-1221-v2.patch, MAPREDUCE-1221-v3.patch, MAPREDUCE-1221-v4.patch, MAPREDUCE-1221-v5.1.txt, MAPREDUCE-1221-v5.txt, MAPREDUCE-1221-v6.txt
>
>
> The TaskTracker currently supports killing tasks if the virtual memory of a task exceeds a set of configured thresholds. I would like to extend this feature to enable killing tasks if the physical memory used by that task exceeds a certain threshold.
> On a certain operating system (guess?), if user space processes start using lots of memory, the machine hangs and dies quickly. This means that we would like to prevent map-reduce jobs from triggering this condition. From my understanding, the killing-based-on-virtual-memory-limits (HADOOP-5883) were designed to address this problem. This works well when most map-reduce jobs are Java jobs and have well-defined -Xmx parameters that specify the max virtual memory for each task. On the other hand, if each task forks off mappers/reducers written in other languages (python/php, etc), the total virtual memory usage of the process-subtree varies greatly. In these cases, it is better to use kill-tasks-using-physical-memory-limits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1221) Kill tasks on a node if the free physical memory on that machine falls below a configured threshold

Posted by "Scott Chen (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Scott Chen updated MAPREDUCE-1221:
----------------------------------

           Status: Patch Available  (was: Open)
    Fix Version/s: 0.21.0
                       (was: 0.22.0)

> Kill tasks on a node if the free physical memory on that machine falls below a configured threshold
> ---------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1221
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1221
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.22.0
>            Reporter: dhruba borthakur
>            Assignee: Scott Chen
>             Fix For: 0.21.0
>
>         Attachments: MAPREDUCE-1221-v1.patch, MAPREDUCE-1221-v2.patch, MAPREDUCE-1221-v3.patch, MAPREDUCE-1221-v4.patch, MAPREDUCE-1221-v5.txt
>
>
> The TaskTracker currently supports killing tasks if the virtual memory of a task exceeds a set of configured thresholds. I would like to extend this feature to enable killing tasks if the physical memory used by that task exceeds a certain threshold.
> On a certain operating system (guess?), if user space processes start using lots of memory, the machine hangs and dies quickly. This means that we would like to prevent map-reduce jobs from triggering this condition. From my understanding, the killing-based-on-virtual-memory-limits (HADOOP-5883) were designed to address this problem. This works well when most map-reduce jobs are Java jobs and have well-defined -Xmx parameters that specify the max virtual memory for each task. On the other hand, if each task forks off mappers/reducers written in other languages (python/php, etc), the total virtual memory usage of the process-subtree varies greatly. In these cases, it is better to use kill-tasks-using-physical-memory-limits.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (MAPREDUCE-1221) Kill tasks on a node if the free physical memory on that machine falls below a configured threshold

Posted by "Scott Chen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12846506#action_12846506 ] 

Scott Chen commented on MAPREDUCE-1221:
---------------------------------------

I have run TestJobClient on two different dev boxes. Both worked. I will submit this to Hudson again.

> Kill tasks on a node if the free physical memory on that machine falls below a configured threshold
> ---------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1221
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1221
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.22.0
>            Reporter: dhruba borthakur
>            Assignee: Scott Chen
>             Fix For: 0.22.0
>
>         Attachments: MAPREDUCE-1221-v1.patch, MAPREDUCE-1221-v2.patch, MAPREDUCE-1221-v3.patch, MAPREDUCE-1221-v4.patch
>
>
> The TaskTracker currently supports killing tasks if the virtual memory of a task exceeds a set of configured thresholds. I would like to extend this feature to enable killing tasks if the physical memory used by that task exceeds a certain threshold.
> On a certain operating system (guess?), if user space processes start using lots of memory, the machine hangs and dies quickly. This means that we would like to prevent map-reduce jobs from triggering this condition. From my understanding, the killing-based-on-virtual-memory-limits (HADOOP-5883) were designed to address this problem. This works well when most map-reduce jobs are Java jobs and have well-defined -Xmx parameters that specify the max virtual memory for each task. On the other hand, if each task forks off mappers/reducers written in other languages (python/php, etc), the total virtual memory usage of the process-subtree varies greatly. In these cases, it is better to use kill-tasks-using-physical-memory-limits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1221) Kill tasks on a node if the free physical memory on that machine falls below a configured threshold

Posted by "Scott Chen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12858749#action_12858749 ] 

Scott Chen commented on MAPREDUCE-1221:
---------------------------------------

Hey Amareshwari, Thanks for the comments.

bq. In TaskTracker.addToMemoryManager() method, shall we move the calculation of physicalMemoryLimit and virtualMemoryLimit into the check isTaskMemoryManagerEnabled()?
This will be more efficient. But unfortunately this limits needs to be set independently on each job. So we have to keep them in addToMemoryManager. The method isTaskMemoryManagerEnabled() will be called only once to determine whether to start the manager thread at the beginning.

bq. Shouldn't we call isKillable() in TaskMemoryManagerThread.killTasksWithLeastProgress() also ?
killTasksWIthLeastProgress() calls taskTracker.findTaskToKill(). In this method, similar logic as isKillable() is implemented. I think this part of the code is not very clear. I will add an one line comment on it to explain why it does not need to call isKillable().

> Kill tasks on a node if the free physical memory on that machine falls below a configured threshold
> ---------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1221
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1221
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.22.0
>            Reporter: dhruba borthakur
>            Assignee: Scott Chen
>             Fix For: 0.21.0
>
>         Attachments: MAPREDUCE-1221-v1.patch, MAPREDUCE-1221-v2.patch, MAPREDUCE-1221-v3.patch, MAPREDUCE-1221-v4.patch, MAPREDUCE-1221-v5.txt
>
>
> The TaskTracker currently supports killing tasks if the virtual memory of a task exceeds a set of configured thresholds. I would like to extend this feature to enable killing tasks if the physical memory used by that task exceeds a certain threshold.
> On a certain operating system (guess?), if user space processes start using lots of memory, the machine hangs and dies quickly. This means that we would like to prevent map-reduce jobs from triggering this condition. From my understanding, the killing-based-on-virtual-memory-limits (HADOOP-5883) were designed to address this problem. This works well when most map-reduce jobs are Java jobs and have well-defined -Xmx parameters that specify the max virtual memory for each task. On the other hand, if each task forks off mappers/reducers written in other languages (python/php, etc), the total virtual memory usage of the process-subtree varies greatly. In these cases, it is better to use kill-tasks-using-physical-memory-limits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1221) Kill tasks on a node if the free physical memory on that machine falls below a configured threshold

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12838403#action_12838403 ] 

Arun C Murthy commented on MAPREDUCE-1221:
------------------------------------------

Dhruba, this is not about making users write good code. This is about penalizing poorly behaved applications. 

If a MR job consumes too much memory you want to fail its component tasks and eventually fail the job.

My concern is that the _current implementation_ of this feature is not doing that.

To be clear, I'm not against tracking physical memory used by the process. I'm only proposing we need to penalize the applications who consume too much memory. To that affect I'm proposing we *fail* the task if it exceeds the limit. 

I prefer the per-task limit since it has has served us well with virtual memory. Maybe it is a bad idea to use the same model for physical memory, maybe some can help me understand why it is so. I'm happy to reconsider it then, I've asked Allen Wittnauer for his thoughts on this too.

> Kill tasks on a node if the free physical memory on that machine falls below a configured threshold
> ---------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1221
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1221
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.22.0
>            Reporter: dhruba borthakur
>            Assignee: Scott Chen
>             Fix For: 0.22.0
>
>         Attachments: MAPREDUCE-1221-v1.patch, MAPREDUCE-1221-v2.patch, MAPREDUCE-1221-v3.patch
>
>
> The TaskTracker currently supports killing tasks if the virtual memory of a task exceeds a set of configured thresholds. I would like to extend this feature to enable killing tasks if the physical memory used by that task exceeds a certain threshold.
> On a certain operating system (guess?), if user space processes start using lots of memory, the machine hangs and dies quickly. This means that we would like to prevent map-reduce jobs from triggering this condition. From my understanding, the killing-based-on-virtual-memory-limits (HADOOP-5883) were designed to address this problem. This works well when most map-reduce jobs are Java jobs and have well-defined -Xmx parameters that specify the max virtual memory for each task. On the other hand, if each task forks off mappers/reducers written in other languages (python/php, etc), the total virtual memory usage of the process-subtree varies greatly. In these cases, it is better to use kill-tasks-using-physical-memory-limits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1221) Kill tasks on a node if the free physical memory on that machine falls below a configured threshold

Posted by "Scott Chen (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Scott Chen updated MAPREDUCE-1221:
----------------------------------

    Status: Patch Available  (was: Open)

> Kill tasks on a node if the free physical memory on that machine falls below a configured threshold
> ---------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1221
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1221
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.22.0
>            Reporter: dhruba borthakur
>            Assignee: Scott Chen
>             Fix For: 0.22.0
>
>         Attachments: MAPREDUCE-1221-v1.patch, MAPREDUCE-1221-v2.patch, MAPREDUCE-1221-v3.patch, MAPREDUCE-1221-v4.patch
>
>
> The TaskTracker currently supports killing tasks if the virtual memory of a task exceeds a set of configured thresholds. I would like to extend this feature to enable killing tasks if the physical memory used by that task exceeds a certain threshold.
> On a certain operating system (guess?), if user space processes start using lots of memory, the machine hangs and dies quickly. This means that we would like to prevent map-reduce jobs from triggering this condition. From my understanding, the killing-based-on-virtual-memory-limits (HADOOP-5883) were designed to address this problem. This works well when most map-reduce jobs are Java jobs and have well-defined -Xmx parameters that specify the max virtual memory for each task. On the other hand, if each task forks off mappers/reducers written in other languages (python/php, etc), the total virtual memory usage of the process-subtree varies greatly. In these cases, it is better to use kill-tasks-using-physical-memory-limits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1221) Kill tasks on a node if the free physical memory on that machine falls below a configured threshold

Posted by "Scott Chen (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Scott Chen updated MAPREDUCE-1221:
----------------------------------

    Attachment: MAPREDUCE-1221-v5.1.txt

> Kill tasks on a node if the free physical memory on that machine falls below a configured threshold
> ---------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1221
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1221
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.22.0
>            Reporter: dhruba borthakur
>            Assignee: Scott Chen
>             Fix For: 0.21.0
>
>         Attachments: MAPREDUCE-1221-v1.patch, MAPREDUCE-1221-v2.patch, MAPREDUCE-1221-v3.patch, MAPREDUCE-1221-v4.patch, MAPREDUCE-1221-v5.1.txt, MAPREDUCE-1221-v5.txt
>
>
> The TaskTracker currently supports killing tasks if the virtual memory of a task exceeds a set of configured thresholds. I would like to extend this feature to enable killing tasks if the physical memory used by that task exceeds a certain threshold.
> On a certain operating system (guess?), if user space processes start using lots of memory, the machine hangs and dies quickly. This means that we would like to prevent map-reduce jobs from triggering this condition. From my understanding, the killing-based-on-virtual-memory-limits (HADOOP-5883) were designed to address this problem. This works well when most map-reduce jobs are Java jobs and have well-defined -Xmx parameters that specify the max virtual memory for each task. On the other hand, if each task forks off mappers/reducers written in other languages (python/php, etc), the total virtual memory usage of the process-subtree varies greatly. In these cases, it is better to use kill-tasks-using-physical-memory-limits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1221) Kill tasks on a node if the free physical memory on that machine falls below a configured threshold

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12795002#action_12795002 ] 

Hadoop QA commented on MAPREDUCE-1221:
--------------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12428563/MAPREDUCE-1221-v1.patch
  against trunk revision 894165.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 3 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed core unit tests.

    -1 contrib tests.  The patch failed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/243/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/243/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/243/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/243/console

This message is automatically generated.

> Kill tasks on a node if the free physical memory on that machine falls below a configured threshold
> ---------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1221
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1221
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.22.0
>            Reporter: dhruba borthakur
>            Assignee: Scott Chen
>             Fix For: 0.22.0
>
>         Attachments: MAPREDUCE-1221-v1.patch
>
>
> The TaskTracker currently supports killing tasks if the virtual memory of a task exceeds a set of configured thresholds. I would like to extend this feature to enable killing tasks if the physical memory used by that task exceeds a certain threshold.
> On a certain operating system (guess?), if user space processes start using lots of memory, the machine hangs and dies quickly. This means that we would like to prevent map-reduce jobs from triggering this condition. From my understanding, the killing-based-on-virtual-memory-limits (HADOOP-5883) were designed to address this problem. This works well when most map-reduce jobs are Java jobs and have well-defined -Xmx parameters that specify the max virtual memory for each task. On the other hand, if each task forks off mappers/reducers written in other languages (python/php, etc), the total virtual memory usage of the process-subtree varies greatly. In these cases, it is better to use kill-tasks-using-physical-memory-limits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1221) Kill tasks on a node if the free physical memory on that machine falls below a configured threshold

Posted by "Scott Chen (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Scott Chen updated MAPREDUCE-1221:
----------------------------------

    Status: Patch Available  (was: Open)

> Kill tasks on a node if the free physical memory on that machine falls below a configured threshold
> ---------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1221
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1221
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.22.0
>            Reporter: dhruba borthakur
>            Assignee: Scott Chen
>             Fix For: 0.22.0
>
>         Attachments: MAPREDUCE-1221-v1.patch, MAPREDUCE-1221-v2.patch, MAPREDUCE-1221-v3.patch, MAPREDUCE-1221-v4.patch
>
>
> The TaskTracker currently supports killing tasks if the virtual memory of a task exceeds a set of configured thresholds. I would like to extend this feature to enable killing tasks if the physical memory used by that task exceeds a certain threshold.
> On a certain operating system (guess?), if user space processes start using lots of memory, the machine hangs and dies quickly. This means that we would like to prevent map-reduce jobs from triggering this condition. From my understanding, the killing-based-on-virtual-memory-limits (HADOOP-5883) were designed to address this problem. This works well when most map-reduce jobs are Java jobs and have well-defined -Xmx parameters that specify the max virtual memory for each task. On the other hand, if each task forks off mappers/reducers written in other languages (python/php, etc), the total virtual memory usage of the process-subtree varies greatly. In these cases, it is better to use kill-tasks-using-physical-memory-limits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1221) Kill tasks on a node if the free physical memory on that machine falls below a configured threshold

Posted by "Scott Chen (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Scott Chen updated MAPREDUCE-1221:
----------------------------------

    Status: Open  (was: Patch Available)

> Kill tasks on a node if the free physical memory on that machine falls below a configured threshold
> ---------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1221
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1221
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.22.0
>            Reporter: dhruba borthakur
>            Assignee: Scott Chen
>             Fix For: 0.22.0
>
>         Attachments: MAPREDUCE-1221-v1.patch
>
>
> The TaskTracker currently supports killing tasks if the virtual memory of a task exceeds a set of configured thresholds. I would like to extend this feature to enable killing tasks if the physical memory used by that task exceeds a certain threshold.
> On a certain operating system (guess?), if user space processes start using lots of memory, the machine hangs and dies quickly. This means that we would like to prevent map-reduce jobs from triggering this condition. From my understanding, the killing-based-on-virtual-memory-limits (HADOOP-5883) were designed to address this problem. This works well when most map-reduce jobs are Java jobs and have well-defined -Xmx parameters that specify the max virtual memory for each task. On the other hand, if each task forks off mappers/reducers written in other languages (python/php, etc), the total virtual memory usage of the process-subtree varies greatly. In these cases, it is better to use kill-tasks-using-physical-memory-limits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1221) Kill tasks on a node if the free physical memory on that machine falls below a configured threshold

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12855839#action_12855839 ] 

dhruba borthakur commented on MAPREDUCE-1221:
---------------------------------------------

Thanks Scott for updating the patch with the three configuration variables. Code looks good to me. I would like to get this into the 0.21 release, so if somebody wants to review it again, please do so now. Thanks.

> Kill tasks on a node if the free physical memory on that machine falls below a configured threshold
> ---------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1221
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1221
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.22.0
>            Reporter: dhruba borthakur
>            Assignee: Scott Chen
>             Fix For: 0.22.0
>
>         Attachments: MAPREDUCE-1221-v1.patch, MAPREDUCE-1221-v2.patch, MAPREDUCE-1221-v3.patch, MAPREDUCE-1221-v4.patch
>
>
> The TaskTracker currently supports killing tasks if the virtual memory of a task exceeds a set of configured thresholds. I would like to extend this feature to enable killing tasks if the physical memory used by that task exceeds a certain threshold.
> On a certain operating system (guess?), if user space processes start using lots of memory, the machine hangs and dies quickly. This means that we would like to prevent map-reduce jobs from triggering this condition. From my understanding, the killing-based-on-virtual-memory-limits (HADOOP-5883) were designed to address this problem. This works well when most map-reduce jobs are Java jobs and have well-defined -Xmx parameters that specify the max virtual memory for each task. On the other hand, if each task forks off mappers/reducers written in other languages (python/php, etc), the total virtual memory usage of the process-subtree varies greatly. In these cases, it is better to use kill-tasks-using-physical-memory-limits.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (MAPREDUCE-1221) Kill tasks on a node if the free physical memory on that machine falls below a configured threshold

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12846431#action_12846431 ] 

Hadoop QA commented on MAPREDUCE-1221:
--------------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12438470/MAPREDUCE-1221-v4.patch
  against trunk revision 923907.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 3 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    -1 core tests.  The patch failed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/527/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/527/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/527/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/527/console

This message is automatically generated.

> Kill tasks on a node if the free physical memory on that machine falls below a configured threshold
> ---------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1221
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1221
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.22.0
>            Reporter: dhruba borthakur
>            Assignee: Scott Chen
>             Fix For: 0.22.0
>
>         Attachments: MAPREDUCE-1221-v1.patch, MAPREDUCE-1221-v2.patch, MAPREDUCE-1221-v3.patch, MAPREDUCE-1221-v4.patch
>
>
> The TaskTracker currently supports killing tasks if the virtual memory of a task exceeds a set of configured thresholds. I would like to extend this feature to enable killing tasks if the physical memory used by that task exceeds a certain threshold.
> On a certain operating system (guess?), if user space processes start using lots of memory, the machine hangs and dies quickly. This means that we would like to prevent map-reduce jobs from triggering this condition. From my understanding, the killing-based-on-virtual-memory-limits (HADOOP-5883) were designed to address this problem. This works well when most map-reduce jobs are Java jobs and have well-defined -Xmx parameters that specify the max virtual memory for each task. On the other hand, if each task forks off mappers/reducers written in other languages (python/php, etc), the total virtual memory usage of the process-subtree varies greatly. In these cases, it is better to use kill-tasks-using-physical-memory-limits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1221) Kill tasks on a node if the free physical memory on that machine falls below a configured threshold

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12859209#action_12859209 ] 

Hadoop QA commented on MAPREDUCE-1221:
--------------------------------------

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12442340/MAPREDUCE-1221-v6.txt
  against trunk revision 936121.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 3 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/121/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/121/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/121/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/121/console

This message is automatically generated.

> Kill tasks on a node if the free physical memory on that machine falls below a configured threshold
> ---------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1221
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1221
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.22.0
>            Reporter: dhruba borthakur
>            Assignee: Scott Chen
>             Fix For: 0.21.0
>
>         Attachments: MAPREDUCE-1221-v1.patch, MAPREDUCE-1221-v2.patch, MAPREDUCE-1221-v3.patch, MAPREDUCE-1221-v4.patch, MAPREDUCE-1221-v5.1.txt, MAPREDUCE-1221-v5.txt, MAPREDUCE-1221-v6.txt
>
>
> The TaskTracker currently supports killing tasks if the virtual memory of a task exceeds a set of configured thresholds. I would like to extend this feature to enable killing tasks if the physical memory used by that task exceeds a certain threshold.
> On a certain operating system (guess?), if user space processes start using lots of memory, the machine hangs and dies quickly. This means that we would like to prevent map-reduce jobs from triggering this condition. From my understanding, the killing-based-on-virtual-memory-limits (HADOOP-5883) were designed to address this problem. This works well when most map-reduce jobs are Java jobs and have well-defined -Xmx parameters that specify the max virtual memory for each task. On the other hand, if each task forks off mappers/reducers written in other languages (python/php, etc), the total virtual memory usage of the process-subtree varies greatly. In these cases, it is better to use kill-tasks-using-physical-memory-limits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1221) Kill tasks on a node if the free physical memory on that machine falls below a configured threshold

Posted by "Allen Wittenauer (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12838546#action_12838546 ] 

Allen Wittenauer commented on MAPREDUCE-1221:
---------------------------------------------

I've read through this jira a few times, and looked at some of the previously mentioned jira's around memory limits.  I think I see where the issue is actually at.

Arun started to fill in the historical background, but I think he may have missed a significant point.  Let's retell the story, so that we can get to cruxt of the ops requirement here....

Under HOD w/torque, we configured torque such that it would limit the virtual memory size to be total vm - 4gb.  [This left plenty of ram for LInux, our monitoring software, etc, etc.  So on a machine with 4x4GB swap partitions and 16GB RAM, the vm limit would be set to 28GB].  Now the thing about hod is that it allocates the -entire- node to an -entire- job.... which means there is a subtle point here, easily missed:  the vm limit under torque was the aggregate for *all* of the tasks on the node, not just a single task.  So if you had a bad behaving task/job, it will kill all the tasks running on that node.

To simulate this ops requirement, hadoop should be taking the memory used by *all* the tasks and then performing some action.  While I realize there is a desire to only punish 'bad tasks', I'm not sure if there is an easy way to do that.  Putting my jack boots on, my answer is Kill Them All and Let The Users Sort Themselves Out.  If I have to pick between killing the system (we're talking *hard hang* here, not happy little panic in my experiences) and punishing potentially innocent users, the answer is easy.

Now here is where things get more complex, and there is a very good chance I've gotten this wrong.  [Hopefully I have, because it sounds to me like a feature was added in the wrong spot.]

It sounds like capacity has the ability to kill tasks based upon vm per node.  It has this idea of a max vm size and how much memory each task is asking for.  It then schedules based upon a weird slot+mem ratio.

While this is a fine and dandy feature that would likely fix the requestors problem, I think it is a bit short sighted not have the kill feature at the task tracker level.  The task tracker, regardless of scheduler, should still be able to keep track of all the mem used on the box and kill as necessary.  If a scheduler wants to provide alternative logic, more power to it.  But tying this to a scheduler just seems a bit ridiculous.

> Kill tasks on a node if the free physical memory on that machine falls below a configured threshold
> ---------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1221
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1221
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.22.0
>            Reporter: dhruba borthakur
>            Assignee: Scott Chen
>             Fix For: 0.22.0
>
>         Attachments: MAPREDUCE-1221-v1.patch, MAPREDUCE-1221-v2.patch, MAPREDUCE-1221-v3.patch
>
>
> The TaskTracker currently supports killing tasks if the virtual memory of a task exceeds a set of configured thresholds. I would like to extend this feature to enable killing tasks if the physical memory used by that task exceeds a certain threshold.
> On a certain operating system (guess?), if user space processes start using lots of memory, the machine hangs and dies quickly. This means that we would like to prevent map-reduce jobs from triggering this condition. From my understanding, the killing-based-on-virtual-memory-limits (HADOOP-5883) were designed to address this problem. This works well when most map-reduce jobs are Java jobs and have well-defined -Xmx parameters that specify the max virtual memory for each task. On the other hand, if each task forks off mappers/reducers written in other languages (python/php, etc), the total virtual memory usage of the process-subtree varies greatly. In these cases, it is better to use kill-tasks-using-physical-memory-limits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1221) Kill tasks on a node if the free physical memory on that machine falls below a configured threshold

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12838074#action_12838074 ] 

Arun C Murthy commented on MAPREDUCE-1221:
------------------------------------------

Scott, as I said, it is reasonably to track either virtual memory or physical memory or both.

bq. The other main reason that we want to do this is that per task virtual-memory-limit is an API change for our users.

A new feature might imply change, no? 

OTOH you could get away with simply setting the default values to be reasonable for a wide-variety of uses so users do not have to do anything.

bq. I think it is may not be that bad that we kill the task.

Like I said, the problem is that there is no predictability. What if a job gets unlucky and it's 4th attempt gets killed because it happened to run on a node where a rouge task of some other job ... again, predictability is very important. Penalizing the right task is equally important.

> Kill tasks on a node if the free physical memory on that machine falls below a configured threshold
> ---------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1221
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1221
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.22.0
>            Reporter: dhruba borthakur
>            Assignee: Scott Chen
>             Fix For: 0.22.0
>
>         Attachments: MAPREDUCE-1221-v1.patch, MAPREDUCE-1221-v2.patch, MAPREDUCE-1221-v3.patch
>
>
> The TaskTracker currently supports killing tasks if the virtual memory of a task exceeds a set of configured thresholds. I would like to extend this feature to enable killing tasks if the physical memory used by that task exceeds a certain threshold.
> On a certain operating system (guess?), if user space processes start using lots of memory, the machine hangs and dies quickly. This means that we would like to prevent map-reduce jobs from triggering this condition. From my understanding, the killing-based-on-virtual-memory-limits (HADOOP-5883) were designed to address this problem. This works well when most map-reduce jobs are Java jobs and have well-defined -Xmx parameters that specify the max virtual memory for each task. On the other hand, if each task forks off mappers/reducers written in other languages (python/php, etc), the total virtual memory usage of the process-subtree varies greatly. In these cases, it is better to use kill-tasks-using-physical-memory-limits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1221) Kill tasks on a node if the free physical memory on that machine falls below a configured threshold

Posted by "Scott Chen (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Scott Chen updated MAPREDUCE-1221:
----------------------------------

    Status: Open  (was: Patch Available)

> Kill tasks on a node if the free physical memory on that machine falls below a configured threshold
> ---------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1221
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1221
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.22.0
>            Reporter: dhruba borthakur
>            Assignee: Scott Chen
>             Fix For: 0.22.0
>
>         Attachments: MAPREDUCE-1221-v1.patch, MAPREDUCE-1221-v2.patch, MAPREDUCE-1221-v3.patch
>
>
> The TaskTracker currently supports killing tasks if the virtual memory of a task exceeds a set of configured thresholds. I would like to extend this feature to enable killing tasks if the physical memory used by that task exceeds a certain threshold.
> On a certain operating system (guess?), if user space processes start using lots of memory, the machine hangs and dies quickly. This means that we would like to prevent map-reduce jobs from triggering this condition. From my understanding, the killing-based-on-virtual-memory-limits (HADOOP-5883) were designed to address this problem. This works well when most map-reduce jobs are Java jobs and have well-defined -Xmx parameters that specify the max virtual memory for each task. On the other hand, if each task forks off mappers/reducers written in other languages (python/php, etc), the total virtual memory usage of the process-subtree varies greatly. In these cases, it is better to use kill-tasks-using-physical-memory-limits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1221) Kill tasks on a node if the free physical memory on that machine falls below a configured threshold

Posted by "Amareshwari Sriramadasu (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12858816#action_12858816 ] 

Amareshwari Sriramadasu commented on MAPREDUCE-1221:
----------------------------------------------------

Scott,
bq.     In TaskTracker.addToMemoryManager() method, shall we move the calculation of physicalMemoryLimit and virtualMemoryLimit into the check isTaskMemoryManagerEnabled()?
My comment there was to move the calculation into the 'if check' in the same method.
Your patch has :
{code}
+    // Obtain physical memory limits from the job configuration
+    long physicalMemoryLimit =
+      conf.getLong(isMap ? JobContext.MAP_MEMORY_PHYSICAL_MB :
+                   JobContext.REDUCE_MEMORY_PHYSICAL_MB,
+                   JobConf.DISABLED_MEMORY_LIMIT);
+    if (physicalMemoryLimit > 0) {
+      physicalMemoryLimit *= 1024L * 1024L;
+    }
+
+    // Obtain virtual memory limits from the job configuration
+    long virtualMemoryLimit = isMap ?
+      conf.getMemoryForMapTask() * 1024 * 1024 :
+      conf.getMemoryForReduceTask() * 1024 * 1024;
+
     if (isTaskMemoryManagerEnabled()) {
-      taskMemoryManager.addTask(attemptId, isMap ? conf
-          .getMemoryForMapTask() * 1024 * 1024L : conf
-          .getMemoryForReduceTask() * 1024 * 1024L);
+      taskMemoryManager.addTask(attemptId, virtualMemoryLimit,
+                                physicalMemoryLimit);
{code}

I was expecting it to be :
{code}
     if (isTaskMemoryManagerEnabled()) {
+    // Obtain physical memory limits from the job configuration
+    long physicalMemoryLimit =
+      conf.getLong(isMap ? JobContext.MAP_MEMORY_PHYSICAL_MB :
+                   JobContext.REDUCE_MEMORY_PHYSICAL_MB,
+                   JobConf.DISABLED_MEMORY_LIMIT);
+    if (physicalMemoryLimit > 0) {
+      physicalMemoryLimit *= 1024L * 1024L;
+    }
+
+    // Obtain virtual memory limits from the job configuration
+    long virtualMemoryLimit = isMap ?
+      conf.getMemoryForMapTask() * 1024 * 1024 :
+      conf.getMemoryForReduceTask() * 1024 * 1024;
+
-      taskMemoryManager.addTask(attemptId, isMap ? conf
-          .getMemoryForMapTask() * 1024 * 1024L : conf
-          .getMemoryForReduceTask() * 1024 * 1024L);
+      taskMemoryManager.addTask(attemptId, virtualMemoryLimit,
+                                physicalMemoryLimit);

{code}

Does that make sense now?

> Kill tasks on a node if the free physical memory on that machine falls below a configured threshold
> ---------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1221
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1221
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.22.0
>            Reporter: dhruba borthakur
>            Assignee: Scott Chen
>             Fix For: 0.21.0
>
>         Attachments: MAPREDUCE-1221-v1.patch, MAPREDUCE-1221-v2.patch, MAPREDUCE-1221-v3.patch, MAPREDUCE-1221-v4.patch, MAPREDUCE-1221-v5.1.txt, MAPREDUCE-1221-v5.txt
>
>
> The TaskTracker currently supports killing tasks if the virtual memory of a task exceeds a set of configured thresholds. I would like to extend this feature to enable killing tasks if the physical memory used by that task exceeds a certain threshold.
> On a certain operating system (guess?), if user space processes start using lots of memory, the machine hangs and dies quickly. This means that we would like to prevent map-reduce jobs from triggering this condition. From my understanding, the killing-based-on-virtual-memory-limits (HADOOP-5883) were designed to address this problem. This works well when most map-reduce jobs are Java jobs and have well-defined -Xmx parameters that specify the max virtual memory for each task. On the other hand, if each task forks off mappers/reducers written in other languages (python/php, etc), the total virtual memory usage of the process-subtree varies greatly. In these cases, it is better to use kill-tasks-using-physical-memory-limits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1221) Kill tasks on a node if the free physical memory on that machine falls below a configured threshold

Posted by "Scott Chen (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Scott Chen updated MAPREDUCE-1221:
----------------------------------

    Attachment: MAPREDUCE-1221-v4.patch

Added the per task physical memory limit. If the rss memory usage of the task tree is higher than the configured value, the task will fail.

> Kill tasks on a node if the free physical memory on that machine falls below a configured threshold
> ---------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1221
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1221
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.22.0
>            Reporter: dhruba borthakur
>            Assignee: Scott Chen
>             Fix For: 0.22.0
>
>         Attachments: MAPREDUCE-1221-v1.patch, MAPREDUCE-1221-v2.patch, MAPREDUCE-1221-v3.patch, MAPREDUCE-1221-v4.patch
>
>
> The TaskTracker currently supports killing tasks if the virtual memory of a task exceeds a set of configured thresholds. I would like to extend this feature to enable killing tasks if the physical memory used by that task exceeds a certain threshold.
> On a certain operating system (guess?), if user space processes start using lots of memory, the machine hangs and dies quickly. This means that we would like to prevent map-reduce jobs from triggering this condition. From my understanding, the killing-based-on-virtual-memory-limits (HADOOP-5883) were designed to address this problem. This works well when most map-reduce jobs are Java jobs and have well-defined -Xmx parameters that specify the max virtual memory for each task. On the other hand, if each task forks off mappers/reducers written in other languages (python/php, etc), the total virtual memory usage of the process-subtree varies greatly. In these cases, it is better to use kill-tasks-using-physical-memory-limits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1221) Kill tasks on a node if the free physical memory on that machine falls below a configured threshold

Posted by "Scott Chen (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Scott Chen updated MAPREDUCE-1221:
----------------------------------

        Fix Version/s: 0.22.0
    Affects Version/s: 0.22.0
               Status: Patch Available  (was: Open)

> Kill tasks on a node if the free physical memory on that machine falls below a configured threshold
> ---------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1221
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1221
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.22.0
>            Reporter: dhruba borthakur
>            Assignee: Scott Chen
>             Fix For: 0.22.0
>
>         Attachments: MAPREDUCE-1221-v1.patch
>
>
> The TaskTracker currently supports killing tasks if the virtual memory of a task exceeds a set of configured thresholds. I would like to extend this feature to enable killing tasks if the physical memory used by that task exceeds a certain threshold.
> On a certain operating system (guess?), if user space processes start using lots of memory, the machine hangs and dies quickly. This means that we would like to prevent map-reduce jobs from triggering this condition. From my understanding, the killing-based-on-virtual-memory-limits (HADOOP-5883) were designed to address this problem. This works well when most map-reduce jobs are Java jobs and have well-defined -Xmx parameters that specify the max virtual memory for each task. On the other hand, if each task forks off mappers/reducers written in other languages (python/php, etc), the total virtual memory usage of the process-subtree varies greatly. In these cases, it is better to use kill-tasks-using-physical-memory-limits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1221) Kill tasks on a node if the free physical memory on that machine falls below a configured threshold

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12838610#action_12838610 ] 

dhruba borthakur commented on MAPREDUCE-1221:
---------------------------------------------

Please allow me to present my use case.

I have users submitting their own jobs to the cluster. These jobs and neither audited nor vetted by any authority before being deployed on the cluster. The mappers for most of these jobs are written in python or php. In these languages, it is easy for code writers to mistakenly use  excessive amounts of memory (via a python dictionary or some such thing). We have seen about 1 such case per month in our cluster. The thing to note that in all 100% of these jobs, the user had a coding error that erroneously kept on inserting elements to his/her dictionary. These are not "valid" jobs, and are usually killed by the user when he/she realises his/her coding mistake.

The problem we are encountering is that when such a job is let loose in our cluster, many tasks start eating lots of memory, thus causing excessive swapping and finally makes the OS on those hang. This JIRA attempts to prevent this scenario. Once properly configured, this JIRA will make it really really hard for a user job to be able to bring down nodes in the Hadoop cluster. This JIRA increases the stability and uptime of our cluster to a great extent. I would request all concerned authorities to review this JIRA from this perspective.





> Kill tasks on a node if the free physical memory on that machine falls below a configured threshold
> ---------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1221
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1221
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.22.0
>            Reporter: dhruba borthakur
>            Assignee: Scott Chen
>             Fix For: 0.22.0
>
>         Attachments: MAPREDUCE-1221-v1.patch, MAPREDUCE-1221-v2.patch, MAPREDUCE-1221-v3.patch
>
>
> The TaskTracker currently supports killing tasks if the virtual memory of a task exceeds a set of configured thresholds. I would like to extend this feature to enable killing tasks if the physical memory used by that task exceeds a certain threshold.
> On a certain operating system (guess?), if user space processes start using lots of memory, the machine hangs and dies quickly. This means that we would like to prevent map-reduce jobs from triggering this condition. From my understanding, the killing-based-on-virtual-memory-limits (HADOOP-5883) were designed to address this problem. This works well when most map-reduce jobs are Java jobs and have well-defined -Xmx parameters that specify the max virtual memory for each task. On the other hand, if each task forks off mappers/reducers written in other languages (python/php, etc), the total virtual memory usage of the process-subtree varies greatly. In these cases, it is better to use kill-tasks-using-physical-memory-limits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1221) Kill tasks on a node if the free physical memory on that machine falls below a configured threshold

Posted by "Tomer Shiran (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12838717#action_12838717 ] 

Tomer Shiran commented on MAPREDUCE-1221:
-----------------------------------------

@Dhruba,

How do you ensure that the physical memory limit is only hit when there is a misbehaving task (which uses unbounded memory), and not during normal operation (without having a per-task limit)? Are you making assumptions about the general expected memory usage of a task, and setting the # of slots accordingly? 

> Kill tasks on a node if the free physical memory on that machine falls below a configured threshold
> ---------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1221
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1221
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.22.0
>            Reporter: dhruba borthakur
>            Assignee: Scott Chen
>             Fix For: 0.22.0
>
>         Attachments: MAPREDUCE-1221-v1.patch, MAPREDUCE-1221-v2.patch, MAPREDUCE-1221-v3.patch
>
>
> The TaskTracker currently supports killing tasks if the virtual memory of a task exceeds a set of configured thresholds. I would like to extend this feature to enable killing tasks if the physical memory used by that task exceeds a certain threshold.
> On a certain operating system (guess?), if user space processes start using lots of memory, the machine hangs and dies quickly. This means that we would like to prevent map-reduce jobs from triggering this condition. From my understanding, the killing-based-on-virtual-memory-limits (HADOOP-5883) were designed to address this problem. This works well when most map-reduce jobs are Java jobs and have well-defined -Xmx parameters that specify the max virtual memory for each task. On the other hand, if each task forks off mappers/reducers written in other languages (python/php, etc), the total virtual memory usage of the process-subtree varies greatly. In these cases, it is better to use kill-tasks-using-physical-memory-limits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1221) Kill tasks on a node if the free physical memory on that machine falls below a configured threshold

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy updated MAPREDUCE-1221:
-------------------------------------

    Status: Open  (was: Patch Available)

> Kill tasks on a node if the free physical memory on that machine falls below a configured threshold
> ---------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1221
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1221
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.22.0
>            Reporter: dhruba borthakur
>            Assignee: Scott Chen
>             Fix For: 0.22.0
>
>         Attachments: MAPREDUCE-1221-v1.patch, MAPREDUCE-1221-v2.patch, MAPREDUCE-1221-v3.patch
>
>
> The TaskTracker currently supports killing tasks if the virtual memory of a task exceeds a set of configured thresholds. I would like to extend this feature to enable killing tasks if the physical memory used by that task exceeds a certain threshold.
> On a certain operating system (guess?), if user space processes start using lots of memory, the machine hangs and dies quickly. This means that we would like to prevent map-reduce jobs from triggering this condition. From my understanding, the killing-based-on-virtual-memory-limits (HADOOP-5883) were designed to address this problem. This works well when most map-reduce jobs are Java jobs and have well-defined -Xmx parameters that specify the max virtual memory for each task. On the other hand, if each task forks off mappers/reducers written in other languages (python/php, etc), the total virtual memory usage of the process-subtree varies greatly. In these cases, it is better to use kill-tasks-using-physical-memory-limits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1221) Kill tasks on a node if the free physical memory on that machine falls below a configured threshold

Posted by "Scott Chen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12858751#action_12858751 ] 

Scott Chen commented on MAPREDUCE-1221:
---------------------------------------

I only did the following change on comments to address the comment made by Amareshwari.

{code}
 --- src/java/org/apache/hadoop/mapred/TaskMemoryManagerThread.java    (revision 934199)
558,570c558
 @@ -332,8 +411,9 @@
      List<TaskAttemptID> tasksToExclude = new ArrayList<TaskAttemptID>();
      // Find tasks to kill so as to get memory usage under limits.
      while (memoryStillInUsage > maxMemoryAllowedForAllTasks) {
 -      // Exclude tasks that are already marked for
 -      // killing.
 +      // Exclude tasks that are already marked for killing.
 +      // Note that we do not need to call isKillable() here because the logic
 +      // is contained in taskTracker.findTaskToKill()
        TaskInProgress task = taskTracker.findTaskToKill(tasksToExclude);
        if (task == null) {
         break; // couldn't find any more tasks to kill.
{code}

> Kill tasks on a node if the free physical memory on that machine falls below a configured threshold
> ---------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1221
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1221
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.22.0
>            Reporter: dhruba borthakur
>            Assignee: Scott Chen
>             Fix For: 0.21.0
>
>         Attachments: MAPREDUCE-1221-v1.patch, MAPREDUCE-1221-v2.patch, MAPREDUCE-1221-v3.patch, MAPREDUCE-1221-v4.patch, MAPREDUCE-1221-v5.1.txt, MAPREDUCE-1221-v5.txt
>
>
> The TaskTracker currently supports killing tasks if the virtual memory of a task exceeds a set of configured thresholds. I would like to extend this feature to enable killing tasks if the physical memory used by that task exceeds a certain threshold.
> On a certain operating system (guess?), if user space processes start using lots of memory, the machine hangs and dies quickly. This means that we would like to prevent map-reduce jobs from triggering this condition. From my understanding, the killing-based-on-virtual-memory-limits (HADOOP-5883) were designed to address this problem. This works well when most map-reduce jobs are Java jobs and have well-defined -Xmx parameters that specify the max virtual memory for each task. On the other hand, if each task forks off mappers/reducers written in other languages (python/php, etc), the total virtual memory usage of the process-subtree varies greatly. In these cases, it is better to use kill-tasks-using-physical-memory-limits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1221) Kill tasks on a node if the free physical memory on that machine falls below a configured threshold

Posted by "Scott Chen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12846613#action_12846613 ] 

Scott Chen commented on MAPREDUCE-1221:
---------------------------------------

The definitions of the parameters in the previous comment are not very clear.
Here are the definitions.

*mapreduce.tasktracker.reserved.physicalmemory.mb*
Specify how much memory on the TT that will not be used by tasks.
For example if we configure this to be 2G and we have 16G of memory.
The tasks only allow to use 14G of memory. If the limit is exceeded, TT will kill the task consumes the most amount of memory.
We use reserved memory instead of total memory because there may be cases that we have nonuniform cluster.

*mapreduce.map.memory.physical.mb*
Memory limit for one mapper. If the mapper uses more memory that this, it will fail.

*mapreduce.reduce.memory.physical.mb*
Memory limit for one reducer. If the reducer uses more memory that this, it will fail.

> Kill tasks on a node if the free physical memory on that machine falls below a configured threshold
> ---------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1221
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1221
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.22.0
>            Reporter: dhruba borthakur
>            Assignee: Scott Chen
>             Fix For: 0.22.0
>
>         Attachments: MAPREDUCE-1221-v1.patch, MAPREDUCE-1221-v2.patch, MAPREDUCE-1221-v3.patch, MAPREDUCE-1221-v4.patch
>
>
> The TaskTracker currently supports killing tasks if the virtual memory of a task exceeds a set of configured thresholds. I would like to extend this feature to enable killing tasks if the physical memory used by that task exceeds a certain threshold.
> On a certain operating system (guess?), if user space processes start using lots of memory, the machine hangs and dies quickly. This means that we would like to prevent map-reduce jobs from triggering this condition. From my understanding, the killing-based-on-virtual-memory-limits (HADOOP-5883) were designed to address this problem. This works well when most map-reduce jobs are Java jobs and have well-defined -Xmx parameters that specify the max virtual memory for each task. On the other hand, if each task forks off mappers/reducers written in other languages (python/php, etc), the total virtual memory usage of the process-subtree varies greatly. In these cases, it is better to use kill-tasks-using-physical-memory-limits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1221) Kill tasks on a node if the free physical memory on that machine falls below a configured threshold

Posted by "Scott Chen (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Scott Chen updated MAPREDUCE-1221:
----------------------------------

    Status: Patch Available  (was: Open)

> Kill tasks on a node if the free physical memory on that machine falls below a configured threshold
> ---------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1221
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1221
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.22.0
>            Reporter: dhruba borthakur
>            Assignee: Scott Chen
>             Fix For: 0.22.0
>
>         Attachments: MAPREDUCE-1221-v1.patch, MAPREDUCE-1221-v2.patch, MAPREDUCE-1221-v3.patch
>
>
> The TaskTracker currently supports killing tasks if the virtual memory of a task exceeds a set of configured thresholds. I would like to extend this feature to enable killing tasks if the physical memory used by that task exceeds a certain threshold.
> On a certain operating system (guess?), if user space processes start using lots of memory, the machine hangs and dies quickly. This means that we would like to prevent map-reduce jobs from triggering this condition. From my understanding, the killing-based-on-virtual-memory-limits (HADOOP-5883) were designed to address this problem. This works well when most map-reduce jobs are Java jobs and have well-defined -Xmx parameters that specify the max virtual memory for each task. On the other hand, if each task forks off mappers/reducers written in other languages (python/php, etc), the total virtual memory usage of the process-subtree varies greatly. In these cases, it is better to use kill-tasks-using-physical-memory-limits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1221) Kill tasks on a node if the free physical memory on that machine falls below a configured threshold

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12837931#action_12837931 ] 

Arun C Murthy commented on MAPREDUCE-1221:
------------------------------------------

Sorry for coming in late, but isn't using (get|set)rlimit significantly simpler?

> Kill tasks on a node if the free physical memory on that machine falls below a configured threshold
> ---------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1221
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1221
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.22.0
>            Reporter: dhruba borthakur
>            Assignee: Scott Chen
>             Fix For: 0.22.0
>
>         Attachments: MAPREDUCE-1221-v1.patch, MAPREDUCE-1221-v2.patch, MAPREDUCE-1221-v3.patch
>
>
> The TaskTracker currently supports killing tasks if the virtual memory of a task exceeds a set of configured thresholds. I would like to extend this feature to enable killing tasks if the physical memory used by that task exceeds a certain threshold.
> On a certain operating system (guess?), if user space processes start using lots of memory, the machine hangs and dies quickly. This means that we would like to prevent map-reduce jobs from triggering this condition. From my understanding, the killing-based-on-virtual-memory-limits (HADOOP-5883) were designed to address this problem. This works well when most map-reduce jobs are Java jobs and have well-defined -Xmx parameters that specify the max virtual memory for each task. On the other hand, if each task forks off mappers/reducers written in other languages (python/php, etc), the total virtual memory usage of the process-subtree varies greatly. In these cases, it is better to use kill-tasks-using-physical-memory-limits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1221) Kill tasks on a node if the free physical memory on that machine falls below a configured threshold

Posted by "Scott Chen (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Scott Chen updated MAPREDUCE-1221:
----------------------------------

    Attachment: MAPREDUCE-1221-v1.patch

> Kill tasks on a node if the free physical memory on that machine falls below a configured threshold
> ---------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1221
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1221
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>            Reporter: dhruba borthakur
>            Assignee: Scott Chen
>         Attachments: MAPREDUCE-1221-v1.patch
>
>
> The TaskTracker currently supports killing tasks if the virtual memory of a task exceeds a set of configured thresholds. I would like to extend this feature to enable killing tasks if the physical memory used by that task exceeds a certain threshold.
> On a certain operating system (guess?), if user space processes start using lots of memory, the machine hangs and dies quickly. This means that we would like to prevent map-reduce jobs from triggering this condition. From my understanding, the killing-based-on-virtual-memory-limits (HADOOP-5883) were designed to address this problem. This works well when most map-reduce jobs are Java jobs and have well-defined -Xmx parameters that specify the max virtual memory for each task. On the other hand, if each task forks off mappers/reducers written in other languages (python/php, etc), the total virtual memory usage of the process-subtree varies greatly. In these cases, it is better to use kill-tasks-using-physical-memory-limits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1221) Kill tasks on a node if the free physical memory on that machine falls below a configured threshold

Posted by "Scott Chen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12844161#action_12844161 ] 

Scott Chen commented on MAPREDUCE-1221:
---------------------------------------

The following two new parameters are for per task physical memory limit.
mapreduce.map.memory.physical.mb   
mapreduce.reduce.memory.physical.mb


> Kill tasks on a node if the free physical memory on that machine falls below a configured threshold
> ---------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1221
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1221
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.22.0
>            Reporter: dhruba borthakur
>            Assignee: Scott Chen
>             Fix For: 0.22.0
>
>         Attachments: MAPREDUCE-1221-v1.patch, MAPREDUCE-1221-v2.patch, MAPREDUCE-1221-v3.patch, MAPREDUCE-1221-v4.patch
>
>
> The TaskTracker currently supports killing tasks if the virtual memory of a task exceeds a set of configured thresholds. I would like to extend this feature to enable killing tasks if the physical memory used by that task exceeds a certain threshold.
> On a certain operating system (guess?), if user space processes start using lots of memory, the machine hangs and dies quickly. This means that we would like to prevent map-reduce jobs from triggering this condition. From my understanding, the killing-based-on-virtual-memory-limits (HADOOP-5883) were designed to address this problem. This works well when most map-reduce jobs are Java jobs and have well-defined -Xmx parameters that specify the max virtual memory for each task. On the other hand, if each task forks off mappers/reducers written in other languages (python/php, etc), the total virtual memory usage of the process-subtree varies greatly. In these cases, it is better to use kill-tasks-using-physical-memory-limits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1221) Kill tasks on a node if the free physical memory on that machine falls below a configured threshold

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12793455#action_12793455 ] 

Hadoop QA commented on MAPREDUCE-1221:
--------------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12428563/MAPREDUCE-1221-v1.patch
  against trunk revision 892893.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 3 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    -1 core tests.  The patch failed core unit tests.

    -1 contrib tests.  The patch failed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/232/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/232/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/232/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/232/console

This message is automatically generated.

> Kill tasks on a node if the free physical memory on that machine falls below a configured threshold
> ---------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1221
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1221
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.22.0
>            Reporter: dhruba borthakur
>            Assignee: Scott Chen
>             Fix For: 0.22.0
>
>         Attachments: MAPREDUCE-1221-v1.patch
>
>
> The TaskTracker currently supports killing tasks if the virtual memory of a task exceeds a set of configured thresholds. I would like to extend this feature to enable killing tasks if the physical memory used by that task exceeds a certain threshold.
> On a certain operating system (guess?), if user space processes start using lots of memory, the machine hangs and dies quickly. This means that we would like to prevent map-reduce jobs from triggering this condition. From my understanding, the killing-based-on-virtual-memory-limits (HADOOP-5883) were designed to address this problem. This works well when most map-reduce jobs are Java jobs and have well-defined -Xmx parameters that specify the max virtual memory for each task. On the other hand, if each task forks off mappers/reducers written in other languages (python/php, etc), the total virtual memory usage of the process-subtree varies greatly. In these cases, it is better to use kill-tasks-using-physical-memory-limits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1221) Kill tasks on a node if the free physical memory on that machine falls below a configured threshold

Posted by "Scott Chen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12859437#action_12859437 ] 

Scott Chen commented on MAPREDUCE-1221:
---------------------------------------

Thanks for the help, Arun and Amareshwari :)

> Kill tasks on a node if the free physical memory on that machine falls below a configured threshold
> ---------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1221
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1221
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.22.0
>            Reporter: dhruba borthakur
>            Assignee: Scott Chen
>             Fix For: 0.22.0
>
>         Attachments: MAPREDUCE-1221-v1.patch, MAPREDUCE-1221-v2.patch, MAPREDUCE-1221-v3.patch, MAPREDUCE-1221-v4.patch, MAPREDUCE-1221-v5.1.txt, MAPREDUCE-1221-v5.txt, MAPREDUCE-1221-v6.txt
>
>
> The TaskTracker currently supports killing tasks if the virtual memory of a task exceeds a set of configured thresholds. I would like to extend this feature to enable killing tasks if the physical memory used by that task exceeds a certain threshold.
> On a certain operating system (guess?), if user space processes start using lots of memory, the machine hangs and dies quickly. This means that we would like to prevent map-reduce jobs from triggering this condition. From my understanding, the killing-based-on-virtual-memory-limits (HADOOP-5883) were designed to address this problem. This works well when most map-reduce jobs are Java jobs and have well-defined -Xmx parameters that specify the max virtual memory for each task. On the other hand, if each task forks off mappers/reducers written in other languages (python/php, etc), the total virtual memory usage of the process-subtree varies greatly. In these cases, it is better to use kill-tasks-using-physical-memory-limits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1221) Kill tasks on a node if the free physical memory on that machine falls below a configured threshold

Posted by "Scott Chen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12841653#action_12841653 ] 

Scott Chen commented on MAPREDUCE-1221:
---------------------------------------

@Arun: Sorry for the very late reply. Dhruba and I have been trying to call you but it seems you are busy as well.

I think I got your point. The problem is that the bad job will never fail and its task gets killed and rescheduled again and again which keeps hurting the cluster. So we should add per task RSS limit in this patch so that we can fail the bad job. This is just like what we currently do in the trunk for virtual memory. But we here offer the RSS memory limiting as an option (a trade-off between memory utilization and stability).

I will make the change and resubmit the patch soon. Thanks again for the help.


> Kill tasks on a node if the free physical memory on that machine falls below a configured threshold
> ---------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1221
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1221
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.22.0
>            Reporter: dhruba borthakur
>            Assignee: Scott Chen
>             Fix For: 0.22.0
>
>         Attachments: MAPREDUCE-1221-v1.patch, MAPREDUCE-1221-v2.patch, MAPREDUCE-1221-v3.patch
>
>
> The TaskTracker currently supports killing tasks if the virtual memory of a task exceeds a set of configured thresholds. I would like to extend this feature to enable killing tasks if the physical memory used by that task exceeds a certain threshold.
> On a certain operating system (guess?), if user space processes start using lots of memory, the machine hangs and dies quickly. This means that we would like to prevent map-reduce jobs from triggering this condition. From my understanding, the killing-based-on-virtual-memory-limits (HADOOP-5883) were designed to address this problem. This works well when most map-reduce jobs are Java jobs and have well-defined -Xmx parameters that specify the max virtual memory for each task. On the other hand, if each task forks off mappers/reducers written in other languages (python/php, etc), the total virtual memory usage of the process-subtree varies greatly. In these cases, it is better to use kill-tasks-using-physical-memory-limits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1221) Kill tasks on a node if the free physical memory on that machine falls below a configured threshold

Posted by "Hong Tang (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12838107#action_12838107 ] 

Hong Tang commented on MAPREDUCE-1221:
--------------------------------------

Chip in my 2 cents. I think the goal of this jira is to prevent a node from swapping (same as in MAPREDUCE-257). In this regard, containing memory usage for individual tasks may not be sufficient to protect the system from swapping - for instance, there could be foreign processes or bugs in TT and DN that "eat up" ram on the same node and lead to swapping. Or there could be faulty ram modules such that OS does not recognize them during the boot time and leads to the actual amount of RAM less than configured. In my view, this idea is similar to the OOM killer mechanism in Linux kernel and serves as a low-level protection against faults not easily preventable from upper layers.

In terms of what tasks to shot down, when a node goes into swap, there is a high chance that all tasks would fail (time out). Even worse, when TT is swapping, it may not even respond to kill commands from JT. So killing some tasks and letting others proceed may still be better than not killing. That said, I agree that we can fine tune the policy - e.g. avoid killing the 4th task attempt, or bias tasks that have already done lots of work (input bytes consumed, slot secs used).

> Kill tasks on a node if the free physical memory on that machine falls below a configured threshold
> ---------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1221
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1221
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.22.0
>            Reporter: dhruba borthakur
>            Assignee: Scott Chen
>             Fix For: 0.22.0
>
>         Attachments: MAPREDUCE-1221-v1.patch, MAPREDUCE-1221-v2.patch, MAPREDUCE-1221-v3.patch
>
>
> The TaskTracker currently supports killing tasks if the virtual memory of a task exceeds a set of configured thresholds. I would like to extend this feature to enable killing tasks if the physical memory used by that task exceeds a certain threshold.
> On a certain operating system (guess?), if user space processes start using lots of memory, the machine hangs and dies quickly. This means that we would like to prevent map-reduce jobs from triggering this condition. From my understanding, the killing-based-on-virtual-memory-limits (HADOOP-5883) were designed to address this problem. This works well when most map-reduce jobs are Java jobs and have well-defined -Xmx parameters that specify the max virtual memory for each task. On the other hand, if each task forks off mappers/reducers written in other languages (python/php, etc), the total virtual memory usage of the process-subtree varies greatly. In these cases, it is better to use kill-tasks-using-physical-memory-limits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1221) Kill tasks on a node if the free physical memory on that machine falls below a configured threshold

Posted by "Allen Wittenauer (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12838554#action_12838554 ] 

Allen Wittenauer commented on MAPREDUCE-1221:
---------------------------------------------

Bah. I wish I had edit to fix the strikeouts.  :(

One thing I didn't say that I meant to:

I don't think a phys mem limit is necessarily harmful, but I don't think it is particularly useful.  I'm still left with the thought that the vm limit wasn't implemented properly, all in a futile attempt to protect innocent users.

> Kill tasks on a node if the free physical memory on that machine falls below a configured threshold
> ---------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1221
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1221
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.22.0
>            Reporter: dhruba borthakur
>            Assignee: Scott Chen
>             Fix For: 0.22.0
>
>         Attachments: MAPREDUCE-1221-v1.patch, MAPREDUCE-1221-v2.patch, MAPREDUCE-1221-v3.patch
>
>
> The TaskTracker currently supports killing tasks if the virtual memory of a task exceeds a set of configured thresholds. I would like to extend this feature to enable killing tasks if the physical memory used by that task exceeds a certain threshold.
> On a certain operating system (guess?), if user space processes start using lots of memory, the machine hangs and dies quickly. This means that we would like to prevent map-reduce jobs from triggering this condition. From my understanding, the killing-based-on-virtual-memory-limits (HADOOP-5883) were designed to address this problem. This works well when most map-reduce jobs are Java jobs and have well-defined -Xmx parameters that specify the max virtual memory for each task. On the other hand, if each task forks off mappers/reducers written in other languages (python/php, etc), the total virtual memory usage of the process-subtree varies greatly. In these cases, it is better to use kill-tasks-using-physical-memory-limits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1221) Kill tasks on a node if the free physical memory on that machine falls below a configured threshold

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12837755#action_12837755 ] 

Hadoop QA commented on MAPREDUCE-1221:
--------------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12436385/MAPREDUCE-1221-v3.patch
  against trunk revision 915223.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 3 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    -1 core tests.  The patch failed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/479/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/479/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/479/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/479/console

This message is automatically generated.

> Kill tasks on a node if the free physical memory on that machine falls below a configured threshold
> ---------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1221
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1221
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.22.0
>            Reporter: dhruba borthakur
>            Assignee: Scott Chen
>             Fix For: 0.22.0
>
>         Attachments: MAPREDUCE-1221-v1.patch, MAPREDUCE-1221-v2.patch, MAPREDUCE-1221-v3.patch
>
>
> The TaskTracker currently supports killing tasks if the virtual memory of a task exceeds a set of configured thresholds. I would like to extend this feature to enable killing tasks if the physical memory used by that task exceeds a certain threshold.
> On a certain operating system (guess?), if user space processes start using lots of memory, the machine hangs and dies quickly. This means that we would like to prevent map-reduce jobs from triggering this condition. From my understanding, the killing-based-on-virtual-memory-limits (HADOOP-5883) were designed to address this problem. This works well when most map-reduce jobs are Java jobs and have well-defined -Xmx parameters that specify the max virtual memory for each task. On the other hand, if each task forks off mappers/reducers written in other languages (python/php, etc), the total virtual memory usage of the process-subtree varies greatly. In these cases, it is better to use kill-tasks-using-physical-memory-limits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1221) Kill tasks on a node if the free physical memory on that machine falls below a configured threshold

Posted by "Scott Chen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12846498#action_12846498 ] 

Scott Chen commented on MAPREDUCE-1221:
---------------------------------------

I am checking the failed tasks.

> Kill tasks on a node if the free physical memory on that machine falls below a configured threshold
> ---------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1221
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1221
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.22.0
>            Reporter: dhruba borthakur
>            Assignee: Scott Chen
>             Fix For: 0.22.0
>
>         Attachments: MAPREDUCE-1221-v1.patch, MAPREDUCE-1221-v2.patch, MAPREDUCE-1221-v3.patch, MAPREDUCE-1221-v4.patch
>
>
> The TaskTracker currently supports killing tasks if the virtual memory of a task exceeds a set of configured thresholds. I would like to extend this feature to enable killing tasks if the physical memory used by that task exceeds a certain threshold.
> On a certain operating system (guess?), if user space processes start using lots of memory, the machine hangs and dies quickly. This means that we would like to prevent map-reduce jobs from triggering this condition. From my understanding, the killing-based-on-virtual-memory-limits (HADOOP-5883) were designed to address this problem. This works well when most map-reduce jobs are Java jobs and have well-defined -Xmx parameters that specify the max virtual memory for each task. On the other hand, if each task forks off mappers/reducers written in other languages (python/php, etc), the total virtual memory usage of the process-subtree varies greatly. In these cases, it is better to use kill-tasks-using-physical-memory-limits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1221) Kill tasks on a node if the free physical memory on that machine falls below a configured threshold

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12858045#action_12858045 ] 

dhruba borthakur commented on MAPREDUCE-1221:
---------------------------------------------

hi arun/amareshwari, would you like to please review this one?

> Kill tasks on a node if the free physical memory on that machine falls below a configured threshold
> ---------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1221
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1221
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.22.0
>            Reporter: dhruba borthakur
>            Assignee: Scott Chen
>             Fix For: 0.21.0
>
>         Attachments: MAPREDUCE-1221-v1.patch, MAPREDUCE-1221-v2.patch, MAPREDUCE-1221-v3.patch, MAPREDUCE-1221-v4.patch, MAPREDUCE-1221-v5.txt
>
>
> The TaskTracker currently supports killing tasks if the virtual memory of a task exceeds a set of configured thresholds. I would like to extend this feature to enable killing tasks if the physical memory used by that task exceeds a certain threshold.
> On a certain operating system (guess?), if user space processes start using lots of memory, the machine hangs and dies quickly. This means that we would like to prevent map-reduce jobs from triggering this condition. From my understanding, the killing-based-on-virtual-memory-limits (HADOOP-5883) were designed to address this problem. This works well when most map-reduce jobs are Java jobs and have well-defined -Xmx parameters that specify the max virtual memory for each task. On the other hand, if each task forks off mappers/reducers written in other languages (python/php, etc), the total virtual memory usage of the process-subtree varies greatly. In these cases, it is better to use kill-tasks-using-physical-memory-limits.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (MAPREDUCE-1221) Kill tasks on a node if the free physical memory on that machine falls below a configured threshold

Posted by "Scott Chen (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Scott Chen updated MAPREDUCE-1221:
----------------------------------

    Status: Open  (was: Patch Available)

> Kill tasks on a node if the free physical memory on that machine falls below a configured threshold
> ---------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1221
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1221
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.22.0
>            Reporter: dhruba borthakur
>            Assignee: Scott Chen
>             Fix For: 0.22.0
>
>         Attachments: MAPREDUCE-1221-v1.patch, MAPREDUCE-1221-v2.patch, MAPREDUCE-1221-v3.patch, MAPREDUCE-1221-v4.patch
>
>
> The TaskTracker currently supports killing tasks if the virtual memory of a task exceeds a set of configured thresholds. I would like to extend this feature to enable killing tasks if the physical memory used by that task exceeds a certain threshold.
> On a certain operating system (guess?), if user space processes start using lots of memory, the machine hangs and dies quickly. This means that we would like to prevent map-reduce jobs from triggering this condition. From my understanding, the killing-based-on-virtual-memory-limits (HADOOP-5883) were designed to address this problem. This works well when most map-reduce jobs are Java jobs and have well-defined -Xmx parameters that specify the max virtual memory for each task. On the other hand, if each task forks off mappers/reducers written in other languages (python/php, etc), the total virtual memory usage of the process-subtree varies greatly. In these cases, it is better to use kill-tasks-using-physical-memory-limits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1221) Kill tasks on a node if the free physical memory on that machine falls below a configured threshold

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12858862#action_12858862 ] 

Hadoop QA commented on MAPREDUCE-1221:
--------------------------------------

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12442255/MAPREDUCE-1221-v5.1.txt
  against trunk revision 935427.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 3 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/119/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/119/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/119/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/119/console

This message is automatically generated.

> Kill tasks on a node if the free physical memory on that machine falls below a configured threshold
> ---------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1221
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1221
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.22.0
>            Reporter: dhruba borthakur
>            Assignee: Scott Chen
>             Fix For: 0.21.0
>
>         Attachments: MAPREDUCE-1221-v1.patch, MAPREDUCE-1221-v2.patch, MAPREDUCE-1221-v3.patch, MAPREDUCE-1221-v4.patch, MAPREDUCE-1221-v5.1.txt, MAPREDUCE-1221-v5.txt
>
>
> The TaskTracker currently supports killing tasks if the virtual memory of a task exceeds a set of configured thresholds. I would like to extend this feature to enable killing tasks if the physical memory used by that task exceeds a certain threshold.
> On a certain operating system (guess?), if user space processes start using lots of memory, the machine hangs and dies quickly. This means that we would like to prevent map-reduce jobs from triggering this condition. From my understanding, the killing-based-on-virtual-memory-limits (HADOOP-5883) were designed to address this problem. This works well when most map-reduce jobs are Java jobs and have well-defined -Xmx parameters that specify the max virtual memory for each task. On the other hand, if each task forks off mappers/reducers written in other languages (python/php, etc), the total virtual memory usage of the process-subtree varies greatly. In these cases, it is better to use kill-tasks-using-physical-memory-limits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1221) Kill tasks on a node if the free physical memory on that machine falls below a configured threshold

Posted by "Scott Chen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12838066#action_12838066 ] 

Scott Chen commented on MAPREDUCE-1221:
---------------------------------------

@Arun: Thanks again for the comment. I appreciate the help.

I agree that virtual-memory-limit has lots of benefit.
It kills the tasks quickly before the task starts doing bad things and it is more predictable.

But virtual-memory-limit also wastes some memory because the processes does not actually use that much of physical memory.
For example, there are tasks acquire 1.6GB of virtual memory but use only 600mb all the time.
If you cumulate the total, the memory that is not being used is a lot.

The other main reason that we want to do this is that per task virtual-memory-limit is an API change for our users.
The jobs they used to submit everyday may failed and they have to learn to set the per-task limit.
The good thing about this patch is that it is backward compatible.

I think it is may not be that bad that we kill the task.
It will be scheduled again on other machine and the job can still be finished. It will only slow down the job.
But if we don't kill it, it is very likely that it will caused the machine to fail and we fail the task anyway.
Also all the tasks on the machine fail that is even worse.


> Kill tasks on a node if the free physical memory on that machine falls below a configured threshold
> ---------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1221
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1221
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.22.0
>            Reporter: dhruba borthakur
>            Assignee: Scott Chen
>             Fix For: 0.22.0
>
>         Attachments: MAPREDUCE-1221-v1.patch, MAPREDUCE-1221-v2.patch, MAPREDUCE-1221-v3.patch
>
>
> The TaskTracker currently supports killing tasks if the virtual memory of a task exceeds a set of configured thresholds. I would like to extend this feature to enable killing tasks if the physical memory used by that task exceeds a certain threshold.
> On a certain operating system (guess?), if user space processes start using lots of memory, the machine hangs and dies quickly. This means that we would like to prevent map-reduce jobs from triggering this condition. From my understanding, the killing-based-on-virtual-memory-limits (HADOOP-5883) were designed to address this problem. This works well when most map-reduce jobs are Java jobs and have well-defined -Xmx parameters that specify the max virtual memory for each task. On the other hand, if each task forks off mappers/reducers written in other languages (python/php, etc), the total virtual memory usage of the process-subtree varies greatly. In these cases, it is better to use kill-tasks-using-physical-memory-limits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1221) Kill tasks on a node if the free physical memory on that machine falls below a configured threshold

Posted by "Scott Chen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12837974#action_12837974 ] 

Scott Chen commented on MAPREDUCE-1221:
---------------------------------------

@Zheng: The failed test is because of MAPREDUCE-1520. It is not related to this patch.

@Arun: Thank you very much for the comment. It's not late at all:) Using setrlimit() is good for limiting the RSS memory for one task. But here we want to limit the total RSS memory on the TT. And we want to kill the task with the highest memory usage in this case. That is why we do it this way.

> Kill tasks on a node if the free physical memory on that machine falls below a configured threshold
> ---------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1221
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1221
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.22.0
>            Reporter: dhruba borthakur
>            Assignee: Scott Chen
>             Fix For: 0.22.0
>
>         Attachments: MAPREDUCE-1221-v1.patch, MAPREDUCE-1221-v2.patch, MAPREDUCE-1221-v3.patch
>
>
> The TaskTracker currently supports killing tasks if the virtual memory of a task exceeds a set of configured thresholds. I would like to extend this feature to enable killing tasks if the physical memory used by that task exceeds a certain threshold.
> On a certain operating system (guess?), if user space processes start using lots of memory, the machine hangs and dies quickly. This means that we would like to prevent map-reduce jobs from triggering this condition. From my understanding, the killing-based-on-virtual-memory-limits (HADOOP-5883) were designed to address this problem. This works well when most map-reduce jobs are Java jobs and have well-defined -Xmx parameters that specify the max virtual memory for each task. On the other hand, if each task forks off mappers/reducers written in other languages (python/php, etc), the total virtual memory usage of the process-subtree varies greatly. In these cases, it is better to use kill-tasks-using-physical-memory-limits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1221) Kill tasks on a node if the free physical memory on that machine falls below a configured threshold

Posted by "Scott Chen (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Scott Chen updated MAPREDUCE-1221:
----------------------------------

    Status: Patch Available  (was: Open)

> Kill tasks on a node if the free physical memory on that machine falls below a configured threshold
> ---------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1221
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1221
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.22.0
>            Reporter: dhruba borthakur
>            Assignee: Scott Chen
>             Fix For: 0.21.0
>
>         Attachments: MAPREDUCE-1221-v1.patch, MAPREDUCE-1221-v2.patch, MAPREDUCE-1221-v3.patch, MAPREDUCE-1221-v4.patch, MAPREDUCE-1221-v5.1.txt, MAPREDUCE-1221-v5.txt
>
>
> The TaskTracker currently supports killing tasks if the virtual memory of a task exceeds a set of configured thresholds. I would like to extend this feature to enable killing tasks if the physical memory used by that task exceeds a certain threshold.
> On a certain operating system (guess?), if user space processes start using lots of memory, the machine hangs and dies quickly. This means that we would like to prevent map-reduce jobs from triggering this condition. From my understanding, the killing-based-on-virtual-memory-limits (HADOOP-5883) were designed to address this problem. This works well when most map-reduce jobs are Java jobs and have well-defined -Xmx parameters that specify the max virtual memory for each task. On the other hand, if each task forks off mappers/reducers written in other languages (python/php, etc), the total virtual memory usage of the process-subtree varies greatly. In these cases, it is better to use kill-tasks-using-physical-memory-limits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1221) Kill tasks on a node if the free physical memory on that machine falls below a configured threshold

Posted by "Scott Chen (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Scott Chen updated MAPREDUCE-1221:
----------------------------------

    Status: Patch Available  (was: Open)

> Kill tasks on a node if the free physical memory on that machine falls below a configured threshold
> ---------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1221
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1221
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.22.0
>            Reporter: dhruba borthakur
>            Assignee: Scott Chen
>             Fix For: 0.22.0
>
>         Attachments: MAPREDUCE-1221-v1.patch
>
>
> The TaskTracker currently supports killing tasks if the virtual memory of a task exceeds a set of configured thresholds. I would like to extend this feature to enable killing tasks if the physical memory used by that task exceeds a certain threshold.
> On a certain operating system (guess?), if user space processes start using lots of memory, the machine hangs and dies quickly. This means that we would like to prevent map-reduce jobs from triggering this condition. From my understanding, the killing-based-on-virtual-memory-limits (HADOOP-5883) were designed to address this problem. This works well when most map-reduce jobs are Java jobs and have well-defined -Xmx parameters that specify the max virtual memory for each task. On the other hand, if each task forks off mappers/reducers written in other languages (python/php, etc), the total virtual memory usage of the process-subtree varies greatly. In these cases, it is better to use kill-tasks-using-physical-memory-limits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1221) Kill tasks on a node if the free physical memory on that machine falls below a configured threshold

Posted by "Vinod K V (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12779821#action_12779821 ] 

Vinod K V commented on MAPREDUCE-1221:
--------------------------------------

bq. This works well when most map-reduce jobs are Java jobs and have well-defined -Xmx parameters that specify the max virtual memory for each task. On the other hand, if each task forks off mappers/reducers written in other languages (python/php, etc), the total virtual memory usage of the process-subtree varies greatly.
This is not quite right. The TaskMemoryManager indeed looks at the memory usage of all the processes in the task's process-tree irrespective of whether each of them is a java program or otherwise. If one wishes to limit the physical memory usage, the tasks can be configured to be.

In any case, it currently works only on Linux, so if is another OS, it needs to be implemented there.

> Kill tasks on a node if the free physical memory on that machine falls below a configured threshold
> ---------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1221
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1221
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>
> The TaskTracker currently supports killing tasks if the virtual memory of a task exceeds a set of configured thresholds. I would like to extend this feature to enable killing tasks if the physical memory used by that task exceeds a certain threshold.
> On a certain operating system (guess?), if user space processes start using lots of memory, the machine hangs and dies quickly. This means that we would like to prevent map-reduce jobs from triggering this condition. From my understanding, the killing-based-on-virtual-memory-limits (HADOOP-5883) were designed to address this problem. This works well when most map-reduce jobs are Java jobs and have well-defined -Xmx parameters that specify the max virtual memory for each task. On the other hand, if each task forks off mappers/reducers written in other languages (python/php, etc), the total virtual memory usage of the process-subtree varies greatly. In these cases, it is better to use kill-tasks-using-physical-memory-limits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1221) Kill tasks on a node if the free physical memory on that machine falls below a configured threshold

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12838023#action_12838023 ] 

Zheng Shao commented on MAPREDUCE-1221:
---------------------------------------

Arun, does the explanations from Scott and Matei make sense to you?
If it looks good to you, I would like to commit it.


> Kill tasks on a node if the free physical memory on that machine falls below a configured threshold
> ---------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1221
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1221
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.22.0
>            Reporter: dhruba borthakur
>            Assignee: Scott Chen
>             Fix For: 0.22.0
>
>         Attachments: MAPREDUCE-1221-v1.patch, MAPREDUCE-1221-v2.patch, MAPREDUCE-1221-v3.patch
>
>
> The TaskTracker currently supports killing tasks if the virtual memory of a task exceeds a set of configured thresholds. I would like to extend this feature to enable killing tasks if the physical memory used by that task exceeds a certain threshold.
> On a certain operating system (guess?), if user space processes start using lots of memory, the machine hangs and dies quickly. This means that we would like to prevent map-reduce jobs from triggering this condition. From my understanding, the killing-based-on-virtual-memory-limits (HADOOP-5883) were designed to address this problem. This works well when most map-reduce jobs are Java jobs and have well-defined -Xmx parameters that specify the max virtual memory for each task. On the other hand, if each task forks off mappers/reducers written in other languages (python/php, etc), the total virtual memory usage of the process-subtree varies greatly. In these cases, it is better to use kill-tasks-using-physical-memory-limits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1221) Kill tasks on a node if the free physical memory on that machine falls below a configured threshold

Posted by "Scott Chen (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Scott Chen updated MAPREDUCE-1221:
----------------------------------

    Attachment: MAPREDUCE-1221-v5.txt

> Kill tasks on a node if the free physical memory on that machine falls below a configured threshold
> ---------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1221
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1221
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.22.0
>            Reporter: dhruba borthakur
>            Assignee: Scott Chen
>             Fix For: 0.22.0
>
>         Attachments: MAPREDUCE-1221-v1.patch, MAPREDUCE-1221-v2.patch, MAPREDUCE-1221-v3.patch, MAPREDUCE-1221-v4.patch, MAPREDUCE-1221-v5.txt
>
>
> The TaskTracker currently supports killing tasks if the virtual memory of a task exceeds a set of configured thresholds. I would like to extend this feature to enable killing tasks if the physical memory used by that task exceeds a certain threshold.
> On a certain operating system (guess?), if user space processes start using lots of memory, the machine hangs and dies quickly. This means that we would like to prevent map-reduce jobs from triggering this condition. From my understanding, the killing-based-on-virtual-memory-limits (HADOOP-5883) were designed to address this problem. This works well when most map-reduce jobs are Java jobs and have well-defined -Xmx parameters that specify the max virtual memory for each task. On the other hand, if each task forks off mappers/reducers written in other languages (python/php, etc), the total virtual memory usage of the process-subtree varies greatly. In these cases, it is better to use kill-tasks-using-physical-memory-limits.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (MAPREDUCE-1221) Kill tasks on a node if the free physical memory on that machine falls below a configured threshold

Posted by "Scott Chen (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Scott Chen updated MAPREDUCE-1221:
----------------------------------

    Status: Patch Available  (was: Open)

> Kill tasks on a node if the free physical memory on that machine falls below a configured threshold
> ---------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1221
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1221
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.22.0
>            Reporter: dhruba borthakur
>            Assignee: Scott Chen
>             Fix For: 0.22.0
>
>         Attachments: MAPREDUCE-1221-v1.patch, MAPREDUCE-1221-v2.patch, MAPREDUCE-1221-v3.patch, MAPREDUCE-1221-v4.patch
>
>
> The TaskTracker currently supports killing tasks if the virtual memory of a task exceeds a set of configured thresholds. I would like to extend this feature to enable killing tasks if the physical memory used by that task exceeds a certain threshold.
> On a certain operating system (guess?), if user space processes start using lots of memory, the machine hangs and dies quickly. This means that we would like to prevent map-reduce jobs from triggering this condition. From my understanding, the killing-based-on-virtual-memory-limits (HADOOP-5883) were designed to address this problem. This works well when most map-reduce jobs are Java jobs and have well-defined -Xmx parameters that specify the max virtual memory for each task. On the other hand, if each task forks off mappers/reducers written in other languages (python/php, etc), the total virtual memory usage of the process-subtree varies greatly. In these cases, it is better to use kill-tasks-using-physical-memory-limits.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (MAPREDUCE-1221) Kill tasks on a node if the free physical memory on that machine falls below a configured threshold

Posted by "Scott Chen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12838089#action_12838089 ] 

Scott Chen commented on MAPREDUCE-1221:
---------------------------------------

@Arun: I agree. It is reasonable to track these two.

We have tried once with the per-task limit. We tried to set the threshold higher but it cannot be too high because it hurts the memory utilization. But that threshold still cause some of our pipeline and user jobs fails. So we switched it off quickly. But we really need some memory protecting feature. That was why we started working on this patch.

Correct me if I am wrong. Kill a task will not make the job fail. It is different from failing a task. So the 4th attempt gets killed will not fail the job. Also, if it gets killed, it is likely that itself is the rough task because it uses the highest amount of memory.

But in general, I agree with you. The total-RSS-memory-based-and-killing-highest-memory-task approach is not as predictable as the virtual-memory-limit approach. But it still have some benefits that we discussed above. And it is optional to use. This patch will not do anything if we don't configure it.

Again, thank you for the comment. I think the discussions are very helpful.

> Kill tasks on a node if the free physical memory on that machine falls below a configured threshold
> ---------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1221
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1221
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.22.0
>            Reporter: dhruba borthakur
>            Assignee: Scott Chen
>             Fix For: 0.22.0
>
>         Attachments: MAPREDUCE-1221-v1.patch, MAPREDUCE-1221-v2.patch, MAPREDUCE-1221-v3.patch
>
>
> The TaskTracker currently supports killing tasks if the virtual memory of a task exceeds a set of configured thresholds. I would like to extend this feature to enable killing tasks if the physical memory used by that task exceeds a certain threshold.
> On a certain operating system (guess?), if user space processes start using lots of memory, the machine hangs and dies quickly. This means that we would like to prevent map-reduce jobs from triggering this condition. From my understanding, the killing-based-on-virtual-memory-limits (HADOOP-5883) were designed to address this problem. This works well when most map-reduce jobs are Java jobs and have well-defined -Xmx parameters that specify the max virtual memory for each task. On the other hand, if each task forks off mappers/reducers written in other languages (python/php, etc), the total virtual memory usage of the process-subtree varies greatly. In these cases, it is better to use kill-tasks-using-physical-memory-limits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1221) Kill tasks on a node if the free physical memory on that machine falls below a configured threshold

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12846632#action_12846632 ] 

Hadoop QA commented on MAPREDUCE-1221:
--------------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12438470/MAPREDUCE-1221-v4.patch
  against trunk revision 923907.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 3 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed core unit tests.

    -1 contrib tests.  The patch failed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/528/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/528/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/528/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/528/console

This message is automatically generated.

> Kill tasks on a node if the free physical memory on that machine falls below a configured threshold
> ---------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1221
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1221
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.22.0
>            Reporter: dhruba borthakur
>            Assignee: Scott Chen
>             Fix For: 0.22.0
>
>         Attachments: MAPREDUCE-1221-v1.patch, MAPREDUCE-1221-v2.patch, MAPREDUCE-1221-v3.patch, MAPREDUCE-1221-v4.patch
>
>
> The TaskTracker currently supports killing tasks if the virtual memory of a task exceeds a set of configured thresholds. I would like to extend this feature to enable killing tasks if the physical memory used by that task exceeds a certain threshold.
> On a certain operating system (guess?), if user space processes start using lots of memory, the machine hangs and dies quickly. This means that we would like to prevent map-reduce jobs from triggering this condition. From my understanding, the killing-based-on-virtual-memory-limits (HADOOP-5883) were designed to address this problem. This works well when most map-reduce jobs are Java jobs and have well-defined -Xmx parameters that specify the max virtual memory for each task. On the other hand, if each task forks off mappers/reducers written in other languages (python/php, etc), the total virtual memory usage of the process-subtree varies greatly. In these cases, it is better to use kill-tasks-using-physical-memory-limits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1221) Kill tasks on a node if the free physical memory on that machine falls below a configured threshold

Posted by "Scott Chen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12859014#action_12859014 ] 

Scott Chen commented on MAPREDUCE-1221:
---------------------------------------

I made the following change based on Amareshwari's suggestion. Thanks

{code}
   void addToMemoryManager(TaskAttemptID attemptId, boolean isMap, 
                           JobConf conf) {
+    if (!isTaskMemoryManagerEnabled()) {
+      return; // Skip this if TaskMemoryManager is not enabled.
     }
{code}

> Kill tasks on a node if the free physical memory on that machine falls below a configured threshold
> ---------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1221
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1221
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.22.0
>            Reporter: dhruba borthakur
>            Assignee: Scott Chen
>             Fix For: 0.21.0
>
>         Attachments: MAPREDUCE-1221-v1.patch, MAPREDUCE-1221-v2.patch, MAPREDUCE-1221-v3.patch, MAPREDUCE-1221-v4.patch, MAPREDUCE-1221-v5.1.txt, MAPREDUCE-1221-v5.txt, MAPREDUCE-1221-v6.txt
>
>
> The TaskTracker currently supports killing tasks if the virtual memory of a task exceeds a set of configured thresholds. I would like to extend this feature to enable killing tasks if the physical memory used by that task exceeds a certain threshold.
> On a certain operating system (guess?), if user space processes start using lots of memory, the machine hangs and dies quickly. This means that we would like to prevent map-reduce jobs from triggering this condition. From my understanding, the killing-based-on-virtual-memory-limits (HADOOP-5883) were designed to address this problem. This works well when most map-reduce jobs are Java jobs and have well-defined -Xmx parameters that specify the max virtual memory for each task. On the other hand, if each task forks off mappers/reducers written in other languages (python/php, etc), the total virtual memory usage of the process-subtree varies greatly. In these cases, it is better to use kill-tasks-using-physical-memory-limits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (MAPREDUCE-1221) Kill tasks on a node if the free physical memory on that machine falls below a configured threshold

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur reassigned MAPREDUCE-1221:
-------------------------------------------

    Assignee: Scott Chen  (was: dhruba borthakur)

> Kill tasks on a node if the free physical memory on that machine falls below a configured threshold
> ---------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1221
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1221
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>            Reporter: dhruba borthakur
>            Assignee: Scott Chen
>
> The TaskTracker currently supports killing tasks if the virtual memory of a task exceeds a set of configured thresholds. I would like to extend this feature to enable killing tasks if the physical memory used by that task exceeds a certain threshold.
> On a certain operating system (guess?), if user space processes start using lots of memory, the machine hangs and dies quickly. This means that we would like to prevent map-reduce jobs from triggering this condition. From my understanding, the killing-based-on-virtual-memory-limits (HADOOP-5883) were designed to address this problem. This works well when most map-reduce jobs are Java jobs and have well-defined -Xmx parameters that specify the max virtual memory for each task. On the other hand, if each task forks off mappers/reducers written in other languages (python/php, etc), the total virtual memory usage of the process-subtree varies greatly. In these cases, it is better to use kill-tasks-using-physical-memory-limits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1221) Kill tasks on a node if the free physical memory on that machine falls below a configured threshold

Posted by "Scott Chen (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Scott Chen updated MAPREDUCE-1221:
----------------------------------

    Attachment:     (was: MAPREDUCE-1221-v1.patch)

> Kill tasks on a node if the free physical memory on that machine falls below a configured threshold
> ---------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1221
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1221
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>            Reporter: dhruba borthakur
>            Assignee: Scott Chen
>         Attachments: MAPREDUCE-1221-v1.patch
>
>
> The TaskTracker currently supports killing tasks if the virtual memory of a task exceeds a set of configured thresholds. I would like to extend this feature to enable killing tasks if the physical memory used by that task exceeds a certain threshold.
> On a certain operating system (guess?), if user space processes start using lots of memory, the machine hangs and dies quickly. This means that we would like to prevent map-reduce jobs from triggering this condition. From my understanding, the killing-based-on-virtual-memory-limits (HADOOP-5883) were designed to address this problem. This works well when most map-reduce jobs are Java jobs and have well-defined -Xmx parameters that specify the max virtual memory for each task. On the other hand, if each task forks off mappers/reducers written in other languages (python/php, etc), the total virtual memory usage of the process-subtree varies greatly. In these cases, it is better to use kill-tasks-using-physical-memory-limits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1221) Kill tasks on a node if the free physical memory on that machine falls below a configured threshold

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12838185#action_12838185 ] 

Arun C Murthy commented on MAPREDUCE-1221:
------------------------------------------

I think Hong is right, this is a duplicate of MAPREDUCE-257.

Anyway,

bq. Correct me if I am wrong. Kill a task will not make the job fail. It is different from failing a task. So the 4th attempt gets killed will not fail the job. Also, if it gets killed, it is likely that itself is the rough task because it uses the highest amount of memory.

Now I'm really worried. If tasks, and hence jobs, are not penalized i.e. killed rather than failed, what is the incentive for the authors of the *bad* applications to be fix them? If they are just killed the same tasks will be run over and over again without any penalty - there are no disincentives!

> Kill tasks on a node if the free physical memory on that machine falls below a configured threshold
> ---------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1221
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1221
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.22.0
>            Reporter: dhruba borthakur
>            Assignee: Scott Chen
>             Fix For: 0.22.0
>
>         Attachments: MAPREDUCE-1221-v1.patch, MAPREDUCE-1221-v2.patch, MAPREDUCE-1221-v3.patch
>
>
> The TaskTracker currently supports killing tasks if the virtual memory of a task exceeds a set of configured thresholds. I would like to extend this feature to enable killing tasks if the physical memory used by that task exceeds a certain threshold.
> On a certain operating system (guess?), if user space processes start using lots of memory, the machine hangs and dies quickly. This means that we would like to prevent map-reduce jobs from triggering this condition. From my understanding, the killing-based-on-virtual-memory-limits (HADOOP-5883) were designed to address this problem. This works well when most map-reduce jobs are Java jobs and have well-defined -Xmx parameters that specify the max virtual memory for each task. On the other hand, if each task forks off mappers/reducers written in other languages (python/php, etc), the total virtual memory usage of the process-subtree varies greatly. In these cases, it is better to use kill-tasks-using-physical-memory-limits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1221) Kill tasks on a node if the free physical memory on that machine falls below a configured threshold

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12838675#action_12838675 ] 

Arun C Murthy commented on MAPREDUCE-1221:
------------------------------------------

Dhruba, thanks for explaining your use case.

First up: Do you agree that we need to *fail* the task and not just *kill* it, so that the job fails fast?

> Kill tasks on a node if the free physical memory on that machine falls below a configured threshold
> ---------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1221
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1221
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.22.0
>            Reporter: dhruba borthakur
>            Assignee: Scott Chen
>             Fix For: 0.22.0
>
>         Attachments: MAPREDUCE-1221-v1.patch, MAPREDUCE-1221-v2.patch, MAPREDUCE-1221-v3.patch
>
>
> The TaskTracker currently supports killing tasks if the virtual memory of a task exceeds a set of configured thresholds. I would like to extend this feature to enable killing tasks if the physical memory used by that task exceeds a certain threshold.
> On a certain operating system (guess?), if user space processes start using lots of memory, the machine hangs and dies quickly. This means that we would like to prevent map-reduce jobs from triggering this condition. From my understanding, the killing-based-on-virtual-memory-limits (HADOOP-5883) were designed to address this problem. This works well when most map-reduce jobs are Java jobs and have well-defined -Xmx parameters that specify the max virtual memory for each task. On the other hand, if each task forks off mappers/reducers written in other languages (python/php, etc), the total virtual memory usage of the process-subtree varies greatly. In these cases, it is better to use kill-tasks-using-physical-memory-limits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1221) Kill tasks on a node if the free physical memory on that machine falls below a configured threshold

Posted by "Scott Chen (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Scott Chen updated MAPREDUCE-1221:
----------------------------------

    Status: Patch Available  (was: Open)

> Kill tasks on a node if the free physical memory on that machine falls below a configured threshold
> ---------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1221
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1221
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.22.0
>            Reporter: dhruba borthakur
>            Assignee: Scott Chen
>             Fix For: 0.22.0
>
>         Attachments: MAPREDUCE-1221-v1.patch, MAPREDUCE-1221-v2.patch
>
>
> The TaskTracker currently supports killing tasks if the virtual memory of a task exceeds a set of configured thresholds. I would like to extend this feature to enable killing tasks if the physical memory used by that task exceeds a certain threshold.
> On a certain operating system (guess?), if user space processes start using lots of memory, the machine hangs and dies quickly. This means that we would like to prevent map-reduce jobs from triggering this condition. From my understanding, the killing-based-on-virtual-memory-limits (HADOOP-5883) were designed to address this problem. This works well when most map-reduce jobs are Java jobs and have well-defined -Xmx parameters that specify the max virtual memory for each task. On the other hand, if each task forks off mappers/reducers written in other languages (python/php, etc), the total virtual memory usage of the process-subtree varies greatly. In these cases, it is better to use kill-tasks-using-physical-memory-limits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1221) Kill tasks on a node if the free physical memory on that machine falls below a configured threshold

Posted by "Luke Lu (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12838625#action_12838625 ] 

Luke Lu commented on MAPREDUCE-1221:
------------------------------------

@dhruba: Have you guys tried lxc-execute? (needs 2.6.27+ kernel), which give you fairly complete control of the hardware resources for a group of processes (an arbitrary container of processes and their children) without having to write any code.

> Kill tasks on a node if the free physical memory on that machine falls below a configured threshold
> ---------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1221
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1221
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.22.0
>            Reporter: dhruba borthakur
>            Assignee: Scott Chen
>             Fix For: 0.22.0
>
>         Attachments: MAPREDUCE-1221-v1.patch, MAPREDUCE-1221-v2.patch, MAPREDUCE-1221-v3.patch
>
>
> The TaskTracker currently supports killing tasks if the virtual memory of a task exceeds a set of configured thresholds. I would like to extend this feature to enable killing tasks if the physical memory used by that task exceeds a certain threshold.
> On a certain operating system (guess?), if user space processes start using lots of memory, the machine hangs and dies quickly. This means that we would like to prevent map-reduce jobs from triggering this condition. From my understanding, the killing-based-on-virtual-memory-limits (HADOOP-5883) were designed to address this problem. This works well when most map-reduce jobs are Java jobs and have well-defined -Xmx parameters that specify the max virtual memory for each task. On the other hand, if each task forks off mappers/reducers written in other languages (python/php, etc), the total virtual memory usage of the process-subtree varies greatly. In these cases, it is better to use kill-tasks-using-physical-memory-limits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1221) Kill tasks on a node if the free physical memory on that machine falls below a configured threshold

Posted by "Scott Chen (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Scott Chen updated MAPREDUCE-1221:
----------------------------------

    Status: Open  (was: Patch Available)

> Kill tasks on a node if the free physical memory on that machine falls below a configured threshold
> ---------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1221
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1221
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.22.0
>            Reporter: dhruba borthakur
>            Assignee: Scott Chen
>             Fix For: 0.22.0
>
>         Attachments: MAPREDUCE-1221-v1.patch, MAPREDUCE-1221-v2.patch, MAPREDUCE-1221-v3.patch, MAPREDUCE-1221-v4.patch
>
>
> The TaskTracker currently supports killing tasks if the virtual memory of a task exceeds a set of configured thresholds. I would like to extend this feature to enable killing tasks if the physical memory used by that task exceeds a certain threshold.
> On a certain operating system (guess?), if user space processes start using lots of memory, the machine hangs and dies quickly. This means that we would like to prevent map-reduce jobs from triggering this condition. From my understanding, the killing-based-on-virtual-memory-limits (HADOOP-5883) were designed to address this problem. This works well when most map-reduce jobs are Java jobs and have well-defined -Xmx parameters that specify the max virtual memory for each task. On the other hand, if each task forks off mappers/reducers written in other languages (python/php, etc), the total virtual memory usage of the process-subtree varies greatly. In these cases, it is better to use kill-tasks-using-physical-memory-limits.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (MAPREDUCE-1221) Kill tasks on a node if the free physical memory on that machine falls below a configured threshold

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12857405#action_12857405 ] 

Arun C Murthy commented on MAPREDUCE-1221:
------------------------------------------

Scott, I had a question:

Comparing the following snippets in src/java/org/apache/hadoop/mapred/TaskMemoryManagerThread.java
{noformat}
+          if (doCheckVirtualMemory() &&
+              isProcessTreeOverLimit(tid.toString(), currentMemUsage,
                                       curMemUsageOfAgedProcesses, limit)) {
{noformat}

{noformat}
+          } else if (doCheckPhysicalMemory() && limitPhysical > 0 &&
+              isProcessTreeOverLimit(tid.toString(), currentRssMemUsage,
+                                curRssMemUsageOfAgedProcesses, limitPhysical)) {
{noformat}

I don't understand why 'limitPhysical > 0' is necessary. If 'doCheckPhysicalMemory' returns true, shouldn't limitPhysical always be positive? Please help me understand, thanks.

> Kill tasks on a node if the free physical memory on that machine falls below a configured threshold
> ---------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1221
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1221
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.22.0
>            Reporter: dhruba borthakur
>            Assignee: Scott Chen
>             Fix For: 0.22.0
>
>         Attachments: MAPREDUCE-1221-v1.patch, MAPREDUCE-1221-v2.patch, MAPREDUCE-1221-v3.patch, MAPREDUCE-1221-v4.patch
>
>
> The TaskTracker currently supports killing tasks if the virtual memory of a task exceeds a set of configured thresholds. I would like to extend this feature to enable killing tasks if the physical memory used by that task exceeds a certain threshold.
> On a certain operating system (guess?), if user space processes start using lots of memory, the machine hangs and dies quickly. This means that we would like to prevent map-reduce jobs from triggering this condition. From my understanding, the killing-based-on-virtual-memory-limits (HADOOP-5883) were designed to address this problem. This works well when most map-reduce jobs are Java jobs and have well-defined -Xmx parameters that specify the max virtual memory for each task. On the other hand, if each task forks off mappers/reducers written in other languages (python/php, etc), the total virtual memory usage of the process-subtree varies greatly. In these cases, it is better to use kill-tasks-using-physical-memory-limits.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (MAPREDUCE-1221) Kill tasks on a node if the free physical memory on that machine falls below a configured threshold

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12838636#action_12838636 ] 

dhruba borthakur commented on MAPREDUCE-1221:
---------------------------------------------

@Luke: Thanks for the suggestion. I am pretty sure that we would like to move to a Container model (where each task runs inside its own Container e.g. lxc-execute) in a slightly longer term. 


> Kill tasks on a node if the free physical memory on that machine falls below a configured threshold
> ---------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1221
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1221
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.22.0
>            Reporter: dhruba borthakur
>            Assignee: Scott Chen
>             Fix For: 0.22.0
>
>         Attachments: MAPREDUCE-1221-v1.patch, MAPREDUCE-1221-v2.patch, MAPREDUCE-1221-v3.patch
>
>
> The TaskTracker currently supports killing tasks if the virtual memory of a task exceeds a set of configured thresholds. I would like to extend this feature to enable killing tasks if the physical memory used by that task exceeds a certain threshold.
> On a certain operating system (guess?), if user space processes start using lots of memory, the machine hangs and dies quickly. This means that we would like to prevent map-reduce jobs from triggering this condition. From my understanding, the killing-based-on-virtual-memory-limits (HADOOP-5883) were designed to address this problem. This works well when most map-reduce jobs are Java jobs and have well-defined -Xmx parameters that specify the max virtual memory for each task. On the other hand, if each task forks off mappers/reducers written in other languages (python/php, etc), the total virtual memory usage of the process-subtree varies greatly. In these cases, it is better to use kill-tasks-using-physical-memory-limits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1221) Kill tasks on a node if the free physical memory on that machine falls below a configured threshold

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12838237#action_12838237 ] 

dhruba borthakur commented on MAPREDUCE-1221:
---------------------------------------------

> If tasks, and hence jobs, are not penalized i.e. killed rather than failed, what is the incentive for the authors of the bad

>From my understanding, this JIRA is not about making users write good code. It is about making nodes in the cluster not die and rot (via excessive swapping, aka MAPREDUCE-257) when a huge-memory-consuming job comes along. It is also about letting other good jobs continue to run peacefully as much as possible. In a true utility model, the service provider provides service and it is upto the customer to write/deploy whenever code he wants to run on it (as long as he pays for it, of course:-))

> So killing some tasks and letting others proceed may still be better than not killing. That said, I agree that we can fine tune the policy

+1 for Hong's proposal. If somebody has a better policy of which tasks to kill when this situation arises that will be great. The default policy is to kill tasks with the maximum memory used and I think that is a very practical one, isn't it? 


> Kill tasks on a node if the free physical memory on that machine falls below a configured threshold
> ---------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1221
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1221
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.22.0
>            Reporter: dhruba borthakur
>            Assignee: Scott Chen
>             Fix For: 0.22.0
>
>         Attachments: MAPREDUCE-1221-v1.patch, MAPREDUCE-1221-v2.patch, MAPREDUCE-1221-v3.patch
>
>
> The TaskTracker currently supports killing tasks if the virtual memory of a task exceeds a set of configured thresholds. I would like to extend this feature to enable killing tasks if the physical memory used by that task exceeds a certain threshold.
> On a certain operating system (guess?), if user space processes start using lots of memory, the machine hangs and dies quickly. This means that we would like to prevent map-reduce jobs from triggering this condition. From my understanding, the killing-based-on-virtual-memory-limits (HADOOP-5883) were designed to address this problem. This works well when most map-reduce jobs are Java jobs and have well-defined -Xmx parameters that specify the max virtual memory for each task. On the other hand, if each task forks off mappers/reducers written in other languages (python/php, etc), the total virtual memory usage of the process-subtree varies greatly. In these cases, it is better to use kill-tasks-using-physical-memory-limits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1221) Kill tasks on a node if the free physical memory on that machine falls below a configured threshold

Posted by "Amareshwari Sriramadasu (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12859220#action_12859220 ] 

Amareshwari Sriramadasu commented on MAPREDUCE-1221:
----------------------------------------------------

Changes look fine to me.

> Kill tasks on a node if the free physical memory on that machine falls below a configured threshold
> ---------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1221
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1221
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.22.0
>            Reporter: dhruba borthakur
>            Assignee: Scott Chen
>             Fix For: 0.21.0
>
>         Attachments: MAPREDUCE-1221-v1.patch, MAPREDUCE-1221-v2.patch, MAPREDUCE-1221-v3.patch, MAPREDUCE-1221-v4.patch, MAPREDUCE-1221-v5.1.txt, MAPREDUCE-1221-v5.txt, MAPREDUCE-1221-v6.txt
>
>
> The TaskTracker currently supports killing tasks if the virtual memory of a task exceeds a set of configured thresholds. I would like to extend this feature to enable killing tasks if the physical memory used by that task exceeds a certain threshold.
> On a certain operating system (guess?), if user space processes start using lots of memory, the machine hangs and dies quickly. This means that we would like to prevent map-reduce jobs from triggering this condition. From my understanding, the killing-based-on-virtual-memory-limits (HADOOP-5883) were designed to address this problem. This works well when most map-reduce jobs are Java jobs and have well-defined -Xmx parameters that specify the max virtual memory for each task. On the other hand, if each task forks off mappers/reducers written in other languages (python/php, etc), the total virtual memory usage of the process-subtree varies greatly. In these cases, it is better to use kill-tasks-using-physical-memory-limits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1221) Kill tasks on a node if the free physical memory on that machine falls below a configured threshold

Posted by "Scott Chen (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Scott Chen updated MAPREDUCE-1221:
----------------------------------

    Status: Open  (was: Patch Available)

> Kill tasks on a node if the free physical memory on that machine falls below a configured threshold
> ---------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1221
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1221
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.22.0
>            Reporter: dhruba borthakur
>            Assignee: Scott Chen
>             Fix For: 0.22.0
>
>         Attachments: MAPREDUCE-1221-v1.patch
>
>
> The TaskTracker currently supports killing tasks if the virtual memory of a task exceeds a set of configured thresholds. I would like to extend this feature to enable killing tasks if the physical memory used by that task exceeds a certain threshold.
> On a certain operating system (guess?), if user space processes start using lots of memory, the machine hangs and dies quickly. This means that we would like to prevent map-reduce jobs from triggering this condition. From my understanding, the killing-based-on-virtual-memory-limits (HADOOP-5883) were designed to address this problem. This works well when most map-reduce jobs are Java jobs and have well-defined -Xmx parameters that specify the max virtual memory for each task. On the other hand, if each task forks off mappers/reducers written in other languages (python/php, etc), the total virtual memory usage of the process-subtree varies greatly. In these cases, it is better to use kill-tasks-using-physical-memory-limits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1221) Kill tasks on a node if the free physical memory on that machine falls below a configured threshold

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12795025#action_12795025 ] 

dhruba borthakur commented on MAPREDUCE-1221:
---------------------------------------------

Hi Vinod, will it be possible for you to review this one?

> Kill tasks on a node if the free physical memory on that machine falls below a configured threshold
> ---------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1221
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1221
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.22.0
>            Reporter: dhruba borthakur
>            Assignee: Scott Chen
>             Fix For: 0.22.0
>
>         Attachments: MAPREDUCE-1221-v1.patch
>
>
> The TaskTracker currently supports killing tasks if the virtual memory of a task exceeds a set of configured thresholds. I would like to extend this feature to enable killing tasks if the physical memory used by that task exceeds a certain threshold.
> On a certain operating system (guess?), if user space processes start using lots of memory, the machine hangs and dies quickly. This means that we would like to prevent map-reduce jobs from triggering this condition. From my understanding, the killing-based-on-virtual-memory-limits (HADOOP-5883) were designed to address this problem. This works well when most map-reduce jobs are Java jobs and have well-defined -Xmx parameters that specify the max virtual memory for each task. On the other hand, if each task forks off mappers/reducers written in other languages (python/php, etc), the total virtual memory usage of the process-subtree varies greatly. In these cases, it is better to use kill-tasks-using-physical-memory-limits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1221) Kill tasks on a node if the free physical memory on that machine falls below a configured threshold

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12799985#action_12799985 ] 

Hadoop QA commented on MAPREDUCE-1221:
--------------------------------------

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12428563/MAPREDUCE-1221-v1.patch
  against trunk revision 898486.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 3 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/383/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/383/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/383/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/383/console

This message is automatically generated.

> Kill tasks on a node if the free physical memory on that machine falls below a configured threshold
> ---------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1221
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1221
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.22.0
>            Reporter: dhruba borthakur
>            Assignee: Scott Chen
>             Fix For: 0.22.0
>
>         Attachments: MAPREDUCE-1221-v1.patch
>
>
> The TaskTracker currently supports killing tasks if the virtual memory of a task exceeds a set of configured thresholds. I would like to extend this feature to enable killing tasks if the physical memory used by that task exceeds a certain threshold.
> On a certain operating system (guess?), if user space processes start using lots of memory, the machine hangs and dies quickly. This means that we would like to prevent map-reduce jobs from triggering this condition. From my understanding, the killing-based-on-virtual-memory-limits (HADOOP-5883) were designed to address this problem. This works well when most map-reduce jobs are Java jobs and have well-defined -Xmx parameters that specify the max virtual memory for each task. On the other hand, if each task forks off mappers/reducers written in other languages (python/php, etc), the total virtual memory usage of the process-subtree varies greatly. In these cases, it is better to use kill-tasks-using-physical-memory-limits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1221) Kill tasks on a node if the free physical memory on that machine falls below a configured threshold

Posted by "Scott Chen (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Scott Chen updated MAPREDUCE-1221:
----------------------------------

    Status: Patch Available  (was: Open)

> Kill tasks on a node if the free physical memory on that machine falls below a configured threshold
> ---------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1221
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1221
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.22.0
>            Reporter: dhruba borthakur
>            Assignee: Scott Chen
>             Fix For: 0.22.0
>
>         Attachments: MAPREDUCE-1221-v1.patch
>
>
> The TaskTracker currently supports killing tasks if the virtual memory of a task exceeds a set of configured thresholds. I would like to extend this feature to enable killing tasks if the physical memory used by that task exceeds a certain threshold.
> On a certain operating system (guess?), if user space processes start using lots of memory, the machine hangs and dies quickly. This means that we would like to prevent map-reduce jobs from triggering this condition. From my understanding, the killing-based-on-virtual-memory-limits (HADOOP-5883) were designed to address this problem. This works well when most map-reduce jobs are Java jobs and have well-defined -Xmx parameters that specify the max virtual memory for each task. On the other hand, if each task forks off mappers/reducers written in other languages (python/php, etc), the total virtual memory usage of the process-subtree varies greatly. In these cases, it is better to use kill-tasks-using-physical-memory-limits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1221) Kill tasks on a node if the free physical memory on that machine falls below a configured threshold

Posted by "Matei Zaharia (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12838018#action_12838018 ] 

Matei Zaharia commented on MAPREDUCE-1221:
------------------------------------------

Setrlimit does *not* set a cumulative limit for the children. If you setrlimit to 1 GB for a process, and then it spawns the child, the child gets a separate 1 GB limit, and together they have 2 GB. Some newer features in Linux (cgroups and containers) can be used to set a limit on a whole process tree. However, I think the goal of this patch is different - it's to let the jobs use however much memory they want without declaring it in advance, but fix things when we do overcommit.

> Kill tasks on a node if the free physical memory on that machine falls below a configured threshold
> ---------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1221
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1221
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.22.0
>            Reporter: dhruba borthakur
>            Assignee: Scott Chen
>             Fix For: 0.22.0
>
>         Attachments: MAPREDUCE-1221-v1.patch, MAPREDUCE-1221-v2.patch, MAPREDUCE-1221-v3.patch
>
>
> The TaskTracker currently supports killing tasks if the virtual memory of a task exceeds a set of configured thresholds. I would like to extend this feature to enable killing tasks if the physical memory used by that task exceeds a certain threshold.
> On a certain operating system (guess?), if user space processes start using lots of memory, the machine hangs and dies quickly. This means that we would like to prevent map-reduce jobs from triggering this condition. From my understanding, the killing-based-on-virtual-memory-limits (HADOOP-5883) were designed to address this problem. This works well when most map-reduce jobs are Java jobs and have well-defined -Xmx parameters that specify the max virtual memory for each task. On the other hand, if each task forks off mappers/reducers written in other languages (python/php, etc), the total virtual memory usage of the process-subtree varies greatly. In these cases, it is better to use kill-tasks-using-physical-memory-limits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1221) Kill tasks on a node if the free physical memory on that machine falls below a configured threshold

Posted by "Scott Chen (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Scott Chen updated MAPREDUCE-1221:
----------------------------------

    Status: Open  (was: Patch Available)

> Kill tasks on a node if the free physical memory on that machine falls below a configured threshold
> ---------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1221
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1221
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.22.0
>            Reporter: dhruba borthakur
>            Assignee: Scott Chen
>             Fix For: 0.21.0
>
>         Attachments: MAPREDUCE-1221-v1.patch, MAPREDUCE-1221-v2.patch, MAPREDUCE-1221-v3.patch, MAPREDUCE-1221-v4.patch, MAPREDUCE-1221-v5.1.txt, MAPREDUCE-1221-v5.txt
>
>
> The TaskTracker currently supports killing tasks if the virtual memory of a task exceeds a set of configured thresholds. I would like to extend this feature to enable killing tasks if the physical memory used by that task exceeds a certain threshold.
> On a certain operating system (guess?), if user space processes start using lots of memory, the machine hangs and dies quickly. This means that we would like to prevent map-reduce jobs from triggering this condition. From my understanding, the killing-based-on-virtual-memory-limits (HADOOP-5883) were designed to address this problem. This works well when most map-reduce jobs are Java jobs and have well-defined -Xmx parameters that specify the max virtual memory for each task. On the other hand, if each task forks off mappers/reducers written in other languages (python/php, etc), the total virtual memory usage of the process-subtree varies greatly. In these cases, it is better to use kill-tasks-using-physical-memory-limits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1221) Kill tasks on a node if the free physical memory on that machine falls below a configured threshold

Posted by "Scott Chen (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Scott Chen updated MAPREDUCE-1221:
----------------------------------

    Status: Patch Available  (was: Open)

> Kill tasks on a node if the free physical memory on that machine falls below a configured threshold
> ---------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1221
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1221
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.22.0
>            Reporter: dhruba borthakur
>            Assignee: Scott Chen
>             Fix For: 0.21.0
>
>         Attachments: MAPREDUCE-1221-v1.patch, MAPREDUCE-1221-v2.patch, MAPREDUCE-1221-v3.patch, MAPREDUCE-1221-v4.patch, MAPREDUCE-1221-v5.1.txt, MAPREDUCE-1221-v5.txt, MAPREDUCE-1221-v6.txt
>
>
> The TaskTracker currently supports killing tasks if the virtual memory of a task exceeds a set of configured thresholds. I would like to extend this feature to enable killing tasks if the physical memory used by that task exceeds a certain threshold.
> On a certain operating system (guess?), if user space processes start using lots of memory, the machine hangs and dies quickly. This means that we would like to prevent map-reduce jobs from triggering this condition. From my understanding, the killing-based-on-virtual-memory-limits (HADOOP-5883) were designed to address this problem. This works well when most map-reduce jobs are Java jobs and have well-defined -Xmx parameters that specify the max virtual memory for each task. On the other hand, if each task forks off mappers/reducers written in other languages (python/php, etc), the total virtual memory usage of the process-subtree varies greatly. In these cases, it is better to use kill-tasks-using-physical-memory-limits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1221) Kill tasks on a node if the free physical memory on that machine falls below a configured threshold

Posted by "Scott Chen (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Scott Chen updated MAPREDUCE-1221:
----------------------------------

    Status: Open  (was: Patch Available)

> Kill tasks on a node if the free physical memory on that machine falls below a configured threshold
> ---------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1221
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1221
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.22.0
>            Reporter: dhruba borthakur
>            Assignee: Scott Chen
>             Fix For: 0.21.0
>
>         Attachments: MAPREDUCE-1221-v1.patch, MAPREDUCE-1221-v2.patch, MAPREDUCE-1221-v3.patch, MAPREDUCE-1221-v4.patch, MAPREDUCE-1221-v5.1.txt, MAPREDUCE-1221-v5.txt, MAPREDUCE-1221-v6.txt
>
>
> The TaskTracker currently supports killing tasks if the virtual memory of a task exceeds a set of configured thresholds. I would like to extend this feature to enable killing tasks if the physical memory used by that task exceeds a certain threshold.
> On a certain operating system (guess?), if user space processes start using lots of memory, the machine hangs and dies quickly. This means that we would like to prevent map-reduce jobs from triggering this condition. From my understanding, the killing-based-on-virtual-memory-limits (HADOOP-5883) were designed to address this problem. This works well when most map-reduce jobs are Java jobs and have well-defined -Xmx parameters that specify the max virtual memory for each task. On the other hand, if each task forks off mappers/reducers written in other languages (python/php, etc), the total virtual memory usage of the process-subtree varies greatly. In these cases, it is better to use kill-tasks-using-physical-memory-limits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1221) Kill tasks on a node if the free physical memory on that machine falls below a configured threshold

Posted by "Hong Tang (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12838470#action_12838470 ] 

Hong Tang commented on MAPREDUCE-1221:
--------------------------------------

@arun, I think we need both - with per-task resource containment at the high level, and os-level protection at the low level. In this jira, let's focus on the low-level protection) and we can have a separate jira to track the per-task resource containment. 

> Kill tasks on a node if the free physical memory on that machine falls below a configured threshold
> ---------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1221
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1221
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.22.0
>            Reporter: dhruba borthakur
>            Assignee: Scott Chen
>             Fix For: 0.22.0
>
>         Attachments: MAPREDUCE-1221-v1.patch, MAPREDUCE-1221-v2.patch, MAPREDUCE-1221-v3.patch
>
>
> The TaskTracker currently supports killing tasks if the virtual memory of a task exceeds a set of configured thresholds. I would like to extend this feature to enable killing tasks if the physical memory used by that task exceeds a certain threshold.
> On a certain operating system (guess?), if user space processes start using lots of memory, the machine hangs and dies quickly. This means that we would like to prevent map-reduce jobs from triggering this condition. From my understanding, the killing-based-on-virtual-memory-limits (HADOOP-5883) were designed to address this problem. This works well when most map-reduce jobs are Java jobs and have well-defined -Xmx parameters that specify the max virtual memory for each task. On the other hand, if each task forks off mappers/reducers written in other languages (python/php, etc), the total virtual memory usage of the process-subtree varies greatly. In these cases, it is better to use kill-tasks-using-physical-memory-limits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1221) Kill tasks on a node if the free physical memory on that machine falls below a configured threshold

Posted by "Scott Chen (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Scott Chen updated MAPREDUCE-1221:
----------------------------------

    Attachment: MAPREDUCE-1221-v2.patch

Have done some change based on Zheng's comment.
1. In TaskMemoryManagerThread 222: Skip the killed task when checking memory of each task
2. In TestTaskTrackerMemoryManager 545: Add assertFalse to fail the test immediately if the job finish successfully.

> Kill tasks on a node if the free physical memory on that machine falls below a configured threshold
> ---------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1221
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1221
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.22.0
>            Reporter: dhruba borthakur
>            Assignee: Scott Chen
>             Fix For: 0.22.0
>
>         Attachments: MAPREDUCE-1221-v1.patch, MAPREDUCE-1221-v2.patch
>
>
> The TaskTracker currently supports killing tasks if the virtual memory of a task exceeds a set of configured thresholds. I would like to extend this feature to enable killing tasks if the physical memory used by that task exceeds a certain threshold.
> On a certain operating system (guess?), if user space processes start using lots of memory, the machine hangs and dies quickly. This means that we would like to prevent map-reduce jobs from triggering this condition. From my understanding, the killing-based-on-virtual-memory-limits (HADOOP-5883) were designed to address this problem. This works well when most map-reduce jobs are Java jobs and have well-defined -Xmx parameters that specify the max virtual memory for each task. On the other hand, if each task forks off mappers/reducers written in other languages (python/php, etc), the total virtual memory usage of the process-subtree varies greatly. In these cases, it is better to use kill-tasks-using-physical-memory-limits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1221) Kill tasks on a node if the free physical memory on that machine falls below a configured threshold

Posted by "Scott Chen (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Scott Chen updated MAPREDUCE-1221:
----------------------------------

    Attachment: MAPREDUCE-1221-v6.txt

> Kill tasks on a node if the free physical memory on that machine falls below a configured threshold
> ---------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1221
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1221
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.22.0
>            Reporter: dhruba borthakur
>            Assignee: Scott Chen
>             Fix For: 0.21.0
>
>         Attachments: MAPREDUCE-1221-v1.patch, MAPREDUCE-1221-v2.patch, MAPREDUCE-1221-v3.patch, MAPREDUCE-1221-v4.patch, MAPREDUCE-1221-v5.1.txt, MAPREDUCE-1221-v5.txt, MAPREDUCE-1221-v6.txt
>
>
> The TaskTracker currently supports killing tasks if the virtual memory of a task exceeds a set of configured thresholds. I would like to extend this feature to enable killing tasks if the physical memory used by that task exceeds a certain threshold.
> On a certain operating system (guess?), if user space processes start using lots of memory, the machine hangs and dies quickly. This means that we would like to prevent map-reduce jobs from triggering this condition. From my understanding, the killing-based-on-virtual-memory-limits (HADOOP-5883) were designed to address this problem. This works well when most map-reduce jobs are Java jobs and have well-defined -Xmx parameters that specify the max virtual memory for each task. On the other hand, if each task forks off mappers/reducers written in other languages (python/php, etc), the total virtual memory usage of the process-subtree varies greatly. In these cases, it is better to use kill-tasks-using-physical-memory-limits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1221) Kill tasks on a node if the free physical memory on that machine falls below a configured threshold

Posted by "Hong Tang (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12779992#action_12779992 ] 

Hong Tang commented on MAPREDUCE-1221:
--------------------------------------

Is it a duplicate of MAPREDUCE-257?

> Kill tasks on a node if the free physical memory on that machine falls below a configured threshold
> ---------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1221
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1221
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>
> The TaskTracker currently supports killing tasks if the virtual memory of a task exceeds a set of configured thresholds. I would like to extend this feature to enable killing tasks if the physical memory used by that task exceeds a certain threshold.
> On a certain operating system (guess?), if user space processes start using lots of memory, the machine hangs and dies quickly. This means that we would like to prevent map-reduce jobs from triggering this condition. From my understanding, the killing-based-on-virtual-memory-limits (HADOOP-5883) were designed to address this problem. This works well when most map-reduce jobs are Java jobs and have well-defined -Xmx parameters that specify the max virtual memory for each task. On the other hand, if each task forks off mappers/reducers written in other languages (python/php, etc), the total virtual memory usage of the process-subtree varies greatly. In these cases, it is better to use kill-tasks-using-physical-memory-limits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1221) Kill tasks on a node if the free physical memory on that machine falls below a configured threshold

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12838039#action_12838039 ] 

Arun C Murthy commented on MAPREDUCE-1221:
------------------------------------------

Ah, fair point - I missed that detail about rlimit, my bad.

----

bq. However, I think the goal of this patch is different - it's to let the jobs use however much memory they want without declaring it in advance, but fix things when we do overcommit.

I'm trying to parse things chronologically. Please help me understand this.

The original description says: "virtual-memory is inconvenient in some cases, so we'll do for physical-memory". Fair enough.

However, the current patch seems to reserve *some* physical memory for TaskTracker... and is the plan to just whichever task is the _highest_ at a given instant?

If so, speaking from experience with HoD which had pretty much the same feature (albeit for different reasons i.e. to protect the Linux Kernel), this is a bad idea. The problem is that there is completely *no predictability*. Tasks get randomly shot down because *someone* tipped things over the edge.

I'd rather see a simpler per-task limit which the user sets per-task - much like the virtual-memory-limit.

> Kill tasks on a node if the free physical memory on that machine falls below a configured threshold
> ---------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1221
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1221
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.22.0
>            Reporter: dhruba borthakur
>            Assignee: Scott Chen
>             Fix For: 0.22.0
>
>         Attachments: MAPREDUCE-1221-v1.patch, MAPREDUCE-1221-v2.patch, MAPREDUCE-1221-v3.patch
>
>
> The TaskTracker currently supports killing tasks if the virtual memory of a task exceeds a set of configured thresholds. I would like to extend this feature to enable killing tasks if the physical memory used by that task exceeds a certain threshold.
> On a certain operating system (guess?), if user space processes start using lots of memory, the machine hangs and dies quickly. This means that we would like to prevent map-reduce jobs from triggering this condition. From my understanding, the killing-based-on-virtual-memory-limits (HADOOP-5883) were designed to address this problem. This works well when most map-reduce jobs are Java jobs and have well-defined -Xmx parameters that specify the max virtual memory for each task. On the other hand, if each task forks off mappers/reducers written in other languages (python/php, etc), the total virtual memory usage of the process-subtree varies greatly. In these cases, it is better to use kill-tasks-using-physical-memory-limits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1221) Kill tasks on a node if the free physical memory on that machine falls below a configured threshold

Posted by "Scott Chen (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Scott Chen updated MAPREDUCE-1221:
----------------------------------

    Attachment: MAPREDUCE-1221-v3.patch

Sorry about the tabs, my bad. It is removed in this file.

> Kill tasks on a node if the free physical memory on that machine falls below a configured threshold
> ---------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1221
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1221
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.22.0
>            Reporter: dhruba borthakur
>            Assignee: Scott Chen
>             Fix For: 0.22.0
>
>         Attachments: MAPREDUCE-1221-v1.patch, MAPREDUCE-1221-v2.patch, MAPREDUCE-1221-v3.patch
>
>
> The TaskTracker currently supports killing tasks if the virtual memory of a task exceeds a set of configured thresholds. I would like to extend this feature to enable killing tasks if the physical memory used by that task exceeds a certain threshold.
> On a certain operating system (guess?), if user space processes start using lots of memory, the machine hangs and dies quickly. This means that we would like to prevent map-reduce jobs from triggering this condition. From my understanding, the killing-based-on-virtual-memory-limits (HADOOP-5883) were designed to address this problem. This works well when most map-reduce jobs are Java jobs and have well-defined -Xmx parameters that specify the max virtual memory for each task. On the other hand, if each task forks off mappers/reducers written in other languages (python/php, etc), the total virtual memory usage of the process-subtree varies greatly. In these cases, it is better to use kill-tasks-using-physical-memory-limits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1221) Kill tasks on a node if the free physical memory on that machine falls below a configured threshold

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zheng Shao updated MAPREDUCE-1221:
----------------------------------

    Status: Patch Available  (was: Open)

> Kill tasks on a node if the free physical memory on that machine falls below a configured threshold
> ---------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1221
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1221
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.22.0
>            Reporter: dhruba borthakur
>            Assignee: Scott Chen
>             Fix For: 0.22.0
>
>         Attachments: MAPREDUCE-1221-v1.patch, MAPREDUCE-1221-v2.patch, MAPREDUCE-1221-v3.patch
>
>
> The TaskTracker currently supports killing tasks if the virtual memory of a task exceeds a set of configured thresholds. I would like to extend this feature to enable killing tasks if the physical memory used by that task exceeds a certain threshold.
> On a certain operating system (guess?), if user space processes start using lots of memory, the machine hangs and dies quickly. This means that we would like to prevent map-reduce jobs from triggering this condition. From my understanding, the killing-based-on-virtual-memory-limits (HADOOP-5883) were designed to address this problem. This works well when most map-reduce jobs are Java jobs and have well-defined -Xmx parameters that specify the max virtual memory for each task. On the other hand, if each task forks off mappers/reducers written in other languages (python/php, etc), the total virtual memory usage of the process-subtree varies greatly. In these cases, it is better to use kill-tasks-using-physical-memory-limits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1221) Kill tasks on a node if the free physical memory on that machine falls below a configured threshold

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy updated MAPREDUCE-1221:
-------------------------------------

    Status: Open  (was: Patch Available)

I just started looking at this, some minor comments:

We shouldn't be using the TT.fConf to read config values each time, please save it and re-use:

{noformat}
Index: src/java/org/apache/hadoop/mapred/TaskTracker.java
===================================================================
--- src/java/org/apache/hadoop/mapred/TaskTracker.java	(revision 921667)
+++ src/java/org/apache/hadoop/mapred/TaskTracker.java	(working copy)
   
+    if (fConf.get(TTConfig.TT_RESERVED_PHYSCIALMEMORY_MB) == null
+        && totalMemoryAllottedForTasks == JobConf.DISABLED_MEMORY_LIMIT) {

Index: src/java/org/apache/hadoop/mapred/TaskMemoryManagerThread.java
===================================================================
--- src/java/org/apache/hadoop/mapred/TaskMemoryManagerThread.java	(revision 921667)
+++ src/java/org/apache/hadoop/mapred/TaskMemoryManagerThread.java	(working copy)

+    long reservedRssMemory = taskTracker.getJobConf().
+            getLong(TTConfig.TT_RESERVED_PHYSCIALMEMORY_MB,
+                    JobConf.DISABLED_MEMORY_LIMIT);
{noformat}

> Kill tasks on a node if the free physical memory on that machine falls below a configured threshold
> ---------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1221
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1221
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.22.0
>            Reporter: dhruba borthakur
>            Assignee: Scott Chen
>             Fix For: 0.22.0
>
>         Attachments: MAPREDUCE-1221-v1.patch, MAPREDUCE-1221-v2.patch, MAPREDUCE-1221-v3.patch, MAPREDUCE-1221-v4.patch
>
>
> The TaskTracker currently supports killing tasks if the virtual memory of a task exceeds a set of configured thresholds. I would like to extend this feature to enable killing tasks if the physical memory used by that task exceeds a certain threshold.
> On a certain operating system (guess?), if user space processes start using lots of memory, the machine hangs and dies quickly. This means that we would like to prevent map-reduce jobs from triggering this condition. From my understanding, the killing-based-on-virtual-memory-limits (HADOOP-5883) were designed to address this problem. This works well when most map-reduce jobs are Java jobs and have well-defined -Xmx parameters that specify the max virtual memory for each task. On the other hand, if each task forks off mappers/reducers written in other languages (python/php, etc), the total virtual memory usage of the process-subtree varies greatly. In these cases, it is better to use kill-tasks-using-physical-memory-limits.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (MAPREDUCE-1221) Kill tasks on a node if the free physical memory on that machine falls below a configured threshold

Posted by "Amareshwari Sriramadasu (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12858546#action_12858546 ] 

Amareshwari Sriramadasu commented on MAPREDUCE-1221:
----------------------------------------------------

Looked at the attached patch. Some minor comments:
* In TaskTracker.addToMemoryManager() method, shall we move the calculation of physicalMemoryLimit and virtualMemoryLimit into the check isTaskMemoryManagerEnabled()?
* Shouldn't we call isKillable() in TaskMemoryManagerThread.killTasksWithLeastProgress() also ?

> Kill tasks on a node if the free physical memory on that machine falls below a configured threshold
> ---------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1221
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1221
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.22.0
>            Reporter: dhruba borthakur
>            Assignee: Scott Chen
>             Fix For: 0.21.0
>
>         Attachments: MAPREDUCE-1221-v1.patch, MAPREDUCE-1221-v2.patch, MAPREDUCE-1221-v3.patch, MAPREDUCE-1221-v4.patch, MAPREDUCE-1221-v5.txt
>
>
> The TaskTracker currently supports killing tasks if the virtual memory of a task exceeds a set of configured thresholds. I would like to extend this feature to enable killing tasks if the physical memory used by that task exceeds a certain threshold.
> On a certain operating system (guess?), if user space processes start using lots of memory, the machine hangs and dies quickly. This means that we would like to prevent map-reduce jobs from triggering this condition. From my understanding, the killing-based-on-virtual-memory-limits (HADOOP-5883) were designed to address this problem. This works well when most map-reduce jobs are Java jobs and have well-defined -Xmx parameters that specify the max virtual memory for each task. On the other hand, if each task forks off mappers/reducers written in other languages (python/php, etc), the total virtual memory usage of the process-subtree varies greatly. In these cases, it is better to use kill-tasks-using-physical-memory-limits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1221) Kill tasks on a node if the free physical memory on that machine falls below a configured threshold

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12851932#action_12851932 ] 

dhruba borthakur commented on MAPREDUCE-1221:
---------------------------------------------

Hi amareshwari, you bring up a good point that a failed/killed task might get re-executed on the same machine. But this actually depends on the scheduler that one uses. I agree that the schedulers should be intelligent enough to not schedule the same task on the same machine on which it had scheduled the same task earlier (if there are other equivalent resources available). This isue does not seem to be directly related to this JIRA, isn't it?

> Kill tasks on a node if the free physical memory on that machine falls below a configured threshold
> ---------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1221
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1221
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.22.0
>            Reporter: dhruba borthakur
>            Assignee: Scott Chen
>             Fix For: 0.22.0
>
>         Attachments: MAPREDUCE-1221-v1.patch, MAPREDUCE-1221-v2.patch, MAPREDUCE-1221-v3.patch, MAPREDUCE-1221-v4.patch
>
>
> The TaskTracker currently supports killing tasks if the virtual memory of a task exceeds a set of configured thresholds. I would like to extend this feature to enable killing tasks if the physical memory used by that task exceeds a certain threshold.
> On a certain operating system (guess?), if user space processes start using lots of memory, the machine hangs and dies quickly. This means that we would like to prevent map-reduce jobs from triggering this condition. From my understanding, the killing-based-on-virtual-memory-limits (HADOOP-5883) were designed to address this problem. This works well when most map-reduce jobs are Java jobs and have well-defined -Xmx parameters that specify the max virtual memory for each task. On the other hand, if each task forks off mappers/reducers written in other languages (python/php, etc), the total virtual memory usage of the process-subtree varies greatly. In these cases, it is better to use kill-tasks-using-physical-memory-limits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1221) Kill tasks on a node if the free physical memory on that machine falls below a configured threshold

Posted by "Scott Chen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12858989#action_12858989 ] 

Scott Chen commented on MAPREDUCE-1221:
---------------------------------------

Thanks Amareshwari,
That totally make sense! I will upload the patch soon. 

> Kill tasks on a node if the free physical memory on that machine falls below a configured threshold
> ---------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1221
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1221
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.22.0
>            Reporter: dhruba borthakur
>            Assignee: Scott Chen
>             Fix For: 0.21.0
>
>         Attachments: MAPREDUCE-1221-v1.patch, MAPREDUCE-1221-v2.patch, MAPREDUCE-1221-v3.patch, MAPREDUCE-1221-v4.patch, MAPREDUCE-1221-v5.1.txt, MAPREDUCE-1221-v5.txt
>
>
> The TaskTracker currently supports killing tasks if the virtual memory of a task exceeds a set of configured thresholds. I would like to extend this feature to enable killing tasks if the physical memory used by that task exceeds a certain threshold.
> On a certain operating system (guess?), if user space processes start using lots of memory, the machine hangs and dies quickly. This means that we would like to prevent map-reduce jobs from triggering this condition. From my understanding, the killing-based-on-virtual-memory-limits (HADOOP-5883) were designed to address this problem. This works well when most map-reduce jobs are Java jobs and have well-defined -Xmx parameters that specify the max virtual memory for each task. On the other hand, if each task forks off mappers/reducers written in other languages (python/php, etc), the total virtual memory usage of the process-subtree varies greatly. In these cases, it is better to use kill-tasks-using-physical-memory-limits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1221) Kill tasks on a node if the free physical memory on that machine falls below a configured threshold

Posted by "Scott Chen (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Scott Chen updated MAPREDUCE-1221:
----------------------------------

    Status: Patch Available  (was: Open)

> Kill tasks on a node if the free physical memory on that machine falls below a configured threshold
> ---------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1221
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1221
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.22.0
>            Reporter: dhruba borthakur
>            Assignee: Scott Chen
>             Fix For: 0.22.0
>
>         Attachments: MAPREDUCE-1221-v1.patch, MAPREDUCE-1221-v2.patch
>
>
> The TaskTracker currently supports killing tasks if the virtual memory of a task exceeds a set of configured thresholds. I would like to extend this feature to enable killing tasks if the physical memory used by that task exceeds a certain threshold.
> On a certain operating system (guess?), if user space processes start using lots of memory, the machine hangs and dies quickly. This means that we would like to prevent map-reduce jobs from triggering this condition. From my understanding, the killing-based-on-virtual-memory-limits (HADOOP-5883) were designed to address this problem. This works well when most map-reduce jobs are Java jobs and have well-defined -Xmx parameters that specify the max virtual memory for each task. On the other hand, if each task forks off mappers/reducers written in other languages (python/php, etc), the total virtual memory usage of the process-subtree varies greatly. In these cases, it is better to use kill-tasks-using-physical-memory-limits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1221) Kill tasks on a node if the free physical memory on that machine falls below a configured threshold

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12838187#action_12838187 ] 

Arun C Murthy commented on MAPREDUCE-1221:
------------------------------------------

bq. We have tried once with the per-task limit. We tried to set the threshold higher but it cannot be too high because it hurts the memory utilization.

Yes, we will need to make trade-offs. And making the right ones is very important for Hadoop Map-Reduce.

> Kill tasks on a node if the free physical memory on that machine falls below a configured threshold
> ---------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1221
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1221
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.22.0
>            Reporter: dhruba borthakur
>            Assignee: Scott Chen
>             Fix For: 0.22.0
>
>         Attachments: MAPREDUCE-1221-v1.patch, MAPREDUCE-1221-v2.patch, MAPREDUCE-1221-v3.patch
>
>
> The TaskTracker currently supports killing tasks if the virtual memory of a task exceeds a set of configured thresholds. I would like to extend this feature to enable killing tasks if the physical memory used by that task exceeds a certain threshold.
> On a certain operating system (guess?), if user space processes start using lots of memory, the machine hangs and dies quickly. This means that we would like to prevent map-reduce jobs from triggering this condition. From my understanding, the killing-based-on-virtual-memory-limits (HADOOP-5883) were designed to address this problem. This works well when most map-reduce jobs are Java jobs and have well-defined -Xmx parameters that specify the max virtual memory for each task. On the other hand, if each task forks off mappers/reducers written in other languages (python/php, etc), the total virtual memory usage of the process-subtree varies greatly. In these cases, it is better to use kill-tasks-using-physical-memory-limits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1221) Kill tasks on a node if the free physical memory on that machine falls below a configured threshold

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zheng Shao updated MAPREDUCE-1221:
----------------------------------

    Status: Open  (was: Patch Available)

> Kill tasks on a node if the free physical memory on that machine falls below a configured threshold
> ---------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1221
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1221
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.22.0
>            Reporter: dhruba borthakur
>            Assignee: Scott Chen
>             Fix For: 0.22.0
>
>         Attachments: MAPREDUCE-1221-v1.patch, MAPREDUCE-1221-v2.patch
>
>
> The TaskTracker currently supports killing tasks if the virtual memory of a task exceeds a set of configured thresholds. I would like to extend this feature to enable killing tasks if the physical memory used by that task exceeds a certain threshold.
> On a certain operating system (guess?), if user space processes start using lots of memory, the machine hangs and dies quickly. This means that we would like to prevent map-reduce jobs from triggering this condition. From my understanding, the killing-based-on-virtual-memory-limits (HADOOP-5883) were designed to address this problem. This works well when most map-reduce jobs are Java jobs and have well-defined -Xmx parameters that specify the max virtual memory for each task. On the other hand, if each task forks off mappers/reducers written in other languages (python/php, etc), the total virtual memory usage of the process-subtree varies greatly. In these cases, it is better to use kill-tasks-using-physical-memory-limits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1221) Kill tasks on a node if the free physical memory on that machine falls below a configured threshold

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12838605#action_12838605 ] 

Arun C Murthy commented on MAPREDUCE-1221:
------------------------------------------

Allen, thanks for your inputs.

To be clear, an admin can set a per-task limit, on the TaskTracker for the vmem. This works independent of the scheduler.

OTOH, since this is per-task, users who need more than the limit have to use multiple-slots for their tasks (jobs) to survive. This is where the scheduler comes in, afaik only the CapacityScheduler supports this.

Also, the 'per-task' is a misnomer, this limit is actually on the task and all its children - we do this for vmem by walking /proc.

> Kill tasks on a node if the free physical memory on that machine falls below a configured threshold
> ---------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1221
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1221
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.22.0
>            Reporter: dhruba borthakur
>            Assignee: Scott Chen
>             Fix For: 0.22.0
>
>         Attachments: MAPREDUCE-1221-v1.patch, MAPREDUCE-1221-v2.patch, MAPREDUCE-1221-v3.patch
>
>
> The TaskTracker currently supports killing tasks if the virtual memory of a task exceeds a set of configured thresholds. I would like to extend this feature to enable killing tasks if the physical memory used by that task exceeds a certain threshold.
> On a certain operating system (guess?), if user space processes start using lots of memory, the machine hangs and dies quickly. This means that we would like to prevent map-reduce jobs from triggering this condition. From my understanding, the killing-based-on-virtual-memory-limits (HADOOP-5883) were designed to address this problem. This works well when most map-reduce jobs are Java jobs and have well-defined -Xmx parameters that specify the max virtual memory for each task. On the other hand, if each task forks off mappers/reducers written in other languages (python/php, etc), the total virtual memory usage of the process-subtree varies greatly. In these cases, it is better to use kill-tasks-using-physical-memory-limits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.