You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by Eric Jacobson <ej...@gmail.com> on 2014/11/06 19:49:08 UTC

Why won't Hadoop 2.4.0 kill my containers using more memory than allocated?

I'm running a single node Apache Hadoop 2.4.0 "cluster" and trying to test
my application's behavior when it exceeds the memory allocated for the
containers.  No matter what I do I can't seem to get the containers to be
killed when they start exceeding physical memory allocated.

Any suggestions on what I'm doing wrong or how to debug further appreciated.

Some additional information...
Memory requested for the container as shown in node manager console:
*TotalMemoryNeeded 32*

Memory settings for the node as shown in node manager console:
Total Vmem allocated for Containers  16.80 GB
Vmem enforcement enabled  true
Total Pmem allocated for Container  8 GB
*Pmem enforcement enabled  true *

Top of the container child process using ~111mb of memory:
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
*12013* xxxxxxxx  20   0  243m *111m*  10m R 100.0  1.4  19:52.55 osh

Pids of the NodeManager(*4229*), Container launch process(*12008*), and
child(*12013*):
xxxxxxxx  *4229*     1  2 12:53 ?        00:01:04
/apt/isinstall/yli/soft/jdk/linux64/jdk/bin/java -Dproc_nodemanager
-Xmx2000m <classpath removed for brevity>
org.apache.hadoop.yarn.server.nodemanager.NodeManager
xxxxxxxx *12008  4229*  0 13:00 ?        00:00:00 /bin/bash -c
/sandbox/xxxxxxxx/is_11_3_2_hdfs/ORCHESTRATE/orch_master/apt/bin/osh
-APT_PMplayerFlag isblade1.swg.usma.ibm.com 1  1 30
isblade1.swg.usma.ibm.com isblade1.swg.usma.ibm.com 1415296853.252169.2c47
12000 12001 12002 /tmp/APTps120026233e6d8_20141106130053.532726 -os_charset
ISO-8859-1 --------------------------------------------------?
1>/tmp/logs/application_1415296425234_0003/container_1415296425234_0003_01_000003/stdout
2>/tmp/logs/application_1415296425234_0003/container_1415296425234_0003_01_000003/stderr
xxxxxxxx *12013 12008* 99 13:00 ?        00:35:23
/sandbox/xxxxxxxx/is_11_3_2_hdfs/ORCHESTRATE/orch_master/apt/bin/osh
-APT_PMplayerFlag isblade1.swg.usma.ibm.com 1 1 30 isblade1.swg.usma.ibm.com
isblade1.swg.usma.ibm.com 1415296853.252169.2c47 12000 12001 12002
/tmp/APTps120026233e6d8_20141106130053.532726 -os_charset ISO-8859-1
parallel APT_JoinSubOperatorNC in fullouterjoin

What am I missing?

Thanks,
Eric