You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Krishna Kishore Bonagiri <wr...@gmail.com> on 2013/03/08 08:42:00 UTC

Application Master getting killed randomly reporting excess usage of memory

Hi,
  I am running an application on YARN in a loop for 500 times. It ran 321
times correctly but the 322nd time it is saying that the AM container
exceeded it's memory limit. I am sure it wouldn't really have exceeded the
limit because it ran fine for 321 times. Also, it never reported this kind
of error in my previous runs in this kind of loops. Is this kind of problem
seen for some other reasons? I am using hadoop-2.0.0-alpha version. Please
help.

2013-03-07 10:55:35,853 INFO  Client (Client.java:main(143)) - Initializing
Client
2013-03-07 10:55:35,867 INFO  Client (Client.java:launchAndMonitorAM(463))
- Starting Client
2013-03-07 10:55:35,957 INFO  Client (Client.java:connectToASM(564)) -
Connecting to ResourceManager at isredeng/127.0.1.1:8032
2013-03-07 10:55:36,540 INFO  Client (Client.java:dumpClusterInfo(246)) -
Got Cluster metric info from ASM, numNodeManagers=1
2013-03-07 10:55:36,561 INFO  Client (Client.java:dumpClusterInfo(251)) -
Got Cluster node info from ASM
2013-03-07 10:55:36,738 INFO  Client (Client.java:dumpClusterInfo(253)) -
Got node report from ASM for, nodeId=isredeng:33967,
nodeAddress=isredeng:8042, nodeRackName=/default-rack,
nodeNumContainers=14, nodeHealthStatus=is_node_healthy: true,
health_report: "", last_health_report_time: 1362671618339,
2013-03-07 10:55:36,746 INFO  Client (Client.java:dumpClusterInfo(263)) -
Queue info, queueName=default, queueCurrentCapacity=0.21875,
queueMaxCapacity=1.0, queueApplicationCount=0, queueChildQueueCount=0
2013-03-07 10:55:36,755 INFO  Client (Client.java:dumpClusterInfo(275)) -
User ACL Info for Queue, queueName=default, userAcl=SUBMIT_APPLICATIONS
2013-03-07 10:55:36,755 INFO  Client (Client.java:dumpClusterInfo(275)) -
User ACL Info for Queue, queueName=default, userAcl=ADMINISTER_QUEUE
2013-03-07 10:55:36,763 INFO  Client (Client.java:getApplication(577)) -
Got new application id=application_1362668734615_0322
2013-03-07 10:55:36,763 INFO  Client (Client.java:launchAndMonitorAM(476))
- Min mem capabililty of resources in this cluster 128
2013-03-07 10:55:36,764 INFO  Client (Client.java:launchAndMonitorAM(477))
- Max mem capabililty of resources in this cluster 10240
2013-03-07 10:55:36,764 INFO  Client (Client.java:launchAndMonitorAM(484))
- Setting up application submission context for ASM
2013-03-07 10:55:37,117 INFO  Client (Client.java:prepareJarResource(288))
- Copy App Master jar from local filesystem and add to local environment
2013-03-07 10:55:37,390 INFO  Client (Client.java:launchAndMonitorAM(519))
- Set the environment for the application master
2013-03-07 10:55:37,391 INFO  Client
(Client.java:getTestRuntimeClasspath(592)) - Trying to generate classpath
for app master from current thread's classpath
2013-03-07 10:55:37,392 INFO  Client
(Client.java:getTestRuntimeClasspath(604)) - Readable bytes from stream :
8559
2013-03-07 10:55:37,394 INFO  Client (Client.java:prepareCommand(346)) -
Setting up app master command
2013-03-07 10:55:37,395 INFO  Client (Client.java:prepareCommand(364)) -
Completed setting up app master command ${JAVA_HOME}/bin/java
ApplicationMaster --osh_am_port 10011 --osh_env
LD_LIBRARY_PATH=/home_/dsadm/kishore/yarn_feb14/orch_master/apt/lib::/home_/dsadm/kishore/yarn_feb14/orch_master/apt/lib:
--osh_env APT_ORCHHOME=/home_/dsadm/kishore/yarn_feb14/orch_master/apt
1><LOG_DIR>/AppMaster.stdout 2><LOG_DIR>/AppMaster.stderr
2013-03-07 10:55:37,397 INFO  Client
(Client.java:submitAndMonitorApplication(385)) - Submitting application to
ASM
2013-03-07 10:55:38,458 INFO  Client (Client.java:monitorApplication(413))
- Got application report from ASM for, appId=322, appDiagnostics=,
appMasterHost=N/A, clientToken=null, appQueue=default, appMasterRpcPort=0,
appStartTime=1362671737443, yarnAppState=SUBMITTED,
distributedFinalState=UNDEFINED, appTrackingUrl=
isredeng.swg.usma.ibm.com:8088/proxy/application_1362668734615_0322/,
appUser=dsadm
2013-03-07 10:55:39,460 INFO  Client (Client.java:monitorApplication(413))
- Got application report from ASM for, appId=322, appDiagnostics=,
appMasterHost=N/A, clientToken=null, appQueue=default, appMasterRpcPort=0,
appStartTime=1362671737443, yarnAppState=SUBMITTED,
distributedFinalState=UNDEFINED, appTrackingUrl=
isredeng.swg.usma.ibm.com:8088/proxy/application_1362668734615_0322/,
appUser=dsadm
2013-03-07 10:55:40,463 INFO  Client (Client.java:monitorApplication(413))
- Got application report from ASM for, appId=322, appDiagnostics=,
appMasterHost=N/A, clientToken=null, appQueue=default, appMasterRpcPort=0,
appStartTime=1362671737443, yarnAppState=SUBMITTED,
distributedFinalState=UNDEFINED, appTrackingUrl=
isredeng.swg.usma.ibm.com:8088/proxy/application_1362668734615_0322/,
appUser=dsadm
2013-03-07 10:55:41,467 INFO  Client (Client.java:monitorApplication(413))
- Got application report from ASM for, appId=322,
appDiagnostics=Application application_1362668734615_0322 failed 1 times
due to AM Container for appattempt_1362668734615_0322_000001 exited with
 exitCode: 143 due to: Container
[pid=3606,containerID=container_1362668734615_0322_01_000001] is running
beyond virtual memory limits. Current usage: 37.0mb of 128.0mb physical
memory used; 998.4mb of 268.8mb virtual memory used. Killing container.
Dump of the process-tree for container_1362668734615_0322_01_000001 :
        |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS)
SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
        |- 3612 3606 3606 3606 (java) 150 13 938192896 9164
/home/kbonagir/yarn/jdk//bin/java ApplicationMaster --osh_am_port 10011
--osh_env
LD_LIBRARY_PATH=/home_/dsadm/kishore/yarn_feb14/orch_master/apt/lib::/home_/dsadm/kishore/yarn_feb14/orch_master/apt/lib:
--osh_env APT_ORCHHOME=/home_/dsadm/kishore/yarn_feb14/orch_master/apt


Thanks,
Kishore