You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Krishna Kishore Bonagiri (JIRA)" <ji...@apache.org> on 2013/04/15 17:26:17 UTC
[jira] [Commented] (YARN-501) Application Master getting killed
randomly reporting excess usage of memory
[ https://issues.apache.org/jira/browse/YARN-501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13631798#comment-13631798 ]
Krishna Kishore Bonagiri commented on YARN-501:
-----------------------------------------------
What I have observed today is that this error is coming at some regular
intervals of 50 minutes. And at that particular interval of time, I am
seeing the following kind of messages in the node manager's log: So, I
think being the node manager busy with some other task like this monitoring
is causing the error of virtual memory for AM's container.
2013-04-12 15:51:02,048 INFO [Container Monitor]
monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(346)) -
Starting resource-monitoring for
container_1365688251527_6643_01_000003
2013-04-12 15:51:02,048 INFO [Container Monitor]
monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(346)) -
Starting resource-monitoring for
container_1365688251527_6642_01_000004
2013-04-12 15:51:02,049 INFO [Container Monitor]
monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(346)) -
Starting resource-monitoring for
container_1365688251527_6641_01_000005
2013-04-12 15:51:02,049 INFO [Container Monitor]
monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(346)) -
Starting resource-monitoring for
container_1365688251527_6640_01_000006
2013-04-12 15:51:02,049 INFO [Container Monitor]
monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(356)) -
Stopping resource-monitoring for
container_1365688251527_6524_01_000001
2013-04-12 15:51:02,049 INFO [Container Monitor]
monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(356)) -
Stopping resource-monitoring for
container_1365688251527_6525_01_000002
2013-04-12 15:51:02,049 INFO [Container Monitor]
monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(356)) -
Stopping resource-monitoring for
container_1365688251527_6525_01_000003
2013-04-12 15:51:02,049 INFO [Container Monitor]
monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(356)) -
Stopping resource-monitoring for
container_1365688251527_6525_01_000004
On Sun, Mar 24, 2013 at 3:54 PM, Krishna Kishore Bonagiri <
> Application Master getting killed randomly reporting excess usage of memory
> ---------------------------------------------------------------------------
>
> Key: YARN-501
> URL: https://issues.apache.org/jira/browse/YARN-501
> Project: Hadoop YARN
> Issue Type: Bug
> Components: applications/distributed-shell, nodemanager
> Affects Versions: 2.0.3-alpha
> Reporter: Krishna Kishore Bonagiri
>
> I am running a date command using the Distributed Shell example in a loop of 500 times. It ran successfully all the times except one time where it gave the following error.
> 2013-03-22 04:33:25,280 INFO [main] distributedshell.Client (Client.java:monitorApplication(605)) - Got application report from ASM for, appId=222, clientToken=null, appDiagnostics=Application application_1363938200742_0222 failed 1 times due to AM Container for appattempt_1363938200742_0222_000001 exited with exitCode: 143 due to: Container [pid=21141,containerID=container_1363938200742_0222_01_000001] is running beyond virtual memory limits. Current usage: 47.3 Mb of 128 Mb physical memory used; 611.6 Mb of 268.8 Mb virtual memory used. Killing container.
> Dump of the process-tree for container_1363938200742_0222_01_000001 :
> |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
> |- 21147 21141 21141 21141 (java) 244 12 532643840 11802 /home_/dsadm/yarn/jdk//bin/java -Xmx128m org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster --container_memory 10 --num_containers 2 --priority 0 --shell_command date
> |- 21141 8433 21141 21141 (bash) 0 0 108642304 298 /bin/bash -c /home_/dsadm/yarn/jdk//bin/java -Xmx128m org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster --container_memory 10 --num_containers 2 --priority 0 --shell_command date 1>/tmp/logs/application_1363938200742_0222/container_1363938200742_0222_01_000001/AppMaster.stdout 2>/tmp/logs/application_1363938200742_0222/container_1363938200742_0222_01_000001/AppMaster.stderr
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira