You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Krishna Kishore Bonagiri (JIRA)" <ji...@apache.org> on 2013/04/15 17:26:17 UTC

[jira] [Commented] (YARN-501) Application Master getting killed randomly reporting excess usage of memory

    [ https://issues.apache.org/jira/browse/YARN-501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13631798#comment-13631798 ] 

Krishna Kishore Bonagiri commented on YARN-501:
-----------------------------------------------

What I have observed today is that this error is coming at some regular
intervals of 50 minutes. And at that particular interval of time, I am
seeing the following kind of messages in the node manager's log:  So, I
think being the node manager busy with some other task like this monitoring
is causing the error of virtual memory for AM's container.

2013-04-12 15:51:02,048 INFO  [Container Monitor]
monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(346)) -
Starting resource-monitoring for

container_1365688251527_6643_01_000003
2013-04-12 15:51:02,048 INFO  [Container Monitor]
monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(346)) -
Starting resource-monitoring for

container_1365688251527_6642_01_000004
2013-04-12 15:51:02,049 INFO  [Container Monitor]
monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(346)) -
Starting resource-monitoring for

container_1365688251527_6641_01_000005
2013-04-12 15:51:02,049 INFO  [Container Monitor]
monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(346)) -
Starting resource-monitoring for

container_1365688251527_6640_01_000006
2013-04-12 15:51:02,049 INFO  [Container Monitor]
monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(356)) -
Stopping resource-monitoring for

container_1365688251527_6524_01_000001
2013-04-12 15:51:02,049 INFO  [Container Monitor]
monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(356)) -
Stopping resource-monitoring for

container_1365688251527_6525_01_000002
2013-04-12 15:51:02,049 INFO  [Container Monitor]
monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(356)) -
Stopping resource-monitoring for

container_1365688251527_6525_01_000003
2013-04-12 15:51:02,049 INFO  [Container Monitor]
monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(356)) -
Stopping resource-monitoring for

container_1365688251527_6525_01_000004



On Sun, Mar 24, 2013 at 3:54 PM, Krishna Kishore Bonagiri <


                
> Application Master getting killed randomly reporting excess usage of memory
> ---------------------------------------------------------------------------
>
>                 Key: YARN-501
>                 URL: https://issues.apache.org/jira/browse/YARN-501
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: applications/distributed-shell, nodemanager
>    Affects Versions: 2.0.3-alpha
>            Reporter: Krishna Kishore Bonagiri
>
> I am running a date command using the Distributed Shell example in a loop of 500 times. It ran successfully all the times except one time where it gave the following error.
> 2013-03-22 04:33:25,280 INFO  [main] distributedshell.Client (Client.java:monitorApplication(605)) - Got application report from ASM for, appId=222, clientToken=null, appDiagnostics=Application application_1363938200742_0222 failed 1 times due to AM Container for appattempt_1363938200742_0222_000001 exited with  exitCode: 143 due to: Container [pid=21141,containerID=container_1363938200742_0222_01_000001] is running beyond virtual memory limits. Current usage: 47.3 Mb of 128 Mb physical memory used; 611.6 Mb of 268.8 Mb virtual memory used. Killing container.
> Dump of the process-tree for container_1363938200742_0222_01_000001 :
>         |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
>         |- 21147 21141 21141 21141 (java) 244 12 532643840 11802 /home_/dsadm/yarn/jdk//bin/java -Xmx128m org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster --container_memory 10 --num_containers 2 --priority 0 --shell_command date
>         |- 21141 8433 21141 21141 (bash) 0 0 108642304 298 /bin/bash -c /home_/dsadm/yarn/jdk//bin/java -Xmx128m org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster --container_memory 10 --num_containers 2 --priority 0 --shell_command date 1>/tmp/logs/application_1363938200742_0222/container_1363938200742_0222_01_000001/AppMaster.stdout 2>/tmp/logs/application_1363938200742_0222/container_1363938200742_0222_01_000001/AppMaster.stderr

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira