You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@myriad.apache.org by "Sarjeet Singh (JIRA)" <ji...@apache.org> on 2015/10/17 02:06:05 UTC
[jira] [Commented] (MYRIAD-153) Placeholder tasks yarn_container_*
is not cleaned after yarn job is complete.
[ https://issues.apache.org/jira/browse/MYRIAD-153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14961560#comment-14961560 ]
Sarjeet Singh commented on MYRIAD-153:
--------------------------------------
More details...Here is the NM, RM & Mesos-master's log for container "container_1442507909665_0002_01_000012":
[node-1]# grep container_1442507909665_0002_01_000012 task-nm.zero.28b0d3b6-79eb-44c7-be99-aa7157568d8e.stderr
15/09/17 10:04:40 INFO containermanager.ContainerManagerImpl: Start request for container_1442507909665_0002_01_000012 by user mapr
15/09/17 10:04:40 INFO application.ApplicationImpl: Adding container_1442507909665_0002_01_000012 to application application_1442507909665_0002
15/09/17 10:04:40 INFO nodemanager.NMAuditLogger: USER=mapr IP=10.10.101.116 OPERATION=Start Container Request TARGET=ContainerManageImpl RESULT=SUCCESS APPID=application_1442507909665_0002 CONTAINERID=container_1442507909665_0002_01_000012
15/09/17 10:04:40 INFO container.ContainerImpl: Container container_1442507909665_0002_01_000012 transitioned from NEW to LOCALIZING
15/09/17 10:04:40 INFO container.ContainerImpl: Container container_1442507909665_0002_01_000012 transitioned from LOCALIZING to LOCALIZED
15/09/17 10:04:40 INFO container.ContainerImpl: Container container_1442507909665_0002_01_000012 transitioned from LOCALIZED to RUNNING
15/09/17 10:04:40 INFO monitor.ContainersMonitorImpl: Starting resource-monitoring for container_1442507909665_0002_01_000012
15/09/17 10:04:40 INFO monitor.ContainersMonitorImpl: Memory usage of ProcessTree 26207 for container-id container_1442507909665_0002_01_000012: 62.0 MB of 1 GB physical memory used; 1.7 GB of 2.1 GB virtual memory used
15/09/17 10:04:43 INFO monitor.ContainersMonitorImpl: Memory usage of ProcessTree 26207 for container-id container_1442507909665_0002_01_000012: 271.5 MB of 1 GB physical memory used; 1.8 GB of 2.1 GB virtual memory used
15/09/17 10:04:46 INFO monitor.ContainersMonitorImpl: Memory usage of ProcessTree 26207 for container-id container_1442507909665_0002_01_000012: 343.3 MB of 1 GB physical memory used; 1.8 GB of 2.1 GB virtual memory used
15/09/17 10:04:48 INFO containermanager.ContainerManagerImpl: Stopping container with container Id: container_1442507909665_0002_01_000012
15/09/17 10:04:48 INFO nodemanager.NMAuditLogger: USER=mapr IP=10.10.101.116 OPERATION=Stop Container Request TARGET=ContainerManageImpl RESULT=SUCCESS APPID=application_1442507909665_0002 CONTAINERID=container_1442507909665_0002_01_000012
15/09/17 10:04:48 INFO container.ContainerImpl: Container container_1442507909665_0002_01_000012 transitioned from RUNNING to KILLING
15/09/17 10:04:48 INFO launcher.ContainerLaunch: Cleaning up container container_1442507909665_0002_01_000012
15/09/17 10:04:48 WARN nodemanager.LinuxContainerExecutor: Exit code from container container_1442507909665_0002_01_000012 is : 143
15/09/17 10:04:48 INFO container.ContainerImpl: Container container_1442507909665_0002_01_000012 transitioned from KILLING to CONTAINER_CLEANEDUP_AFTER_KILL
15/09/17 10:04:48 INFO nodemanager.LinuxContainerExecutor: Deleting absolute path : /tmp/hadoop-mapr/nm-local-dir/usercache/mapr/appcache/application_1442507909665_0002/container_1442507909665_0002_01_000012
15/09/17 10:04:48 INFO nodemanager.NMAuditLogger: USER=mapr OPERATION=Container Finished - Killed TARGET=ContainerImpl RESULT=SUCCESS APPID=application_1442507909665_0002 CONTAINERID=container_1442507909665_0002_01_000012
15/09/17 10:04:48 INFO container.ContainerImpl: Container container_1442507909665_0002_01_000012 transitioned from CONTAINER_CLEANEDUP_AFTER_KILL to DONE
15/09/17 10:04:48 INFO application.ApplicationImpl: Removing container_1442507909665_0002_01_000012 from application application_1442507909665_0002
15/09/17 10:04:49 INFO monitor.ContainersMonitorImpl: Stopping resource-monitoring for container_1442507909665_0002_01_000012
15/09/17 10:05:31 INFO nodemanager.NodeStatusUpdaterImpl: Removed completed containers from NM context: [container_1442507909665_0002_01_000009, container_1442507909665_0002_01_000012]
[node-1]# grep container_1442507909665_0002_01_000012 testrm.646ddf2c-5d5a-11e5-9651-0cc47a587d16.stderr
15/09/17 10:04:11 INFO rmcontainer.RMContainerImpl: container_1442507909665_0002_01_000012 Container Transitioned from NEW to RESERVED
15/09/17 10:04:11 INFO fair.FSSchedulerNode: Reserved container container_1442507909665_0002_01_000012 on node host: qa101-117.qa.lab:31004 #containers=3 available=<memory:-2048, vCores:-2, disks:2.5> used=<memory:3072, vCores:3, disks:1.5> for application org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt@18a02c01
15/09/17 10:04:17 INFO rmcontainer.RMContainerImpl: container_1442507909665_0002_01_000012 Container Transitioned from NEW to ALLOCATED
15/09/17 10:04:17 INFO resourcemanager.RMAuditLogger: USER=mapr OPERATION=AM Allocated Container TARGET=SchedulerApp RESULT=SUCCESS APPID=application_1442507909665_0002 CONTAINERID=container_1442507909665_0002_01_000012
15/09/17 10:04:17 INFO scheduler.SchedulerNode: Assigned container container_1442507909665_0002_01_000012 of capacity <memory:1024, vCores:1, disks:0.5> on host qa101-117.qa.lab:31004, which has 4 containers, <memory:4096, vCores:4, disks:2.0> used and <memory:86765, vCores:21, disks:2.0> available after allocation
15/09/17 10:04:17 WARN handlers.StatusUpdateEventHandler: Task: value: "yarn_container_1442507909665_0002_01_000012"
15/09/17 10:04:30 INFO rmcontainer.RMContainerImpl: container_1442507909665_0002_01_000012 Container Transitioned from ALLOCATED to ACQUIRED
15/09/17 10:04:40 WARN handlers.StatusUpdateEventHandler: Task: value: "yarn_container_1442507909665_0002_01_000012"
15/09/17 10:04:48 WARN handlers.StatusUpdateEventHandler: Task: value: "yarn_container_1442507909665_0002_01_000012"
15/09/17 10:04:50 INFO rmcontainer.RMContainerImpl: container_1442507909665_0002_01_000012 Container Transitioned from ACQUIRED to RUNNING
15/09/17 10:04:50 WARN handlers.StatusUpdateEventHandler: Task: value: "yarn_container_1442507909665_0002_01_000012"
15/09/17 10:04:51 INFO rmcontainer.RMContainerImpl: container_1442507909665_0002_01_000012 Container Transitioned from RUNNING to COMPLETED
15/09/17 10:04:51 INFO fair.FSAppAttempt: Completed container: container_1442507909665_0002_01_000012 in state: COMPLETED event:FINISHED
15/09/17 10:04:51 INFO resourcemanager.RMAuditLogger: USER=mapr OPERATION=AM Released Container TARGET=SchedulerApp RESULT=SUCCESS APPID=application_1442507909665_0002 CONTAINERID=container_1442507909665_0002_01_000012
15/09/17 10:04:51 INFO scheduler.SchedulerNode: Released container container_1442507909665_0002_01_000012 of capacity <memory:1024, vCores:1, disks:0.5> on host qa101-117.qa.lab:31004, which currently has 2 containers, <memory:2048, vCores:2, disks:1.0> used and <memory:-2048, vCores:-2, disks:3.0> available, release resources=true
15/09/17 10:04:51 INFO fair.FairScheduler: Application attempt appattempt_1442507909665_0002_000001 released container container_1442507909665_0002_01_000012 on node: host: qa101-117.qa.lab:31004 #containers=2 available=<memory:-2048, vCores:-2, disks:3.0> used=<memory:2048, vCores:2, disks:1.0> with event: FINISHED
[node-1]# [root@qa101-116 bug20530]# grep container_1442507909665_0002_01_000012 mesos-master.INFO
I0917 10:04:17.006140 5563 master.hpp:159] Adding task yarn_container_1442507909665_0002_01_000012 with resources cpus(*):1; mem(*):1024 on slave 20150916-104543-1969555978-5050-5493-S0 (qa101-117.qa.lab)
I0917 10:04:17.006294 5563 master.cpp:2835] Launching task yarn_container_1442507909665_0002_01_000012 of framework 20150916-104543-1969555978-5050-5493-0001 (MyriadAlpha) at scheduler-8622685d-0ed6-4f01-906d-5a847a787888@10.10.101.116:38037 with resources cpus(*):1; mem(*):1024 on slave 20150916-104543-1969555978-5050-5493-S0 at slave(1)@10.10.101.117:5051 (qa101-117.qa.lab)
I0917 10:04:17.013324 5563 master.cpp:3758] Status update TASK_RUNNING (UUID: 11034a25-de90-4950-9d39-0a775280dd01) for task yarn_container_1442507909665_0002_01_000012 of framework 20150916-104543-1969555978-5050-5493-0001 from slave 20150916-104543-1969555978-5050-5493-S0 at slave(1)@10.10.101.117:5051 (qa101-117.qa.lab)
I0917 10:04:17.013465 5563 master.cpp:3797] Forwarding status update TASK_RUNNING (UUID: 11034a25-de90-4950-9d39-0a775280dd01) for task yarn_container_1442507909665_0002_01_000012 of framework 20150916-104543-1969555978-5050-5493-0001
I0917 10:04:17.013741 5563 master.cpp:5178] Updating the latest state of task yarn_container_1442507909665_0002_01_000012 of framework 20150916-104543-1969555978-5050-5493-0001 to TASK_RUNNING
I0917 10:04:17.014492 5562 master.cpp:3158] Processing ACKNOWLEDGE call 11034a25-de90-4950-9d39-0a775280dd01 for task yarn_container_1442507909665_0002_01_000012 of framework 20150916-104543-1969555978-5050-5493-0001 (MyriadAlpha) at scheduler-8622685d-0ed6-4f01-906d-5a847a787888@10.10.101.116:38037 on slave 20150916-104543-1969555978-5050-5493-S0
I0917 10:04:40.035140 5538 master.cpp:3758] Status update TASK_RUNNING (UUID: 77348e87-d3cd-4416-b15a-72fe79a8a3f3) for task yarn_container_1442507909665_0002_01_000012 of framework 20150916-104543-1969555978-5050-5493-0001 from slave 20150916-104543-1969555978-5050-5493-S0 at slave(1)@10.10.101.117:5051 (qa101-117.qa.lab)
I0917 10:04:40.035399 5538 master.cpp:3797] Forwarding status update TASK_RUNNING (UUID: 77348e87-d3cd-4416-b15a-72fe79a8a3f3) for task yarn_container_1442507909665_0002_01_000012 of framework 20150916-104543-1969555978-5050-5493-0001
I0917 10:04:40.035606 5538 master.cpp:5178] Updating the latest state of task yarn_container_1442507909665_0002_01_000012 of framework 20150916-104543-1969555978-5050-5493-0001 to TASK_RUNNING
I0917 10:04:40.036656 5538 master.cpp:3158] Processing ACKNOWLEDGE call 77348e87-d3cd-4416-b15a-72fe79a8a3f3 for task yarn_container_1442507909665_0002_01_000012 of framework 20150916-104543-1969555978-5050-5493-0001 (MyriadAlpha) at scheduler-8622685d-0ed6-4f01-906d-5a847a787888@10.10.101.116:38037 on slave 20150916-104543-1969555978-5050-5493-S0
I0917 10:04:48.146806 5537 master.cpp:3758] Status update TASK_FINISHED (UUID: f2e39e4b-6678-479b-861c-fd58b88e8e30) for task yarn_container_1442507909665_0002_01_000012 of framework 20150916-104543-1969555978-5050-5493-0001 from slave 20150916-104543-1969555978-5050-5493-S0 at slave(1)@10.10.101.117:5051 (qa101-117.qa.lab)
I0917 10:04:48.147330 5537 master.cpp:3797] Forwarding status update TASK_FINISHED (UUID: f2e39e4b-6678-479b-861c-fd58b88e8e30) for task yarn_container_1442507909665_0002_01_000012 of framework 20150916-104543-1969555978-5050-5493-0001
I0917 10:04:48.147701 5537 master.cpp:5178] Updating the latest state of task yarn_container_1442507909665_0002_01_000012 of framework 20150916-104543-1969555978-5050-5493-0001 to TASK_FINISHED
I0917 10:04:48.149219 5537 master.cpp:5246] Removing task yarn_container_1442507909665_0002_01_000012 with resources cpus(*):1; mem(*):1024 of framework 20150916-104543-1969555978-5050-5493-0001 on slave 20150916-104543-1969555978-5050-5493-S0 at slave(1)@10.10.101.117:5051 (qa101-117.qa.lab)
I0917 10:04:48.149515 5537 master.cpp:3158] Processing ACKNOWLEDGE call f2e39e4b-6678-479b-861c-fd58b88e8e30 for task yarn_container_1442507909665_0002_01_000012 of framework 20150916-104543-1969555978-5050-5493-0001 (MyriadAlpha) at scheduler-8622685d-0ed6-4f01-906d-5a847a787888@10.10.101.116:38037 on slave 20150916-104543-1969555978-5050-5493-S0
I0917 10:04:50.011129 5559 master.hpp:159] Adding task yarn_container_1442507909665_0002_01_000012 with resources cpus(*):1; mem(*):1024 on slave 20150916-104543-1969555978-5050-5493-S0 (qa101-117.qa.lab)
I0917 10:04:50.011324 5559 master.cpp:2835] Launching task yarn_container_1442507909665_0002_01_000012 of framework 20150916-104543-1969555978-5050-5493-0001 (MyriadAlpha) at scheduler-8622685d-0ed6-4f01-906d-5a847a787888@10.10.101.116:38037 with resources cpus(*):1; mem(*):1024 on slave 20150916-104543-1969555978-5050-5493-S0 at slave(1)@10.10.101.117:5051 (qa101-117.qa.lab)
I0917 10:04:50.019701 5554 master.cpp:3758] Status update TASK_RUNNING (UUID: cd834102-6d12-46e9-be42-b76b3341ad36) for task yarn_container_1442507909665_0002_01_000012 of framework 20150916-104543-1969555978-5050-5493-0001 from slave 20150916-104543-1969555978-5050-5493-S0 at slave(1)@10.10.101.117:5051 (qa101-117.qa.lab)
I0917 10:04:50.019907 5554 master.cpp:3797] Forwarding status update TASK_RUNNING (UUID: cd834102-6d12-46e9-be42-b76b3341ad36) for task yarn_container_1442507909665_0002_01_000012 of framework 20150916-104543-1969555978-5050-5493-0001
I0917 10:04:50.020102 5554 master.cpp:5178] Updating the latest state of task yarn_container_1442507909665_0002_01_000012 of framework 20150916-104543-1969555978-5050-5493-0001 to TASK_RUNNING
I0917 10:04:50.020922 5554 master.cpp:3158] Processing ACKNOWLEDGE call cd834102-6d12-46e9-be42-b76b3341ad36 for task yarn_container_1442507909665_0002_01_000012 of framework 20150916-104543-1969555978-5050-5493-0001 (MyriadAlpha) at scheduler-8622685d-0ed6-4f01-906d-5a847a787888@10.10.101.116:38037 on slave 20150916-104543-1969555978-5050-5493-S0
> Placeholder tasks yarn_container_* is not cleaned after yarn job is complete.
> -----------------------------------------------------------------------------
>
> Key: MYRIAD-153
> URL: https://issues.apache.org/jira/browse/MYRIAD-153
> Project: Myriad
> Issue Type: Bug
> Reporter: Sarjeet Singh
> Attachments: Mesos_UI_screeshot_placeholder_tasks_running.png
>
>
> Observed the placeholder tasks for containers launched on FGS are still in RUNNING state on mesos. These container tasks are not cleaned up properly after job is finished completely.
> see screenshot attached for mesos UI with placeholder tasks still running.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)