You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2016/07/13 01:10:21 UTC
[jira] [Commented] (MESOS-3808) slave/containerizer/docker leaves
orphan containers on restart of mesos-slave
[ https://issues.apache.org/jira/browse/MESOS-3808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15374119#comment-15374119 ]
ASF GitHub Bot commented on MESOS-3808:
---------------------------------------
Github user jfarrell closed the pull request at:
https://github.com/apache/mesos/pull/79
> slave/containerizer/docker leaves orphan containers on restart of mesos-slave
> -----------------------------------------------------------------------------
>
> Key: MESOS-3808
> URL: https://issues.apache.org/jira/browse/MESOS-3808
> Project: Mesos
> Issue Type: Bug
> Components: containerization, docker, slave
> Affects Versions: 0.25.0
> Environment: CoreOS. Running mesos-slave in a container.
> Reporter: Chris Fortier
> Assignee: Gilbert Song
> Original Estimate: 4h
> Remaining Estimate: 4h
>
> We attempted to upgrade from Mesos 0.23 to 0.25 but noticed that Docker containers launched by Mesos were being orphaned and not destroyed when the Mesos agent was restarted.
> Relavent log output:
> {noformat}
> I1027 20:36:22.343880 23004 docker.cpp:535] Recovering Docker containers
> I1027 20:36:22.517032 23008 docker.cpp:639] Recovering container 'a2308dfc-ec2f-4687-ae92-f045dd2d3614' for executor 'ubuntu.059ced51-7cea-11e5-a442-1ac2f22f38db' of framework 20151016-161150-1902412554-5050-1-0000
> I1027 20:36:22.517467 23008 docker.cpp:639] Recovering container '77b1748e-f295-4eb5-9966-d7a3bba2fc31' for executor 'ubuntu.059d1462-7cea-11e5-a442-1ac2f22f38db' of framework 20151016-161150-1902412554-5050-1-0000
> I1027 20:36:22.517817 23007 slave.cpp:4051] Sending reconnect request to executor ubuntu.059d1462-7cea-11e5-a442-1ac2f22f38db of framework 20151016-161150-1902412554-5050-1-0000 at executor(1)@10.131.100.57:40596
> I1027 20:36:22.518033 23007 slave.cpp:4051] Sending reconnect request to executor ubuntu.059ced51-7cea-11e5-a442-1ac2f22f38db of framework 20151016-161150-1902412554-5050-1-0000 at executor(1)@10.131.100.57:57469
> I1027 20:36:22.518038 23008 docker.cpp:1592] Executor for container 'a2308dfc-ec2f-4687-ae92-f045dd2d3614' has exited
> E1027 20:36:22.518070 23010 socket.hpp:174] Shutdown failed on fd=13: Transport endpoint is not connected [107]
> I1027 20:36:22.518084 23008 docker.cpp:1390] Destroying container 'a2308dfc-ec2f-4687-ae92-f045dd2d3614'
> I1027 20:36:22.518282 23008 docker.cpp:1592] Executor for container '77b1748e-f295-4eb5-9966-d7a3bba2fc31' has exited
> I1027 20:36:22.518324 23008 docker.cpp:1390] Destroying container '77b1748e-f295-4eb5-9966-d7a3bba2fc31'
> E1027 20:36:22.518357 23010 socket.hpp:174] Shutdown failed on fd=13: Transport endpoint is not connected [107]
> I1027 20:36:22.518360 23008 docker.cpp:1494] Running docker stop on container 'a2308dfc-ec2f-4687-ae92-f045dd2d3614'
> I1027 20:36:22.518489 23008 docker.cpp:1494] Running docker stop on container '77b1748e-f295-4eb5-9966-d7a3bba2fc31'
> I1027 20:36:22.518592 23005 slave.cpp:3433] Executor 'ubuntu.059ced51-7cea-11e5-a442-1ac2f22f38db' of framework 20151016-161150-1902412554-5050-1-0000 has terminated with unknown status
> I1027 20:36:22.519127 23005 slave.cpp:2717] Handling status update TASK_LOST (UUID: b07be363-433f-4a11-8c81-1f5787debc76) for task ubuntu.059ced51-7cea-11e5-a442-1ac2f22f38db of framework 20151016-161150-1902412554-5050-1-0000 from @0.0.0.0:0
> I1027 20:36:22.519263 23005 slave.cpp:3433] Executor 'ubuntu.059d1462-7cea-11e5-a442-1ac2f22f38db' of framework 20151016-161150-1902412554-5050-1-0000 has terminated with unknown status
> I1027 20:36:22.519300 23005 slave.cpp:2717] Handling status update TASK_LOST (UUID: 6a687305-78fc-48ec-b49a-8aeb4b42b3ac) for task ubuntu.059d1462-7cea-11e5-a442-1ac2f22f38db of framework 20151016-161150-1902412554-5050-1-0000 from @0.0.0.0:0
> W1027 20:36:22.519498 23003 docker.cpp:1002] Ignoring updating unknown container: a2308dfc-ec2f-4687-ae92-f045dd2d3614
> W1027 20:36:22.519611 23003 docker.cpp:1002] Ignoring updating unknown container: 77b1748e-f295-4eb5-9966-d7a3bba2fc31
> I1027 20:36:22.519691 23003 status_update_manager.cpp:322] Received status update TASK_LOST (UUID: b07be363-433f-4a11-8c81-1f5787debc76) for task ubuntu.059ced51-7cea-11e5-a442-1ac2f22f38db of framework 20151016-161150-1902412554-5050-1-0000
> I1027 20:36:22.519755 23003 status_update_manager.cpp:826] Checkpointing UPDATE for status update TASK_LOST (UUID: b07be363-433f-4a11-8c81-1f5787debc76) for task ubuntu.059ced51-7cea-11e5-a442-1ac2f22f38db of framework 20151016-161150-1902412554-5050-1-0000
> I1027 20:36:22.525867 23003 status_update_manager.cpp:322] Received status update TASK_LOST (UUID: 6a687305-78fc-48ec-b49a-8aeb4b42b3ac) for task ubuntu.059d1462-7cea-11e5-a442-1ac2f22f38db of framework 20151016-161150-1902412554-5050-1-0000
> I1027 20:36:22.525907 23003 status_update_manager.cpp:826] Checkpointing UPDATE for status update TASK_LOST (UUID: 6a687305-78fc-48ec-b49a-8aeb4b42b3ac) for task ubuntu.059d1462-7cea-11e5-a442-1ac2f22f38db of framework 20151016-161150-1902412554-5050-1-0000
> W1027 20:36:22.526645 23009 slave.cpp:2968] Dropping status update TASK_LOST (UUID: b07be363-433f-4a11-8c81-1f5787debc76) for task ubuntu.059ced51-7cea-11e5-a442-1ac2f22f38db of framework 20151016-161150-1902412554-5050-1-0000 sent by status update manager because the slave is in RECOVERING state
> W1027 20:36:22.529747 23007 slave.cpp:2968] Dropping status update TASK_LOST (UUID: 6a687305-78fc-48ec-b49a-8aeb4b42b3ac) for task ubuntu.059d1462-7cea-11e5-a442-1ac2f22f38db of framework 20151016-161150-1902412554-5050-1-0000 sent by status update manager because the slave is in RECOVERING state
> I1027 20:36:24.518846 23004 slave.cpp:2666] Cleaning up un-reregistered executors
> I1027 20:36:24.519011 23004 slave.cpp:4110] Finished recovery
> {noformat}
> Docker output:
> {noformat}
> CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
> 8d0d69fe34d7 libmesos/ubuntu "/bin/sh -c 'while s About a minute ago Up About a minute mesos-bc7d28c1-81cd-4dfe-8c53-afa8fdfeb472-S14.a1492e45-2fce-4ca4-bd16-edcef439ca31
> e4344cfbcc6d libmesos/ubuntu "/bin/sh -c 'while s About a minute ago Up About a minute mesos-bc7d28c1-81cd-4dfe-8c53-afa8fdfeb472-S14.c3624e67-7a27-4309-8aa4-365d3fd1bfe2
> 3ce690f3b872 libmesos/ubuntu "/bin/sh -c 'while s 4 minutes ago Up 4 minutes mesos-bc7d28c1-81cd-4dfe-8c53-afa8fdfeb472-S14.a2308dfc-ec2f-4687-ae92-f045dd2d3614
> 5b4546d3087a libmesos/ubuntu "/bin/sh -c 'while s 4 minutes ago Up 4 minutes mesos-bc7d28c1-81cd-4dfe-8c53-afa8fdfeb472-S14.77b1748e-f295-4eb5-9966-d7a3bba2fc31
> {noformat}
> After digging in to the issue it seems the below comment might be the problem.
> https://github.com/apache/mesos/blob/master/src/slave/containerizer/docker.cpp#L97
> It appears that the recovery command is still only sending the containerId and not the frameworkId + containerId.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)