You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2016/07/13 01:10:21 UTC

[jira] [Commented] (MESOS-3808) slave/containerizer/docker leaves orphan containers on restart of mesos-slave

    [ https://issues.apache.org/jira/browse/MESOS-3808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15374119#comment-15374119 ] 

ASF GitHub Bot commented on MESOS-3808:
---------------------------------------

Github user jfarrell closed the pull request at:

    https://github.com/apache/mesos/pull/79


> slave/containerizer/docker leaves orphan containers on restart of mesos-slave
> -----------------------------------------------------------------------------
>
>                 Key: MESOS-3808
>                 URL: https://issues.apache.org/jira/browse/MESOS-3808
>             Project: Mesos
>          Issue Type: Bug
>          Components: containerization, docker, slave
>    Affects Versions: 0.25.0
>         Environment: CoreOS. Running mesos-slave in a container.
>            Reporter: Chris Fortier
>            Assignee: Gilbert Song
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> We attempted to upgrade from Mesos 0.23 to 0.25 but noticed that Docker containers launched by Mesos were being orphaned and not destroyed when the Mesos agent was restarted.
> Relavent log output:
> {noformat}
> I1027 20:36:22.343880 23004 docker.cpp:535] Recovering Docker containers
> I1027 20:36:22.517032 23008 docker.cpp:639] Recovering container 'a2308dfc-ec2f-4687-ae92-f045dd2d3614' for executor 'ubuntu.059ced51-7cea-11e5-a442-1ac2f22f38db' of framework 20151016-161150-1902412554-5050-1-0000
> I1027 20:36:22.517467 23008 docker.cpp:639] Recovering container '77b1748e-f295-4eb5-9966-d7a3bba2fc31' for executor 'ubuntu.059d1462-7cea-11e5-a442-1ac2f22f38db' of framework 20151016-161150-1902412554-5050-1-0000
> I1027 20:36:22.517817 23007 slave.cpp:4051] Sending reconnect request to executor ubuntu.059d1462-7cea-11e5-a442-1ac2f22f38db of framework 20151016-161150-1902412554-5050-1-0000 at executor(1)@10.131.100.57:40596
> I1027 20:36:22.518033 23007 slave.cpp:4051] Sending reconnect request to executor ubuntu.059ced51-7cea-11e5-a442-1ac2f22f38db of framework 20151016-161150-1902412554-5050-1-0000 at executor(1)@10.131.100.57:57469
> I1027 20:36:22.518038 23008 docker.cpp:1592] Executor for container 'a2308dfc-ec2f-4687-ae92-f045dd2d3614' has exited
> E1027 20:36:22.518070 23010 socket.hpp:174] Shutdown failed on fd=13: Transport endpoint is not connected [107]
> I1027 20:36:22.518084 23008 docker.cpp:1390] Destroying container 'a2308dfc-ec2f-4687-ae92-f045dd2d3614'
> I1027 20:36:22.518282 23008 docker.cpp:1592] Executor for container '77b1748e-f295-4eb5-9966-d7a3bba2fc31' has exited
> I1027 20:36:22.518324 23008 docker.cpp:1390] Destroying container '77b1748e-f295-4eb5-9966-d7a3bba2fc31'
> E1027 20:36:22.518357 23010 socket.hpp:174] Shutdown failed on fd=13: Transport endpoint is not connected [107]
> I1027 20:36:22.518360 23008 docker.cpp:1494] Running docker stop on container 'a2308dfc-ec2f-4687-ae92-f045dd2d3614'
> I1027 20:36:22.518489 23008 docker.cpp:1494] Running docker stop on container '77b1748e-f295-4eb5-9966-d7a3bba2fc31'
> I1027 20:36:22.518592 23005 slave.cpp:3433] Executor 'ubuntu.059ced51-7cea-11e5-a442-1ac2f22f38db' of framework 20151016-161150-1902412554-5050-1-0000 has terminated with unknown status
> I1027 20:36:22.519127 23005 slave.cpp:2717] Handling status update TASK_LOST (UUID: b07be363-433f-4a11-8c81-1f5787debc76) for task ubuntu.059ced51-7cea-11e5-a442-1ac2f22f38db of framework 20151016-161150-1902412554-5050-1-0000 from @0.0.0.0:0
> I1027 20:36:22.519263 23005 slave.cpp:3433] Executor 'ubuntu.059d1462-7cea-11e5-a442-1ac2f22f38db' of framework 20151016-161150-1902412554-5050-1-0000 has terminated with unknown status
> I1027 20:36:22.519300 23005 slave.cpp:2717] Handling status update TASK_LOST (UUID: 6a687305-78fc-48ec-b49a-8aeb4b42b3ac) for task ubuntu.059d1462-7cea-11e5-a442-1ac2f22f38db of framework 20151016-161150-1902412554-5050-1-0000 from @0.0.0.0:0
> W1027 20:36:22.519498 23003 docker.cpp:1002] Ignoring updating unknown container: a2308dfc-ec2f-4687-ae92-f045dd2d3614
> W1027 20:36:22.519611 23003 docker.cpp:1002] Ignoring updating unknown container: 77b1748e-f295-4eb5-9966-d7a3bba2fc31
> I1027 20:36:22.519691 23003 status_update_manager.cpp:322] Received status update TASK_LOST (UUID: b07be363-433f-4a11-8c81-1f5787debc76) for task ubuntu.059ced51-7cea-11e5-a442-1ac2f22f38db of framework 20151016-161150-1902412554-5050-1-0000
> I1027 20:36:22.519755 23003 status_update_manager.cpp:826] Checkpointing UPDATE for status update TASK_LOST (UUID: b07be363-433f-4a11-8c81-1f5787debc76) for task ubuntu.059ced51-7cea-11e5-a442-1ac2f22f38db of framework 20151016-161150-1902412554-5050-1-0000
> I1027 20:36:22.525867 23003 status_update_manager.cpp:322] Received status update TASK_LOST (UUID: 6a687305-78fc-48ec-b49a-8aeb4b42b3ac) for task ubuntu.059d1462-7cea-11e5-a442-1ac2f22f38db of framework 20151016-161150-1902412554-5050-1-0000
> I1027 20:36:22.525907 23003 status_update_manager.cpp:826] Checkpointing UPDATE for status update TASK_LOST (UUID: 6a687305-78fc-48ec-b49a-8aeb4b42b3ac) for task ubuntu.059d1462-7cea-11e5-a442-1ac2f22f38db of framework 20151016-161150-1902412554-5050-1-0000
> W1027 20:36:22.526645 23009 slave.cpp:2968] Dropping status update TASK_LOST (UUID: b07be363-433f-4a11-8c81-1f5787debc76) for task ubuntu.059ced51-7cea-11e5-a442-1ac2f22f38db of framework 20151016-161150-1902412554-5050-1-0000 sent by status update manager because the slave is in RECOVERING state
> W1027 20:36:22.529747 23007 slave.cpp:2968] Dropping status update TASK_LOST (UUID: 6a687305-78fc-48ec-b49a-8aeb4b42b3ac) for task ubuntu.059d1462-7cea-11e5-a442-1ac2f22f38db of framework 20151016-161150-1902412554-5050-1-0000 sent by status update manager because the slave is in RECOVERING state
> I1027 20:36:24.518846 23004 slave.cpp:2666] Cleaning up un-reregistered executors
> I1027 20:36:24.519011 23004 slave.cpp:4110] Finished recovery
> {noformat}
> Docker output:
> {noformat}
> CONTAINER ID        IMAGE                             COMMAND                CREATED              STATUS              PORTS               NAMES
> 8d0d69fe34d7        libmesos/ubuntu                   "/bin/sh -c 'while s   About a minute ago   Up About a minute                       mesos-bc7d28c1-81cd-4dfe-8c53-afa8fdfeb472-S14.a1492e45-2fce-4ca4-bd16-edcef439ca31
> e4344cfbcc6d        libmesos/ubuntu                   "/bin/sh -c 'while s   About a minute ago   Up About a minute                       mesos-bc7d28c1-81cd-4dfe-8c53-afa8fdfeb472-S14.c3624e67-7a27-4309-8aa4-365d3fd1bfe2
> 3ce690f3b872        libmesos/ubuntu                   "/bin/sh -c 'while s   4 minutes ago        Up 4 minutes                            mesos-bc7d28c1-81cd-4dfe-8c53-afa8fdfeb472-S14.a2308dfc-ec2f-4687-ae92-f045dd2d3614
> 5b4546d3087a        libmesos/ubuntu                   "/bin/sh -c 'while s   4 minutes ago        Up 4 minutes                            mesos-bc7d28c1-81cd-4dfe-8c53-afa8fdfeb472-S14.77b1748e-f295-4eb5-9966-d7a3bba2fc31
> {noformat}
> After digging in to the issue it seems the below comment might be the problem. 
> https://github.com/apache/mesos/blob/master/src/slave/containerizer/docker.cpp#L97
> It appears that the recovery command is still only sending the containerId and not the frameworkId + containerId.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)