You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "haosdent (JIRA)" <ji...@apache.org> on 2017/03/07 03:00:37 UTC
[jira] [Comment Edited] (MESOS-6480) Support for docker live-restore option in Mesos

    [ https://issues.apache.org/jira/browse/MESOS-6480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15898649#comment-15898649 ] 

haosdent edited comment on MESOS-6480 at 3/7/17 2:59 AM:
---------------------------------------------------------

As check, all docker command would fail when use {{--live-store}} and {{service docker stop}}, include {{docker log}} no matter which log-driver we use. After chat with [~jieyu], The possible way to resolve this is 

1. 
* {{docker run -d}} to start the program
* {{docker log --since xxx --follow}} to read the log
* If {{docker log}} failed, check if {{/proc/$taskPid}} exist, if the task process still exist, keep retry {{docker log}} util {{/proc/$taskPid}} disappear or {{docker log}} success again.

The problem of this way is it is a bit tricky to find the timestamp parameter in {{docker log --since}}. And some logs may miss

2. 
* Read the {{/run/docker/libcontainerd/$container_id/init-stdout}} and {{/run/docker/libcontainerd/$container_id/init-stderr}} directly. This is tricky as well. Because it depends on the implementation of docker accross different versions. And it don't allow multiple consumers, which mean if we read this file directly, other consumers on {{docker log}} would not see the log we got from this file.

In a short word, I think we don't have a perfect solution for this problem unless we allow some log missing.


was (Author: haosdent@gmail.com):
As check, all docker command would fail when use {{--live-store}} and {{service docker stop}}, include {{docker log}} no matter which log-driver we use. After chat with Jie Yu, The possible way to resolve this is 

1. 
* {{docker run -d}} to start the program
* {{docker log --since xxx --follow}} to read the log
* If {{docker log}} failed, check if {{/proc/$taskPid}} exist, if the task process still exist, keep retry {{docker log}} util {{/proc/$taskPid}} disappear or {{docker log}} success again.

The problem of this way is it is a bit tricky to find the timestamp parameter in {{docker log --since}}. And some logs may miss

2. 
* Read the {{/run/docker/libcontainerd/$container_id/init-stdout}} and {{/run/docker/libcontainerd/$container_id/init-stderr}} directly. This is tricky as well. Because it depends on the implementation of docker accross different versions. And it don't allow multiple consumers, which mean if we read this file directly, other consumers on {{docker log}} would not see the log we got from this file.

In a short word, I think we don't have a perfect solution for this problem unless we allow some log missing.

> Support for docker live-restore option in Mesos
> -----------------------------------------------
>
>                 Key: MESOS-6480
>                 URL: https://issues.apache.org/jira/browse/MESOS-6480
>             Project: Mesos
>          Issue Type: Task
>            Reporter: Milind Chawre
>
> Docker-1.12 supports live-restore option which keeps containers alive during docker daemon downtime https://docs.docker.com/engine/admin/live-restore/
> I tried to use this option in my Mesos setup And  observed this :
> 1. On mesos worker node stop docker daemon.
> 2. After some time start the docker daemon. All the containers running on that are still visible using "docker ps". This is an expected behaviour of live-restore option.
> 3. When I check mesos and marathon UI. It shows no Active tasks running on that node. The containers which are still running on that node are now scheduled on different mesos nodes, which is not right since I can see the containers in "docker ps" output because of live-restore option.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)