You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mesos.apache.org by ? ? <At...@outlook.com> on 2016/03/23 04:13:51 UTC

About executor failover

What if the executor process down with its docker container still alive?

As I tested, I killed an executor process in one of my mesos slave machines, the process detail just like:


root     17166  9569  0 Mar22 ?        00:01:39 mesos-docker-executor --container=mesos-0d58cb85-e726-479a-a57a-83405e3ae580-S3.b995031b-9c46-4713-9050-518aa306c6aa --docker=docker --docker_socket=/var/run/docker.sock --help=false --mapped_directory=/mnt/mesos/sandbox --sandbox_directory=/data/mesos/slaves/0d58cb85-e726-479a-a57a-83405e3ae580-S3/frameworks/5cfc9845-05c0-45b1-acc0-595ab92075d2-0000/executors/archtools_hearthstone.eless_eless.uwsgi.353f920b-eff6-11e5-97d3-aeb4726ea116/runs/b995031b-9c46-4713-9050-518aa306c6aa --stop_timeout=0ns



The I checked the container with name "mesos-0d58cb85-e726-479a-a57a-83405e3ae580-S3.b995031b-9c46-4713-9050-518aa306c6aa" was still alive.


My mesos version is 0.25.0. And the mesos slave machine kernel version is Linux 3.10.0-229.11.1.el7.x86_64.


I mean if executor process crashed/killed for whatever reasons(but the container is alive), a new container will launch for the task_lost event. So a container created by the dead executor process would be undiscoverable to my framework.


I want to know if I am wrong, or there is a way to handle this scenario.


I hope my question is clear, if not, please let me know.


Any feedback would be appreciated. [&#X1f60a]


Re: About executor failover

Posted by 琪 冯 <At...@outlook.com>.
I should say finally I found a way to clean orphans containers.
I learnt the executor will not remove its container when the task complete. Executor will stop the container and exit. The container will in exit state and stay in the slave machine until --docker_remove_delay.
I set --docker_remove_delay="1mins", and restarted the slave, and killed an executor process. After 1 minute, the container left by the killed executor removed.
This may not be a good way to solve my problem. But it do.
Thank you haosdent. Thank you for your help. [&#X1f60a]


________________________________
From: haosdent <ha...@gmail.com>
Sent: Wednesday, March 23, 2016 11:17 AM
To: user
Subject: Re: About executor failover

But I think we could make sure docker container exit when kill executor. If you have clear requirements, could you fill it in https://issues.apache.org/jira/browse/MESOS So other folks could help check whether it should be accepted or not.
[https://issues.apache.org/jira/secure/projectavatar?pid=12311242&avatarId=17056&size=large]<https://issues.apache.org/jira/browse/MESOS>

Mesos - ASF JIRA - issues.apache.org<https://issues.apache.org/jira/browse/MESOS>
issues.apache.org
A list of upcoming versions. Click on the row to display issues for that version.



On Wed, Mar 23, 2016 at 7:14 PM, haosdent <ha...@gmail.com>> wrote:
As I know, could not know orphan containers in framework now.

On Wed, Mar 23, 2016 at 6:50 PM, 琪 冯 <At...@outlook.com>> wrote:


Many thanks for reply!
I learnt the orphans containers were removed by the slave recovery. I mean, is there anything I can do from the framework, or some other monitors to remove or detect them automatically.

Thanks for your helps.


________________________________
From: haosdent <ha...@gmail.com>>
Sent: Wednesday, March 23, 2016 3:22 AM
To: user
Subject: Re: About executor failover

Yes, in that case, these orphans containers would be recovered or killed when you restart slave.

On Wed, Mar 23, 2016 at 11:13 AM, ? ? <At...@outlook.com>> wrote:

What if the executor process down with its docker container still alive?

As I tested, I killed an executor process in one of my mesos slave machines, the process detail just like:


root     17166  9569  0 Mar22 ?        00:01:39 mesos-docker-executor --container=mesos-0d58cb85-e726-479a-a57a-83405e3ae580-S3.b995031b-9c46-4713-9050-518aa306c6aa --docker=docker --docker_socket=/var/run/docker.sock --help=false --mapped_directory=/mnt/mesos/sandbox --sandbox_directory=/data/mesos/slaves/0d58cb85-e726-479a-a57a-83405e3ae580-S3/frameworks/5cfc9845-05c0-45b1-acc0-595ab92075d2-0000/executors/archtools_hearthstone.eless_eless.uwsgi.353f920b-eff6-11e5-97d3-aeb4726ea116/runs/b995031b-9c46-4713-9050-518aa306c6aa --stop_timeout=0ns



The I checked the container with name "mesos-0d58cb85-e726-479a-a57a-83405e3ae580-S3.b995031b-9c46-4713-9050-518aa306c6aa" was still alive.


My mesos version is 0.25.0. And the mesos slave machine kernel version is Linux 3.10.0-229.11.1.el7.x86_64.


I mean if executor process crashed/killed for whatever reasons(but the container is alive), a new container will launch for the task_lost event. So a container created by the dead executor process would be undiscoverable to my framework.


I want to know if I am wrong, or there is a way to handle this scenario.


I hope my question is clear, if not, please let me know.


Any feedback would be appreciated. [&#X1f60a]




--
Best Regards,
Haosdent Huang



--
Best Regards,
Haosdent Huang



--
Best Regards,
Haosdent Huang

Re: About executor failover

Posted by haosdent <ha...@gmail.com>.
But I think we could make sure docker container exit when kill executor. If
you have clear requirements, could you fill it in
https://issues.apache.org/jira/browse/MESOS So other folks could help check
whether it should be accepted or not.

On Wed, Mar 23, 2016 at 7:14 PM, haosdent <ha...@gmail.com> wrote:

> As I know, could not know orphan containers in framework now.
>
> On Wed, Mar 23, 2016 at 6:50 PM, 琪 冯 <At...@outlook.com> wrote:
>
>>
>> Many thanks for reply!
>> I learnt the orphans containers were removed by the slave recovery. I
>> mean, is there anything I can do from the framework, or some other monitors
>> to remove or detect them automatically.
>>
>> Thanks for your helps.
>>
>>
>> ------------------------------
>> *From:* haosdent <ha...@gmail.com>
>> *Sent:* Wednesday, March 23, 2016 3:22 AM
>> *To:* user
>> *Subject:* Re: About executor failover
>>
>> Yes, in that case, these orphans containers would be recovered or killed
>> when you restart slave.
>>
>> On Wed, Mar 23, 2016 at 11:13 AM, ? ? <At...@outlook.com> wrote:
>>
>>> What if the executor process down with its docker container still alive?
>>>
>>> As I tested, I killed an executor process in one of my mesos slave
>>> machines, the process detail just like:
>>>
>>>
>>> root     17166  9569  0 Mar22 ?        00:01:39 mesos-docker-executor
>>> --container=mesos-0d58cb85-e726-479a-a57a-83405e3ae580-S3.b995031b-9c46-4713-9050-518aa306c6aa
>>> --docker=docker --docker_socket=/var/run/docker.sock --help=false
>>> --mapped_directory=/mnt/mesos/sandbox --sandbox_directory=/data/mesos/slaves/0d58cb85-e726-479a-a57a-83405e3ae580-S3/frameworks/5cfc9845-05c0-45b1-acc0-595ab92075d2-0000/executors/archtools_hearthstone.eless_eless.uwsgi.353f920b-eff6-11e5-97d3-aeb4726ea116/runs/b995031b-9c46-4713-9050-518aa306c6aa
>>> --stop_timeout=0ns
>>>
>>>
>>> The I checked the container with name "mesos-0d58cb85-e726-479a-a57a-83405e3ae580-S3.b995031b-9c46-4713-9050-518aa306c6aa"
>>> was still alive.
>>>
>>>
>>> My mesos version is 0.25.0. And the mesos slave machine kernel version
>>> is Linux 3.10.0-229.11.1.el7.x86_64.
>>>
>>>
>>> I mean if executor process crashed/killed for whatever reasons(but the
>>> container is alive), a new container will launch for the task_lost event.
>>> So a container created by the dead executor process would be undiscoverable
>>> to my framework.
>>>
>>>
>>> I want to know if I am wrong, or there is a way to handle this scenario.
>>>
>>>
>>> I hope my question is clear, if not, please let me know.
>>>
>>>
>>> Any feedback would be appreciated. [image: &#X1f60a]
>>>
>>>
>>
>>
>> --
>> Best Regards,
>> Haosdent Huang
>>
>
>
>
> --
> Best Regards,
> Haosdent Huang
>



-- 
Best Regards,
Haosdent Huang

Re: About executor failover

Posted by haosdent <ha...@gmail.com>.
As I know, could not know orphan containers in framework now.

On Wed, Mar 23, 2016 at 6:50 PM, 琪 冯 <At...@outlook.com> wrote:

>
> Many thanks for reply!
> I learnt the orphans containers were removed by the slave recovery. I
> mean, is there anything I can do from the framework, or some other monitors
> to remove or detect them automatically.
>
> Thanks for your helps.
>
>
> ------------------------------
> *From:* haosdent <ha...@gmail.com>
> *Sent:* Wednesday, March 23, 2016 3:22 AM
> *To:* user
> *Subject:* Re: About executor failover
>
> Yes, in that case, these orphans containers would be recovered or killed
> when you restart slave.
>
> On Wed, Mar 23, 2016 at 11:13 AM, ? ? <At...@outlook.com> wrote:
>
>> What if the executor process down with its docker container still alive?
>>
>> As I tested, I killed an executor process in one of my mesos slave
>> machines, the process detail just like:
>>
>>
>> root     17166  9569  0 Mar22 ?        00:01:39 mesos-docker-executor
>> --container=mesos-0d58cb85-e726-479a-a57a-83405e3ae580-S3.b995031b-9c46-4713-9050-518aa306c6aa
>> --docker=docker --docker_socket=/var/run/docker.sock --help=false
>> --mapped_directory=/mnt/mesos/sandbox --sandbox_directory=/data/mesos/slaves/0d58cb85-e726-479a-a57a-83405e3ae580-S3/frameworks/5cfc9845-05c0-45b1-acc0-595ab92075d2-0000/executors/archtools_hearthstone.eless_eless.uwsgi.353f920b-eff6-11e5-97d3-aeb4726ea116/runs/b995031b-9c46-4713-9050-518aa306c6aa
>> --stop_timeout=0ns
>>
>>
>> The I checked the container with name "mesos-0d58cb85-e726-479a-a57a-83405e3ae580-S3.b995031b-9c46-4713-9050-518aa306c6aa"
>> was still alive.
>>
>>
>> My mesos version is 0.25.0. And the mesos slave machine kernel version is
>> Linux 3.10.0-229.11.1.el7.x86_64.
>>
>>
>> I mean if executor process crashed/killed for whatever reasons(but the
>> container is alive), a new container will launch for the task_lost event.
>> So a container created by the dead executor process would be undiscoverable
>> to my framework.
>>
>>
>> I want to know if I am wrong, or there is a way to handle this scenario.
>>
>>
>> I hope my question is clear, if not, please let me know.
>>
>>
>> Any feedback would be appreciated. [image: &#X1f60a]
>>
>>
>
>
> --
> Best Regards,
> Haosdent Huang
>



-- 
Best Regards,
Haosdent Huang

Re: About executor failover

Posted by 琪 冯 <At...@outlook.com>.
Many thanks for reply!
I learnt the orphans containers were removed by the slave recovery. I mean, is there anything I can do from the framework, or some other monitors to remove or detect them automatically.

Thanks for your helps.


________________________________
From: haosdent <ha...@gmail.com>
Sent: Wednesday, March 23, 2016 3:22 AM
To: user
Subject: Re: About executor failover

Yes, in that case, these orphans containers would be recovered or killed when you restart slave.

On Wed, Mar 23, 2016 at 11:13 AM, ? ? <At...@outlook.com>> wrote:

What if the executor process down with its docker container still alive?

As I tested, I killed an executor process in one of my mesos slave machines, the process detail just like:


root     17166  9569  0 Mar22 ?        00:01:39 mesos-docker-executor --container=mesos-0d58cb85-e726-479a-a57a-83405e3ae580-S3.b995031b-9c46-4713-9050-518aa306c6aa --docker=docker --docker_socket=/var/run/docker.sock --help=false --mapped_directory=/mnt/mesos/sandbox --sandbox_directory=/data/mesos/slaves/0d58cb85-e726-479a-a57a-83405e3ae580-S3/frameworks/5cfc9845-05c0-45b1-acc0-595ab92075d2-0000/executors/archtools_hearthstone.eless_eless.uwsgi.353f920b-eff6-11e5-97d3-aeb4726ea116/runs/b995031b-9c46-4713-9050-518aa306c6aa --stop_timeout=0ns



The I checked the container with name "mesos-0d58cb85-e726-479a-a57a-83405e3ae580-S3.b995031b-9c46-4713-9050-518aa306c6aa" was still alive.


My mesos version is 0.25.0. And the mesos slave machine kernel version is Linux 3.10.0-229.11.1.el7.x86_64.


I mean if executor process crashed/killed for whatever reasons(but the container is alive), a new container will launch for the task_lost event. So a container created by the dead executor process would be undiscoverable to my framework.


I want to know if I am wrong, or there is a way to handle this scenario.


I hope my question is clear, if not, please let me know.


Any feedback would be appreciated. [&#X1f60a]




--
Best Regards,
Haosdent Huang

Re: About executor failover

Posted by haosdent <ha...@gmail.com>.
Yes, in that case, these orphans containers would be recovered or killed
when you restart slave.

On Wed, Mar 23, 2016 at 11:13 AM, ? ? <At...@outlook.com> wrote:

> What if the executor process down with its docker container still alive?
>
> As I tested, I killed an executor process in one of my mesos slave
> machines, the process detail just like:
>
>
> root     17166  9569  0 Mar22 ?        00:01:39 mesos-docker-executor
> --container=mesos-0d58cb85-e726-479a-a57a-83405e3ae580-S3.b995031b-9c46-4713-9050-518aa306c6aa
> --docker=docker --docker_socket=/var/run/docker.sock --help=false
> --mapped_directory=/mnt/mesos/sandbox --sandbox_directory=/data/mesos/slaves/0d58cb85-e726-479a-a57a-83405e3ae580-S3/frameworks/5cfc9845-05c0-45b1-acc0-595ab92075d2-0000/executors/archtools_hearthstone.eless_eless.uwsgi.353f920b-eff6-11e5-97d3-aeb4726ea116/runs/b995031b-9c46-4713-9050-518aa306c6aa
> --stop_timeout=0ns
>
>
> The I checked the container with name "mesos-0d58cb85-e726-479a-a57a-83405e3ae580-S3.b995031b-9c46-4713-9050-518aa306c6aa"
> was still alive.
>
>
> My mesos version is 0.25.0. And the mesos slave machine kernel version is
> Linux 3.10.0-229.11.1.el7.x86_64.
>
>
> I mean if executor process crashed/killed for whatever reasons(but the
> container is alive), a new container will launch for the task_lost event.
> So a container created by the dead executor process would be undiscoverable
> to my framework.
>
>
> I want to know if I am wrong, or there is a way to handle this scenario.
>
>
> I hope my question is clear, if not, please let me know.
>
>
> Any feedback would be appreciated. [image: &#X1f60a]
>
>


-- 
Best Regards,
Haosdent Huang