You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by Vinod Kone <vi...@mesosphere.io> on 2018/05/17 01:18:20 UTC

Re: [jira] [Commented] (MESOS-8927) Default executor cannot kill tasks if `LAUNCH_NESTED_CONTAINER` is stuck.

Can you paste some logs here too if you have?

On Wed, May 16, 2018 at 5:53 PM, Chun-Hung Hsiao (JIRA) <ji...@apache.org>
wrote:

>
>     [ https://issues.apache.org/jira/browse/MESOS-8927?page=
> com.atlassian.jira.plugin.system.issuetabpanels:comment-
> tabpanel&focusedCommentId=16478318#comment-16478318 ]
>
> Chun-Hung Hsiao commented on MESOS-8927:
> ----------------------------------------
>
> I'd like to add some notes here. This problem is actually nontrivial,
> because AFAIK we don't have a reliable way to kill a container at any state.
>
> > Default executor cannot kill tasks if `LAUNCH_NESTED_CONTAINER` is stuck.
> > ------------------------------------------------------------
> -------------
> >
> >                 Key: MESOS-8927
> >                 URL: https://issues.apache.org/jira/browse/MESOS-8927
> >             Project: Mesos
> >          Issue Type: Bug
> >          Components: executor
> >    Affects Versions: 1.5.1, 1.6.0
> >            Reporter: Chun-Hung Hsiao
> >            Priority: Critical
> >              Labels: default-executor, mesosphere
> >
> > In the default executor, if the {{LAUNCH_NESTED_CONTAINER}} call never
> returns, {{container->launched}} won't be set, so a follow-up {{KILL}}
> event will be ignored:
> >  [https://github.com/apache/mesos/blob/40b40d9b73221388e583fc140280f1
> eb2b48b832/src/launcher/default_executor.cpp#L1091]
> > This could lead to tasks stuck in {{TASK_STARTING}}.
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v7.6.3#76005)
>

Re: [jira] [Commented] (MESOS-8927) Default executor cannot kill tasks if `LAUNCH_NESTED_CONTAINER` is stuck.

Posted by Chun-Hung Hsiao <ch...@mesosphere.io>.
I'm sorry for the duplicated messages. Accidentally pressed the wrong key
shortcuts twice :(

Unfortunately I don't have the log right now. IIRC the executor received
the `KILL` event because the log I saw contained this line:
https://github.com/apache/mesos/blob/7e11a2d39cc642944897d2480105db
fd860fa601/src/launcher/default_executor.cpp#L1236
But it didn't contain this line:
https://github.com/apache/mesos/blob/7e11a2d39cc642944897d2480105dbfd860fa601/src/launcher/default_executor.cpp#L1101

The reason that caused the `LAUNCH_NESTED_CONTAINER` to be stuck was
rotated out in the log file when I examined it.


On Wed, May 16, 2018 at 6:57 PM, Chun-Hung Hsiao <ch...@mesosphere.io>
wrote:

> Unfortunately I don't have the log right now. IIRC the executor received
> the `KILL` event because the log I saw contained this line:
> https://github.com/apache/mesos/blob/7e11a2d39cc642944897d2480105db
> fd860fa601/src/launcher/default_executor.cpp#L1236
> But it didn't contain this line:
>
> On Wed, May 16, 2018 at 6:18 PM, Vinod Kone <vi...@mesosphere.io> wrote:
>
>> Can you paste some logs here too if you have?
>>
>> On Wed, May 16, 2018 at 5:53 PM, Chun-Hung Hsiao (JIRA) <ji...@apache.org>
>> wrote:
>>
>> >
>> >     [ https://issues.apache.org/jira/browse/MESOS-8927?page=
>> > com.atlassian.jira.plugin.system.issuetabpanels:comment-
>> > tabpanel&focusedCommentId=16478318#comment-16478318 ]
>> >
>> > Chun-Hung Hsiao commented on MESOS-8927:
>> > ----------------------------------------
>> >
>> > I'd like to add some notes here. This problem is actually nontrivial,
>> > because AFAIK we don't have a reliable way to kill a container at any
>> state.
>> >
>> > > Default executor cannot kill tasks if `LAUNCH_NESTED_CONTAINER` is
>> stuck.
>> > > ------------------------------------------------------------
>> > -------------
>> > >
>> > >                 Key: MESOS-8927
>> > >                 URL: https://issues.apache.org/jira/browse/MESOS-8927
>> > >             Project: Mesos
>> > >          Issue Type: Bug
>> > >          Components: executor
>> > >    Affects Versions: 1.5.1, 1.6.0
>> > >            Reporter: Chun-Hung Hsiao
>> > >            Priority: Critical
>> > >              Labels: default-executor, mesosphere
>> > >
>> > > In the default executor, if the {{LAUNCH_NESTED_CONTAINER}} call never
>> > returns, {{container->launched}} won't be set, so a follow-up {{KILL}}
>> > event will be ignored:
>> > >  [https://github.com/apache/mesos/blob/40b40d9b73221388e583fc140280f1
>> > eb2b48b832/src/launcher/default_executor.cpp#L1091]
>> > > This could lead to tasks stuck in {{TASK_STARTING}}.
>> >
>> >
>> >
>> > --
>> > This message was sent by Atlassian JIRA
>> > (v7.6.3#76005)
>> >
>>
>
>

Re: [jira] [Commented] (MESOS-8927) Default executor cannot kill tasks if `LAUNCH_NESTED_CONTAINER` is stuck.

Posted by Chun-Hung Hsiao <ch...@mesosphere.io>.
Unfortunately I don't have the log right now. IIRC the executor received
the `KILL` event because the log I saw contained this line:
https://github.com/apache/mesos/blob/7e11a2d39cc642944897d2480105dbfd860fa601/src/launcher/default_executor.cpp#L1236
But it didn't contain this line:

On Wed, May 16, 2018 at 6:18 PM, Vinod Kone <vi...@mesosphere.io> wrote:

> Can you paste some logs here too if you have?
>
> On Wed, May 16, 2018 at 5:53 PM, Chun-Hung Hsiao (JIRA) <ji...@apache.org>
> wrote:
>
> >
> >     [ https://issues.apache.org/jira/browse/MESOS-8927?page=
> > com.atlassian.jira.plugin.system.issuetabpanels:comment-
> > tabpanel&focusedCommentId=16478318#comment-16478318 ]
> >
> > Chun-Hung Hsiao commented on MESOS-8927:
> > ----------------------------------------
> >
> > I'd like to add some notes here. This problem is actually nontrivial,
> > because AFAIK we don't have a reliable way to kill a container at any
> state.
> >
> > > Default executor cannot kill tasks if `LAUNCH_NESTED_CONTAINER` is
> stuck.
> > > ------------------------------------------------------------
> > -------------
> > >
> > >                 Key: MESOS-8927
> > >                 URL: https://issues.apache.org/jira/browse/MESOS-8927
> > >             Project: Mesos
> > >          Issue Type: Bug
> > >          Components: executor
> > >    Affects Versions: 1.5.1, 1.6.0
> > >            Reporter: Chun-Hung Hsiao
> > >            Priority: Critical
> > >              Labels: default-executor, mesosphere
> > >
> > > In the default executor, if the {{LAUNCH_NESTED_CONTAINER}} call never
> > returns, {{container->launched}} won't be set, so a follow-up {{KILL}}
> > event will be ignored:
> > >  [https://github.com/apache/mesos/blob/40b40d9b73221388e583fc140280f1
> > eb2b48b832/src/launcher/default_executor.cpp#L1091]
> > > This could lead to tasks stuck in {{TASK_STARTING}}.
> >
> >
> >
> > --
> > This message was sent by Atlassian JIRA
> > (v7.6.3#76005)
> >
>

Re: [jira] [Commented] (MESOS-8927) Default executor cannot kill tasks if `LAUNCH_NESTED_CONTAINER` is stuck.

Posted by Chun-Hung Hsiao <ch...@mesosphere.io>.
Unfortunately I don't have any log. IIRC the executor received the the
`KILL` event because this is printed:

On Wed, May 16, 2018 at 6:18 PM, Vinod Kone <vi...@mesosphere.io> wrote:

> Can you paste some logs here too if you have?
>
> On Wed, May 16, 2018 at 5:53 PM, Chun-Hung Hsiao (JIRA) <ji...@apache.org>
> wrote:
>
> >
> >     [ https://issues.apache.org/jira/browse/MESOS-8927?page=
> > com.atlassian.jira.plugin.system.issuetabpanels:comment-
> > tabpanel&focusedCommentId=16478318#comment-16478318 ]
> >
> > Chun-Hung Hsiao commented on MESOS-8927:
> > ----------------------------------------
> >
> > I'd like to add some notes here. This problem is actually nontrivial,
> > because AFAIK we don't have a reliable way to kill a container at any
> state.
> >
> > > Default executor cannot kill tasks if `LAUNCH_NESTED_CONTAINER` is
> stuck.
> > > ------------------------------------------------------------
> > -------------
> > >
> > >                 Key: MESOS-8927
> > >                 URL: https://issues.apache.org/jira/browse/MESOS-8927
> > >             Project: Mesos
> > >          Issue Type: Bug
> > >          Components: executor
> > >    Affects Versions: 1.5.1, 1.6.0
> > >            Reporter: Chun-Hung Hsiao
> > >            Priority: Critical
> > >              Labels: default-executor, mesosphere
> > >
> > > In the default executor, if the {{LAUNCH_NESTED_CONTAINER}} call never
> > returns, {{container->launched}} won't be set, so a follow-up {{KILL}}
> > event will be ignored:
> > >  [https://github.com/apache/mesos/blob/40b40d9b73221388e583fc140280f1
> > eb2b48b832/src/launcher/default_executor.cpp#L1091]
> > > This could lead to tasks stuck in {{TASK_STARTING}}.
> >
> >
> >
> > --
> > This message was sent by Atlassian JIRA
> > (v7.6.3#76005)
> >
>