You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mesos.apache.org by James Vanns <jv...@gmail.com> on 2015/11/24 11:16:54 UTC

Task still 'active' after TASK_FINISHED status

Hi again list.

Mesos 0.24
C++ Framework (still using the Protobufs based comms, not REST)

My framework appears to be holding onto offers (somehow) from tasks that
are finished!? I don't understand why. The task comprises of a shell
command that executes within a docker container.
The return code to the OS from the shell command is indeed zero for
success, which Mesos honours and transitions to TASK_FINISHED state.
However, using the UI these still register as 'active' (though acknowledged
as FINISHED) and thus the resources are not yet freed.

Any pointers appreciated!

Cheers,

Jim

--
Senior Code Pig
Industrial Light & Magic

Re: Task still 'active' after TASK_FINISHED status

Posted by Vinod Kone <vi...@apache.org>.
The fact that the slave is retrying means that the TASK_FAILED status
hasn't reached the master or the scheduler or the acknowledgement hasn't
reached the slave.  Since the master hasn't released the resources for the
task (from what you say), I imagine it's the former.

What do master logs say?

On Wed, Nov 25, 2015 at 2:01 AM, James Vanns <jv...@gmail.com> wrote:

> Er, I could. At the moment it's pretty huge so maybe I'll just try and
> trim it down a bit. I've noticed that Chronos does the same, actually.
> There is a task that is 'active' and still holding onto resources yet it
> has already completed unsuccessfully with TASK_FAILED (16hrs ago!) state.
> Attached is a log of the events from the mesos slave that executed this
> particular Chronos task (before it continues to forward the same state over
> and over). Note that the last pair of lines is repeated ad-infinitum. I can
> confirm that this Chronos framework with the same ID is still running.
>
> Sorry to switch frameworks suddenly - this was simpler because it was one
> task instead of 100s.
>
> Jim
>
> On 24 November 2015 at 17:57, Vinod Kone <vi...@gmail.com> wrote:
>
>> Can you paste the logs?
>>
>> On Tue, Nov 24, 2015 at 2:16 AM, James Vanns <jv...@gmail.com>
>> wrote:
>>
>>> Hi again list.
>>>
>>> Mesos 0.24
>>> C++ Framework (still using the Protobufs based comms, not REST)
>>>
>>> My framework appears to be holding onto offers (somehow) from tasks that
>>> are finished!? I don't understand why. The task comprises of a shell
>>> command that executes within a docker container.
>>> The return code to the OS from the shell command is indeed zero for
>>> success, which Mesos honours and transitions to TASK_FINISHED state.
>>> However, using the UI these still register as 'active' (though acknowledged
>>> as FINISHED) and thus the resources are not yet freed.
>>>
>>> Any pointers appreciated!
>>>
>>> Cheers,
>>>
>>> Jim
>>>
>>> --
>>> Senior Code Pig
>>> Industrial Light & Magic
>>>
>>
>>
>
>
> --
> --
> Senior Code Pig
> Industrial Light & Magic
>

Re: Task still 'active' after TASK_FINISHED status

Posted by David Greenberg <ds...@gmail.com>.
Yes, those'll be CommandExecutors; this is probably not the issue I
suggested it might be.

On Wed, Nov 25, 2015 at 11:02 AM James Vanns <jv...@gmail.com> wrote:

> I don't know what the Chronos default is - but in the recent case I posted
> about, we use whatever the Chronos default is.... I just checked their
> documentation and it states they use the Mesos command executor.
>
> As far as our own framework, which exhibits similar behaviour, we don't
> explicitly specify one (but we do use ContainerInfo::DockerInfo). We do set
> a command for the task to run so I guess that assumes a CommandExecutor?
>
> Cheers,
>
> Jim
>
>
> On 25 November 2015 at 15:51, David Greenberg <ds...@gmail.com>
> wrote:
>
>> If you're using a custom executor, this could happen if you don't
>> actually exit the executor process. Is this using CommandExecutor or a
>> custom one?
>>
>> On Wed, Nov 25, 2015 at 5:01 AM James Vanns <jv...@gmail.com> wrote:
>>
>>> Er, I could. At the moment it's pretty huge so maybe I'll just try and
>>> trim it down a bit. I've noticed that Chronos does the same, actually.
>>> There is a task that is 'active' and still holding onto resources yet it
>>> has already completed unsuccessfully with TASK_FAILED (16hrs ago!) state.
>>> Attached is a log of the events from the mesos slave that executed this
>>> particular Chronos task (before it continues to forward the same state over
>>> and over). Note that the last pair of lines is repeated ad-infinitum. I can
>>> confirm that this Chronos framework with the same ID is still running.
>>>
>>> Sorry to switch frameworks suddenly - this was simpler because it was
>>> one task instead of 100s.
>>>
>>> Jim
>>>
>>> On 24 November 2015 at 17:57, Vinod Kone <vi...@gmail.com> wrote:
>>>
>>>> Can you paste the logs?
>>>>
>>>> On Tue, Nov 24, 2015 at 2:16 AM, James Vanns <jv...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi again list.
>>>>>
>>>>> Mesos 0.24
>>>>> C++ Framework (still using the Protobufs based comms, not REST)
>>>>>
>>>>> My framework appears to be holding onto offers (somehow) from tasks
>>>>> that are finished!? I don't understand why. The task comprises of a shell
>>>>> command that executes within a docker container.
>>>>> The return code to the OS from the shell command is indeed zero for
>>>>> success, which Mesos honours and transitions to TASK_FINISHED state.
>>>>> However, using the UI these still register as 'active' (though acknowledged
>>>>> as FINISHED) and thus the resources are not yet freed.
>>>>>
>>>>> Any pointers appreciated!
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Jim
>>>>>
>>>>> --
>>>>> Senior Code Pig
>>>>> Industrial Light & Magic
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> --
>>> Senior Code Pig
>>> Industrial Light & Magic
>>>
>>
>
>
> --
> --
> Senior Code Pig
> Industrial Light & Magic
>

Re: Task still 'active' after TASK_FINISHED status

Posted by James Vanns <jv...@gmail.com>.
I don't know what the Chronos default is - but in the recent case I posted
about, we use whatever the Chronos default is.... I just checked their
documentation and it states they use the Mesos command executor.

As far as our own framework, which exhibits similar behaviour, we don't
explicitly specify one (but we do use ContainerInfo::DockerInfo). We do set
a command for the task to run so I guess that assumes a CommandExecutor?

Cheers,

Jim


On 25 November 2015 at 15:51, David Greenberg <ds...@gmail.com>
wrote:

> If you're using a custom executor, this could happen if you don't actually
> exit the executor process. Is this using CommandExecutor or a custom one?
>
> On Wed, Nov 25, 2015 at 5:01 AM James Vanns <jv...@gmail.com> wrote:
>
>> Er, I could. At the moment it's pretty huge so maybe I'll just try and
>> trim it down a bit. I've noticed that Chronos does the same, actually.
>> There is a task that is 'active' and still holding onto resources yet it
>> has already completed unsuccessfully with TASK_FAILED (16hrs ago!) state.
>> Attached is a log of the events from the mesos slave that executed this
>> particular Chronos task (before it continues to forward the same state over
>> and over). Note that the last pair of lines is repeated ad-infinitum. I can
>> confirm that this Chronos framework with the same ID is still running.
>>
>> Sorry to switch frameworks suddenly - this was simpler because it was one
>> task instead of 100s.
>>
>> Jim
>>
>> On 24 November 2015 at 17:57, Vinod Kone <vi...@gmail.com> wrote:
>>
>>> Can you paste the logs?
>>>
>>> On Tue, Nov 24, 2015 at 2:16 AM, James Vanns <jv...@gmail.com>
>>> wrote:
>>>
>>>> Hi again list.
>>>>
>>>> Mesos 0.24
>>>> C++ Framework (still using the Protobufs based comms, not REST)
>>>>
>>>> My framework appears to be holding onto offers (somehow) from tasks
>>>> that are finished!? I don't understand why. The task comprises of a shell
>>>> command that executes within a docker container.
>>>> The return code to the OS from the shell command is indeed zero for
>>>> success, which Mesos honours and transitions to TASK_FINISHED state.
>>>> However, using the UI these still register as 'active' (though acknowledged
>>>> as FINISHED) and thus the resources are not yet freed.
>>>>
>>>> Any pointers appreciated!
>>>>
>>>> Cheers,
>>>>
>>>> Jim
>>>>
>>>> --
>>>> Senior Code Pig
>>>> Industrial Light & Magic
>>>>
>>>
>>>
>>
>>
>> --
>> --
>> Senior Code Pig
>> Industrial Light & Magic
>>
>


-- 
--
Senior Code Pig
Industrial Light & Magic

Re: Task still 'active' after TASK_FINISHED status

Posted by David Greenberg <ds...@gmail.com>.
If you're using a custom executor, this could happen if you don't actually
exit the executor process. Is this using CommandExecutor or a custom one?
On Wed, Nov 25, 2015 at 5:01 AM James Vanns <jv...@gmail.com> wrote:

> Er, I could. At the moment it's pretty huge so maybe I'll just try and
> trim it down a bit. I've noticed that Chronos does the same, actually.
> There is a task that is 'active' and still holding onto resources yet it
> has already completed unsuccessfully with TASK_FAILED (16hrs ago!) state.
> Attached is a log of the events from the mesos slave that executed this
> particular Chronos task (before it continues to forward the same state over
> and over). Note that the last pair of lines is repeated ad-infinitum. I can
> confirm that this Chronos framework with the same ID is still running.
>
> Sorry to switch frameworks suddenly - this was simpler because it was one
> task instead of 100s.
>
> Jim
>
> On 24 November 2015 at 17:57, Vinod Kone <vi...@gmail.com> wrote:
>
>> Can you paste the logs?
>>
>> On Tue, Nov 24, 2015 at 2:16 AM, James Vanns <jv...@gmail.com>
>> wrote:
>>
>>> Hi again list.
>>>
>>> Mesos 0.24
>>> C++ Framework (still using the Protobufs based comms, not REST)
>>>
>>> My framework appears to be holding onto offers (somehow) from tasks that
>>> are finished!? I don't understand why. The task comprises of a shell
>>> command that executes within a docker container.
>>> The return code to the OS from the shell command is indeed zero for
>>> success, which Mesos honours and transitions to TASK_FINISHED state.
>>> However, using the UI these still register as 'active' (though acknowledged
>>> as FINISHED) and thus the resources are not yet freed.
>>>
>>> Any pointers appreciated!
>>>
>>> Cheers,
>>>
>>> Jim
>>>
>>> --
>>> Senior Code Pig
>>> Industrial Light & Magic
>>>
>>
>>
>
>
> --
> --
> Senior Code Pig
> Industrial Light & Magic
>

Re: Task still 'active' after TASK_FINISHED status

Posted by James Vanns <jv...@gmail.com>.
Er, I could. At the moment it's pretty huge so maybe I'll just try and trim
it down a bit. I've noticed that Chronos does the same, actually. There is
a task that is 'active' and still holding onto resources yet it has already
completed unsuccessfully with TASK_FAILED (16hrs ago!) state. Attached is a
log of the events from the mesos slave that executed this particular
Chronos task (before it continues to forward the same state over and over).
Note that the last pair of lines is repeated ad-infinitum. I can confirm
that this Chronos framework with the same ID is still running.

Sorry to switch frameworks suddenly - this was simpler because it was one
task instead of 100s.

Jim

On 24 November 2015 at 17:57, Vinod Kone <vi...@gmail.com> wrote:

> Can you paste the logs?
>
> On Tue, Nov 24, 2015 at 2:16 AM, James Vanns <jv...@gmail.com> wrote:
>
>> Hi again list.
>>
>> Mesos 0.24
>> C++ Framework (still using the Protobufs based comms, not REST)
>>
>> My framework appears to be holding onto offers (somehow) from tasks that
>> are finished!? I don't understand why. The task comprises of a shell
>> command that executes within a docker container.
>> The return code to the OS from the shell command is indeed zero for
>> success, which Mesos honours and transitions to TASK_FINISHED state.
>> However, using the UI these still register as 'active' (though acknowledged
>> as FINISHED) and thus the resources are not yet freed.
>>
>> Any pointers appreciated!
>>
>> Cheers,
>>
>> Jim
>>
>> --
>> Senior Code Pig
>> Industrial Light & Magic
>>
>
>


-- 
--
Senior Code Pig
Industrial Light & Magic

Re: Task still 'active' after TASK_FINISHED status

Posted by Vinod Kone <vi...@gmail.com>.
Can you paste the logs?

On Tue, Nov 24, 2015 at 2:16 AM, James Vanns <jv...@gmail.com> wrote:

> Hi again list.
>
> Mesos 0.24
> C++ Framework (still using the Protobufs based comms, not REST)
>
> My framework appears to be holding onto offers (somehow) from tasks that
> are finished!? I don't understand why. The task comprises of a shell
> command that executes within a docker container.
> The return code to the OS from the shell command is indeed zero for
> success, which Mesos honours and transitions to TASK_FINISHED state.
> However, using the UI these still register as 'active' (though acknowledged
> as FINISHED) and thus the resources are not yet freed.
>
> Any pointers appreciated!
>
> Cheers,
>
> Jim
>
> --
> Senior Code Pig
> Industrial Light & Magic
>