You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@mesos.apache.org by Olivier Sallou <ol...@irisa.fr> on 2017/09/19 05:47:49 UTC

strange behaviour: Task status -> error-> finished

Hi 
I found a strange behaviour on a cluster that I do not understand. I do not have access to mesos logs (not in my cluster), but anyone faced this before ? 
My framework uses Docker containerizer. We faced a task that sent TASK_ERROR to the framework (why not), but in reality the Docker executed correctly on mesos slave, then we received a TASK_FINISHED. 
So mesos detected an error with task but it detected anyway the end of the task sending the finished event at the end. 

How mesos can detect an error but still watching the task and detect its end ? 

Here are my framework logs: 
2017-09-17 01:06:35,447 DEBUG [godocker-scheduler][Thread-1] Task 17820-0 is in state TASK_RUNNING 
2017-09-17 01:06:46,286 DEBUG [godocker-scheduler][Thread-1] Task 17820-0 is in state TASK_ERROR 
2017-09-17 02:13:44,537 DEBUG [godocker-scheduler][Thread-1] Task 17820-0 is in state TASK_FINISHED 

Unfortunalty I did not log the "reason" of the ERROR, so I do not know what occured, and cannot at this stage reproduce manually the use case. 

Can we have "non terminal" errors, from mesos point of view, where task should not be considered as over? 

Thanks 

Olivier

Re: strange behaviour: Task status -> error-> finished

Posted by Olivier Sallou <ol...@irisa.fr>.


On 09/19/2017 11:22 AM, Benno Evers wrote:
> Hi Olivier,
>
>> Can we have "non terminal" errors, from mesos point of view, where task
> should not be considered as over?
>
> Not really, what you're seeing certainly looks like a bug, terminal updates
> should be terminal. It'lls probably be hard to debug it without more data ;)
indeed...
>
> As a wild guess, since you seem to be using custom task id's, maybe you
> tried to start a task twice, and the TASK_ERROR was generated on the master
> in response to the duplicate task id or some other validation issue, and
> the TASK_FINISHED was generated on the slave when the first task finished?
> Although I'm not sure from the top of my head if there are checks in mesos
> that would catch this.
nope, task was not started twice (got only one TASK_RUNNING event). When
resubmitted, task id is modified.
Thanks anyway.
>
> Best regards,
>
> On Tue, Sep 19, 2017 at 7:47 AM, Olivier Sallou <ol...@irisa.fr>
> wrote:
>
>> Hi
>> I found a strange behaviour on a cluster that I do not understand. I do
>> not have access to mesos logs (not in my cluster), but anyone faced this
>> before ?
>> My framework uses Docker containerizer. We faced a task that sent
>> TASK_ERROR to the framework (why not), but in reality the Docker executed
>> correctly on mesos slave, then we received a TASK_FINISHED.
>> So mesos detected an error with task but it detected anyway the end of the
>> task sending the finished event at the end.
>>
>> How mesos can detect an error but still watching the task and detect its
>> end ?
>>
>> Here are my framework logs:
>> 2017-09-17 01:06:35,447 DEBUG [godocker-scheduler][Thread-1] Task 17820-0
>> is in state TASK_RUNNING
>> 2017-09-17 01:06:46,286 DEBUG [godocker-scheduler][Thread-1] Task 17820-0
>> is in state TASK_ERROR
>> 2017-09-17 02:13:44,537 DEBUG [godocker-scheduler][Thread-1] Task 17820-0
>> is in state TASK_FINISHED
>>
>> Unfortunalty I did not log the "reason" of the ERROR, so I do not know
>> what occured, and cannot at this stage reproduce manually the use case.
>>
>> Can we have "non terminal" errors, from mesos point of view, where task
>> should not be considered as over?
>>
>> Thanks
>>
>> Olivier
>>
>
>

-- 
Olivier Sallou
IRISA / University of Rennes 1
Campus de Beaulieu, 35000 RENNES - FRANCE
Tel: 02.99.84.71.95

gpg key id: 4096R/326D8438  (keyring.debian.org)
Key fingerprint = 5FB4 6F83 D3B9 5204 6335  D26D 78DC 68DB 326D 8438

Re: strange behaviour: Task status -> error-> finished

Posted by Benno Evers <be...@mesosphere.com>.

Hi Olivier,

> Can we have "non terminal" errors, from mesos point of view, where task
should not be considered as over?

Not really, what you're seeing certainly looks like a bug, terminal updates
should be terminal. It'lls probably be hard to debug it without more data ;)

As a wild guess, since you seem to be using custom task id's, maybe you
tried to start a task twice, and the TASK_ERROR was generated on the master
in response to the duplicate task id or some other validation issue, and
the TASK_FINISHED was generated on the slave when the first task finished?
Although I'm not sure from the top of my head if there are checks in mesos
that would catch this.

Best regards,

On Tue, Sep 19, 2017 at 7:47 AM, Olivier Sallou <ol...@irisa.fr>
wrote:

> Hi
> I found a strange behaviour on a cluster that I do not understand. I do
> not have access to mesos logs (not in my cluster), but anyone faced this
> before ?
> My framework uses Docker containerizer. We faced a task that sent
> TASK_ERROR to the framework (why not), but in reality the Docker executed
> correctly on mesos slave, then we received a TASK_FINISHED.
> So mesos detected an error with task but it detected anyway the end of the
> task sending the finished event at the end.
>
> How mesos can detect an error but still watching the task and detect its
> end ?
>
> Here are my framework logs:
> 2017-09-17 01:06:35,447 DEBUG [godocker-scheduler][Thread-1] Task 17820-0
> is in state TASK_RUNNING
> 2017-09-17 01:06:46,286 DEBUG [godocker-scheduler][Thread-1] Task 17820-0
> is in state TASK_ERROR
> 2017-09-17 02:13:44,537 DEBUG [godocker-scheduler][Thread-1] Task 17820-0
> is in state TASK_FINISHED
>
> Unfortunalty I did not log the "reason" of the ERROR, so I do not know
> what occured, and cannot at this stage reproduce manually the use case.
>
> Can we have "non terminal" errors, from mesos point of view, where task
> should not be considered as over?
>
> Thanks
>
> Olivier
>

-- 
Benno Evers
Software Engineer, Mesosphere