You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mesos.apache.org by Adam Cecile <ad...@hitec.lu> on 2018/03/14 08:59:58 UTC

Mesos master endless attemps to kill unexisting task

Hello,

I see two old tasks being stuck in Mesos. These tasks don't exist 
anymore since ages but Mesos still tries to kill them:


Mar 14 09:56:49 mario mesos-master[23570]: I0314 09:56:49.441572 23602 
master.cpp:5297] Processing KILL call for task 
'pub_api_oecd-rest-api-on-port-20015.196f414a-f61f-11e7-856c-f6e84742f1ef' 
of framework 346d7333-a980-43a8-93ab-343ea12d77d7-0000 (marathon) at 
scheduler-66a67553-0692-40b0-b29e-e7f342b6a241@10.99.50.2:40487

Mar 14 09:56:49 mario mesos-master[23570]: I0314 09:56:49.441658 23602 
master.cpp:5371] Telling agent 2215ab84-177b-478b-ab62-4453803fde6c-S6 
at slave(1)@10.99.50.3:5051 (zelda.service.domain.com) to kill task 
pub_api_oecd-rest-api-on-port-20015.196f414a-f61f-11e7-856c-f6e84742f1ef 
of framework 346d7333-a980-43a8-93ab-343ea12d77d7-0000 (marathon) at 
scheduler-66a67553-0692-40b0-b29e-e7f342b6a241@10.99.50.2:40487

Mar 14 09:57:09 mario mesos-master[23570]: I0314 09:57:09.441529 23607 
master.cpp:5297] Processing KILL call for task 
'pub_api_oecd-rest-api-on-port-20015.196f414a-f61f-11e7-856c-f6e84742f1ef' 
of framework 346d7333-a980-43a8-93ab-343ea12d77d7-0000 (marathon) at 
scheduler-66a67553-0692-40b0-b29e-e7f342b6a241@10.99.50.2:40487

Mar 14 09:57:09 mario mesos-master[23570]: I0314 09:57:09.441617 23607 
master.cpp:5371] Telling agent 2215ab84-177b-478b-ab62-4453803fde6c-S6 
at slave(1)@10.99.50.3:5051 (zelda.service.domain.com) to kill task 
pub_api_oecd-rest-api-on-port-20015.196f414a-f61f-11e7-856c-f6e84742f1ef 
of framework 346d7333-a980-43a8-93ab-343ea12d77d7-0000 (marathon) at 
scheduler-66a67553-0692-40b0-b29e-e7f342b6a241@10.99.50.2:40487


Could you please tell me how to "purge" them from Mesos master ?

Thanks in advance,

Adam.

Re: Mesos master endless attemps to kill unexisting task

Posted by Greg Mann <gr...@mesosphere.io>.
Hi Adam,
The fact that the task does not show up in the Mesos UI doesn't make sense
to me, in light of the logs excerpts you included. The line:

Mar 14 09:56:49 mario mesos-master[23570]: I0314 09:56:49.441658 23602
master.cpp:5371] Telling agent 2215ab84-177b-478b-ab62-4453803fde6c-S6 at
slave(1)@10.99.50.3:5051 (zelda.service.domain.com) to kill task
pub_api_oecd-rest-api-on-port-20015.196f414a-f61f-11e7-856c-f6e84742f1ef of
framework 346d7333-a980-43a8-93ab-343ea12d77d7-0000 (marathon) at
scheduler-66a67553-0692-40b0-b29e-e7f342b6a241@10.99.50.2:40487

indicates that the Mesos master was able to locate this task in its
internal state. So, I would expect the task to show up in the Mesos UI. You
could also look for the task in the output of the GET_TASKS operator API
call for the master
<http://mesos.apache.org/documentation/latest/operator-http-api/#get_tasks>
and the agent
<http://mesos.apache.org/documentation/latest/operator-http-api/#get_tasks-1>
.

Have you looked at the Mesos agent logs to see how the agent is responding
to the KILL calls?

Mesos doesn't store any state in ZK (it's only used for leader election),
so clearing the task there is not an option. It's possible that forcing a
leader election by restarting the current Mesos master may help, but I'm
uncertain what state the master is in currently, given the inconsistency
noted above.

Cheers,
Greg


On Wed, Apr 4, 2018 at 1:09 AM, Adam Cecile <ad...@hitec.lu> wrote:

> For instance,
>
> No kill ack received for instance [pub_api_oecd-rest-api-on-
> port-20015.marathon-196f414a-f61f-11e7-856c-f6e84742f1ef], retrying
> (73402 attempts so far)
>
> I'd say after 73402 attempts, it's time to let it go :D
>
> On 04/04/2018 10:07 AM, Adam Cecile wrote:
>
> Hello list !
>
> Problem is still on-going, any hint how to fix that ? Like removing broken
> app from zookeeper by hand ?
>
> Regards, Adam.
>
> On 03/20/2018 06:04 PM, daemeon reiydelle wrote:
>
> I ran across a situation with the same symptoms last year (with Mesos &
> Marathon) when we had network problems. The mesos task did exit normally
> (eventually found same in the logs), therefore the UUID had aged out.
>
>
> <======>
> "Who do you think made the first stone spear? The Asperger guy.
> If you get rid of the autism genetics, there would be no Silicon Valley"
> Temple Grandin
>
>
> *Daemeon C.M. Reiydelle San Francisco 1.415.501.0198 London 44 020 8144
> 9872*
>
>
> On Tue, Mar 20, 2018 at 1:34 AM, Adam Cecile <ad...@hitec.lu> wrote:
>
>> Hi Greg,
>>
>> Yes I can confirm No kill ack received for instance
>> [pub_api_oecd-rest-api-on-port-20015.marathon-196f414a-f61f-11e7-856c-f6e84742f1ef],
>> retrying (73402 attempts so far)i cannot find this UUID in Mesos interface.
>>
>> Regards, Adam.
>>
>> On 03/15/2018 05:47 PM, Greg Mann wrote:
>>
>> Hi Adam,
>> The KILL calls are being sent to Mesos by Marathon. Since the KILL call
>> is being forwarded to the agent, it seems that the Mesos master is aware of
>> the task. Could you verify that the tasks show up as running in the Mesos
>> UI? You say that the tasks don't exist anymore - how did you verify this?
>> If the tasks show up as running in the Mesos state, but the actual task
>> processes are not running on the agent, then it could indicate an issue
>> with the Mesos agent or executor.
>>
>> Cheers,
>> Greg
>>
>>
>> On Wed, Mar 14, 2018 at 1:59 AM, Adam Cecile <ad...@hitec.lu>
>> wrote:
>>
>>> Hello,
>>>
>>> I see two old tasks being stuck in Mesos. These tasks don't exist
>>> anymore since ages but Mesos still tries to kill them:
>>>
>>>
>>> Mar 14 09:56:49 mario mesos-master[23570]: I0314 09:56:49.441572 23602
>>> master.cpp:5297] Processing KILL call for task
>>> 'pub_api_oecd-rest-api-on-port-20015.196f414a-f61f-11e7-856c-f6e84742f1ef'
>>> of framework 346d7333-a980-43a8-93ab-343ea12d77d7-0000 (marathon) at
>>> scheduler-66a67553-0692-40b0-b29e-e7f342b6a241@10.99.50.2:40487
>>>
>>> Mar 14 09:56:49 mario mesos-master[23570]: I0314 09:56:49.441658 23602
>>> master.cpp:5371] Telling agent 2215ab84-177b-478b-ab62-4453803fde6c-S6
>>> at slave(1)@10.99.50.3:5051 (zelda.service.domain.com) to kill task
>>> pub_api_oecd-rest-api-on-port-20015.196f414a-f61f-11e7-856c-f6e84742f1ef
>>> of framework 346d7333-a980-43a8-93ab-343ea12d77d7-0000 (marathon) at
>>> scheduler-66a67553-0692-40b0-b29e-e7f342b6a241@10.99.50.2:40487
>>>
>>> Mar 14 09:57:09 mario mesos-master[23570]: I0314 09:57:09.441529 23607
>>> master.cpp:5297] Processing KILL call for task
>>> 'pub_api_oecd-rest-api-on-port-20015.196f414a-f61f-11e7-856c-f6e84742f1ef'
>>> of framework 346d7333-a980-43a8-93ab-343ea12d77d7-0000 (marathon) at
>>> scheduler-66a67553-0692-40b0-b29e-e7f342b6a241@10.99.50.2:40487
>>>
>>> Mar 14 09:57:09 mario mesos-master[23570]: I0314 09:57:09.441617 23607
>>> master.cpp:5371] Telling agent 2215ab84-177b-478b-ab62-4453803fde6c-S6
>>> at slave(1)@10.99.50.3:5051 (zelda.service.domain.com) to kill task
>>> pub_api_oecd-rest-api-on-port-20015.196f414a-f61f-11e7-856c-f6e84742f1ef
>>> of framework 346d7333-a980-43a8-93ab-343ea12d77d7-0000 (marathon) at
>>> scheduler-66a67553-0692-40b0-b29e-e7f342b6a241@10.99.50.2:40487
>>>
>>>
>>> Could you please tell me how to "purge" them from Mesos master ?
>>>
>>> Thanks in advance,
>>>
>>> Adam.
>>>
>>
>>
>>
>
>
>

Re: Mesos master endless attemps to kill unexisting task

Posted by Adam Cecile <ad...@hitec.lu>.
For instance,

No kill ack received for instance 
[pub_api_oecd-rest-api-on-port-20015.marathon-196f414a-f61f-11e7-856c-f6e84742f1ef], 
retrying (73402 attempts so far)

I'd say after 73402 attempts, it's time to let it go :D

On 04/04/2018 10:07 AM, Adam Cecile wrote:
> Hello list !
>
> Problem is still on-going, any hint how to fix that ? Like removing 
> broken app from zookeeper by hand ?
>
> Regards, Adam.
>
> On 03/20/2018 06:04 PM, daemeon reiydelle wrote:
>> I ran across a situation with the same symptoms last year (with Mesos 
>> & Marathon) when we had network problems. The mesos task did exit 
>> normally (eventually found same in the logs), therefore the UUID had 
>> aged out.
>>
>>
>> <======>
>> "Who do you think made the first stone spear? The Asperger guy.
>> If you get rid of the autism genetics, there would be no Silicon Valley"
>> Temple Grandin
>> *Daemeon C.M. Reiydelle
>> San Francisco 1.415.501.0198
>> London 44 020 8144 9872*/
>> /*/*
>> */*
>>
>> On Tue, Mar 20, 2018 at 1:34 AM, Adam Cecile <adam.cecile@hitec.lu 
>> <ma...@hitec.lu>> wrote:
>>
>>     Hi Greg,
>>
>>     Yes I can confirm No kill ack received for instance
>>     [pub_api_oecd-rest-api-on-port-20015.marathon-196f414a-f61f-11e7-856c-f6e84742f1ef],
>>     retrying (73402 attempts so far)i cannot find this UUID in Mesos
>>     interface.
>>
>>     Regards, Adam.
>>
>>     On 03/15/2018 05:47 PM, Greg Mann wrote:
>>>     Hi Adam,
>>>     The KILL calls are being sent to Mesos by Marathon. Since the
>>>     KILL call is being forwarded to the agent, it seems that the
>>>     Mesos master is aware of the task. Could you verify that the
>>>     tasks show up as running in the Mesos UI? You say that the tasks
>>>     don't exist anymore - how did you verify this? If the tasks show
>>>     up as running in the Mesos state, but the actual task processes
>>>     are not running on the agent, then it could indicate an issue
>>>     with the Mesos agent or executor.
>>>
>>>     Cheers,
>>>     Greg
>>>
>>>
>>>     On Wed, Mar 14, 2018 at 1:59 AM, Adam Cecile
>>>     <adam.cecile@hitec.lu <ma...@hitec.lu>> wrote:
>>>
>>>         Hello,
>>>
>>>         I see two old tasks being stuck in Mesos. These tasks don't
>>>         exist anymore since ages but Mesos still tries to kill them:
>>>
>>>
>>>         Mar 14 09:56:49 mario mesos-master[23570]: I0314
>>>         09:56:49.441572 23602 master.cpp:5297] Processing KILL call
>>>         for task
>>>         'pub_api_oecd-rest-api-on-port-20015.196f414a-f61f-11e7-856c-f6e84742f1ef'
>>>         of framework 346d7333-a980-43a8-93ab-343ea12d77d7-0000
>>>         (marathon) at
>>>         scheduler-66a67553-0692-40b0-b29e-e7f342b6a241@10.99.50.2:40487
>>>         <http://scheduler-66a67553-0692-40b0-b29e-e7f342b6a241@10.99.50.2:40487>
>>>
>>>         Mar 14 09:56:49 mario mesos-master[23570]: I0314
>>>         09:56:49.441658 23602 master.cpp:5371] Telling agent
>>>         2215ab84-177b-478b-ab62-4453803fde6c-S6 at
>>>         slave(1)@10.99.50.3:5051 <http://10.99.50.3:5051>
>>>         (zelda.service.domain.com <http://zelda.service.domain.com>)
>>>         to kill task
>>>         pub_api_oecd-rest-api-on-port-20015.196f414a-f61f-11e7-856c-f6e84742f1ef
>>>         of framework 346d7333-a980-43a8-93ab-343ea12d77d7-0000
>>>         (marathon) at
>>>         scheduler-66a67553-0692-40b0-b29e-e7f342b6a241@10.99.50.2:40487
>>>         <http://scheduler-66a67553-0692-40b0-b29e-e7f342b6a241@10.99.50.2:40487>
>>>
>>>         Mar 14 09:57:09 mario mesos-master[23570]: I0314
>>>         09:57:09.441529 23607 master.cpp:5297] Processing KILL call
>>>         for task
>>>         'pub_api_oecd-rest-api-on-port-20015.196f414a-f61f-11e7-856c-f6e84742f1ef'
>>>         of framework 346d7333-a980-43a8-93ab-343ea12d77d7-0000
>>>         (marathon) at
>>>         scheduler-66a67553-0692-40b0-b29e-e7f342b6a241@10.99.50.2:40487
>>>         <http://scheduler-66a67553-0692-40b0-b29e-e7f342b6a241@10.99.50.2:40487>
>>>
>>>         Mar 14 09:57:09 mario mesos-master[23570]: I0314
>>>         09:57:09.441617 23607 master.cpp:5371] Telling agent
>>>         2215ab84-177b-478b-ab62-4453803fde6c-S6 at
>>>         slave(1)@10.99.50.3:5051 <http://10.99.50.3:5051>
>>>         (zelda.service.domain.com <http://zelda.service.domain.com>)
>>>         to kill task
>>>         pub_api_oecd-rest-api-on-port-20015.196f414a-f61f-11e7-856c-f6e84742f1ef
>>>         of framework 346d7333-a980-43a8-93ab-343ea12d77d7-0000
>>>         (marathon) at
>>>         scheduler-66a67553-0692-40b0-b29e-e7f342b6a241@10.99.50.2:40487
>>>         <http://scheduler-66a67553-0692-40b0-b29e-e7f342b6a241@10.99.50.2:40487>
>>>
>>>
>>>         Could you please tell me how to "purge" them from Mesos master ?
>>>
>>>         Thanks in advance,
>>>
>>>         Adam.
>>>
>>>
>>
>>
>


Re: Mesos master endless attemps to kill unexisting task

Posted by Adam Cecile <ad...@hitec.lu>.
Hello list !

Problem is still on-going, any hint how to fix that ? Like removing 
broken app from zookeeper by hand ?

Regards, Adam.

On 03/20/2018 06:04 PM, daemeon reiydelle wrote:
> I ran across a situation with the same symptoms last year (with Mesos 
> & Marathon) when we had network problems. The mesos task did exit 
> normally (eventually found same in the logs), therefore the UUID had 
> aged out.
>
>
> <======>
> "Who do you think made the first stone spear? The Asperger guy.
> If you get rid of the autism genetics, there would be no Silicon Valley"
> Temple Grandin
> *Daemeon C.M. Reiydelle
> San Francisco 1.415.501.0198
> London 44 020 8144 9872*/
> /*/*
> */*
>
> On Tue, Mar 20, 2018 at 1:34 AM, Adam Cecile <adam.cecile@hitec.lu 
> <ma...@hitec.lu>> wrote:
>
>     Hi Greg,
>
>     Yes I can confirm i cannot find this UUID in Mesos interface.
>
>     Regards, Adam.
>
>     On 03/15/2018 05:47 PM, Greg Mann wrote:
>>     Hi Adam,
>>     The KILL calls are being sent to Mesos by Marathon. Since the
>>     KILL call is being forwarded to the agent, it seems that the
>>     Mesos master is aware of the task. Could you verify that the
>>     tasks show up as running in the Mesos UI? You say that the tasks
>>     don't exist anymore - how did you verify this? If the tasks show
>>     up as running in the Mesos state, but the actual task processes
>>     are not running on the agent, then it could indicate an issue
>>     with the Mesos agent or executor.
>>
>>     Cheers,
>>     Greg
>>
>>
>>     On Wed, Mar 14, 2018 at 1:59 AM, Adam Cecile
>>     <adam.cecile@hitec.lu <ma...@hitec.lu>> wrote:
>>
>>         Hello,
>>
>>         I see two old tasks being stuck in Mesos. These tasks don't
>>         exist anymore since ages but Mesos still tries to kill them:
>>
>>
>>         Mar 14 09:56:49 mario mesos-master[23570]: I0314
>>         09:56:49.441572 23602 master.cpp:5297] Processing KILL call
>>         for task
>>         'pub_api_oecd-rest-api-on-port-20015.196f414a-f61f-11e7-856c-f6e84742f1ef'
>>         of framework 346d7333-a980-43a8-93ab-343ea12d77d7-0000
>>         (marathon) at
>>         scheduler-66a67553-0692-40b0-b29e-e7f342b6a241@10.99.50.2:40487
>>         <http://scheduler-66a67553-0692-40b0-b29e-e7f342b6a241@10.99.50.2:40487>
>>
>>         Mar 14 09:56:49 mario mesos-master[23570]: I0314
>>         09:56:49.441658 23602 master.cpp:5371] Telling agent
>>         2215ab84-177b-478b-ab62-4453803fde6c-S6 at
>>         slave(1)@10.99.50.3:5051 <http://10.99.50.3:5051>
>>         (zelda.service.domain.com <http://zelda.service.domain.com>)
>>         to kill task
>>         pub_api_oecd-rest-api-on-port-20015.196f414a-f61f-11e7-856c-f6e84742f1ef
>>         of framework 346d7333-a980-43a8-93ab-343ea12d77d7-0000
>>         (marathon) at
>>         scheduler-66a67553-0692-40b0-b29e-e7f342b6a241@10.99.50.2:40487
>>         <http://scheduler-66a67553-0692-40b0-b29e-e7f342b6a241@10.99.50.2:40487>
>>
>>         Mar 14 09:57:09 mario mesos-master[23570]: I0314
>>         09:57:09.441529 23607 master.cpp:5297] Processing KILL call
>>         for task
>>         'pub_api_oecd-rest-api-on-port-20015.196f414a-f61f-11e7-856c-f6e84742f1ef'
>>         of framework 346d7333-a980-43a8-93ab-343ea12d77d7-0000
>>         (marathon) at
>>         scheduler-66a67553-0692-40b0-b29e-e7f342b6a241@10.99.50.2:40487
>>         <http://scheduler-66a67553-0692-40b0-b29e-e7f342b6a241@10.99.50.2:40487>
>>
>>         Mar 14 09:57:09 mario mesos-master[23570]: I0314
>>         09:57:09.441617 23607 master.cpp:5371] Telling agent
>>         2215ab84-177b-478b-ab62-4453803fde6c-S6 at
>>         slave(1)@10.99.50.3:5051 <http://10.99.50.3:5051>
>>         (zelda.service.domain.com <http://zelda.service.domain.com>)
>>         to kill task
>>         pub_api_oecd-rest-api-on-port-20015.196f414a-f61f-11e7-856c-f6e84742f1ef
>>         of framework 346d7333-a980-43a8-93ab-343ea12d77d7-0000
>>         (marathon) at
>>         scheduler-66a67553-0692-40b0-b29e-e7f342b6a241@10.99.50.2:40487
>>         <http://scheduler-66a67553-0692-40b0-b29e-e7f342b6a241@10.99.50.2:40487>
>>
>>
>>         Could you please tell me how to "purge" them from Mesos master ?
>>
>>         Thanks in advance,
>>
>>         Adam.
>>
>>
>
>


Re: Mesos master endless attemps to kill unexisting task

Posted by daemeon reiydelle <da...@gmail.com>.
I ran across a situation with the same symptoms last year (with Mesos &
Marathon) when we had network problems. The mesos task did exit normally
(eventually found same in the logs), therefore the UUID had aged out.


<======>
"Who do you think made the first stone spear? The Asperger guy.
If you get rid of the autism genetics, there would be no Silicon Valley"
Temple Grandin


*Daemeon C.M. ReiydelleSan Francisco 1.415.501.0198London 44 020 8144 9872*


On Tue, Mar 20, 2018 at 1:34 AM, Adam Cecile <ad...@hitec.lu> wrote:

> Hi Greg,
>
> Yes I can confirm i cannot find this UUID in Mesos interface.
>
> Regards, Adam.
>
> On 03/15/2018 05:47 PM, Greg Mann wrote:
>
> Hi Adam,
> The KILL calls are being sent to Mesos by Marathon. Since the KILL call is
> being forwarded to the agent, it seems that the Mesos master is aware of
> the task. Could you verify that the tasks show up as running in the Mesos
> UI? You say that the tasks don't exist anymore - how did you verify this?
> If the tasks show up as running in the Mesos state, but the actual task
> processes are not running on the agent, then it could indicate an issue
> with the Mesos agent or executor.
>
> Cheers,
> Greg
>
>
> On Wed, Mar 14, 2018 at 1:59 AM, Adam Cecile <ad...@hitec.lu> wrote:
>
>> Hello,
>>
>> I see two old tasks being stuck in Mesos. These tasks don't exist anymore
>> since ages but Mesos still tries to kill them:
>>
>>
>> Mar 14 09:56:49 mario mesos-master[23570]: I0314 09:56:49.441572 23602
>> master.cpp:5297] Processing KILL call for task
>> 'pub_api_oecd-rest-api-on-port-20015.196f414a-f61f-11e7-856c-f6e84742f1ef'
>> of framework 346d7333-a980-43a8-93ab-343ea12d77d7-0000 (marathon) at
>> scheduler-66a67553-0692-40b0-b29e-e7f342b6a241@10.99.50.2:40487
>>
>> Mar 14 09:56:49 mario mesos-master[23570]: I0314 09:56:49.441658 23602
>> master.cpp:5371] Telling agent 2215ab84-177b-478b-ab62-4453803fde6c-S6
>> at slave(1)@10.99.50.3:5051 (zelda.service.domain.com) to kill task
>> pub_api_oecd-rest-api-on-port-20015.196f414a-f61f-11e7-856c-f6e84742f1ef
>> of framework 346d7333-a980-43a8-93ab-343ea12d77d7-0000 (marathon) at
>> scheduler-66a67553-0692-40b0-b29e-e7f342b6a241@10.99.50.2:40487
>>
>> Mar 14 09:57:09 mario mesos-master[23570]: I0314 09:57:09.441529 23607
>> master.cpp:5297] Processing KILL call for task
>> 'pub_api_oecd-rest-api-on-port-20015.196f414a-f61f-11e7-856c-f6e84742f1ef'
>> of framework 346d7333-a980-43a8-93ab-343ea12d77d7-0000 (marathon) at
>> scheduler-66a67553-0692-40b0-b29e-e7f342b6a241@10.99.50.2:40487
>>
>> Mar 14 09:57:09 mario mesos-master[23570]: I0314 09:57:09.441617 23607
>> master.cpp:5371] Telling agent 2215ab84-177b-478b-ab62-4453803fde6c-S6
>> at slave(1)@10.99.50.3:5051 (zelda.service.domain.com) to kill task
>> pub_api_oecd-rest-api-on-port-20015.196f414a-f61f-11e7-856c-f6e84742f1ef
>> of framework 346d7333-a980-43a8-93ab-343ea12d77d7-0000 (marathon) at
>> scheduler-66a67553-0692-40b0-b29e-e7f342b6a241@10.99.50.2:40487
>>
>>
>> Could you please tell me how to "purge" them from Mesos master ?
>>
>> Thanks in advance,
>>
>> Adam.
>>
>
>
>

Re: Mesos master endless attemps to kill unexisting task

Posted by Adam Cecile <ad...@hitec.lu>.
Hi Greg,

Yes I can confirm i cannot find this UUID in Mesos interface.

Regards, Adam.

On 03/15/2018 05:47 PM, Greg Mann wrote:
> Hi Adam,
> The KILL calls are being sent to Mesos by Marathon. Since the KILL 
> call is being forwarded to the agent, it seems that the Mesos master 
> is aware of the task. Could you verify that the tasks show up as 
> running in the Mesos UI? You say that the tasks don't exist anymore - 
> how did you verify this? If the tasks show up as running in the Mesos 
> state, but the actual task processes are not running on the agent, 
> then it could indicate an issue with the Mesos agent or executor.
>
> Cheers,
> Greg
>
>
> On Wed, Mar 14, 2018 at 1:59 AM, Adam Cecile <adam.cecile@hitec.lu 
> <ma...@hitec.lu>> wrote:
>
>     Hello,
>
>     I see two old tasks being stuck in Mesos. These tasks don't exist
>     anymore since ages but Mesos still tries to kill them:
>
>
>     Mar 14 09:56:49 mario mesos-master[23570]: I0314 09:56:49.441572
>     23602 master.cpp:5297] Processing KILL call for task
>     'pub_api_oecd-rest-api-on-port-20015.196f414a-f61f-11e7-856c-f6e84742f1ef'
>     of framework 346d7333-a980-43a8-93ab-343ea12d77d7-0000 (marathon)
>     at scheduler-66a67553-0692-40b0-b29e-e7f342b6a241@10.99.50.2:40487
>     <http://scheduler-66a67553-0692-40b0-b29e-e7f342b6a241@10.99.50.2:40487>
>
>     Mar 14 09:56:49 mario mesos-master[23570]: I0314 09:56:49.441658
>     23602 master.cpp:5371] Telling agent
>     2215ab84-177b-478b-ab62-4453803fde6c-S6 at
>     slave(1)@10.99.50.3:5051 <http://10.99.50.3:5051>
>     (zelda.service.domain.com <http://zelda.service.domain.com>) to
>     kill task
>     pub_api_oecd-rest-api-on-port-20015.196f414a-f61f-11e7-856c-f6e84742f1ef
>     of framework 346d7333-a980-43a8-93ab-343ea12d77d7-0000 (marathon)
>     at scheduler-66a67553-0692-40b0-b29e-e7f342b6a241@10.99.50.2:40487
>     <http://scheduler-66a67553-0692-40b0-b29e-e7f342b6a241@10.99.50.2:40487>
>
>     Mar 14 09:57:09 mario mesos-master[23570]: I0314 09:57:09.441529
>     23607 master.cpp:5297] Processing KILL call for task
>     'pub_api_oecd-rest-api-on-port-20015.196f414a-f61f-11e7-856c-f6e84742f1ef'
>     of framework 346d7333-a980-43a8-93ab-343ea12d77d7-0000 (marathon)
>     at scheduler-66a67553-0692-40b0-b29e-e7f342b6a241@10.99.50.2:40487
>     <http://scheduler-66a67553-0692-40b0-b29e-e7f342b6a241@10.99.50.2:40487>
>
>     Mar 14 09:57:09 mario mesos-master[23570]: I0314 09:57:09.441617
>     23607 master.cpp:5371] Telling agent
>     2215ab84-177b-478b-ab62-4453803fde6c-S6 at
>     slave(1)@10.99.50.3:5051 <http://10.99.50.3:5051>
>     (zelda.service.domain.com <http://zelda.service.domain.com>) to
>     kill task
>     pub_api_oecd-rest-api-on-port-20015.196f414a-f61f-11e7-856c-f6e84742f1ef
>     of framework 346d7333-a980-43a8-93ab-343ea12d77d7-0000 (marathon)
>     at scheduler-66a67553-0692-40b0-b29e-e7f342b6a241@10.99.50.2:40487
>     <http://scheduler-66a67553-0692-40b0-b29e-e7f342b6a241@10.99.50.2:40487>
>
>
>     Could you please tell me how to "purge" them from Mesos master ?
>
>     Thanks in advance,
>
>     Adam.
>
>


Re: Mesos master endless attemps to kill unexisting task

Posted by Greg Mann <gr...@mesosphere.io>.
Hi Adam,
The KILL calls are being sent to Mesos by Marathon. Since the KILL call is
being forwarded to the agent, it seems that the Mesos master is aware of
the task. Could you verify that the tasks show up as running in the Mesos
UI? You say that the tasks don't exist anymore - how did you verify this?
If the tasks show up as running in the Mesos state, but the actual task
processes are not running on the agent, then it could indicate an issue
with the Mesos agent or executor.

Cheers,
Greg


On Wed, Mar 14, 2018 at 1:59 AM, Adam Cecile <ad...@hitec.lu> wrote:

> Hello,
>
> I see two old tasks being stuck in Mesos. These tasks don't exist anymore
> since ages but Mesos still tries to kill them:
>
>
> Mar 14 09:56:49 mario mesos-master[23570]: I0314 09:56:49.441572 23602
> master.cpp:5297] Processing KILL call for task
> 'pub_api_oecd-rest-api-on-port-20015.196f414a-f61f-11e7-856c-f6e84742f1ef'
> of framework 346d7333-a980-43a8-93ab-343ea12d77d7-0000 (marathon) at
> scheduler-66a67553-0692-40b0-b29e-e7f342b6a241@10.99.50.2:40487
>
> Mar 14 09:56:49 mario mesos-master[23570]: I0314 09:56:49.441658 23602
> master.cpp:5371] Telling agent 2215ab84-177b-478b-ab62-4453803fde6c-S6 at
> slave(1)@10.99.50.3:5051 (zelda.service.domain.com) to kill task
> pub_api_oecd-rest-api-on-port-20015.196f414a-f61f-11e7-856c-f6e84742f1ef
> of framework 346d7333-a980-43a8-93ab-343ea12d77d7-0000 (marathon) at
> scheduler-66a67553-0692-40b0-b29e-e7f342b6a241@10.99.50.2:40487
>
> Mar 14 09:57:09 mario mesos-master[23570]: I0314 09:57:09.441529 23607
> master.cpp:5297] Processing KILL call for task
> 'pub_api_oecd-rest-api-on-port-20015.196f414a-f61f-11e7-856c-f6e84742f1ef'
> of framework 346d7333-a980-43a8-93ab-343ea12d77d7-0000 (marathon) at
> scheduler-66a67553-0692-40b0-b29e-e7f342b6a241@10.99.50.2:40487
>
> Mar 14 09:57:09 mario mesos-master[23570]: I0314 09:57:09.441617 23607
> master.cpp:5371] Telling agent 2215ab84-177b-478b-ab62-4453803fde6c-S6 at
> slave(1)@10.99.50.3:5051 (zelda.service.domain.com) to kill task
> pub_api_oecd-rest-api-on-port-20015.196f414a-f61f-11e7-856c-f6e84742f1ef
> of framework 346d7333-a980-43a8-93ab-343ea12d77d7-0000 (marathon) at
> scheduler-66a67553-0692-40b0-b29e-e7f342b6a241@10.99.50.2:40487
>
>
> Could you please tell me how to "purge" them from Mesos master ?
>
> Thanks in advance,
>
> Adam.
>