You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mesos.apache.org by Marc Roos <M....@f1-outsourcing.eu> on 2020/08/25 13:07:10 UTC

Suddenly all tasks gone, framework at completed, cannot start framework


Today all my tasks are down and framework marathon is at completed. Any 
idea how this can happen?



ed.cpp:520] Successfully authenticated with master 
master@192.168.10.151:5050
I0825 13:03:27.961248   108 sched.cpp:1188] Got error 'Framework has 
been removed'


RE: Suddenly all tasks gone, framework at completed, cannot start framework

Posted by Marc Roos <M....@f1-outsourcing.eu>.
Is there a way to change this failover_timeout after the framework is 
running? Via the api or so? I see it is changed when the leader is 
changing.


-----Original Message-----
To: user
Cc: cf.natali; janiszt
Subject: RE: Suddenly all tasks gone, framework at completed, cannot 
start framework


Thanks Tomek, Charles, I increased my MARATHON_FAILOVER_TIMEOUT from a 
day to a week. I almost cannot believe something happened yesterday that 
made everything go down today. However I have recently been testing with 
JAVA_OPTS to prevent oom's from the marathon tasks.




RE: Suddenly all tasks gone, framework at completed, cannot start framework

Posted by Marc Roos <M....@f1-outsourcing.eu>.
Thanks Tomek, Charles, I increased my MARATHON_FAILOVER_TIMEOUT from a 
day to a week. I almost cannot believe something happened yesterday that 
made everything go down today. However I have recently been testing with 
JAVA_OPTS to prevent oom's from the marathon tasks.




-----Original Message-----
From: Tomek Janiszewski [mailto:janiszt@gmail.com] 
Sent: dinsdag 25 augustus 2020 16:55
To: user
Subject: Re: Suddenly all tasks gone, framework at completed, cannot 
start framework

See: https://stackoverflow.com/a/42544023/1387612

wt., 25 sie 2020 o 15:07 Marc Roos <M....@f1-outsourcing.eu> 
napisał(a):




	Today all my tasks are down and framework marathon is at completed. 
Any 
	idea how this can happen?
	
	
	
	ed.cpp:520] Successfully authenticated with master 
	master@192.168.10.151:5050
	I0825 13:03:27.961248   108 sched.cpp:1188] Got error 'Framework 
has 
	been removed'
	
	



Re: Suddenly all tasks gone, framework at completed, cannot start framework

Posted by Tomek Janiszewski <ja...@gmail.com>.
See: https://stackoverflow.com/a/42544023/1387612

wt., 25 sie 2020 o 15:07 Marc Roos <M....@f1-outsourcing.eu> napisał(a):

>
>
> Today all my tasks are down and framework marathon is at completed. Any
> idea how this can happen?
>
>
>
> ed.cpp:520] Successfully authenticated with master
> master@192.168.10.151:5050
> I0825 13:03:27.961248   108 sched.cpp:1188] Got error 'Framework has
> been removed'
>
>

Re: Suddenly all tasks gone, framework at completed, cannot start framework -

Posted by Charles-François Natali <cf...@gmail.com>.
Marc,

Have you read
https://mesos.readthedocs.io/en/1.1.0/high-availability-framework-guide/ in
particular the section about the FrameworkInfo failover_timeout?

Cheers,

Charles



On Tue, 25 Aug 2020, 16:01 Marc Roos, <M....@f1-outsourcing.eu> wrote:

>
>
>
> I assume this was because something happened with zookeeper, and it
> restarted loading the wrong configuration file without the quorum=1.
> Because I was testing with different zookeeper rpms (mesos rpm conf is
> not standard location)
>
> Question: Is this by design that all tasks are terminated when zookeeper
> is gone? Is there some timeout setting that allows tasks to run for a
> day without zookeeper
>
>
>
>
>
> -----Original Message-----
> To: user
> Subject: Suddenly all tasks gone, framework at completed, cannot start
> framework
>
>
>
> Today all my tasks are down and framework marathon is at completed. Any
> idea how this can happen?
>
>
>
> ed.cpp:520] Successfully authenticated with master
> master@192.168.10.151:5050
> I0825 13:03:27.961248   108 sched.cpp:1188] Got error 'Framework has
> been removed'
>
>
>
>

RE: Suddenly all tasks gone, framework at completed, cannot start framework -

Posted by Marc Roos <M....@f1-outsourcing.eu>.


I assume this was because something happened with zookeeper, and it 
restarted loading the wrong configuration file without the quorum=1. 
Because I was testing with different zookeeper rpms (mesos rpm conf is 
not standard location)

Question: Is this by design that all tasks are terminated when zookeeper 
is gone? Is there some timeout setting that allows tasks to run for a 
day without zookeeper





-----Original Message-----
To: user
Subject: Suddenly all tasks gone, framework at completed, cannot start 
framework



Today all my tasks are down and framework marathon is at completed. Any 
idea how this can happen?



ed.cpp:520] Successfully authenticated with master 
master@192.168.10.151:5050
I0825 13:03:27.961248   108 sched.cpp:1188] Got error 'Framework has 
been removed'