You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mesos.apache.org by Marc Roos <M....@f1-outsourcing.eu> on 2020/08/25 13:07:10 UTC
Suddenly all tasks gone, framework at completed, cannot start framework
Today all my tasks are down and framework marathon is at completed. Any
idea how this can happen?
ed.cpp:520] Successfully authenticated with master
master@192.168.10.151:5050
I0825 13:03:27.961248 108 sched.cpp:1188] Got error 'Framework has
been removed'
RE: Suddenly all tasks gone, framework at completed, cannot start framework
Posted by Marc Roos <M....@f1-outsourcing.eu>.
Is there a way to change this failover_timeout after the framework is
running? Via the api or so? I see it is changed when the leader is
changing.
-----Original Message-----
To: user
Cc: cf.natali; janiszt
Subject: RE: Suddenly all tasks gone, framework at completed, cannot
start framework
Thanks Tomek, Charles, I increased my MARATHON_FAILOVER_TIMEOUT from a
day to a week. I almost cannot believe something happened yesterday that
made everything go down today. However I have recently been testing with
JAVA_OPTS to prevent oom's from the marathon tasks.
RE: Suddenly all tasks gone, framework at completed, cannot start framework
Posted by Marc Roos <M....@f1-outsourcing.eu>.
Thanks Tomek, Charles, I increased my MARATHON_FAILOVER_TIMEOUT from a
day to a week. I almost cannot believe something happened yesterday that
made everything go down today. However I have recently been testing with
JAVA_OPTS to prevent oom's from the marathon tasks.
-----Original Message-----
From: Tomek Janiszewski [mailto:janiszt@gmail.com]
Sent: dinsdag 25 augustus 2020 16:55
To: user
Subject: Re: Suddenly all tasks gone, framework at completed, cannot
start framework
See: https://stackoverflow.com/a/42544023/1387612
wt., 25 sie 2020 o 15:07 Marc Roos <M....@f1-outsourcing.eu>
napisał(a):
Today all my tasks are down and framework marathon is at completed.
Any
idea how this can happen?
ed.cpp:520] Successfully authenticated with master
master@192.168.10.151:5050
I0825 13:03:27.961248 108 sched.cpp:1188] Got error 'Framework
has
been removed'
Re: Suddenly all tasks gone, framework at completed, cannot start framework
Posted by Tomek Janiszewski <ja...@gmail.com>.
See: https://stackoverflow.com/a/42544023/1387612
wt., 25 sie 2020 o 15:07 Marc Roos <M....@f1-outsourcing.eu> napisał(a):
>
>
> Today all my tasks are down and framework marathon is at completed. Any
> idea how this can happen?
>
>
>
> ed.cpp:520] Successfully authenticated with master
> master@192.168.10.151:5050
> I0825 13:03:27.961248 108 sched.cpp:1188] Got error 'Framework has
> been removed'
>
>
Re: Suddenly all tasks gone, framework at completed, cannot start
framework -
Posted by Charles-François Natali <cf...@gmail.com>.
Marc,
Have you read
https://mesos.readthedocs.io/en/1.1.0/high-availability-framework-guide/ in
particular the section about the FrameworkInfo failover_timeout?
Cheers,
Charles
On Tue, 25 Aug 2020, 16:01 Marc Roos, <M....@f1-outsourcing.eu> wrote:
>
>
>
> I assume this was because something happened with zookeeper, and it
> restarted loading the wrong configuration file without the quorum=1.
> Because I was testing with different zookeeper rpms (mesos rpm conf is
> not standard location)
>
> Question: Is this by design that all tasks are terminated when zookeeper
> is gone? Is there some timeout setting that allows tasks to run for a
> day without zookeeper
>
>
>
>
>
> -----Original Message-----
> To: user
> Subject: Suddenly all tasks gone, framework at completed, cannot start
> framework
>
>
>
> Today all my tasks are down and framework marathon is at completed. Any
> idea how this can happen?
>
>
>
> ed.cpp:520] Successfully authenticated with master
> master@192.168.10.151:5050
> I0825 13:03:27.961248 108 sched.cpp:1188] Got error 'Framework has
> been removed'
>
>
>
>
RE: Suddenly all tasks gone, framework at completed, cannot start framework -
Posted by Marc Roos <M....@f1-outsourcing.eu>.
I assume this was because something happened with zookeeper, and it
restarted loading the wrong configuration file without the quorum=1.
Because I was testing with different zookeeper rpms (mesos rpm conf is
not standard location)
Question: Is this by design that all tasks are terminated when zookeeper
is gone? Is there some timeout setting that allows tasks to run for a
day without zookeeper
-----Original Message-----
To: user
Subject: Suddenly all tasks gone, framework at completed, cannot start
framework
Today all my tasks are down and framework marathon is at completed. Any
idea how this can happen?
ed.cpp:520] Successfully authenticated with master
master@192.168.10.151:5050
I0825 13:03:27.961248 108 sched.cpp:1188] Got error 'Framework has
been removed'