You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@flink.apache.org by Gyula Fóra <gy...@gmail.com> on 2016/06/01 11:42:35 UTC

HA Cluster restart behaviour

Hi,

I have noticed some strange behaviour on a streaming cluster running in HA
mode.

I have stopped a cluster with some deployed jobs (stop-cluster.sh without
cancelling the jobs) and when I bring the cluster back up the jobs that
were running before are restarted.

Is this the expected behaviour? It feels strange that jobs will be
automatically redeployed after specifically calling stop-cluster.

Regards,
Gyula

Re: HA Cluster restart behaviour

Posted by Maximilian Michels <mx...@apache.org>.

At the moment, the stop-cluster script simply sends a TERM signal to
all processes using "kill". Shutting down the cluster cleanly is a bit
more complicated and would block for a longer time. I think the
current approach is the safest for most users. I agree that it would
be nice to have an option to shutdown cleanly.

On Wed, Jun 1, 2016 at 2:21 PM, Gyula Fóra <gy...@gmail.com> wrote:
> So you mean that you don't want people accidentally remove all jobs by
> shutting down the cluster? I think it is a bigger risk that people will
> actually click the cancel button on the website by accident :D
>
> For me it would seem intuitive that when I stop the cluster it stops the
> jobs. Definitely a flag would help to make sure the jobs are cleared but I
> am not sure what the default behaviour should be.
>
> Gyula
>
> Márton Balassi <ba...@gmail.com> ezt írta (időpont: 2016. jún. 1.,
> Sze, 14:14):
>
>> I also think that the current mechanism is weird. IMHO it makes sense to
>> add the flag to both the start and stop scripts.
>>
>> On Wed, Jun 1, 2016 at 2:09 PM, Ufuk Celebi <uc...@apache.org> wrote:
>>
>> > Yes, it's expected, but you are certainly not the first one to be
>> > confused by this behaviour.
>> >
>> > The reasoning behind the current behaviour is that we don't users
>> > accidentally removing jobs, which seems worse than requiring users to
>> > cancel manually. We thought about adding a flag to the start scripts
>> > to either clear the jobs on start up or shut down. What's your opinion
>> > on this?
>> >
>> > – Ufuk
>> >
>> > On Wed, Jun 1, 2016 at 1:42 PM, Gyula Fóra <gy...@gmail.com> wrote:
>> > > Hi,
>> > >
>> > > I have noticed some strange behaviour on a streaming cluster running in
>> > HA
>> > > mode.
>> > >
>> > > I have stopped a cluster with some deployed jobs (stop-cluster.sh
>> without
>> > > cancelling the jobs) and when I bring the cluster back up the jobs that
>> > > were running before are restarted.
>> > >
>> > > Is this the expected behaviour? It feels strange that jobs will be
>> > > automatically redeployed after specifically calling stop-cluster.
>> > >
>> > > Regards,
>> > > Gyula
>> >
>>

Re: HA Cluster restart behaviour

Posted by Gyula Fóra <gy...@gmail.com>.

So you mean that you don't want people accidentally remove all jobs by
shutting down the cluster? I think it is a bigger risk that people will
actually click the cancel button on the website by accident :D

For me it would seem intuitive that when I stop the cluster it stops the
jobs. Definitely a flag would help to make sure the jobs are cleared but I
am not sure what the default behaviour should be.

Gyula

Márton Balassi <ba...@gmail.com> ezt írta (időpont: 2016. jún. 1.,
Sze, 14:14):

> I also think that the current mechanism is weird. IMHO it makes sense to
> add the flag to both the start and stop scripts.
>
> On Wed, Jun 1, 2016 at 2:09 PM, Ufuk Celebi <uc...@apache.org> wrote:
>
> > Yes, it's expected, but you are certainly not the first one to be
> > confused by this behaviour.
> >
> > The reasoning behind the current behaviour is that we don't users
> > accidentally removing jobs, which seems worse than requiring users to
> > cancel manually. We thought about adding a flag to the start scripts
> > to either clear the jobs on start up or shut down. What's your opinion
> > on this?
> >
> > – Ufuk
> >
> > On Wed, Jun 1, 2016 at 1:42 PM, Gyula Fóra <gy...@gmail.com> wrote:
> > > Hi,
> > >
> > > I have noticed some strange behaviour on a streaming cluster running in
> > HA
> > > mode.
> > >
> > > I have stopped a cluster with some deployed jobs (stop-cluster.sh
> without
> > > cancelling the jobs) and when I bring the cluster back up the jobs that
> > > were running before are restarted.
> > >
> > > Is this the expected behaviour? It feels strange that jobs will be
> > > automatically redeployed after specifically calling stop-cluster.
> > >
> > > Regards,
> > > Gyula
> >
>

Re: HA Cluster restart behaviour

Posted by Márton Balassi <ba...@gmail.com>.

I also think that the current mechanism is weird. IMHO it makes sense to
add the flag to both the start and stop scripts.

On Wed, Jun 1, 2016 at 2:09 PM, Ufuk Celebi <uc...@apache.org> wrote:

> Yes, it's expected, but you are certainly not the first one to be
> confused by this behaviour.
>
> The reasoning behind the current behaviour is that we don't users
> accidentally removing jobs, which seems worse than requiring users to
> cancel manually. We thought about adding a flag to the start scripts
> to either clear the jobs on start up or shut down. What's your opinion
> on this?
>
> – Ufuk
>
> On Wed, Jun 1, 2016 at 1:42 PM, Gyula Fóra <gy...@gmail.com> wrote:
> > Hi,
> >
> > I have noticed some strange behaviour on a streaming cluster running in
> HA
> > mode.
> >
> > I have stopped a cluster with some deployed jobs (stop-cluster.sh without
> > cancelling the jobs) and when I bring the cluster back up the jobs that
> > were running before are restarted.
> >
> > Is this the expected behaviour? It feels strange that jobs will be
> > automatically redeployed after specifically calling stop-cluster.
> >
> > Regards,
> > Gyula
>

Re: HA Cluster restart behaviour

Posted by Ufuk Celebi <uc...@apache.org>.

Yes, it's expected, but you are certainly not the first one to be
confused by this behaviour.

The reasoning behind the current behaviour is that we don't users
accidentally removing jobs, which seems worse than requiring users to
cancel manually. We thought about adding a flag to the start scripts
to either clear the jobs on start up or shut down. What's your opinion
on this?

– Ufuk

On Wed, Jun 1, 2016 at 1:42 PM, Gyula Fóra <gy...@gmail.com> wrote:
> Hi,
>
> I have noticed some strange behaviour on a streaming cluster running in HA
> mode.
>
> I have stopped a cluster with some deployed jobs (stop-cluster.sh without
> cancelling the jobs) and when I bring the cluster back up the jobs that
> were running before are restarted.
>
> Is this the expected behaviour? It feels strange that jobs will be
> automatically redeployed after specifically calling stop-cluster.
>
> Regards,
> Gyula