You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@flink.apache.org by vishalovercome <vi...@moengage.com> on 2020/12/16 21:12:56 UTC

Re: Challenges Deploying Flink With Savepoints On Kubernetes

I'm not sure if this addresses the original concern. For instance consider
this sequence:

1. Job starts from savepoint
2. Job creates a few checkpoints
3. Job manager (just one in kubernetes) crashes and restarts with the
commands specified in the kubernetes manifest which has the savepoint path

Will Zookeeper based HA ensure that this savepoint path will be ignored? 

I've asked this and various other questions here -
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Will-job-manager-restarts-lead-to-repeated-savepoint-restoration-tp40188.html



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Challenges Deploying Flink With Savepoints On Kubernetes

Posted by vishalovercome <vi...@moengage.com>.

Thanks for your reply!

What I have seen is that the job terminates when there's intermittent loss
of connectivity with zookeeper. This is in-fact the most common reason why
our jobs are terminating at this point. Worse, it's unable to restore from
checkpoint during some (not all) of these terminations. Under these
scenarios, won't the job try to recover from a savepoint?

I've gone through various tickets reporting stability issues due to
zookeeper that you've mentioned you intend to resolve soon. But until the
zookeeper based HA is stable, should we assume that it will repeatedly
restore from savepoints? I would rather rely on kafka offsets to resume
where it left off rather than savepoints.




--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Challenges Deploying Flink With Savepoints On Kubernetes

Posted by Till Rohrmann <tr...@apache.org>.

Flink should try to pick the latest checkpoint and will only use the
savepoint if no newer checkpoint could be found.

Cheers,
Till

On Wed, Dec 16, 2020 at 10:13 PM vishalovercome <vi...@moengage.com> wrote:

> I'm not sure if this addresses the original concern. For instance consider
> this sequence:
>
> 1. Job starts from savepoint
> 2. Job creates a few checkpoints
> 3. Job manager (just one in kubernetes) crashes and restarts with the
> commands specified in the kubernetes manifest which has the savepoint path
>
> Will Zookeeper based HA ensure that this savepoint path will be ignored?
>
> I've asked this and various other questions here -
>
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Will-job-manager-restarts-lead-to-repeated-savepoint-restoration-tp40188.html
>
>
>
> --
> Sent from:
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
>