You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Varun Chakravarthy Senthilnathan <Va...@infosys.com> on 2021/03/03 06:26:51 UTC

Flink Zookeeper leader change v 1.9.X

Hi,

We are using flink version 1.9.1 and in a long-running environment, we encountered the specific issue mentioned in : https://issues.apache.org/jira/browse/FLINK-14091
While we are working on upgrading our version,

  1.  Why does zookeeper go for a leader change? As far as we checked, there was not scaling in our cluster at all. The load was very minimal. Is there any reason for the zookeeper leader change to happen?
  2.  is there a way to replicate the zookeeper leader change manually to verify if the version upgrade helped us?

Regards,
Varun.


Re: Flink Zookeeper leader change v 1.9.X

Posted by Chesnay Schepler <ch...@apache.org>.
1) This could occur due to a number of reasons, like processes crashing, 
network issues between ZK and Flink, or the JobManager being stuck in 
some blocking operation for a long time. You will need to take a look at 
the ZK/Flink logs to narrow things down.

2) For FLINK-14091 the issue was not just a ZK leader change but that 
the zookeeper connection was suspended, i.e, the connection broke down. 
I'd think the best way to replicate that is to shut down ZK for a bit, 
or make it otherwise unreachable. To trigger a plain leader change the 
easiest way would be to kill the leading JobManager.

On 3/3/2021 7:26 AM, Varun Chakravarthy Senthilnathan wrote:
>
> Hi,
>
> We are using flink version 1.9.1 and in a long-running environment, we 
> encountered the specific issue mentioned in : 
> https://issues.apache.org/jira/browse/FLINK-14091 
> <https://issues.apache.org/jira/browse/FLINK-14091>
>
> While we are working on upgrading our version,
>
>  1. Why does zookeeper go for a leader change? As far as we checked,
>     there was not scaling in our cluster at all. The load was very
>     minimal. Is there any reason for the zookeeper leader change to
>     happen?
>  2. is there a way to replicate the zookeeper leader change manually
>     to verify if the version upgrade helped us?
>
> Regards,
>
> Varun.
>