You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Till Rohrmann (Jira)" <ji...@apache.org> on 2020/07/27 14:50:00 UTC

[jira] [Comment Edited] (FLINK-18733) Jobmanager cannot start in HA mode with Zookeeper

    [ https://issues.apache.org/jira/browse/FLINK-18733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17165756#comment-17165756 ] 

Till Rohrmann edited comment on FLINK-18733 at 7/27/20, 2:49 PM:
-----------------------------------------------------------------

Thanks for reporting this issue [~lilyevsky]. Could you share with us the cluster logs for further debugging? It would also be helpful to better understand in which environment you are deploying Flink. We bumped the ZooKeeper version from {{3.4.10}} to {{3.4.14}} for the {{1.11.0}} release (FLINK-18042 & FLINK-16955). This change might be causing the problems you are observing.


was (Author: till.rohrmann):
Thanks for reporting this issue [~lilyevsky]. Could you share with us the cluster logs for further debugging? It would also be helpful to better understand in which environment you are deploying Flink. We bumped the ZooKeeper version from {{3.4.10}} to {{3.4.14}} for the {{1.11.0}} release (FLINK-18042). This change might be causing the problems you are observing.

> Jobmanager cannot start in HA mode with Zookeeper
> -------------------------------------------------
>
>                 Key: FLINK-18733
>                 URL: https://issues.apache.org/jira/browse/FLINK-18733
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Coordination
>    Affects Versions: 1.11.1
>            Reporter: Leonid Ilyevsky
>            Priority: Major
>
> When configured in HA mode, the Jobmanager cannot start at all. First, it issues warnings like this:
> {quote}{{2020-07-27 08:58:23,197 WARN org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.ClientCnxn [] - Session 0x0 for server *nj1dvloglab01.liquidnet.biz/<unresolved>:2181*, unexpected error, closing socket connection and attempting reconnect}}
>  {{java.lang.IllegalArgumentException: *Unable to canonicalize address* nj1dvloglab01.liquidnet.biz/<unresolved>:2181 because it's not resolvable}}
>  {{ at org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.SaslServerPrincipal.getServerPrincipal(SaslServerPrincipal.java:65) ~[flink-shaded-zookeeper-3.4.14.jar:3.4.14-11.0]}}
>  {{ at org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.SaslServerPrincipal.getServerPrincipal(SaslServerPrincipal.java:41) ~[flink-shaded-zookeeper-3.4.14.jar:3.4.14-11.0]}}
>  {{ at org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.ClientCnxn$SendThread.startConnect(ClientCnxn.java:1001) ~[flink-shaded-zookeeper-3.4.14.jar:3.4.14-11.0]}}
>  {{ at org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1060) [flink-shaded-zookeeper-3.4.14.jar:3.4.14-11.0]}}
> {quote}
> After few attempts connecting to Zookeeper, it finally fails:
> {quote}2020-07-27 08:59:35,055 ERROR org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - Fatal error occurred in the cluster entrypoint.
>  org.apache.flink.util.FlinkException: Unhandled error in ZooKeeperLeaderElectionService: Ensure path threw exception
>  at org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionService.unhandledError(ZooKeeperLeaderElectionService.java:430) ~[flink-dist_2.12-1.11.1.jar:1.11.1]
> {quote}
>  
> The same HA configuration works fine for me in Flink 1.10.0.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)