You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Chesnay Schepler (Jira)" <ji...@apache.org> on 2020/04/29 13:37:00 UTC

[jira] [Comment Edited] (FLINK-17443) Flink's ZK in HA mode setup is unable to start up if any of the zk hosts are unreachable

    [ https://issues.apache.org/jira/browse/FLINK-17443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17095457#comment-17095457 ] 

Chesnay Schepler edited comment on FLINK-17443 at 4/29/20, 1:36 PM:
--------------------------------------------------------------------

I'm not actively working on it; I can assign it to you. You basically have to do the thing you proposed for Flink, just against the [flink-shaded|https://github.com/apache/flink-shaded/] repo.

I would then close this ticket as a duplicate.


was (Author: zentol):
I'm not actively working on it; I can assign it to you. You basically have to the thing you proposed for Flink, just against the [flink-shaded|https://github.com/apache/flink-shaded/] repo.

I would then close this ticket as a duplicate.

> Flink's ZK in HA mode setup is unable to start up if any of the zk hosts are unreachable
> ----------------------------------------------------------------------------------------
>
>                 Key: FLINK-17443
>                 URL: https://issues.apache.org/jira/browse/FLINK-17443
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Coordination
>            Reporter: Piyush Narang
>            Priority: Major
>              Labels: pull-request-available
>
> We occasionally hit an issue where our Flink cluster will not startup if any of the zookeeper hosts passed in the "high-availability.zookeeper.quorum" config setting are unreachable. This seems to stem from us using an older zookeeper dependency version (3.4.10). 
> Sample error we see is shown below.
> This error seems to stem from us being on an older zookeeper release (3.4.10). This has been fixed as part of: https://issues.apache.org/jira/browse/ZOOKEEPER-1576 in the 3.4.x branch ([https://github.com/apache/zookeeper/commit/be1409cc9a14ac2e28693e0e02a0ba6d9713565e]). 
> {code:java}
> java.net.UnknownHostException: zk01-pa4.hpc.criteo.prod: Name or service not knownjava.net.UnknownHostException: zk01-pa4.hpc.criteo.prod: Name or service not known at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method) at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:929) at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1324) at java.net.InetAddress.getAllByName0(InetAddress.java:1277) at java.net.InetAddress.getAllByName(InetAddress.java:1193) at java.net.InetAddress.getAllByName(InetAddress.java:1127) at org.apache.flink.shaded.zookeeper.org.apache.zookeeper.client.StaticHostProvider.<init>(StaticHostProvider.java:61)  at org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ZooKeeper.<init>(ZooKeeper.java:445) at org.apache.flink.shaded.curator.org.apache.curator.utils.DefaultZookeeperFactory.newZooKeeper(DefaultZookeeperFactory.java:29) at org.apache.flink.shaded.curator.org.apache.curator.framework.imps.CuratorFrameworkImpl$2.newZooKeeper(CuratorFrameworkImpl.java:150) at org.apache.flink.shaded.curator.org.apache.curator.HandleHolder$1.getZooKeeper(HandleHolder.java:94) at org.apache.flink.shaded.curator.org.apache.curator.HandleHolder.getZooKeeper(HandleHolder.java:55) at org.apache.flink.shaded.curator.org.apache.curator.ConnectionState.reset(ConnectionState.java:262) at org.apache.flink.shaded.curator.org.apache.curator.ConnectionState.start(ConnectionState.java:109) at org.apache.flink.shaded.curator.org.apache.curator.CuratorZookeeperClient.start(CuratorZookeeperClient.java:191) at org.apache.flink.shaded.curator.org.apache.curator.framework.imps.CuratorFrameworkImpl.start(CuratorFrameworkImpl.java:259) at org.apache.flink.runtime.util.ZooKeeperUtils.startCuratorFramework(ZooKeeperUtils.java:131) at org.apache.flink.runtime.highavailability.HighAvailabilityServicesUtils.createHighAvailabilityServices(HighAvailabilityServicesUtils.java:123) at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.createHaServices(ClusterEntrypoint.java:292) at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.initializeServices(ClusterEntrypoint.java:257){code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)