You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@helix.apache.org by "Changgeng Li (JIRA)" <ji...@apache.org> on 2015/07/29 00:11:06 UTC

[jira] [Updated] (HELIX-608) NPE and unable to reconnect to zookeeper after a network outage

     [ https://issues.apache.org/jira/browse/HELIX-608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Changgeng Li updated HELIX-608:
-------------------------------
    Description: 
ERROR 2015-07-28 17:12:15,010 [main-EventThread] org.apache.zookeeper.ClientCnxn: Error while calling watcher
java.lang.RuntimeException: Exception while restarting zk client
        at org.I0Itec.zkclient.ZkClient.processStateChanged(ZkClient.java:462) ~[telegraph.jar:?]
        at org.I0Itec.zkclient.ZkClient.process(ZkClient.java:368) ~[telegraph.jar:?]
        at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:531) [telegraph.jar:?]
        at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:507) [telegraph.jar:?]
Caused by: org.I0Itec.zkclient.exception.ZkException: Unable to connect to zeusglobalwo035.dummy.com:2181,zeusglobalwo044.frc3.dummy.com:2181/instagram-a
        at org.I0Itec.zkclient.ZkConnection.connect(ZkConnection.java:66) ~[telegraph.jar:?]
        at org.I0Itec.zkclient.ZkClient.reconnect(ZkClient.java:935) ~[telegraph.jar:?]
        at org.I0Itec.zkclient.ZkClient.processStateChanged(ZkClient.java:458) ~[telegraph.jar:?]
        ... 3 more
Caused by: java.net.UnknownHostException: zeusglobalwo035.dummy.com: Temporary failure in name resolution
        at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method) ~[?:1.7.0_72]
        at java.net.InetAddress$1.lookupAllHostAddr(InetAddress.java:901) ~[?:1.7.0_72]
        at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1293) ~[?:1.7.0_72]
        at java.net.InetAddress.getAllByName0(InetAddress.java:1246) ~[?:1.7.0_72]
        at java.net.InetAddress.getAllByName(InetAddress.java:1162) ~[?:1.7.0_72]
        at java.net.InetAddress.getAllByName(InetAddress.java:1098) ~[?:1.7.0_72]
        at org.apache.zookeeper.ClientCnxn.<init>(ClientCnxn.java:387) ~[telegraph.jar:?]
        at org.apache.zookeeper.ClientCnxn.<init>(ClientCnxn.java:332) ~[telegraph.jar:?]
        at org.apache.zookeeper.ZooKeeper.<init>(ZooKeeper.java:383) ~[telegraph.jar:?]
        at org.I0Itec.zkclient.ZkConnection.connect(ZkConnection.java:64) ~[telegraph.jar:?]
        at org.I0Itec.zkclient.ZkClient.reconnect(ZkClient.java:935) ~[telegraph.jar:?]
        at org.I0Itec.zkclient.ZkClient.processStateChanged(ZkClient.java:458) ~[telegraph.jar:?]
        ... 3 more
INFO  2015-07-28 17:12:15,010 [main-EventThread] org.apache.zookeeper.ClientCnxn: EventThread shut down
ERROR 2015-07-28 17:12:15,014 [ZkClient-EventThread-184-zeusglobalwo035.dummy.com:2181,zeusglobalwo044.frc3.dummy.com:2181/instagram-a] org.I0Itec.zkclient.ZkEventThread: Error handling event ZkEvent[Children of /telegraph/INSTANCES/10.211.12.21_9000/MESSAGES changed sent to org.apache.helix.manager.zk.ZkCallbackHandler@71bd5cfa]
java.lang.NullPointerException
        at org.I0Itec.zkclient.ZkConnection.exists(ZkConnection.java:95) ~[telegraph.jar:?]
        at org.apache.helix.manager.zk.ZkClient$2.call(ZkClient.java:195) ~[telegraph.jar:?]
        at org.apache.helix.manager.zk.ZkClient$2.call(ZkClient.java:192) ~[telegraph.jar:?]
        at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:675) ~[telegraph.jar:?]
        at org.apache.helix.manager.zk.ZkClient.exists(ZkClient.java:192) ~[telegraph.jar:?]
        at org.I0Itec.zkclient.ZkClient.exists(ZkClient.java:445) ~[telegraph.jar:?]
        at org.I0Itec.zkclient.ZkClient$7.run(ZkClient.java:566) ~[telegraph.jar:?]
        at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71) [telegraph.jar:?]
ERROR 2015-07-28 17:12:15,015 [ZkClient-EventThread-184-zeusglobalwo035.dummy.com:2181,zeusglobalwo044.frc3.dummy.com:2181/instagram-a] org.I0Itec.zkclient.ZkEventThread: Error handling event ZkEvent[Children of /telegraph/EXTERNALVIEW changed sent to org.apache.helix.manager.zk.ZkCallbackHandler@35d1655]
java.lang.NullPointerException
        at org.I0Itec.zkclient.ZkConnection.exists(ZkConnection.java:95) ~[telegraph.jar:?]
        at org.apache.helix.manager.zk.ZkClient$2.call(ZkClient.java:195) ~[telegraph.jar:?]
        at org.apache.helix.manager.zk.ZkClient$2.call(ZkClient.java:192) ~[telegraph.jar:?]
        at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:675) ~[telegraph.jar:?]
        at org.apache.helix.manager.zk.ZkClient.exists(ZkClient.java:192) ~[telegraph.jar:?]
        at org.I0Itec.zkclient.ZkClient.exists(ZkClient.java:445) ~[telegraph.jar:?]
        at org.I0Itec.zkclient.ZkClient$7.run(ZkClient.java:566) ~[telegraph.jar:?]
        at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71) [telegraph.jar:?]


  was:
I noticed one of the participant is not a live instance in zookeeper after a network outage, while the java process is live. I have to restart the java process to make it live again. 

Found following logs:

ERROR 2015-07-28 17:12:15,010 [main-EventThread] org.apache.zookeeper.ClientCnxn: Error while calling watcher
java.lang.RuntimeException: Exception while restarting zk client
        at org.I0Itec.zkclient.ZkClient.processStateChanged(ZkClient.java:462) ~[telegraph.jar:?]
        at org.I0Itec.zkclient.ZkClient.process(ZkClient.java:368) ~[telegraph.jar:?]
        at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:531) [telegraph.jar:?]
        at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:507) [telegraph.jar:?]
Caused by: org.I0Itec.zkclient.exception.ZkException: Unable to connect to zeusglobalwo035.dummy.com:2181,zeusglobalwo044.frc3.dummy.com:2181/instagram-a
        at org.I0Itec.zkclient.ZkConnection.connect(ZkConnection.java:66) ~[telegraph.jar:?]
        at org.I0Itec.zkclient.ZkClient.reconnect(ZkClient.java:935) ~[telegraph.jar:?]
        at org.I0Itec.zkclient.ZkClient.processStateChanged(ZkClient.java:458) ~[telegraph.jar:?]
        ... 3 more
Caused by: java.net.UnknownHostException: zeusglobalwo035.dummy.com: Temporary failure in name resolution
        at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method) ~[?:1.7.0_72]
        at java.net.InetAddress$1.lookupAllHostAddr(InetAddress.java:901) ~[?:1.7.0_72]
        at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1293) ~[?:1.7.0_72]
        at java.net.InetAddress.getAllByName0(InetAddress.java:1246) ~[?:1.7.0_72]
        at java.net.InetAddress.getAllByName(InetAddress.java:1162) ~[?:1.7.0_72]
        at java.net.InetAddress.getAllByName(InetAddress.java:1098) ~[?:1.7.0_72]
        at org.apache.zookeeper.ClientCnxn.<init>(ClientCnxn.java:387) ~[telegraph.jar:?]
        at org.apache.zookeeper.ClientCnxn.<init>(ClientCnxn.java:332) ~[telegraph.jar:?]
        at org.apache.zookeeper.ZooKeeper.<init>(ZooKeeper.java:383) ~[telegraph.jar:?]
        at org.I0Itec.zkclient.ZkConnection.connect(ZkConnection.java:64) ~[telegraph.jar:?]
        at org.I0Itec.zkclient.ZkClient.reconnect(ZkClient.java:935) ~[telegraph.jar:?]
        at org.I0Itec.zkclient.ZkClient.processStateChanged(ZkClient.java:458) ~[telegraph.jar:?]
        ... 3 more
INFO  2015-07-28 17:12:15,010 [main-EventThread] org.apache.zookeeper.ClientCnxn: EventThread shut down
ERROR 2015-07-28 17:12:15,014 [ZkClient-EventThread-184-zeusglobalwo035.dummy.com:2181,zeusglobalwo044.frc3.dummy.com:2181/instagram-a] org.I0Itec.zkclient.ZkEventThread: Error handling event ZkEvent[Children of /telegraph/INSTANCES/10.211.12.21_9000/MESSAGES changed sent to org.apache.helix.manager.zk.ZkCallbackHandler@71bd5cfa]
java.lang.NullPointerException
        at org.I0Itec.zkclient.ZkConnection.exists(ZkConnection.java:95) ~[telegraph.jar:?]
        at org.apache.helix.manager.zk.ZkClient$2.call(ZkClient.java:195) ~[telegraph.jar:?]
        at org.apache.helix.manager.zk.ZkClient$2.call(ZkClient.java:192) ~[telegraph.jar:?]
        at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:675) ~[telegraph.jar:?]
        at org.apache.helix.manager.zk.ZkClient.exists(ZkClient.java:192) ~[telegraph.jar:?]
        at org.I0Itec.zkclient.ZkClient.exists(ZkClient.java:445) ~[telegraph.jar:?]
        at org.I0Itec.zkclient.ZkClient$7.run(ZkClient.java:566) ~[telegraph.jar:?]
        at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71) [telegraph.jar:?]
ERROR 2015-07-28 17:12:15,015 [ZkClient-EventThread-184-zeusglobalwo035.dummy.com:2181,zeusglobalwo044.frc3.dummy.com:2181/instagram-a] org.I0Itec.zkclient.ZkEventThread: Error handling event ZkEvent[Children of /telegraph/EXTERNALVIEW changed sent to org.apache.helix.manager.zk.ZkCallbackHandler@35d1655]
java.lang.NullPointerException
        at org.I0Itec.zkclient.ZkConnection.exists(ZkConnection.java:95) ~[telegraph.jar:?]
        at org.apache.helix.manager.zk.ZkClient$2.call(ZkClient.java:195) ~[telegraph.jar:?]
        at org.apache.helix.manager.zk.ZkClient$2.call(ZkClient.java:192) ~[telegraph.jar:?]
        at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:675) ~[telegraph.jar:?]
        at org.apache.helix.manager.zk.ZkClient.exists(ZkClient.java:192) ~[telegraph.jar:?]
        at org.I0Itec.zkclient.ZkClient.exists(ZkClient.java:445) ~[telegraph.jar:?]
        at org.I0Itec.zkclient.ZkClient$7.run(ZkClient.java:566) ~[telegraph.jar:?]
        at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71) [telegraph.jar:?]



> NPE and unable to reconnect to zookeeper after a network outage
> ---------------------------------------------------------------
>
>                 Key: HELIX-608
>                 URL: https://issues.apache.org/jira/browse/HELIX-608
>             Project: Apache Helix
>          Issue Type: Bug
>          Components: helix-core
>    Affects Versions: 0.7.1
>            Reporter: Changgeng Li
>
> ERROR 2015-07-28 17:12:15,010 [main-EventThread] org.apache.zookeeper.ClientCnxn: Error while calling watcher
> java.lang.RuntimeException: Exception while restarting zk client
>         at org.I0Itec.zkclient.ZkClient.processStateChanged(ZkClient.java:462) ~[telegraph.jar:?]
>         at org.I0Itec.zkclient.ZkClient.process(ZkClient.java:368) ~[telegraph.jar:?]
>         at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:531) [telegraph.jar:?]
>         at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:507) [telegraph.jar:?]
> Caused by: org.I0Itec.zkclient.exception.ZkException: Unable to connect to zeusglobalwo035.dummy.com:2181,zeusglobalwo044.frc3.dummy.com:2181/instagram-a
>         at org.I0Itec.zkclient.ZkConnection.connect(ZkConnection.java:66) ~[telegraph.jar:?]
>         at org.I0Itec.zkclient.ZkClient.reconnect(ZkClient.java:935) ~[telegraph.jar:?]
>         at org.I0Itec.zkclient.ZkClient.processStateChanged(ZkClient.java:458) ~[telegraph.jar:?]
>         ... 3 more
> Caused by: java.net.UnknownHostException: zeusglobalwo035.dummy.com: Temporary failure in name resolution
>         at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method) ~[?:1.7.0_72]
>         at java.net.InetAddress$1.lookupAllHostAddr(InetAddress.java:901) ~[?:1.7.0_72]
>         at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1293) ~[?:1.7.0_72]
>         at java.net.InetAddress.getAllByName0(InetAddress.java:1246) ~[?:1.7.0_72]
>         at java.net.InetAddress.getAllByName(InetAddress.java:1162) ~[?:1.7.0_72]
>         at java.net.InetAddress.getAllByName(InetAddress.java:1098) ~[?:1.7.0_72]
>         at org.apache.zookeeper.ClientCnxn.<init>(ClientCnxn.java:387) ~[telegraph.jar:?]
>         at org.apache.zookeeper.ClientCnxn.<init>(ClientCnxn.java:332) ~[telegraph.jar:?]
>         at org.apache.zookeeper.ZooKeeper.<init>(ZooKeeper.java:383) ~[telegraph.jar:?]
>         at org.I0Itec.zkclient.ZkConnection.connect(ZkConnection.java:64) ~[telegraph.jar:?]
>         at org.I0Itec.zkclient.ZkClient.reconnect(ZkClient.java:935) ~[telegraph.jar:?]
>         at org.I0Itec.zkclient.ZkClient.processStateChanged(ZkClient.java:458) ~[telegraph.jar:?]
>         ... 3 more
> INFO  2015-07-28 17:12:15,010 [main-EventThread] org.apache.zookeeper.ClientCnxn: EventThread shut down
> ERROR 2015-07-28 17:12:15,014 [ZkClient-EventThread-184-zeusglobalwo035.dummy.com:2181,zeusglobalwo044.frc3.dummy.com:2181/instagram-a] org.I0Itec.zkclient.ZkEventThread: Error handling event ZkEvent[Children of /telegraph/INSTANCES/10.211.12.21_9000/MESSAGES changed sent to org.apache.helix.manager.zk.ZkCallbackHandler@71bd5cfa]
> java.lang.NullPointerException
>         at org.I0Itec.zkclient.ZkConnection.exists(ZkConnection.java:95) ~[telegraph.jar:?]
>         at org.apache.helix.manager.zk.ZkClient$2.call(ZkClient.java:195) ~[telegraph.jar:?]
>         at org.apache.helix.manager.zk.ZkClient$2.call(ZkClient.java:192) ~[telegraph.jar:?]
>         at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:675) ~[telegraph.jar:?]
>         at org.apache.helix.manager.zk.ZkClient.exists(ZkClient.java:192) ~[telegraph.jar:?]
>         at org.I0Itec.zkclient.ZkClient.exists(ZkClient.java:445) ~[telegraph.jar:?]
>         at org.I0Itec.zkclient.ZkClient$7.run(ZkClient.java:566) ~[telegraph.jar:?]
>         at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71) [telegraph.jar:?]
> ERROR 2015-07-28 17:12:15,015 [ZkClient-EventThread-184-zeusglobalwo035.dummy.com:2181,zeusglobalwo044.frc3.dummy.com:2181/instagram-a] org.I0Itec.zkclient.ZkEventThread: Error handling event ZkEvent[Children of /telegraph/EXTERNALVIEW changed sent to org.apache.helix.manager.zk.ZkCallbackHandler@35d1655]
> java.lang.NullPointerException
>         at org.I0Itec.zkclient.ZkConnection.exists(ZkConnection.java:95) ~[telegraph.jar:?]
>         at org.apache.helix.manager.zk.ZkClient$2.call(ZkClient.java:195) ~[telegraph.jar:?]
>         at org.apache.helix.manager.zk.ZkClient$2.call(ZkClient.java:192) ~[telegraph.jar:?]
>         at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:675) ~[telegraph.jar:?]
>         at org.apache.helix.manager.zk.ZkClient.exists(ZkClient.java:192) ~[telegraph.jar:?]
>         at org.I0Itec.zkclient.ZkClient.exists(ZkClient.java:445) ~[telegraph.jar:?]
>         at org.I0Itec.zkclient.ZkClient$7.run(ZkClient.java:566) ~[telegraph.jar:?]
>         at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71) [telegraph.jar:?]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)