You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@storm.apache.org by "Anthony Milbourne (JIRA)" <ji...@apache.org> on 2017/02/23 16:59:44 UTC

[jira] [Commented] (STORM-2290) Upgrading zookeeper to 3.4.9 for stability

    [ https://issues.apache.org/jira/browse/STORM-2290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15880842#comment-15880842 ] 

Anthony Milbourne commented on STORM-2290:
------------------------------------------

We run a storm cluster (v.1.0.2) on AWS and have 3 Zookeepers supporting it.  Because AWS sometimes terminates VMs, we sometimes lose a Zookeeper instance.  When this happens, the hostname cannot be resolved for that zookeeper instance as AWS has taken the VM away.  We noticed that in this case storm fails to connect to zookeeper – even though there are still 2 Zookeeper instances running.  It fails with an exception something like:
 
{noformat}
java.net.UnknownHostException: zookeeper3
  at java.net.InetAddress.getAllByName0(InetAddress.java:1280) 
  at java.net.InetAddress.getAllByName(InetAddress.java:1192) 
  at java.net.InetAddress.getAllByName(InetAddress.java:1126) 
  at org.apache.storm.shade.org.apache.zookeeper.client.StaticHostProvider.<init>(StaticHostProvider.java:61) 
  at org.apache.storm.shade.org.apache.zookeeper.ZooKeeper.<init>(ZooKeeper.java:445) 
  at org.apache.storm.shade.org.apache.curator.utils.DefaultZookeeperFactory.newZooKeeper(DefaultZookeeperFactory.java:29) 
  at org.apache.storm.shade.org.apache.curator.framework.imps.CuratorFrameworkImpl$2.newZooKeeper(CuratorFrameworkImpl.java:150) 
  at org.apache.storm.shade.org.apache.curator.HandleHolder$1.getZooKeeper(HandleHolder.java:94) 
  at org.apache.storm.shade.org.apache.curator.HandleHolder.getZooKeeper(HandleHolder.java:55) 
  at org.apache.storm.shade.org.apache.curator.ConnectionState.reset(ConnectionState.java:218) 
  at org.apache.storm.shade.org.apache.curator.ConnectionState.start(ConnectionState.java:103) 
  at org.apache.storm.shade.org.apache.curator.CuratorZookeeperClient.start(CuratorZookeeperClient.java:190) 
  at org.apache.storm.shade.org.apache.curator.framework.imps.CuratorFrameworkImpl.start(CuratorFrameworkImpl.java:259) 
  at org.apache.storm.zookeeper$mk_client.doInvoke(zookeeper.clj:86) 
  at clojure.lang.RestFn.invoke(RestFn.java:494)
  at org.apache.storm.cluster_state.zookeeper_state_factory$_mkState.invoke(zookeeper_state_factory.clj:28) 
  at org.apache.storm.cluster_state.zookeeper_state_factory.mkState(Unknown Source) 
  <SNIP REST OF STACKTRACE>
{noformat}

Having done some research it looks like this error is caused by a bug in the Zookeeper client library.  There is an issue for it here:
[https://issues.apache.org/jira/browse/ZOOKEEPER-1576]
This issue has been resolved in the version 3.5.x branch of Zookeeper.  However, after 2.5 years and 3 releases the 3.5.x branch of Zookeeper is still in Alpha :-(.
 
Despite the fact that it is in alpha, there is a branch of Curator (v.3.x.x) that uses it, but Storm uses Curator version 2.x.x – possibly because it doesn’t rely on alpha code.  So the bug is still unpatched in Storm.

I realise that an upgrade to alpha code may be too much of a risk, but this problem is a serious issue for those running Storm on AWS (and presumably other cloud providers) - so perhaps it may be worth considering?

> Upgrading zookeeper to 3.4.9 for stability
> ------------------------------------------
>
>                 Key: STORM-2290
>                 URL: https://issues.apache.org/jira/browse/STORM-2290
>             Project: Apache Storm
>          Issue Type: Bug
>    Affects Versions: 1.0.2
>            Reporter: Sachin Goyal
>
> We should upgrade zookeeper to 3.4.9 as it brings in a lot of stability improvements (http://zookeeper.apache.org/releases.html) and storm is still using 3.4.6 (https://github.com/apache/storm/blob/master/pom.xml)
> One serious issue affecting zookeeper 3.4.6 is https://issues.apache.org/jira/browse/ZOOKEEPER-1506 which prohibits zookeeper from getting a quorum and hence affects storm's stability as well.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)