You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@bookkeeper.apache.org by "Ivan Kelly (JIRA)" <ji...@apache.org> on 2011/09/01 18:16:09 UTC

[jira] [Commented] (BOOKKEEPER-63) Hedwig PubSubServer must wait for its Zookeeper client to be connected upon startup

    [ https://issues.apache.org/jira/browse/BOOKKEEPER-63?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13095375#comment-13095375 ] 

Ivan Kelly commented on BOOKKEEPER-63:
--------------------------------------

Looks good, but you should also check the return value of CountdownLatch.await. If a timeout has occurred the code should LOG.fatal and throw an exception. Also, could you name the patch BOOKKEEPER-63.diff or .patch etc. The extension doesn't matter, but having them named makes it easier to see what I'm working with in my source tree.

> Hedwig PubSubServer must wait for its Zookeeper client to be connected upon startup
> -----------------------------------------------------------------------------------
>
>                 Key: BOOKKEEPER-63
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-63
>             Project: Bookkeeper
>          Issue Type: Bug
>          Components: hedwig-server
>            Reporter: Matthieu Morel
>            Priority: Minor
>         Attachments: patch-testcase.txt, patch-v2.txt, patch.txt
>
>
> When a PubSubServer is instantiated in *non-standalone* mode, it creates a ZkTopicManager which takes a Zookeeper client as an argument.
> Unfortunately, this Zookeeper client may not be connected yet (not in CONNECTED state yet), and when this is the case, creation of ZkTopicManager fails, leading to failure of the PubSubServer startup.
> Typical error (adapted, line numbers take into account commented patching code):
> jjava.io.IOException: org.apache.hedwig.exceptions.PubSubException$ServiceDownException: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hedwig/standalone/hosts/x.x.x.x:4080:9876
> 	at org.apache.hedwig.server.netty.PubSubServer.instantiateTopicManager(PubSubServer.java:170)
> 	at org.apache.hedwig.server.netty.PubSubServer$3.run(PubSubServer.java:294)
> 	at java.lang.Thread.run(Thread.java:680)
> Caused by: org.apache.hedwig.exceptions.PubSubException$ServiceDownException: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hedwig/standalone/hosts/x.x.x.x:4080:9876
> 	at org.apache.hedwig.server.topics.ZkTopicManager$4.safeProcessResult(ZkTopicManager.java:146)
> etc...
> This is particularly problematic for running tests that require to pass a config to the PubSubServer.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira