You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@accumulo.apache.org by "Keith Turner (JIRA)" <ji...@apache.org> on 2012/10/10 02:54:02 UTC

[jira] [Updated] (ACCUMULO-131) ZookeeperInstance gets stuck when given bad host

     [ https://issues.apache.org/jira/browse/ACCUMULO-131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Keith Turner updated ACCUMULO-131:
----------------------------------

    Affects Version/s: 1.4.1
                       1.4.0
               Status: Patch Available  (was: Open)

There is no way to distinguish being given a bogus host from a good host.  I define a bogus host as a reachable machine:port where zookeeper will never run.  A good host is a machine:port where zookeeper is running or will run in the future.  The code code already handles the case where you give it a bad DNS name, this is clearly a bad host and it does not retry.

I have attached a patch that changes the behavior of ZooSession.  The patch throws an exception if zookeeper can not be connected to within 2x the zookeeper timeout.  This patch significantly changes the behavior of Accumulo.  Without this patch, if zookeeper went down a new Accumulo client would just wait indefinitely till it came back up.  With this patch it will timeout. What are peoples opinions about applying this to 1.4?
                
> ZookeeperInstance gets stuck when given bad host
> ------------------------------------------------
>
>                 Key: ACCUMULO-131
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-131
>             Project: Accumulo
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 1.4.0, 1.4.1
>            Reporter: Keith Turner
>            Assignee: Eric Newton
>             Fix For: 1.4.2
>
>
> Keith Massey reported the following issue on the mailing list.
> {quote}
> A user of our recently filed a bug with us because our code hung forever when she gave us an address for a zookeeper that was not running. I think I've traced the problem into org.apache.accumulo.core.zookeeper.ZooSession.connect(). If the connection to the zookeeper fails it throws a ConnectException, which gets caught by the catch (IOException) block, which logs the message and keeps trying infinitely. It's definitely user error passing in an invalid zookeeper. But shouldn't that method bail out after some time?
> Thanks.
> Keith
> {quote}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira