You are viewing a plain text version of this content. The canonical link for it is here.

Posted to notifications@accumulo.apache.org by "Josh Elser (JIRA)" <ji...@apache.org> on 2014/11/13 18:34:34 UTC

[jira] [Comment Edited] (ACCUMULO-3036) 1.5 MiniCluster fails to start, forces clients to wait indefinitely

    [ https://issues.apache.org/jira/browse/ACCUMULO-3036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14210060#comment-14210060 ] 

Josh Elser edited comment on ACCUMULO-3036 at 11/13/14 5:33 PM:
----------------------------------------------------------------

One easy fix would be to watch ZooKeeper and wait for the locks for the started processes to be acquired. If they fail to do so after some period of time, we can abort.

If we return on {{start()}} before the locks are actually held, the client is just going to be sitting there spinning its wheels trying to connect anyways. This would also be generally applicable to all versions, not just 1.5


was (Author: elserj):
One easy fix would be to watch ZooKeeper and wait for the locks for the started processes to be acquired. If they fail to do so after some period of time, we can abort.

If we return on {{start()}} before the locks are actually held, the client is just going to be sitting there spinning its wheels trying to connect anyways.

> 1.5 MiniCluster fails to start, forces clients to wait indefinitely
> -------------------------------------------------------------------
>
>                 Key: ACCUMULO-3036
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-3036
>             Project: Accumulo
>          Issue Type: Improvement
>          Components: mini
>    Affects Versions: 1.5.0, 1.5.1
>            Reporter: Josh Elser
>            Assignee: Josh Elser
>             Fix For: 1.5.3
>
>
> Over in Pig land, a user was complaining about a test which used MiniAccumuloCluster that hung until the JUnit timeout was hit.
> Eventually, the problem was diagnosed as a bad classpath (old version of Thrift was included and used), which was causing the TServer and Master to immediately bail out. However, the client sat indefinitely trying to connect unsuccessfully.
> MAC#start should not return before we're sure that the processes are actually up and running (a very quick smoke test).
> It looks like ACCUMULO-1537 introduced a call to SetGoalState on the Master before MAC#start returned which would (I assume) fail and then throw a RTE if the Master decided to die. Including this fix in 1.5 may be sufficient to fix the underlying issue the user was seeing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)