You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@accumulo.apache.org by "Josh Elser (JIRA)" <ji...@apache.org> on 2014/12/08 20:59:14 UTC

[jira] [Comment Edited] (ACCUMULO-3269) nondeterministic failure of MiniAccumuloClusterStartStopTest

    [ https://issues.apache.org/jira/browse/ACCUMULO-3269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14238372#comment-14238372 ] 

Josh Elser edited comment on ACCUMULO-3269 at 12/8/14 7:58 PM:
---------------------------------------------------------------

So, I don't have any understanding as to why this helps, but, when starting ZooKeeper and we wait to be able to connect to it, we "spin" fast. We get an exception and we immediately retry.

Adding a {{Thread.sleep(1000)}} in the catch block around sending "ruok" to the ZooKeeper server appears to prevent this from happening. I went from being unable to run the unit tests in the minicluster module for more than a few minutes in repetition to being able to run them for 20mins...

I have no idea why this "helps".

For context: I had actually called out to {{netstat}} and used procfs to figure out what magical process was already bound to the port that prevent ZK from coming up and noticed that suddenly I stopped getting failures. I assumed that the latency from making those calls is what started to make this work (as the output from those commands never showed me anything useful).


was (Author: elserj):
So, I don't have any understanding as to why this helps, but, when starting ZooKeeper and we wait to be able to connect to it, we "spin" fast. We get an exception and we immediately retry.

Adding a {{Thread.sleep(1000)}} in the catch block around sending "ruok" to the ZooKeeper server appears to prevent this from happening. I went from being unable to run the unit tests in the minicluster module for more than a few minutes in repetition to being able to run them for 20mins...

I have no idea why this "helps".

> nondeterministic failure of MiniAccumuloClusterStartStopTest
> ------------------------------------------------------------
>
>                 Key: ACCUMULO-3269
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-3269
>             Project: Accumulo
>          Issue Type: Bug
>            Reporter: Adam Fuchs
>            Assignee: Josh Elser
>             Fix For: 1.7.0
>
>
> When building in master (mvn package -P assemble) I got the following error. Ran the build again (also mvn package -P assemble, with no clean inbetween) and the whole build succeeded.
> {code}
> Running org.apache.accumulo.minicluster.MiniAccumuloClusterStartStopTest
> Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 31.103 sec <<< FAILURE! - in org.apache.accumulo.minicluster.MiniAccumuloClusterStartStopTest
> multipleStopsIsAllowed(org.apache.accumulo.minicluster.MiniAccumuloClusterStartStopTest)  Time elapsed: 20.016 sec  <<< ERROR!
> org.apache.accumulo.minicluster.impl.ZooKeeperBindException: Zookeeper did not start within 20 seconds. Check the logs in /tmp/junit1360063600921880650/logs for errors.  Last exception: java.net.ConnectException: Connection refused
> 	at org.apache.accumulo.minicluster.impl.MiniAccumuloClusterImpl.start(MiniAccumuloClusterImpl.java:548)
> 	at org.apache.accumulo.minicluster.MiniAccumuloCluster.start(MiniAccumuloCluster.java:72)
> 	at org.apache.accumulo.minicluster.MiniAccumuloClusterStartStopTest.multipleStopsIsAllowed(MiniAccumuloClusterStartStopTest.java:57)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)