You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@accumulo.apache.org by "ASF subversion and git services (JIRA)" <ji...@apache.org> on 2014/01/16 21:17:20 UTC

[jira] [Commented] (ACCUMULO-2198) Concurrent randomwalk fails with unbalanced servers

    [ https://issues.apache.org/jira/browse/ACCUMULO-2198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13873878#comment-13873878 ] 

ASF subversion and git services commented on ACCUMULO-2198:
-----------------------------------------------------------

Commit cd4eac0d7e2820321db9fc9cdfc8dc89f7dd53d2 in branch refs/heads/1.4.5-SNAPSHOT from [~bhavanki]
[ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=cd4eac0 ]

ACCUMULO-2198 Concurrent randomwalk: add teardown, fix server balance check

The Concurrent randomwalk test had been using a test node property to remember the
last time when servers were unbalanced, but this property was not getting cleaned up
between runs. Therefore, if a new Concurrent test was started some time later, it
would pick up the old timestamp property from the last run. This commit adds removal
of the property during test teardown, and also moves the tracking from a node
property to test state.

In addition, the test logic would reset the timestamp every time servers were found
unbalanced, provided the 15-minute allowance hadn't expired. This commit fixes that
issue as well. This could lead to more, correct, reports of unbalanced servers.

Lastly, the test in 1.5.x requires three checks for unbalanced servers to fail before
failing the test. This commit backports that requirement to 1.4.x.

The timestamp reset and three-check fixes were added to 1.5.x in commit 0ee7e5a8.


> Concurrent randomwalk fails with unbalanced servers
> ---------------------------------------------------
>
>                 Key: ACCUMULO-2198
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-2198
>             Project: Accumulo
>          Issue Type: Bug
>          Components: test
>    Affects Versions: 1.4.4
>            Reporter: Bill Havanki
>            Assignee: Bill Havanki
>              Labels: randomwalk, test
>
> Not always, but sometimes I am seeing the Concurrent randomwalk test fail with:
> {noformat}
> java.lang.Exception: Error running node Concurrent.xml
>         at org.apache.accumulo.server.test.randomwalk.Module.visit(Module.java:259)
> ...
> Caused by: java.lang.Exception: Error running node ct.CheckBalance
>         at org.apache.accumulo.server.test.randomwalk.Module.visit(Module.java:259)
>         at org.apache.accumulo.server.test.randomwalk.Module.visit(Module.java:251)
>         ... 8 more
> Caused by: java.lang.Exception: servers are unbalanced!
>         at org.apache.accumulo.server.test.randomwalk.concurrent.CheckBalance.visit(CheckBalance.java:74)
>         at org.apache.accumulo.server.test.randomwalk.Module.visit(Module.java:251)
>         ... 9 more
> {noformat}
> In one case, the 15-minute allowance for balancing extended to a prior run of Concurrent.xml within the same overall test run. In another case, the time span begins at a point when HDFS failed to contact a datanode.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)