You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zookeeper.apache.org by "Donny Nadolny (JIRA)" <ji...@apache.org> on 2015/06/03 18:05:38 UTC

[jira] [Updated] (ZOOKEEPER-2204) LearnerSnapshotThrottlerTest.testHighContentionWithTimeout fails occasionally

     [ https://issues.apache.org/jira/browse/ZOOKEEPER-2204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Donny Nadolny updated ZOOKEEPER-2204:
-------------------------------------
    Attachment: ZOOKEEPER-2204.patch

> LearnerSnapshotThrottlerTest.testHighContentionWithTimeout fails occasionally
> -----------------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-2204
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2204
>             Project: ZooKeeper
>          Issue Type: Test
>    Affects Versions: 3.5.0
>            Reporter: Donny Nadolny
>            Assignee: Donny Nadolny
>            Priority: Minor
>         Attachments: ZOOKEEPER-2204.patch
>
>
> The {{LearnerSnapshotThrottler}} will only allow 2 concurrent snapshots to be taken, and if there are already 2 snapshots in progress it will wait up to 200ms for one to complete. This isn't enough time for {{testHighContentionWithTimeout}} to consistently pass - on a cold JVM running just the one test I was able to get it to fail 3 times in around 50 runs. This 200ms timeout will be hit if there is a delay between a thread calling {{LearnerSnapshot snap = throttler.beginSnapshot(false);}} and {{throttler.endSnapshot();}}.
> This also erroneously fails on the build server, see https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2747/testReport/org.apache.zookeeper.server.quorum/LearnerSnapshotThrottlerTest/testHighContentionWithTimeout/ for an example.
> I have bumped the timeout up to 5 seconds (which should be more than enough for warmup / gc pauses), as well as added logging to the {{catch (Exception e)}} block to assist in debugging any future issues.
> An alternate approach would be to separate out results gathered from the threads, because although we only record true/false there are really three outcomes:
> 1. The {{snapshotNumber}} was <= 2, meaning the individual call operated correctly
> 2. The {{snapshotNumber}} was > 2, meaning the test should definitely fail
> 3. We were unable to snapshot in the time given, so we can't determine if we should fail or pass (although if we have "enough" successes from #1 with no failures from #2 maybe we would pass the test anyway).
> Bumping up the timeout is easier.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)