You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@geode.apache.org by "Juan José Ramos Cassella (JIRA)" <ji...@apache.org> on 2019/08/15 15:44:00 UTC
[jira] [Resolved] (GEODE-7062) CI Failure:
DistributedLockServiceDUnitTest.testSuspendLockingBlocksUntilNoLocks
[ https://issues.apache.org/jira/browse/GEODE-7062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Juan José Ramos Cassella resolved GEODE-7062.
---------------------------------------------
Resolution: Fixed
Fix Version/s: 1.11.0
> CI Failure: DistributedLockServiceDUnitTest.testSuspendLockingBlocksUntilNoLocks
> --------------------------------------------------------------------------------
>
> Key: GEODE-7062
> URL: https://issues.apache.org/jira/browse/GEODE-7062
> Project: Geode
> Issue Type: Bug
> Components: tests
> Reporter: Juan José Ramos Cassella
> Assignee: Juan José Ramos Cassella
> Priority: Major
> Labels: GeodeCommons, flaky-test
> Fix For: 1.11.0
>
> Time Spent: 20m
> Remaining Estimate: 0h
>
> The test {{testSuspendLockingBlocksUntilNoLocks}} from class {{DistributedLockServiceDUnitTest}} failed twice in CI runs [967|https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/DistributedTestOpenJDK8/builds/967] and [969|https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/DistributedTestOpenJDK8/builds/969].
> Results for the first failure are available [here|http://files.apachegeode-ci.info/builds/apache-develop-main/1.11.0-SNAPSHOT.0015/test-results/distributedTest/1565222926/] and for the second one [here|http://files.apachegeode-ci.info/builds/apache-develop-main/1.11.0-SNAPSHOT.0015/test-results/distributedTest/1565246507/].
> Archived artifacts for the first failure are available [here|http://files.apachegeode-ci.info/builds/apache-develop-main/1.11.0-SNAPSHOT.0015/test-artifacts/1565222926/distributedtestfiles-OpenJDK8-1.11.0-SNAPSHOT.0015.tgz] and for the second one [here|http://files.apachegeode-ci.info/builds/apache-develop-main/1.11.0-SNAPSHOT.0015/test-artifacts/1565246507/distributedtestfiles-OpenJDK8-1.11.0-SNAPSHOT.0015.tgz].
> The issue appears to be a race condition while firing an asynchronous thread on a remote {{VM}} through the following code:
> {code:title=DistributedLockServiceDUnitTest.java|borderStyle=solid}
> VM vm1 = getVM(1);
> vm1.invokeAsync(new SerializableRunnable("Lock & unlock in vm1") {
> @Override
> public void run() {
> DistributedLockService service2 = getServiceNamed(name);
> assertThat(service2.lock("lock", -1, -1)).isTrue();
> synchronized (monitor) {
> try {
> monitor.wait();
> } catch (InterruptedException ex) {
> out.println("Unexpected InterruptedException");
> fail("interrupted");
> }
> }
> service2.unlock("lock");
> }
> });
> // Let vm1's thread get the lock and go into wait()
> sleep(100);
> {code}
> If the thread is not launched on the remote {{VM}} after sleeping for 100 milliseconds, the test will fail as the thread on the local {{VM}} will be able to invoke {{suspendLocking}} right away:
> {code:title=DistributedLockServiceDUnitTest.java|borderStyle=solid}
> Thread thread = new Thread(new Runnable() {
> @Override
> public void run() {
> setGot(service.suspendLocking(-1));
> setDone(true);
> service.resumeLocking();
> }
> });
> setGot(false);
> setDone(false);
> thread.start();
> // Let thread start, make sure it's blocked in suspendLocking
> sleep(100);
> assertThat(getGot() || getDone())
> .withFailMessage("Before release, got: " + getGot() + ", done: " + getDone()).isFalse();
> {code}
> Increasing the sleep time might help to reduce possible re occurrences of the issue, another option would be to investigate how to make the test wait *unti* the asynchronous invocation has been started on the remote {{VM}} instead of arbitrarily sleeping 100 milliseconds.
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)