You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@geode.apache.org by "Juan José Ramos Cassella (JIRA)" <ji...@apache.org> on 2019/08/15 15:44:00 UTC

[jira] [Resolved] (GEODE-7062) CI Failure: DistributedLockServiceDUnitTest.testSuspendLockingBlocksUntilNoLocks

     [ https://issues.apache.org/jira/browse/GEODE-7062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Juan José Ramos Cassella resolved GEODE-7062.
---------------------------------------------
       Resolution: Fixed
    Fix Version/s: 1.11.0

> CI Failure: DistributedLockServiceDUnitTest.testSuspendLockingBlocksUntilNoLocks
> --------------------------------------------------------------------------------
>
>                 Key: GEODE-7062
>                 URL: https://issues.apache.org/jira/browse/GEODE-7062
>             Project: Geode
>          Issue Type: Bug
>          Components: tests
>            Reporter: Juan José Ramos Cassella
>            Assignee: Juan José Ramos Cassella
>            Priority: Major
>              Labels: GeodeCommons, flaky-test
>             Fix For: 1.11.0
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> The test {{testSuspendLockingBlocksUntilNoLocks}} from class {{DistributedLockServiceDUnitTest}} failed twice in CI runs [967|https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/DistributedTestOpenJDK8/builds/967] and [969|https://concourse.apachegeode-ci.info/teams/main/pipelines/apache-develop-main/jobs/DistributedTestOpenJDK8/builds/969].
> Results for the first failure are available [here|http://files.apachegeode-ci.info/builds/apache-develop-main/1.11.0-SNAPSHOT.0015/test-results/distributedTest/1565222926/] and for the second one [here|http://files.apachegeode-ci.info/builds/apache-develop-main/1.11.0-SNAPSHOT.0015/test-results/distributedTest/1565246507/].
> Archived artifacts for the first failure are available [here|http://files.apachegeode-ci.info/builds/apache-develop-main/1.11.0-SNAPSHOT.0015/test-artifacts/1565222926/distributedtestfiles-OpenJDK8-1.11.0-SNAPSHOT.0015.tgz] and for the second one [here|http://files.apachegeode-ci.info/builds/apache-develop-main/1.11.0-SNAPSHOT.0015/test-artifacts/1565246507/distributedtestfiles-OpenJDK8-1.11.0-SNAPSHOT.0015.tgz].
> The issue appears to be a race condition while firing an asynchronous thread on a remote {{VM}} through the following code:
> {code:title=DistributedLockServiceDUnitTest.java|borderStyle=solid}
>     VM vm1 = getVM(1);
>     vm1.invokeAsync(new SerializableRunnable("Lock & unlock in vm1") {
>       @Override
>       public void run() {
>         DistributedLockService service2 = getServiceNamed(name);
>         assertThat(service2.lock("lock", -1, -1)).isTrue();
>         synchronized (monitor) {
>           try {
>             monitor.wait();
>           } catch (InterruptedException ex) {
>             out.println("Unexpected InterruptedException");
>             fail("interrupted");
>           }
>         }
>         service2.unlock("lock");
>       }
>     });
>     // Let vm1's thread get the lock and go into wait()
>     sleep(100);
> {code}
> If the thread is not launched on the remote {{VM}} after sleeping for 100 milliseconds, the test will fail as the thread on the local {{VM}} will be able to invoke {{suspendLocking}} right away:
> {code:title=DistributedLockServiceDUnitTest.java|borderStyle=solid}
>     Thread thread = new Thread(new Runnable() {
>       @Override
>       public void run() {
>         setGot(service.suspendLocking(-1));
>         setDone(true);
>         service.resumeLocking();
>       }
>     });
>     setGot(false);
>     setDone(false);
>     thread.start();
>     // Let thread start, make sure it's blocked in suspendLocking
>     sleep(100);
>     assertThat(getGot() || getDone())
>         .withFailMessage("Before release, got: " + getGot() + ", done: " + getDone()).isFalse();
> {code}
> Increasing the sleep time might help to reduce possible re occurrences of the issue, another option would be to investigate how to make the test wait *unti* the asynchronous invocation has been started on the remote {{VM}} instead of arbitrarily sleeping 100 milliseconds.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)