You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Andres de la Peña (Jira)" <ji...@apache.org> on 2021/10/04 13:52:00 UTC
[jira] [Commented] (CASSANDRA-16953) Flaky replaceAliveHost test from hostreplacement.HostReplacementTest

    [ https://issues.apache.org/jira/browse/CASSANDRA-16953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17423955#comment-17423955 ] 

Andres de la Peña commented on CASSANDRA-16953:
-----------------------------------------------

The failure can be easily reproduced with the multiplexer. For example, [this run|https://app.circleci.com/pipelines/github/adelapena/cassandra/954/workflows/abc53f58-5585-4e85-8c24-822fb03b9d98] with 100 repetitions reproduces the failure 26 times.

The error happens when trying to concurrently shutdown the instances in {{AbstractCluster#close()}}. Any of the two first nodes can get a rejected connection when trying to connect to the third node, which is the one that wasn't able to do the replacement and is kept running.

The test apparently [passes|https://app.circleci.com/pipelines/github/adelapena/cassandra/959/workflows/6e8c2754-6c44-43a7-8190-8cce5b6ec604] if we shut down the third node before shutting down the other two instances, [this way|https://github.com/adelapena/cassandra/commit/f730853db1cb557676ae46b248030677b895a91f], but I'm not sure about what is interfering with the parallel shutdown.

> Flaky replaceAliveHost test from hostreplacement.HostReplacementTest
> --------------------------------------------------------------------
>
>                 Key: CASSANDRA-16953
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-16953
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Test/dtest/java
>            Reporter: Ruslan Fomkin
>            Assignee: Andres de la Peña
>            Priority: Normal
>
> {{replaceAliveHost}} from {{org.apache.cassandra.distributed.test.hostreplacement.HostReplacementTest}} has failed number of times in different CircleCI builds in Java 8 and in Java 11. [The last failure|https://app.circleci.com/pipelines/github/k-rus/cassandra/14/workflows/3af46462-d162-4997-a49e-1ca10cd2392b/jobs/126/tests#failed-test-0]. The log is the same in different failures:
> {code:java}
> java.lang.RuntimeException: java.util.concurrent.TimeoutException
> 	at org.apache.cassandra.utils.Throwables.maybeFail(Throwables.java:70)
> 	at org.apache.cassandra.utils.FBUtilities.waitOnFutures(FBUtilities.java:476)
> 	at org.apache.cassandra.distributed.impl.AbstractCluster.close(AbstractCluster.java:850)
> 	at org.apache.cassandra.distributed.test.hostreplacement.HostReplacementTest.replaceAliveHost(HostReplacementTest.java:145)
> 	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> Caused by: java.util.concurrent.TimeoutException
> 	at java.base/java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1886)
> 	at java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2021)
> 	at org.apache.cassandra.utils.FBUtilities.waitOnFutures(FBUtilities.java:468)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org