You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ratis.apache.org by "Kiyoshi Mizumaru (Jira)" <ji...@apache.org> on 2022/08/05 07:59:00 UTC

[jira] [Commented] (RATIS-1653) TestNettyDataStreamChainTopologyWithGrpcCluster fails sometimes

    [ https://issues.apache.org/jira/browse/RATIS-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17575627#comment-17575627 ] 

Kiyoshi Mizumaru commented on RATIS-1653:
-----------------------------------------

I don't understand the details of the test, but it seems to me that there are two different patterns until "Failed to send DataStreamWindowRequest:seqNum=12" is recorded.

As contained in console.out.fail-000, the one is GrpcLogAppender record "appendEntries Timeout" after GrpcUtil record "Timed out gracefully shutting down connection:" and the other is GrpcLogAppender record "appendEntries Timeout" with cid=1 as contained in console.out.fail-001.

Does this seqNum=12 have any special meaning? Any advice on what changes I should make this test more robust?

> TestNettyDataStreamChainTopologyWithGrpcCluster fails sometimes
> ---------------------------------------------------------------
>
>                 Key: RATIS-1653
>                 URL: https://issues.apache.org/jira/browse/RATIS-1653
>             Project: Ratis
>          Issue Type: Bug
>          Components: server, test
>    Affects Versions: 2.3.0
>         Environment: > java -version
> openjdk version "1.8.0_332"
> OpenJDK Runtime Environment (Temurin)(build 1.8.0_332-b09)
> OpenJDK 64-Bit Server VM (Temurin)(build 25.332-b09, mixed mode)
> > mvn -version
> Apache Maven 3.8.6 (84538c9988a25aec085021c365c560670ad80f63)
> Maven home: /home/maru/.sdkman/candidates/maven/current
> Java version: 1.8.0_332, vendor: Temurin, runtime: /home/maru/.sdkman/candidates/java/8.0.332-tem/jre
> Default locale: en_US, platform encoding: UTF-8
> OS name: "linux", version: "5.15.0-43-generic", arch: "amd64", family: "unix"
>            Reporter: Kiyoshi Mizumaru
>            Priority: Major
>         Attachments: ratis-1653.tbz
>
>
> {{Sometimes I see mvn test fails with the following error:}}
> {code:java}
> [INFO] Running org.apache.ratis.datastream.TestNettyDataStreamChainTopologyWithGrpcCluster
> [ERROR] Tests run: 4, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 94.492 s <<< FAILURE! - in org.apache.ratis.datastream.TestNettyDataStreamChainTopologyWithGrpcCluster
> [ERROR] testMultipleStreamsMultipleServersStepDownLeader(org.apache.ratis.datastream.TestNettyDataStreamChainTopologyWithGrpcCluster)  Time elapsed: 63.959 s  <<< ERROR!
> java.util.concurrent.CompletionException: org.apache.ratis.protocol.exceptions.TimeoutIOException: Timeout 10000ms: Failed to send DataStreamWindowRequest:seqNum=12,DataStreamRequestHeader:clientId=client-B7213C09F5FF,type=STREAM_DATA,id=418,offset=7575945,length=0
>     at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
>     at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
>     at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:783)
>     at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750)
>     at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
>     at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990)
>     at org.apache.ratis.client.impl.OrderedStreamAsync.lambda$scheduleWithTimeout$7(OrderedStreamAsync.java:172)
>     at org.apache.ratis.util.TimeoutScheduler.lambda$onTimeout$0(TimeoutScheduler.java:141)
>     at org.apache.ratis.util.TimeoutScheduler.lambda$onTimeout$1(TimeoutScheduler.java:155)
>     at org.apache.ratis.util.LogUtils.runAndLog(LogUtils.java:38)
>     at org.apache.ratis.util.LogUtils$1.run(LogUtils.java:79)
>     at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>     at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>     at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
>     at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>     at java.lang.Thread.run(Thread.java:750)
> Caused by: org.apache.ratis.protocol.exceptions.TimeoutIOException: Timeout 10000ms: Failed to send DataStreamWindowRequest:seqNum=12,DataStreamRequestHeader:clientId=client-B7213C09F5FF,type=STREAM_DATA,id=418,offset=7575945,length=0
>     ... 12 more {code}
> I've uploaded output of TestNettyDataStreamChainTopologyWithGrpcCluster#testMultipleServersStepdownLeader test to my gist https://gist.github.com/kmizumar/4eefb95ac7677ab47442e3e17c920645



--
This message was sent by Atlassian Jira
(v8.20.10#820010)