You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by "ArafatKhan2198 (via GitHub)" <gi...@apache.org> on 2023/11/29 14:31:31 UTC
[PR] HDDS-9766 Intermittent AlreadyClosedException in TestCommitWatcher.testReleaseBuffersOnException [ozone]
ArafatKhan2198 opened a new pull request, #5700:
URL: https://github.com/apache/ozone/pull/5700
## What changes were proposed in this pull request?
As provided the following exceptions have been thrown :- `AlreadyClosedException`, `ExecutionException`, `RaftRetryFailureException`, `NotLeaderException`.
```
org.apache.hadoop.hdds.scm.storage.TestCommitWatcher.testReleaseBuffersOnException -- Time elapsed: 28.88 s <<< ERROR!
java.util.concurrent.ExecutionException: org.apache.ratis.protocol.exceptions.AlreadyClosedException: SlidingWindow$Client:client-2D6A59F17A72->RAFT is closed.
at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
at org.apache.hadoop.hdds.scm.storage.TestCommitWatcher.testReleaseBuffersOnException(TestCommitWatcher.java:301)
...
Caused by: org.apache.ratis.protocol.exceptions.AlreadyClosedException: SlidingWindow$Client:client-2D6A59F17A72->RAFT is closed.
at org.apache.ratis.util.SlidingWindow$Client.alreadyClosed(SlidingWindow.java:406)
...
Caused by: org.apache.ratis.protocol.exceptions.RaftRetryFailureException: Failed RaftClientRequest:client-2D6A59F17A72->31ca1c78-6c1a-481a-9835-ed9e1bd72d7b@group-4121B0A26A9F, cid=40, seq=1*, Watch(0), null for 3 attempts with RequestTypeDependentRetryPolicy{WRITE->ExceptionDependentRetry(maxAttempts=2147483647; defaultPolicy=MultipleLinearRandomRetry[5x5s, 5x10s, 5x15s, 5x20s, 5x25s, 10x60s]; map={org.apache.ratis.protocol.exceptions.GroupMismatchException->NoRetry, org.apache.ratis.protocol.exceptions.NotReplicatedException->NoRetry, org.apache.ratis.protocol.exceptions.ResourceUnavailableException->org.apache.ratis.retry.ExponentialBackoffRetry@571a7a22, org.apache.ratis.protocol.exceptions.StateMachineException->NoRetry, org.apache.ratis.protocol.exceptions.TimeoutIOException->org.apache.ratis.retry.ExponentialBackoffRetry@571a7a22}), WATCH->ExceptionDependentRetry(maxAttempts=2147483647; defaultPolicy=MultipleLinearRandomRetry[5x5s, 5x10s, 5x15s, 5x20s, 5x25s, 10x60s]; map=
{org.apache.ratis.protocol.exceptions.GroupMismatchException->NoRetry, org.apache.ratis.protocol.exceptions.NotReplicatedException->NoRetry, org.apache.ratis.protocol.exceptions.ResourceUnavailableException->org.apache.ratis.retry.ExponentialBackoffRetry@571a7a22, org.apache.ratis.protocol.exceptions.StateMachineException->NoRetry, org.apache.ratis.protocol.exceptions.TimeoutIOException->NoRetry})}
at org.apache.ratis.client.impl.RaftClientImpl.noMoreRetries(RaftClientImpl.java:353)
... 20 more
Caused by: org.apache.ratis.protocol.exceptions.NotLeaderException: Server 31ca1c78-6c1a-481a-9835-ed9e1bd72d7b@group-4121B0A26A9F is not the leader 66f0d3a6-9de9-4155-af63-8575b18d2e1d|10.1.0.11:15121
at org.apache.ratis.client.impl.ClientProtoUtils.toRaftClientReply(ClientProtoUtils.java:397)
at org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers$1.onNext(GrpcClientProtocolClient.java:310)
... 11 more
```
Solution for the fix :-
- The solution is to modify the test environment by reducing the number of datanodes and pipelines in the MiniOzoneCluster configuration.
- With fewer datanodes and pipelines, there's less likelihood of encountering resource contention and timing issues. Such issues can often lead to intermittent failures that are hard to reproduce and diagnose.
## What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-9766
## How was this patch tested?
Ran it 300 times in my fork, and passed successfully :-
Test Run 1 :- https://github.com/ArafatKhan2198/ozone/actions/runs/7031682772
Test Run 2 :- https://github.com/ArafatKhan2198/ozone/actions/runs/7031685390
Test Run 3 :- https://github.com/ArafatKhan2198/ozone/actions/runs/7031688216
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org
Re: [PR] HDDS-9766 Intermittent AlreadyClosedException in TestCommitWatcher.testReleaseBuffersOnException [ozone]
Posted by "ArafatKhan2198 (via GitHub)" <gi...@apache.org>.
ArafatKhan2198 commented on PR #5700:
URL: https://github.com/apache/ozone/pull/5700#issuecomment-1832120339
@adoroszlai
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org
Re: [PR] HDDS-9766. Intermittent AlreadyClosedException in TestCommitWatcher.testReleaseBuffersOnException [ozone]
Posted by "nandakumar131 (via GitHub)" <gi...@apache.org>.
nandakumar131 commented on PR #5700:
URL: https://github.com/apache/ozone/pull/5700#issuecomment-1832647735
+1, LGTM.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org
Re: [PR] HDDS-9766. Intermittent AlreadyClosedException in TestCommitWatcher.testReleaseBuffersOnException [ozone]
Posted by "nandakumar131 (via GitHub)" <gi...@apache.org>.
nandakumar131 merged PR #5700:
URL: https://github.com/apache/ozone/pull/5700
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org