You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by "Ritesh H Shukla (Jira)" <ji...@apache.org> on 2022/01/13 23:37:00 UTC

[jira] [Commented] (HDDS-6109) Ozone Client should retry unflushed buffers on new pipeline on GroupMismatch Exception.

    [ https://issues.apache.org/jira/browse/HDDS-6109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17475845#comment-17475845 ] 

Ritesh H Shukla commented on HDDS-6109:
---------------------------------------

This bug was introduced via [https://github.com/apache/ozone/commit/565972c162819dc4f57d17e4b47f2f47e3bd9c55]

cc [~sammichen] [~captainzmc] 

> Ozone Client should retry unflushed buffers on new pipeline on GroupMismatch Exception.
> ---------------------------------------------------------------------------------------
>
>                 Key: HDDS-6109
>                 URL: https://issues.apache.org/jira/browse/HDDS-6109
>             Project: Apache Ozone
>          Issue Type: Bug
>          Components: Ozone Client
>            Reporter: Ritesh H Shukla
>            Assignee: Ritesh H Shukla
>            Priority: Major
>              Labels: pull-request-available
>
> Currently, if the pipeline is closed in between a write the client gets a Mismatch Exception which results in a exception using the client. https://github.com/kerneltime/ozone/blob/a43735eba7a2eea7769ea146a136aebae3b8b84b/hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/client/rpc/TestContainerStateMachineFailures.java#L175-L284
> {code}
> 2021-12-14 14:38:49,683 [Command processor thread] INFO server.RaftServer$Division (ServerState.java:close(419)) - 2d07f9d1-28a1-49bc-a902-d2a1291cbdf1@group-89F59A98FF87: closes. applyIndex: 2
> 2021-12-14 14:38:49,683 [2d07f9d1-28a1-49bc-a902-d2a1291cbdf1@group-89F59A98FF87-SegmentedRaftLogWorker] INFO segmented.SegmentedRaftLogWorker (SegmentedRaftLogWorker.java:run(327)) - 2d07f9d1-28a1-49bc-a902-d2a1291cbdf1@group-89F59A98FF87-SegmentedRaftLogWorker was interrupted, exiting. There are 0 tasks remaining in the queue.
> 2021-12-14 14:38:49,686 [Command processor thread] INFO segmented.SegmentedRaftLogWorker (SegmentedRaftLogWorker.java:close(237)) - 2d07f9d1-28a1-49bc-a902-d2a1291cbdf1@group-89F59A98FF87-SegmentedRaftLogWorker close()
> 2021-12-14 14:38:49,691 [Command processor thread] INFO server.RaftServer$Division (RaftServerImpl.java:groupRemove(382)) - 2d07f9d1-28a1-49bc-a902-d2a1291cbdf1@group-89F59A98FF87: Succeed to remove RaftStorageDirectory Storage Directory /Users/ritesh/IdeaProjects/ozone/hadoop-ozone/integration-test/target/test-dir/MiniOzoneClusterImpl-4ef3409b-a4e4-4564-b417-667c302b8de2/datanode-1/data/ratis/pipelineXXX
> 2021-12-14 14:38:49,691 [Command processor thread] INFO commandhandler.ClosePipelineCommandHandler (ClosePipelineCommandHandler.java:handle(78)) - Close Pipeline PipelineID=pipelineXXX command on datanode 2d07f9d1-28a1-49bc-a902-d2a1291cbdf1.
> 2021-12-14 14:38:49,728 [EventQueue-PipelineReportForPipelineReportHandler] INFO pipeline.PipelineReportHandler (PipelineReportHandler.java:processPipelineReport(113)) - Reported pipeline PipelineID=pipelineXXX is not found
> 2021-12-14 14:38:51,926 [Listener at 127.0.0.1/52003] WARN scm.XceiverClientRatis (XceiverClientRatis.java:watchForCommit(266)) - 3 way commit failed on pipeline Pipeline[ Id: pipelineXXX, Nodes: 8c998abc-6bf8-426d-ae41-6d32c225dbb3\{ip: 192.168.86.246, host: 21884.lan, ports: [REPLICATION=52022, RATIS=52023, RATIS_ADMIN=52023, RATIS_SERVER=52023, STANDALONE=52024], networkLocation: /default-rack, certSerialId: null, persistedOpState: IN_SERVICE, persistedOpStateExpiryEpochSec: 0}82f2254c-9af0-4452-9f3a-881c3df8ce31\{ip: 192.168.86.246, host: 21884.lan, ports: [REPLICATION=52016, RATIS=52017, RATIS_ADMIN=52017, RATIS_SERVER=52017, STANDALONE=52018], networkLocation: /default-rack, certSerialId: null, persistedOpState: IN_SERVICE, persistedOpStateExpiryEpochSec: 0}2d07f9d1-28a1-49bc-a902-d2a1291cbdf1\{ip: 192.168.86.246, host: 21884.lan, ports: [REPLICATION=52019, RATIS=52020, RATIS_ADMIN=52020, RATIS_SERVER=52020, STANDALONE=52021], networkLocation: /default-rack, certSerialId: null, persistedOpState: IN_SERVICE, persistedOpStateExpiryEpochSec: 0}, ReplicationConfig: RATIS/THREE, State:OPEN, leaderId:82f2254c-9af0-4452-9f3a-881c3df8ce31, CreationTimestamp2021-12-14T14:38:39.305-08:00[America/Los_Angeles]]
> java.util.concurrent.ExecutionException: org.apache.ratis.protocol.exceptions.RaftRetryFailureException: Failed RaftClientRequest:client-214E4F4A64F9->8c998abc-6bf8-426d-ae41-6d32c225dbb3@group-89F59A98FF87, cid=37, seq=0, Watch-ALL_COMMITTED(6), null for 2 attempts with RequestTypeDependentRetryPolicy\{WRITE->org.apache.ratis.retry.ExceptionDependentRetry@7754720f, WATCH->org.apache.ratis.retry.ExceptionDependentRetry@514c16e5}
> at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
> at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
> at org.apache.hadoop.hdds.scm.XceiverClientRatis.watchForCommit(XceiverClientRatis.java:263)
> at org.apache.hadoop.hdds.scm.storage.CommitWatcher.watchForCommit(CommitWatcher.java:199)
> at org.apache.hadoop.hdds.scm.storage.CommitWatcher.watchOnLastIndex(CommitWatcher.java:166)
> at org.apache.hadoop.hdds.scm.storage.RatisBlockOutputStream.sendWatchForCommit(RatisBlockOutputStream.java:101)
> at org.apache.hadoop.hdds.scm.storage.BlockOutputStream.watchForCommit(BlockOutputStream.java:373)
> at org.apache.hadoop.hdds.scm.storage.BlockOutputStream.handleFlush(BlockOutputStream.java:533)
> at org.apache.hadoop.hdds.scm.storage.BlockOutputStream.close(BlockOutputStream.java:547)
> at org.apache.hadoop.ozone.client.io.BlockOutputStreamEntry.close(BlockOutputStreamEntry.java:137)
> at org.apache.hadoop.ozone.client.io.KeyOutputStream.handleStreamAction(KeyOutputStream.java:495)
> at org.apache.hadoop.ozone.client.io.KeyOutputStream.handleFlushOrClose(KeyOutputStream.java:469)
> at org.apache.hadoop.ozone.client.io.KeyOutputStream.close(KeyOutputStream.java:522)
> at org.apache.hadoop.ozone.client.io.OzoneOutputStream.close(OzoneOutputStream.java:61)
> at org.apache.hadoop.ozone.client.rpc.TestContainerStateMachineFailures.testContainerStateMachineTransitionOnUnhealthyReplicas(TestContainerStateMachineFailures.java:225)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
> at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
> at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
> at org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
> at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
> at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
> at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
> at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
> at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
> at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
> at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
> at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
> at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
> at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
> at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
> at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
> at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
> at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:69)
> at com.intellij.rt.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:33)
> at com.intellij.rt.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:235)
> at com.intellij.rt.junit.JUnitStarter.main(JUnitStarter.java:54)
> Caused by: org.apache.ratis.protocol.exceptions.RaftRetryFailureException: Failed RaftClientRequest:client-214E4F4A64F9->8c998abc-6bf8-426d-ae41-6d32c225dbb3@group-89F59A98FF87, cid=37, seq=0, Watch-ALL_COMMITTED(6), null for 2 attempts with RequestTypeDependentRetryPolicy\{WRITE->org.apache.ratis.retry.ExceptionDependentRetry@7754720f, WATCH->org.apache.ratis.retry.ExceptionDependentRetry@514c16e5}
> at org.apache.ratis.client.impl.RaftClientImpl.noMoreRetries(RaftClientImpl.java:272)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org