You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by "Lokesh Jain (Jira)" <ji...@apache.org> on 2020/12/22 13:06:00 UTC

[jira] [Updated] (HDDS-3611) Ozone client should not consider closed container error as failure

     [ https://issues.apache.org/jira/browse/HDDS-3611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lokesh Jain updated HDDS-3611:
------------------------------
    Resolution: Later
        Status: Resolved  (was: Patch Available)

> Ozone client should not consider closed container error as failure
> ------------------------------------------------------------------
>
>                 Key: HDDS-3611
>                 URL: https://issues.apache.org/jira/browse/HDDS-3611
>             Project: Hadoop Distributed Data Store
>          Issue Type: Bug
>          Components: Ozone Client
>            Reporter: Lokesh Jain
>            Assignee: Lokesh Jain
>            Priority: Critical
>              Labels: TriagePending, pull-request-available
>
> ContainerNotOpen exception exception is thrown by datanode when client is writing to a non open container. Currently ozone client sees this as failure and would increment the retry count. If client reaches a configured retry count it fails the write. Map reduce jobs were seen failing due to this error with default retry count of 5.
> Idea is to not consider errors due to closed container in retry count. This would make sure that ozone client writes do not fail due to closed container exceptions.
> {code:java}
> 2020-05-15 02:20:28,375 ERROR [main] org.apache.hadoop.ozone.client.io.KeyOutputStream: Retry request failed. retries get failed due to exceeded maximum allowed retries number: 5
> java.io.IOException: Unexpected Storage Container Exception: java.util.concurrent.CompletionException: java.util.concurrent.CompletionException: org.apache.ratis.protocol.StateMachineException: org.apache.hadoop.hdds.scm.container.common.helpers.ContainerNotOpenException from Server e2eec12f-02c5-46e2-9c23-14d6445db219@group-A3BF3ABDC307: Container 15 in CLOSED state
>         at org.apache.hadoop.hdds.scm.storage.BlockOutputStream.setIoException(BlockOutputStream.java:551)
>         at org.apache.hadoop.hdds.scm.storage.BlockOutputStream.lambda$writeChunkToContainer$3(BlockOutputStream.java:638)
>         at java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:884)
>         at java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:866)
>         at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
>         at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)
>         at org.apache.ratis.client.impl.OrderedAsync$PendingOrderedRequest.setReply(OrderedAsync.java:99)
>         at org.apache.ratis.client.impl.OrderedAsync$PendingOrderedRequest.setReply(OrderedAsync.java:60)
>         at org.apache.ratis.util.SlidingWindow$RequestMap.setReply(SlidingWindow.java:143)
>         at org.apache.ratis.util.SlidingWindow$Client.receiveReply(SlidingWindow.java:314)
>         at org.apache.ratis.client.impl.OrderedAsync.lambda$sendRequest$9(OrderedAsync.java:242)
>         at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:616)
>         at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591)
>         at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
>         at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)
>         at org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers$1.lambda$onNext$0(GrpcClientProtocolClient.java:284)
>         at java.util.Optional.ifPresent(Optional.java:159)
>         at org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers.handleReplyFuture(GrpcClientProtocolClient.java:340)
>         at org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers.access$100(GrpcClientProtocolClient.java:264)
>         at org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers$1.onNext(GrpcClientProtocolClient.java:284)
>         at org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers$1.onNext(GrpcClientProtocolClient.java:267)
>         at org.apache.ratis.thirdparty.io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onMessage(ClientCalls.java:436)
>         at org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1MessagesAvailable.runInternal(ClientCallImpl.java:658)
> ...{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org