You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by "Sammi Chen (Jira)" <ji...@apache.org> on 2021/08/27 12:23:00 UTC

[jira] [Resolved] (HDDS-5688) Rpc should not retry if exception is ContainerNotFoundException

     [ https://issues.apache.org/jira/browse/HDDS-5688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sammi Chen resolved HDDS-5688.
------------------------------
    Resolution: Fixed

> Rpc should not retry if exception is ContainerNotFoundException
> ---------------------------------------------------------------
>
>                 Key: HDDS-5688
>                 URL: https://issues.apache.org/jira/browse/HDDS-5688
>             Project: Apache Ozone
>          Issue Type: Bug
>            Reporter: Sammi Chen
>            Assignee: Sammi Chen
>            Priority: Major
>              Labels: pull-request-available
>
> SCM HA is enabled. When run the "ozone admin container info" with non existed container ID, the command will retry many times before stop.   Here is the first three retry output, 
> Hadoop UGI authentication : TAUTH
> com.google.protobuf.ServiceException: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdds.ratis.ServerNotLeaderException): Server:7aac262f-5828-448d-a1aa-cd8a3e344b4b is not the leader. Suggested leader is Server:qy-ozone-common-v1-scm-1.tencent-distribute.com:9860.
>         at org.apache.hadoop.hdds.ratis.ServerNotLeaderException.convertToNotLeaderException(ServerNotLeaderException.java:106)
>         at org.apache.hadoop.hdds.scm.ha.RatisUtil.checkRatisException(RatisUtil.java:191)
>         at org.apache.hadoop.hdds.scm.protocol.StorageContainerLocationProtocolServerSideTranslatorPB.submitRequest(StorageContainerLocationProtocolServerSideTranslatorPB.java:150)
>         at org.apache.hadoop.hdds.protocol.proto.StorageContainerLocationProtocolProtos$StorageContainerLocationProtocolService$2.callBlockingMethod(StorageContainerLocationProtocolProtos.java:48216)
>         at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:529)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
>         at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1024)
>         at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:948)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:422)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:2002)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2993)
> , while invoking $Proxy19.submitRequest over nodeId=scm2,nodeAddress=qy-ozone-common-v1-scm-2.tencent-distribute.com/11.32.183.209:9860 after 1 failover attempts. Trying to failover after sleeping for 2000ms.
> com.google.protobuf.ServiceException: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdds.scm.container.ContainerNotFoundException): ID #481
>         at org.apache.hadoop.hdds.scm.container.ContainerManagerImpl.lambda$getContainer$0(ContainerManagerImpl.java:147)
>         at java.util.Optional.orElseThrow(Optional.java:290)
>         at org.apache.hadoop.hdds.scm.container.ContainerManagerImpl.getContainer(ContainerManagerImpl.java:147)
>         at org.apache.hadoop.hdds.scm.server.SCMClientProtocolServer.getContainerWithPipelineCommon(SCMClientProtocolServer.java:236)
>         at org.apache.hadoop.hdds.scm.server.SCMClientProtocolServer.getContainerWithPipeline(SCMClientProtocolServer.java:275)
>         at org.apache.hadoop.hdds.scm.protocol.StorageContainerLocationProtocolServerSideTranslatorPB.getContainerWithPipeline(StorageContainerLocationProtocolServerSideTranslatorPB.java:396)
>         at org.apache.hadoop.hdds.scm.protocol.StorageContainerLocationProtocolServerSideTranslatorPB.processRequest(StorageContainerLocationProtocolServerSideTranslatorPB.java:189)
>         at org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:87)
>         at org.apache.hadoop.hdds.scm.protocol.StorageContainerLocationProtocolServerSideTranslatorPB.submitRequest(StorageContainerLocationProtocolServerSideTranslatorPB.java:155)
>         at org.apache.hadoop.hdds.protocol.proto.StorageContainerLocationProtocolProtos$StorageContainerLocationProtocolService$2.callBlockingMethod(StorageContainerLocationProtocolProtos.java:48216)
>         at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:529)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
>         at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1024)
>         at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:948)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:422)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:2002)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2993)
> , while invoking $Proxy19.submitRequest over nodeId=scm1,nodeAddress=qy-ozone-common-v1-scm-1.tencent-distribute.com/11.32.205.14:9860 after 2 failover attempts. Trying to failover after sleeping for 2000ms.
> com.google.protobuf.ServiceException: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdds.ratis.ServerNotLeaderException): Server:9e77f811-8df6-4a59-9642-0f40d6f01764 is not the leader. Suggested leader is Server:qy-ozone-common-v1-scm-1.tencent-distribute.com:9860.
>         at org.apache.hadoop.hdds.ratis.ServerNotLeaderException.convertToNotLeaderException(ServerNotLeaderException.java:106)
>         at org.apache.hadoop.hdds.scm.ha.RatisUtil.checkRatisException(RatisUtil.java:191)
>         at org.apache.hadoop.hdds.scm.protocol.StorageContainerLocationProtocolServerSideTranslatorPB.submitRequest(StorageContainerLocationProtocolServerSideTranslatorPB.java:150)
>         at org.apache.hadoop.hdds.protocol.proto.StorageContainerLocationProtocolProtos$StorageContainerLocationProtocolService$2.callBlockingMethod(StorageContainerLocationProtocolProtos.java:48216)
>         at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:529)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
>         at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1024)
>         at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:948)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:422)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:2002)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2993)
> , while invoking $Proxy19.submitRequest over nodeId=scm3,nodeAddress=qy-ozone-common-v1-scm-3.tencent-distribute.com/11.0.119.77:9860 after 3 failover attempts. Trying to failover after sleeping for 2000ms.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org