You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by "Sammi Chen (Jira)" <ji...@apache.org> on 2021/08/27 07:08:00 UTC
[jira] [Created] (HDDS-5688) Rpc should not retry if the exception
is ContainerNotFoundException
Sammi Chen created HDDS-5688:
--------------------------------
Summary: Rpc should not retry if the exception is ContainerNotFoundException
Key: HDDS-5688
URL: https://issues.apache.org/jira/browse/HDDS-5688
Project: Apache Ozone
Issue Type: Bug
Reporter: Sammi Chen
Assignee: Sammi Chen
SCM HA is enabled. When run the "ozone admin container info" with non existed container ID, the command will retry many times before stop. Here is the first three retry output,
Hadoop UGI authentication : TAUTH
com.google.protobuf.ServiceException: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdds.ratis.ServerNotLeaderException): Server:7aac262f-5828-448d-a1aa-cd8a3e344b4b is not the leader. Suggested leader is Server:qy-ozone-common-v1-scm-1.tencent-distribute.com:9860.
at org.apache.hadoop.hdds.ratis.ServerNotLeaderException.convertToNotLeaderException(ServerNotLeaderException.java:106)
at org.apache.hadoop.hdds.scm.ha.RatisUtil.checkRatisException(RatisUtil.java:191)
at org.apache.hadoop.hdds.scm.protocol.StorageContainerLocationProtocolServerSideTranslatorPB.submitRequest(StorageContainerLocationProtocolServerSideTranslatorPB.java:150)
at org.apache.hadoop.hdds.protocol.proto.StorageContainerLocationProtocolProtos$StorageContainerLocationProtocolService$2.callBlockingMethod(StorageContainerLocationProtocolProtos.java:48216)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:529)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1024)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:948)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:2002)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2993)
, while invoking $Proxy19.submitRequest over nodeId=scm2,nodeAddress=qy-ozone-common-v1-scm-2.tencent-distribute.com/11.32.183.209:9860 after 1 failover attempts. Trying to failover after sleeping for 2000ms.
com.google.protobuf.ServiceException: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdds.scm.container.ContainerNotFoundException): ID #481
at org.apache.hadoop.hdds.scm.container.ContainerManagerImpl.lambda$getContainer$0(ContainerManagerImpl.java:147)
at java.util.Optional.orElseThrow(Optional.java:290)
at org.apache.hadoop.hdds.scm.container.ContainerManagerImpl.getContainer(ContainerManagerImpl.java:147)
at org.apache.hadoop.hdds.scm.server.SCMClientProtocolServer.getContainerWithPipelineCommon(SCMClientProtocolServer.java:236)
at org.apache.hadoop.hdds.scm.server.SCMClientProtocolServer.getContainerWithPipeline(SCMClientProtocolServer.java:275)
at org.apache.hadoop.hdds.scm.protocol.StorageContainerLocationProtocolServerSideTranslatorPB.getContainerWithPipeline(StorageContainerLocationProtocolServerSideTranslatorPB.java:396)
at org.apache.hadoop.hdds.scm.protocol.StorageContainerLocationProtocolServerSideTranslatorPB.processRequest(StorageContainerLocationProtocolServerSideTranslatorPB.java:189)
at org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:87)
at org.apache.hadoop.hdds.scm.protocol.StorageContainerLocationProtocolServerSideTranslatorPB.submitRequest(StorageContainerLocationProtocolServerSideTranslatorPB.java:155)
at org.apache.hadoop.hdds.protocol.proto.StorageContainerLocationProtocolProtos$StorageContainerLocationProtocolService$2.callBlockingMethod(StorageContainerLocationProtocolProtos.java:48216)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:529)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1024)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:948)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:2002)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2993)
, while invoking $Proxy19.submitRequest over nodeId=scm1,nodeAddress=qy-ozone-common-v1-scm-1.tencent-distribute.com/11.32.205.14:9860 after 2 failover attempts. Trying to failover after sleeping for 2000ms.
com.google.protobuf.ServiceException: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdds.ratis.ServerNotLeaderException): Server:9e77f811-8df6-4a59-9642-0f40d6f01764 is not the leader. Suggested leader is Server:qy-ozone-common-v1-scm-1.tencent-distribute.com:9860.
at org.apache.hadoop.hdds.ratis.ServerNotLeaderException.convertToNotLeaderException(ServerNotLeaderException.java:106)
at org.apache.hadoop.hdds.scm.ha.RatisUtil.checkRatisException(RatisUtil.java:191)
at org.apache.hadoop.hdds.scm.protocol.StorageContainerLocationProtocolServerSideTranslatorPB.submitRequest(StorageContainerLocationProtocolServerSideTranslatorPB.java:150)
at org.apache.hadoop.hdds.protocol.proto.StorageContainerLocationProtocolProtos$StorageContainerLocationProtocolService$2.callBlockingMethod(StorageContainerLocationProtocolProtos.java:48216)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:529)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1024)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:948)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:2002)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2993)
, while invoking $Proxy19.submitRequest over nodeId=scm3,nodeAddress=qy-ozone-common-v1-scm-3.tencent-distribute.com/11.0.119.77:9860 after 3 failover attempts. Trying to failover after sleeping for 2000ms.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org