You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by "Bharat Viswanadham (Jira)" <ji...@apache.org> on 2021/04/01 10:31:00 UTC
[jira] [Updated] (HDDS-5058) Make getScmInfo retry for a duration
[ https://issues.apache.org/jira/browse/HDDS-5058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Bharat Viswanadham updated HDDS-5058:
-------------------------------------
Description:
Previously during init of OM for getScmInfo we used to do RetryForEverWithFixedSleep, but during SCM HA we have removed this.
This Jira proposes to add a ceration duration to try getScmInfo, instead of retry forever with fixed sleep.
In a few docker tests CI run, we have seen this issue, after 15 retries Om init failed, as SCM is started later.
{code:java}
om1_1 | 2021-03-31 17:03:48,184 [main] WARN server.ServerUtils: ozone.om.db.dirs is not configured. We recommend adding this setting. Falling back to ozone.metadata.dirs instead.
om1_1 | 2021-03-31 17:03:52,453 [main] INFO retry.RetryInvocationHandler: com.google.protobuf.ServiceException: java.net.ConnectException: Call From om1/172.20.0.4 to scm2:9863 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused, while invoking $Proxy31.send over nodeId=scm2,nodeAddress=scm2/172.20.0.6:9863 after 1 failover attempts. Trying to failover after sleeping for 2000ms.
om1_1 | 2021-03-31 17:03:54,455 [main] INFO retry.RetryInvocationHandler: com.google.protobuf.ServiceException: java.net.ConnectException: Call From om1/172.20.0.4 to scm3:9863 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused, while invoking $Proxy31.send over nodeId=scm3,nodeAddress=scm3/172.20.0.7:9863 after 2 failover attempts. Trying to failover after sleeping for 2000ms.
om1_1 | 2021-03-31 17:03:56,457 [main] INFO retry.RetryInvocationHandler: com.google.protobuf.ServiceException: java.net.ConnectException: Call From om1/172.20.0.4 to scm1:9863 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused, while invoking $Proxy31.send over nodeId=scm1,nodeAddress=scm1/172.20.0.8:9863 after 3 failover attempts. Trying to failover after sleeping for 2000ms.
om1_1 | 2021-03-31 17:03:58,466 [main] INFO retry.RetryInvocationHandler: com.google.protobuf.ServiceException: java.net.ConnectException: Call From om1/172.20.0.4 to scm2:9863 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused, while invoking $Proxy31.send over nodeId=scm2,nodeAddress=scm2/172.20.0.6:9863 after 4 failover attempts. Trying to failover after sleeping for 2000ms.
om1_1 | 2021-03-31 17:04:00,498 [main] INFO retry.RetryInvocationHandler: com.google.protobuf.ServiceException: java.net.ConnectException: Call From om1/172.20.0.4 to scm3:9863 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused, while invoking $Proxy31.send over nodeId=scm3,nodeAddress=scm3/172.20.0.7:9863 after 5 failover attempts. Trying to failover after sleeping for 2000ms.
om1_1 | 2021-03-31 17:04:02,522 [main] INFO retry.RetryInvocationHandler: com.google.protobuf.ServiceException: java.net.ConnectException: Call From om1/172.20.0.4 to scm1:9863 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused, while invoking $Proxy31.send over nodeId=scm1,nodeAddress=scm1/172.20.0.8:9863 after 6 failover attempts. Trying to failover after sleeping for 2000ms.
om1_1 | 2021-03-31 17:04:04,533 [main] INFO retry.RetryInvocationHandler: com.google.protobuf.ServiceException: java.net.ConnectException: Call From om1/172.20.0.4 to scm2:9863 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused, while invoking $Proxy31.send over nodeId=scm2,nodeAddress=scm2/172.20.0.6:9863 after 7 failover attempts. Trying to failover after sleeping for 2000ms.
om1_1 | 2021-03-31 17:04:06,535 [main] INFO retry.RetryInvocationHandler: com.google.protobuf.ServiceException: java.net.ConnectException: Call From om1/172.20.0.4 to scm3:9863 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused, while invoking $Proxy31.send over nodeId=scm3,nodeAddress=scm3/172.20.0.7:9863 after 8 failover attempts. Trying to failover after sleeping for 2000ms.
om1_1 | 2021-03-31 17:04:08,537 [main] INFO retry.RetryInvocationHandler: com.google.protobuf.ServiceException: java.net.ConnectException: Call From om1/172.20.0.4 to scm1:9863 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused, while invoking $Proxy31.send over nodeId=scm1,nodeAddress=scm1/172.20.0.8:9863 after 9 failover attempts. Trying to failover after sleeping for 2000ms.
om1_1 | 2021-03-31 17:04:10,541 [main] INFO retry.RetryInvocationHandler: com.google.protobuf.ServiceException: java.net.ConnectException: Call From om1/172.20.0.4 to scm2:9863 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused, while invoking $Proxy31.send over nodeId=scm2,nodeAddress=scm2/172.20.0.6:9863 after 10 failover attempts. Trying to failover after sleeping for 2000ms.
om1_1 | 2021-03-31 17:04:12,543 [main] INFO retry.RetryInvocationHandler: com.google.protobuf.ServiceException: java.net.ConnectException: Call From om1/172.20.0.4 to scm3:9863 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused, while invoking $Proxy31.send over nodeId=scm3,nodeAddress=scm3/172.20.0.7:9863 after 11 failover attempts. Trying to failover after sleeping for 2000ms.
om1_1 | 2021-03-31 17:04:14,546 [main] INFO retry.RetryInvocationHandler: com.google.protobuf.ServiceException: java.net.ConnectException: Call From om1/172.20.0.4 to scm1:9863 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused, while invoking $Proxy31.send over nodeId=scm1,nodeAddress=scm1/172.20.0.8:9863 after 12 failover attempts. Trying to failover after sleeping for 2000ms.
om1_1 | 2021-03-31 17:04:16,550 [main] INFO retry.RetryInvocationHandler: com.google.protobuf.ServiceException: java.net.ConnectException: Call From om1/172.20.0.4 to scm2:9863 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused, while invoking $Proxy31.send over nodeId=scm2,nodeAddress=scm2/172.20.0.6:9863 after 13 failover attempts. Trying to failover after sleeping for 2000ms.
om1_1 | 2021-03-31 17:04:18,553 [main] INFO retry.RetryInvocationHandler: com.google.protobuf.ServiceException: java.net.ConnectException: Call From om1/172.20.0.4 to scm3:9863 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused, while invoking $Proxy31.send over nodeId=scm3,nodeAddress=scm3/172.20.0.7:9863 after 14 failover attempts. Trying to failover after sleeping for 2000ms.
om1_1 | 2021-03-31 17:04:20,795 [main] ERROR om.OzoneManager: Could not initialize OM version file
om1_1 | org.apache.hadoop.ipc.RemoteException(org.apache.ratis.protocol.exceptions.NotLeaderException): Server 9cb7a7ae-4c40-401c-b1c6-55728c1f0907@group-C35E1BD0DE21 is not the leader
om1_1 | at org.apache.hadoop.hdds.scm.ha.SCMRatisServerImpl.triggerNotLeaderException(SCMRatisServerImpl.java:245)
om1_1 | at org.apache.hadoop.hdds.scm.protocol.ScmBlockLocationProtocolServerSideTranslatorPB.send(ScmBlockLocationProtocolServerSideTranslatorPB.java:108)
om1_1 | at org.apache.hadoop.hdds.protocol.proto.ScmBlockLocationProtocolProtos$ScmBlockLocationProtocolService$2.callBlockingMethod(ScmBlockLocationProtocolProtos.java:13874)
om1_1 | at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:528)
om1_1 | at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1086)
om1_1 | at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1029)
om1_1 | at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:957)
om1_1 | at java.base/java.security.AccessController.doPrivileged(Native Method)
om1_1 | at java.base/javax.security.auth.Subject.doAs(Subject.java:423)
om1_1 | at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1762)
om1_1 | at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2957)
om1_1 |
{code}
was:
Previously during init of OM for getScmInfo we used to do RetryForEverWithFixedSleep, but during SCM HA we have removed this.
This Jira proposes to add a ceration duration to try getScmInfo, instead of retry forever with fixed sleep.
In a few docker tests, we have seen this issue, after 15 retries Om init failed, as SCM is started later.
{code:java}
om1_1 | 2021-03-31 17:03:48,184 [main] WARN server.ServerUtils: ozone.om.db.dirs is not configured. We recommend adding this setting. Falling back to ozone.metadata.dirs instead.
om1_1 | 2021-03-31 17:03:52,453 [main] INFO retry.RetryInvocationHandler: com.google.protobuf.ServiceException: java.net.ConnectException: Call From om1/172.20.0.4 to scm2:9863 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused, while invoking $Proxy31.send over nodeId=scm2,nodeAddress=scm2/172.20.0.6:9863 after 1 failover attempts. Trying to failover after sleeping for 2000ms.
om1_1 | 2021-03-31 17:03:54,455 [main] INFO retry.RetryInvocationHandler: com.google.protobuf.ServiceException: java.net.ConnectException: Call From om1/172.20.0.4 to scm3:9863 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused, while invoking $Proxy31.send over nodeId=scm3,nodeAddress=scm3/172.20.0.7:9863 after 2 failover attempts. Trying to failover after sleeping for 2000ms.
om1_1 | 2021-03-31 17:03:56,457 [main] INFO retry.RetryInvocationHandler: com.google.protobuf.ServiceException: java.net.ConnectException: Call From om1/172.20.0.4 to scm1:9863 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused, while invoking $Proxy31.send over nodeId=scm1,nodeAddress=scm1/172.20.0.8:9863 after 3 failover attempts. Trying to failover after sleeping for 2000ms.
om1_1 | 2021-03-31 17:03:58,466 [main] INFO retry.RetryInvocationHandler: com.google.protobuf.ServiceException: java.net.ConnectException: Call From om1/172.20.0.4 to scm2:9863 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused, while invoking $Proxy31.send over nodeId=scm2,nodeAddress=scm2/172.20.0.6:9863 after 4 failover attempts. Trying to failover after sleeping for 2000ms.
om1_1 | 2021-03-31 17:04:00,498 [main] INFO retry.RetryInvocationHandler: com.google.protobuf.ServiceException: java.net.ConnectException: Call From om1/172.20.0.4 to scm3:9863 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused, while invoking $Proxy31.send over nodeId=scm3,nodeAddress=scm3/172.20.0.7:9863 after 5 failover attempts. Trying to failover after sleeping for 2000ms.
om1_1 | 2021-03-31 17:04:02,522 [main] INFO retry.RetryInvocationHandler: com.google.protobuf.ServiceException: java.net.ConnectException: Call From om1/172.20.0.4 to scm1:9863 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused, while invoking $Proxy31.send over nodeId=scm1,nodeAddress=scm1/172.20.0.8:9863 after 6 failover attempts. Trying to failover after sleeping for 2000ms.
om1_1 | 2021-03-31 17:04:04,533 [main] INFO retry.RetryInvocationHandler: com.google.protobuf.ServiceException: java.net.ConnectException: Call From om1/172.20.0.4 to scm2:9863 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused, while invoking $Proxy31.send over nodeId=scm2,nodeAddress=scm2/172.20.0.6:9863 after 7 failover attempts. Trying to failover after sleeping for 2000ms.
om1_1 | 2021-03-31 17:04:06,535 [main] INFO retry.RetryInvocationHandler: com.google.protobuf.ServiceException: java.net.ConnectException: Call From om1/172.20.0.4 to scm3:9863 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused, while invoking $Proxy31.send over nodeId=scm3,nodeAddress=scm3/172.20.0.7:9863 after 8 failover attempts. Trying to failover after sleeping for 2000ms.
om1_1 | 2021-03-31 17:04:08,537 [main] INFO retry.RetryInvocationHandler: com.google.protobuf.ServiceException: java.net.ConnectException: Call From om1/172.20.0.4 to scm1:9863 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused, while invoking $Proxy31.send over nodeId=scm1,nodeAddress=scm1/172.20.0.8:9863 after 9 failover attempts. Trying to failover after sleeping for 2000ms.
om1_1 | 2021-03-31 17:04:10,541 [main] INFO retry.RetryInvocationHandler: com.google.protobuf.ServiceException: java.net.ConnectException: Call From om1/172.20.0.4 to scm2:9863 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused, while invoking $Proxy31.send over nodeId=scm2,nodeAddress=scm2/172.20.0.6:9863 after 10 failover attempts. Trying to failover after sleeping for 2000ms.
om1_1 | 2021-03-31 17:04:12,543 [main] INFO retry.RetryInvocationHandler: com.google.protobuf.ServiceException: java.net.ConnectException: Call From om1/172.20.0.4 to scm3:9863 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused, while invoking $Proxy31.send over nodeId=scm3,nodeAddress=scm3/172.20.0.7:9863 after 11 failover attempts. Trying to failover after sleeping for 2000ms.
om1_1 | 2021-03-31 17:04:14,546 [main] INFO retry.RetryInvocationHandler: com.google.protobuf.ServiceException: java.net.ConnectException: Call From om1/172.20.0.4 to scm1:9863 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused, while invoking $Proxy31.send over nodeId=scm1,nodeAddress=scm1/172.20.0.8:9863 after 12 failover attempts. Trying to failover after sleeping for 2000ms.
om1_1 | 2021-03-31 17:04:16,550 [main] INFO retry.RetryInvocationHandler: com.google.protobuf.ServiceException: java.net.ConnectException: Call From om1/172.20.0.4 to scm2:9863 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused, while invoking $Proxy31.send over nodeId=scm2,nodeAddress=scm2/172.20.0.6:9863 after 13 failover attempts. Trying to failover after sleeping for 2000ms.
om1_1 | 2021-03-31 17:04:18,553 [main] INFO retry.RetryInvocationHandler: com.google.protobuf.ServiceException: java.net.ConnectException: Call From om1/172.20.0.4 to scm3:9863 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused, while invoking $Proxy31.send over nodeId=scm3,nodeAddress=scm3/172.20.0.7:9863 after 14 failover attempts. Trying to failover after sleeping for 2000ms.
om1_1 | 2021-03-31 17:04:20,795 [main] ERROR om.OzoneManager: Could not initialize OM version file
om1_1 | org.apache.hadoop.ipc.RemoteException(org.apache.ratis.protocol.exceptions.NotLeaderException): Server 9cb7a7ae-4c40-401c-b1c6-55728c1f0907@group-C35E1BD0DE21 is not the leader
om1_1 | at org.apache.hadoop.hdds.scm.ha.SCMRatisServerImpl.triggerNotLeaderException(SCMRatisServerImpl.java:245)
om1_1 | at org.apache.hadoop.hdds.scm.protocol.ScmBlockLocationProtocolServerSideTranslatorPB.send(ScmBlockLocationProtocolServerSideTranslatorPB.java:108)
om1_1 | at org.apache.hadoop.hdds.protocol.proto.ScmBlockLocationProtocolProtos$ScmBlockLocationProtocolService$2.callBlockingMethod(ScmBlockLocationProtocolProtos.java:13874)
om1_1 | at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:528)
om1_1 | at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1086)
om1_1 | at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1029)
om1_1 | at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:957)
om1_1 | at java.base/java.security.AccessController.doPrivileged(Native Method)
om1_1 | at java.base/javax.security.auth.Subject.doAs(Subject.java:423)
om1_1 | at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1762)
om1_1 | at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2957)
om1_1 |
{code}
> Make getScmInfo retry for a duration
> ------------------------------------
>
> Key: HDDS-5058
> URL: https://issues.apache.org/jira/browse/HDDS-5058
> Project: Apache Ozone
> Issue Type: Bug
> Reporter: Bharat Viswanadham
> Assignee: Bharat Viswanadham
> Priority: Major
>
> Previously during init of OM for getScmInfo we used to do RetryForEverWithFixedSleep, but during SCM HA we have removed this.
> This Jira proposes to add a ceration duration to try getScmInfo, instead of retry forever with fixed sleep.
> In a few docker tests CI run, we have seen this issue, after 15 retries Om init failed, as SCM is started later.
> {code:java}
> om1_1 | 2021-03-31 17:03:48,184 [main] WARN server.ServerUtils: ozone.om.db.dirs is not configured. We recommend adding this setting. Falling back to ozone.metadata.dirs instead.
> om1_1 | 2021-03-31 17:03:52,453 [main] INFO retry.RetryInvocationHandler: com.google.protobuf.ServiceException: java.net.ConnectException: Call From om1/172.20.0.4 to scm2:9863 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused, while invoking $Proxy31.send over nodeId=scm2,nodeAddress=scm2/172.20.0.6:9863 after 1 failover attempts. Trying to failover after sleeping for 2000ms.
> om1_1 | 2021-03-31 17:03:54,455 [main] INFO retry.RetryInvocationHandler: com.google.protobuf.ServiceException: java.net.ConnectException: Call From om1/172.20.0.4 to scm3:9863 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused, while invoking $Proxy31.send over nodeId=scm3,nodeAddress=scm3/172.20.0.7:9863 after 2 failover attempts. Trying to failover after sleeping for 2000ms.
> om1_1 | 2021-03-31 17:03:56,457 [main] INFO retry.RetryInvocationHandler: com.google.protobuf.ServiceException: java.net.ConnectException: Call From om1/172.20.0.4 to scm1:9863 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused, while invoking $Proxy31.send over nodeId=scm1,nodeAddress=scm1/172.20.0.8:9863 after 3 failover attempts. Trying to failover after sleeping for 2000ms.
> om1_1 | 2021-03-31 17:03:58,466 [main] INFO retry.RetryInvocationHandler: com.google.protobuf.ServiceException: java.net.ConnectException: Call From om1/172.20.0.4 to scm2:9863 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused, while invoking $Proxy31.send over nodeId=scm2,nodeAddress=scm2/172.20.0.6:9863 after 4 failover attempts. Trying to failover after sleeping for 2000ms.
> om1_1 | 2021-03-31 17:04:00,498 [main] INFO retry.RetryInvocationHandler: com.google.protobuf.ServiceException: java.net.ConnectException: Call From om1/172.20.0.4 to scm3:9863 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused, while invoking $Proxy31.send over nodeId=scm3,nodeAddress=scm3/172.20.0.7:9863 after 5 failover attempts. Trying to failover after sleeping for 2000ms.
> om1_1 | 2021-03-31 17:04:02,522 [main] INFO retry.RetryInvocationHandler: com.google.protobuf.ServiceException: java.net.ConnectException: Call From om1/172.20.0.4 to scm1:9863 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused, while invoking $Proxy31.send over nodeId=scm1,nodeAddress=scm1/172.20.0.8:9863 after 6 failover attempts. Trying to failover after sleeping for 2000ms.
> om1_1 | 2021-03-31 17:04:04,533 [main] INFO retry.RetryInvocationHandler: com.google.protobuf.ServiceException: java.net.ConnectException: Call From om1/172.20.0.4 to scm2:9863 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused, while invoking $Proxy31.send over nodeId=scm2,nodeAddress=scm2/172.20.0.6:9863 after 7 failover attempts. Trying to failover after sleeping for 2000ms.
> om1_1 | 2021-03-31 17:04:06,535 [main] INFO retry.RetryInvocationHandler: com.google.protobuf.ServiceException: java.net.ConnectException: Call From om1/172.20.0.4 to scm3:9863 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused, while invoking $Proxy31.send over nodeId=scm3,nodeAddress=scm3/172.20.0.7:9863 after 8 failover attempts. Trying to failover after sleeping for 2000ms.
> om1_1 | 2021-03-31 17:04:08,537 [main] INFO retry.RetryInvocationHandler: com.google.protobuf.ServiceException: java.net.ConnectException: Call From om1/172.20.0.4 to scm1:9863 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused, while invoking $Proxy31.send over nodeId=scm1,nodeAddress=scm1/172.20.0.8:9863 after 9 failover attempts. Trying to failover after sleeping for 2000ms.
> om1_1 | 2021-03-31 17:04:10,541 [main] INFO retry.RetryInvocationHandler: com.google.protobuf.ServiceException: java.net.ConnectException: Call From om1/172.20.0.4 to scm2:9863 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused, while invoking $Proxy31.send over nodeId=scm2,nodeAddress=scm2/172.20.0.6:9863 after 10 failover attempts. Trying to failover after sleeping for 2000ms.
> om1_1 | 2021-03-31 17:04:12,543 [main] INFO retry.RetryInvocationHandler: com.google.protobuf.ServiceException: java.net.ConnectException: Call From om1/172.20.0.4 to scm3:9863 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused, while invoking $Proxy31.send over nodeId=scm3,nodeAddress=scm3/172.20.0.7:9863 after 11 failover attempts. Trying to failover after sleeping for 2000ms.
> om1_1 | 2021-03-31 17:04:14,546 [main] INFO retry.RetryInvocationHandler: com.google.protobuf.ServiceException: java.net.ConnectException: Call From om1/172.20.0.4 to scm1:9863 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused, while invoking $Proxy31.send over nodeId=scm1,nodeAddress=scm1/172.20.0.8:9863 after 12 failover attempts. Trying to failover after sleeping for 2000ms.
> om1_1 | 2021-03-31 17:04:16,550 [main] INFO retry.RetryInvocationHandler: com.google.protobuf.ServiceException: java.net.ConnectException: Call From om1/172.20.0.4 to scm2:9863 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused, while invoking $Proxy31.send over nodeId=scm2,nodeAddress=scm2/172.20.0.6:9863 after 13 failover attempts. Trying to failover after sleeping for 2000ms.
> om1_1 | 2021-03-31 17:04:18,553 [main] INFO retry.RetryInvocationHandler: com.google.protobuf.ServiceException: java.net.ConnectException: Call From om1/172.20.0.4 to scm3:9863 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused, while invoking $Proxy31.send over nodeId=scm3,nodeAddress=scm3/172.20.0.7:9863 after 14 failover attempts. Trying to failover after sleeping for 2000ms.
> om1_1 | 2021-03-31 17:04:20,795 [main] ERROR om.OzoneManager: Could not initialize OM version file
> om1_1 | org.apache.hadoop.ipc.RemoteException(org.apache.ratis.protocol.exceptions.NotLeaderException): Server 9cb7a7ae-4c40-401c-b1c6-55728c1f0907@group-C35E1BD0DE21 is not the leader
> om1_1 | at org.apache.hadoop.hdds.scm.ha.SCMRatisServerImpl.triggerNotLeaderException(SCMRatisServerImpl.java:245)
> om1_1 | at org.apache.hadoop.hdds.scm.protocol.ScmBlockLocationProtocolServerSideTranslatorPB.send(ScmBlockLocationProtocolServerSideTranslatorPB.java:108)
> om1_1 | at org.apache.hadoop.hdds.protocol.proto.ScmBlockLocationProtocolProtos$ScmBlockLocationProtocolService$2.callBlockingMethod(ScmBlockLocationProtocolProtos.java:13874)
> om1_1 | at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:528)
> om1_1 | at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1086)
> om1_1 | at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1029)
> om1_1 | at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:957)
> om1_1 | at java.base/java.security.AccessController.doPrivileged(Native Method)
> om1_1 | at java.base/javax.security.auth.Subject.doAs(Subject.java:423)
> om1_1 | at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1762)
> om1_1 | at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2957)
> om1_1 |
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org