You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by "Bharat Viswanadham (Jira)" <ji...@apache.org> on 2021/05/11 11:02:00 UTC

[jira] [Resolved] (HDDS-5200) Fix scm roles command if one of the host is unresolvable

     [ https://issues.apache.org/jira/browse/HDDS-5200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bharat Viswanadham resolved HDDS-5200.
--------------------------------------
    Fix Version/s: 1.2.0
       Resolution: Fixed

> Fix scm roles command if one of the host is unresolvable
> --------------------------------------------------------
>
>                 Key: HDDS-5200
>                 URL: https://issues.apache.org/jira/browse/HDDS-5200
>             Project: Apache Ozone
>          Issue Type: Sub-task
>          Components: SCM HA
>            Reporter: Bharat Viswanadham
>            Assignee: Bharat Viswanadham
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.2.0
>
>
> {code:java}
> while invoking $Proxy19.submitRequest over nodeId=scm3,nodeAddress=scm3/172.19.0.7:9860 after 8 failover attempts. Trying to failover after sleeping for 2000ms.
> com.google.protobuf.ServiceException: java.net.UnknownHostException: Invalid host name: local host is: (unknown); destination host is: "scm1":9860; java.net.UnknownHostException; For more details see:  http://wiki.apache.org/hadoop/UnknownHost, while invoking $Proxy19.submitRequest over nodeId=scm1,nodeAddress=scm1:9860 after 9 failover attempts. Trying to failover after sleeping for 2000ms.
> com.google.protobuf.ServiceException: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdds.ratis.ServerNotLeaderException): Server:7fc05067-c144-4c14-880b-e47f1e40599b is not the leader. Suggested leader is Server:scm3:9860.
> 	at org.apache.hadoop.hdds.ratis.ServerNotLeaderException.convertToNotLeaderException(ServerNotLeaderException.java:106)
> 	at org.apache.hadoop.hdds.scm.ha.RatisUtil.checkRatisException(RatisUtil.java:191)
> 	at org.apache.hadoop.hdds.scm.protocol.StorageContainerLocationProtocolServerSideTranslatorPB.submitRequest(StorageContainerLocationProtocolServerSideTranslatorPB.java:144)
> 	at org.apache.hadoop.hdds.protocol.proto.StorageContainerLocationProtocolProtos$StorageContainerLocationProtocolService$2.callBlockingMethod(StorageContainerLocationProtocolProtos.java:43838)
> 	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:528)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1086)
> 	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1029)
> 	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:957)
> 	at java.base/java.security.AccessController.doPrivileged(Native Method)
> 	at java.base/javax.security.auth.Subject.doAs(Subject.java:423)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1762)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2957)
> , while invoking $Proxy19.submitRequest over nodeId=scm2,nodeAddress=scm2/172.19.0.2:9860 after 10 failover attempts. Trying to failover after sleeping for 2000ms.
> com.google.protobuf.ServiceException: org.apache.hadoop.ipc.RemoteException(java.net.UnknownHostException): scm1
> 	at java.base/java.net.InetAddress$CachedAddresses.get(InetAddress.java:797)
> 	at java.base/java.net.InetAddress.getAllByName0(InetAddress.java:1509)
> 	at java.base/java.net.InetAddress.getAllByName(InetAddress.java:1368)
> 	at java.base/java.net.InetAddress.getAllByName(InetAddress.java:1302)
> 	at java.base/java.net.InetAddress.getByName(InetAddress.java:1252)
> 	at org.apache.hadoop.hdds.scm.ha.SCMRatisServerImpl.getRatisRoles(SCMRatisServerImpl.java:233)
> 	at org.apache.hadoop.hdds.scm.server.SCMClientProtocolServer.getScmInfo(SCMClientProtocolServer.java:579)
> 	at org.apache.hadoop.hdds.scm.protocol.StorageContainerLocationProtocolServerSideTranslatorPB.getScmInfo(StorageContainerLocationProtocolServerSideTranslatorPB.java:506)
> 	at org.apache.hadoop.hdds.scm.protocol.StorageContainerLocationProtocolServerSideTranslatorPB.processRequest(StorageContainerLocationProtocolServerSideTranslatorPB.java:249)
> 	at org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:87)
> 	at org.apache.hadoop.hdds.scm.protocol.StorageContainerLocationProtocolServerSideTranslatorPB.submitRequest(StorageContainerLocationProtocolServerSideTranslatorPB.java:149)
> 	at org.apache.hadoop.hdds.protocol.proto.StorageContainerLocationProtocolProtos$StorageContainerLocationProtocolService$2.callBlockingMethod(StorageContainerLocationProtocolProtos.java:43838)
> 	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:528)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1086)
> 	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1029)
> 	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:957)
> 	at java.base/java.security.AccessController.doPrivileged(Native Method)
> 	at java.base/javax.security.auth.Subject.doAs(Subject.java:423)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1762)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2957)
> , while invoking $Proxy19.submitRequest over nodeId=scm3,nodeAddress=scm3/172.19.0.7:9860 after 11 failover attempts. Trying to failover after sleeping for 2000ms.
> {code}
> If one of the host is unresolvable roles command keep on failing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org