You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by GitBox <gi...@apache.org> on 2021/06/08 08:18:07 UTC

[GitHub] [ozone] bharatviswa504 opened a new pull request #2312: HDDS-5317. BootStrapped SCM fails to bootstrap if it connects to another bootstrapped SCM first.

bharatviswa504 opened a new pull request #2312:
URL: https://github.com/apache/ozone/pull/2312


   ## What changes were proposed in this pull request?
   
   On SCM check if it is SCMSecurityException with errorCode NOT_A_PRIMARY_SCM return a RetriableWithFailOverException. In this way, FailOverProxyProvider performs failOver and Retry to the next SCM.
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/HDDS-5317
   
   ## How was this patch tested?
   
   Tested manually on docker-compose where changed the order of node ids to scm2,scm3,scm1
   And started SCM3, so it will connect to scm2, and see whether it is able to bootstrap or not.
   
   SCM3 connected to SCM2 and it is throwing RetriableWithFailOverException.
   ```
   scm2.org_1   | org.apache.hadoop.hdds.scm.ha.RetriableWithFailOverException: org.apache.hadoop.hdds.security.exception.SCMSecurityException: Get SCM Certificate can be run only primary SCM
   scm2.org_1   | 	at org.apache.hadoop.hdds.scm.ha.RatisUtil.checkRatisException(RatisUtil.java:206)
   scm2.org_1   | 	at org.apache.hadoop.hdds.scm.protocol.SCMSecurityProtocolServerSideTranslatorPB.processRequest(SCMSecurityProtocolServerSideTranslatorPB.java:157)
   scm2.org_1   | 	at org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:87)
   scm2.org_1   | 	at org.apache.hadoop.hdds.scm.protocol.SCMSecurityProtocolServerSideTranslatorPB.submitRequest(SCMSecurityProtocolServerSideTranslatorPB.java:97)
   scm2.org_1   | 	at org.apache.hadoop.hdds.protocol.proto.SCMSecurityProtocolProtos$SCMSecurityProtocolService$2.callBlockingMethod(SCMSecurityProtocolProtos.java:15124)
   scm2.org_1   | 	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:528)
   scm2.org_1   | 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1086)
   scm2.org_1   | 	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1029)
   scm2.org_1   | 	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:957)
   scm2.org_1   | 	at java.base/java.security.AccessController.doPrivileged(Native Method)
   scm2.org_1   | 	at java.base/javax.security.auth.Subject.doAs(Subject.java:423)
   scm2.org_1   | 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1762)
   scm2.org_1   | 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2957)
   scm2.org_1   | Caused by: org.apache.hadoop.hdds.security.exception.SCMSecurityException: Get SCM Certificate can be run only primary SCM
   scm2.org_1   | 	at org.apache.hadoop.hdds.scm.server.SCMSecurityProtocolServer.getSCMCertificate(SCMSecurityProtocolServer.java:200)
   scm2.org_1   | 	at org.apache.hadoop.hdds.scm.protocol.SCMSecurityProtocolServerSideTranslatorPB.getSCMCertificate(SCMSecurityProtocolServerSideTranslatorPB.java:228)
   scm2.org_1   | 	at org.apache.hadoop.hdds.scm.protocol.SCMSecurityProtocolServerSideTranslatorPB.processRequest(SCMSecurityProtocolServerSideTranslatorPB.java:127)
   scm2.org_1   | 	... 11 more
   ```
   
   SCM3 bootstrap is successful.
   ```
   scm3.org_1   | 2021-06-08 08:11:53,076 [main] INFO server.StorageContainerManager: SCM BootStrap  is successful for ClusterID CID-74d4b242-a5d7-4b07-8677-f75f0207c0e8, SCMID d7a4c94b-423a-45ae-b04a-9474584206d1
   scm3.org_1   | 2021-06-08 08:11:53,076 [main] INFO server.StorageContainerManager: Primary SCM Node ID 4f54d4de-8942-47b0-a88e-99e5d1bbcad7
   scm3.org_1   | 2021-06-08 08:11:53,086 [shutdown-hook-0] INFO server.StorageContainerManagerStarter: SHUTDOWN_MSG:
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bharatviswa504 merged pull request #2312: HDDS-5317. BootStrapped SCM fails to bootstrap if it connects to another bootstrapped SCM first.

Posted by GitBox <gi...@apache.org>.
bharatviswa504 merged pull request #2312:
URL: https://github.com/apache/ozone/pull/2312


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bharatviswa504 commented on pull request #2312: HDDS-5317. BootStrapped SCM fails to bootstrap if it connects to another bootstrapped SCM first.

Posted by GitBox <gi...@apache.org>.
bharatviswa504 commented on pull request #2312:
URL: https://github.com/apache/ozone/pull/2312#issuecomment-858284939


   Thank You @xiaoyuyao for the review.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bharatviswa504 commented on pull request #2312: HDDS-5317. BootStrapped SCM fails to bootstrap if it connects to another bootstrapped SCM first.

Posted by GitBox <gi...@apache.org>.
bharatviswa504 commented on pull request #2312:
URL: https://github.com/apache/ozone/pull/2312#issuecomment-858284761


   The latest commit is, just a code comment change and does not change any code.
   As the previous run has clean CI, proceeding with commit.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] xiaoyuyao commented on a change in pull request #2312: HDDS-5317. BootStrapped SCM fails to bootstrap if it connects to another bootstrapped SCM first.

Posted by GitBox <gi...@apache.org>.
xiaoyuyao commented on a change in pull request #2312:
URL: https://github.com/apache/ozone/pull/2312#discussion_r648564809



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/ha/RatisUtil.java
##########
@@ -190,6 +195,23 @@ public static void checkRatisException(IOException e, String port,
       throw new ServiceException(ServerNotLeaderException
           .convertToNotLeaderException(nle,
               SCMRatisServerImpl.getSelfPeerId(scmId), port));
+    } else if (e instanceof SCMSecurityException) {
+      // For this error client needs to retry on next SCM.

Review comment:
       Can you be more specific, change "this error" to "NOT_A_PRIMARY_SCM error"




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org