You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by "Bharat Viswanadham (Jira)" <ji...@apache.org> on 2021/06/08 07:44:00 UTC
[jira] [Commented] (HDDS-5317) BootStrapped SCM fails to bootstrap
if it connects to another bootstrapped SCM first.
[ https://issues.apache.org/jira/browse/HDDS-5317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17359107#comment-17359107 ]
Bharat Viswanadham commented on HDDS-5317:
------------------------------------------
*Exception:*
{code:java}
2021-06-07 13:58:14,062 ERROR org.apache.hadoop.hdds.scm.ha.HASecurityUtils: Error while fetching/storing SCM signed certificate.
org.apache.hadoop.hdds.security.exception.SCMSecurityException: Get SCM Certificate can be run only primary SCM
at org.apache.hadoop.hdds.protocolPB.SCMSecurityProtocolClientSideTranslatorPB.handleError(SCMSecurityProtocolClientSideTranslatorPB.java:126)
at org.apache.hadoop.hdds.protocolPB.SCMSecurityProtocolClientSideTranslatorPB.submitRequest(SCMSecurityProtocolClientSideTranslatorPB.java:108)
at org.apache.hadoop.hdds.protocolPB.SCMSecurityProtocolClientSideTranslatorPB.getSCMCertChain(SCMSecurityProtocolClientSideTranslatorPB.java:205)
at org.apache.hadoop.hdds.scm.ha.HASecurityUtils.getRootCASignedSCMCert(HASecurityUtils.java:149)
at org.apache.hadoop.hdds.scm.ha.HASecurityUtils.initializeSecurity(HASecurityUtils.java:102)
at org.apache.hadoop.hdds.scm.server.StorageContainerManager.scmBootstrap(StorageContainerManager.java:900)
at org.apache.hadoop.hdds.scm.server.StorageContainerManagerStarter$SCMStarterHelper.bootStrap(StorageContainerManagerStarter.java:179)
at org.apache.hadoop.hdds.scm.server.StorageContainerManagerStarter.bootStrapScm(StorageContainerManagerStarter.java:129)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at picocli.CommandLine.executeUserObject(CommandLine.java:1952)
at picocli.CommandLine.access$1100(CommandLine.java:145)
at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2332)
at picocli.CommandLine$RunLast.handle(CommandLine.java:2326)
at picocli.CommandLine$RunLast.handle(CommandLine.java:2291)
at picocli.CommandLine$AbstractParseResultHandler.handleParseResult(CommandLine.java:2152)
at picocli.CommandLine.parseWithHandlers(CommandLine.java:2530)
at picocli.CommandLine.parseWithHandler(CommandLine.java:2465)
at org.apache.hadoop.hdds.cli.GenericCli.execute(GenericCli.java:96)
at org.apache.hadoop.hdds.cli.GenericCli.run(GenericCli.java:87)
at org.apache.hadoop.hdds.scm.server.StorageContainerManagerStarter.main(StorageContainerManagerStarter.java:57)
2021-06-07 13:58:14,067 INFO org.apache.hadoop.hdds.scm.server.StorageContainerManagerStarter: SHUTDOWN_MSG:
{code}
> BootStrapped SCM fails to bootstrap if it connects to another bootstrapped SCM first.
> -------------------------------------------------------------------------------------
>
> Key: HDDS-5317
> URL: https://issues.apache.org/jira/browse/HDDS-5317
> Project: Apache Ozone
> Issue Type: Bug
> Components: SCM HA, Security
> Reporter: Bharat Viswanadham
> Assignee: Bharat Viswanadham
> Priority: Blocker
>
> GetSCMCertificate can happen non-leader SCM, as rootCA is only run on primary SCM.
> So, when an SCM is bootstrapped, let's say it connects first to a bootstrapped SCM, we fail with a SCMSecurityResponse with status set to NOT_A_PRIMARY_SCM. As we return with a response, failOver will not happen.
> *SCMSecurityProtocolClientSideTranslatorPB*
> {code:java}
> private SCMSecurityResponse handleError(SCMSecurityResponse resp)
> throws SCMSecurityException {
> if (resp.getStatus() != SCMSecurityProtocolProtos.Status.OK) {
> throw new SCMSecurityException(resp.getMessage(),
> SCMSecurityException.ErrorCode.values()[resp.getStatus().ordinal()]);
> }
> return resp;
> }
> {code}
> To solve this issue, one possible solution is on server check if it is SCMSecurityException with errorCode NOT_A_PRIMARY_SCM return a RetriableWithFailOverException. In this way, FailOverProxyProvider performs failOver and Retry to the next SCM.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org