You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by "Hanisha Koneru (Jira)" <ji...@apache.org> on 2021/08/06 16:43:00 UTC

[jira] [Commented] (HDDS-5547) Generation of raftgroupId should not depend on OM service id

    [ https://issues.apache.org/jira/browse/HDDS-5547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17394891#comment-17394891 ] 

Hanisha Koneru commented on HDDS-5547:
--------------------------------------

The idea behind using service Id for generating RaftGroupId was that all OMs need to know the RaftGroupId to join the ring. If we do not keep it configurable, we would need another protocol for OMs to talk to each other and have one OM as the primary initially which generates the RaftGroupId and propagates to other OMs.

> Generation of raftgroupId should not depend on OM service id
> ------------------------------------------------------------
>
>                 Key: HDDS-5547
>                 URL: https://issues.apache.org/jira/browse/HDDS-5547
>             Project: Apache Ozone
>          Issue Type: Bug
>            Reporter: Bharat Viswanadham
>            Assignee: Bharat Viswanadham
>            Priority: Major
>
> In OM HA, raftGroupID is generated from service ID.
> So, if there is a change in OM Service ID OM startup fails with below error
> {code:java}
> 2021-08-05 12:20:03,043 ERROR org.apache.hadoop.ozone.om.OzoneManagerStarter: OM start failed with exception
> java.io.IOException: java.lang.IllegalStateException: ILLEGAL TRANSITION: In OzoneManagerStateMachine:om1:group-8A65FD498CB6, RUNNING -> STARTING
>         at org.apache.ratis.util.IOUtils.asIOException(IOUtils.java:54)
>         at org.apache.ratis.util.IOUtils.toIOException(IOUtils.java:61)
>         at org.apache.ratis.util.IOUtils.getFromFuture(IOUtils.java:71)
>         at org.apache.ratis.server.impl.RaftServerProxy.getImpls(RaftServerProxy.java:354)
>         at org.apache.ratis.server.impl.RaftServerProxy.start(RaftServerProxy.java:371)
>         at org.apache.hadoop.ozone.om.ratis.OzoneManagerRatisServer.start(OzoneManagerRatisServer.java:390)
>         at org.apache.hadoop.ozone.om.OzoneManager.start(OzoneManager.java:1109)
>         at org.apache.hadoop.ozone.om.OzoneManagerStarter$OMStarterHelper.start(OzoneManagerStarter.java:126)
>         at org.apache.hadoop.ozone.om.OzoneManagerStarter.startOm(OzoneManagerStarter.java:79)
>         at org.apache.hadoop.ozone.om.OzoneManagerStarter.call(OzoneManagerStarter.java:67)
>         at org.apache.hadoop.ozone.om.OzoneManagerStarter.call(OzoneManagerStarter.java:38)
>         at picocli.CommandLine.executeUserObject(CommandLine.java:1933)
>         at picocli.CommandLine.access$1100(CommandLine.java:145)
>         at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2332)
>         at picocli.CommandLine$RunLast.handle(CommandLine.java:2326)
>         at picocli.CommandLine$RunLast.handle(CommandLine.java:2291)
>         at picocli.CommandLine$AbstractParseResultHandler.handleParseResult(CommandLine.java:2152)
>         at picocli.CommandLine.parseWithHandlers(CommandLine.java:2530)
>         at picocli.CommandLine.parseWithHandler(CommandLine.java:2465)
>         at org.apache.hadoop.hdds.cli.GenericCli.execute(GenericCli.java:96)
>         at org.apache.hadoop.hdds.cli.GenericCli.run(GenericCli.java:87)
>         at org.apache.hadoop.ozone.om.OzoneManagerStarter.main(OzoneManagerStarter.java:51)
> Caused by: java.lang.IllegalStateException: ILLEGAL TRANSITION: In OzoneManagerStateMachine:om1:group-8A65FD498CB6, RUNNING -> STARTING
>         at org.apache.ratis.util.Preconditions.assertTrue(Preconditions.java:60)
>         at org.apache.ratis.util.LifeCycle$State.validate(LifeCycle.java:121)
>         at org.apache.ratis.util.LifeCycle.transition(LifeCycle.java:164)
>         at org.apache.ratis.util.LifeCycle.startAndTransition(LifeCycle.java:268)
>         at org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.initialize(OzoneManagerStateMachine.java:127)
>         at org.apache.ratis.server.impl.ServerState.<init>(ServerState.java:120)
>         at org.apache.ratis.server.impl.RaftServerImpl.<init>(RaftServerImpl.java:193)
>         at org.apache.ratis.server.impl.RaftServerProxy.lambda$newRaftServerImpl$4(RaftServerProxy.java:266)
>         at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
> {code}
> One possible solution is
> If a ratis group dir already exists, use that as it is an existing cluster we cannot change. For new clusters might be we can use clusterID which does not change for a ozone cluster, in this way we shall be tolerant to service id config change.
> This is just one idea, we can discuss any other approaches to solve this issue and fix this.
> As right now, in OM we don't allow change of om service id



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org