You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by "Bharat Viswanadham (Jira)" <ji...@apache.org> on 2021/08/06 07:29:00 UTC

[jira] [Updated] (HDDS-5547) Generation of raftgroupId should not depend on OM service id

     [ https://issues.apache.org/jira/browse/HDDS-5547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bharat Viswanadham updated HDDS-5547:
-------------------------------------
    Summary: Generation of raftgroupId should not depend on OM service id  (was: Generation of raftgroupId should not depend on service id)

> Generation of raftgroupId should not depend on OM service id
> ------------------------------------------------------------
>
>                 Key: HDDS-5547
>                 URL: https://issues.apache.org/jira/browse/HDDS-5547
>             Project: Apache Ozone
>          Issue Type: Bug
>            Reporter: Bharat Viswanadham
>            Assignee: Bharat Viswanadham
>            Priority: Major
>
> In OM HA, raftGroupID is generated from service ID.
> So, if there is a change in OM Service ID OM startup fails with below error
> {code:java}
> 2021-08-05 12:20:03,043 ERROR org.apache.hadoop.ozone.om.OzoneManagerStarter: OM start failed with exception
> java.io.IOException: java.lang.IllegalStateException: ILLEGAL TRANSITION: In OzoneManagerStateMachine:om1:group-8A65FD498CB6, RUNNING -> STARTING
>         at org.apache.ratis.util.IOUtils.asIOException(IOUtils.java:54)
>         at org.apache.ratis.util.IOUtils.toIOException(IOUtils.java:61)
>         at org.apache.ratis.util.IOUtils.getFromFuture(IOUtils.java:71)
>         at org.apache.ratis.server.impl.RaftServerProxy.getImpls(RaftServerProxy.java:354)
>         at org.apache.ratis.server.impl.RaftServerProxy.start(RaftServerProxy.java:371)
>         at org.apache.hadoop.ozone.om.ratis.OzoneManagerRatisServer.start(OzoneManagerRatisServer.java:390)
>         at org.apache.hadoop.ozone.om.OzoneManager.start(OzoneManager.java:1109)
>         at org.apache.hadoop.ozone.om.OzoneManagerStarter$OMStarterHelper.start(OzoneManagerStarter.java:126)
>         at org.apache.hadoop.ozone.om.OzoneManagerStarter.startOm(OzoneManagerStarter.java:79)
>         at org.apache.hadoop.ozone.om.OzoneManagerStarter.call(OzoneManagerStarter.java:67)
>         at org.apache.hadoop.ozone.om.OzoneManagerStarter.call(OzoneManagerStarter.java:38)
>         at picocli.CommandLine.executeUserObject(CommandLine.java:1933)
>         at picocli.CommandLine.access$1100(CommandLine.java:145)
>         at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2332)
>         at picocli.CommandLine$RunLast.handle(CommandLine.java:2326)
>         at picocli.CommandLine$RunLast.handle(CommandLine.java:2291)
>         at picocli.CommandLine$AbstractParseResultHandler.handleParseResult(CommandLine.java:2152)
>         at picocli.CommandLine.parseWithHandlers(CommandLine.java:2530)
>         at picocli.CommandLine.parseWithHandler(CommandLine.java:2465)
>         at org.apache.hadoop.hdds.cli.GenericCli.execute(GenericCli.java:96)
>         at org.apache.hadoop.hdds.cli.GenericCli.run(GenericCli.java:87)
>         at org.apache.hadoop.ozone.om.OzoneManagerStarter.main(OzoneManagerStarter.java:51)
> Caused by: java.lang.IllegalStateException: ILLEGAL TRANSITION: In OzoneManagerStateMachine:om1:group-8A65FD498CB6, RUNNING -> STARTING
>         at org.apache.ratis.util.Preconditions.assertTrue(Preconditions.java:60)
>         at org.apache.ratis.util.LifeCycle$State.validate(LifeCycle.java:121)
>         at org.apache.ratis.util.LifeCycle.transition(LifeCycle.java:164)
>         at org.apache.ratis.util.LifeCycle.startAndTransition(LifeCycle.java:268)
>         at org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.initialize(OzoneManagerStateMachine.java:127)
>         at org.apache.ratis.server.impl.ServerState.<init>(ServerState.java:120)
>         at org.apache.ratis.server.impl.RaftServerImpl.<init>(RaftServerImpl.java:193)
>         at org.apache.ratis.server.impl.RaftServerProxy.lambda$newRaftServerImpl$4(RaftServerProxy.java:266)
>         at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
> {code}
> One possible solution is
> If a ratis group dir already exists, use that as it is an existing cluster we cannot change. For new clusters might be we can use clusterID which does not change for a ozone cluster, in this way we shall be tolerant to service id config change.
> This is just one idea, we can discuss any other approaches to solve this issue and fix this.
> As right now, in OM we don't allow change of om service id



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org