You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by "Ethan Rose (Jira)" <ji...@apache.org> on 2021/10/20 20:40:07 UTC

[jira] [Updated] (HDDS-3379) Clients unable to failover after the OzoneManager leader is restart in MiniOzoneChaosCluster

     [ https://issues.apache.org/jira/browse/HDDS-3379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ethan Rose updated HDDS-3379:
-----------------------------
    Target Version/s: 1.3.0  (was: 1.2.0)

I am managing the 1.2.0 release and we currently have more than 600 issues targeted for 1.2.0. I am moving the target field to 1.3.0.

If you are actively working on this jira and believe this should be targeted for the 1.2.0 release, Please reach out to me via Apache email or Slack.

> Clients unable to failover after the OzoneManager leader is restart in MiniOzoneChaosCluster
> --------------------------------------------------------------------------------------------
>
>                 Key: HDDS-3379
>                 URL: https://issues.apache.org/jira/browse/HDDS-3379
>             Project: Apache Ozone
>          Issue Type: Bug
>          Components: Ozone Manager
>            Reporter: Mukul Kumar Singh
>            Priority: Major
>              Labels: MiniOzoneChaosCluster, TriagePending
>
> Clients unable to failover after the OzoneManager leader is restart in MiniOzoneChaosCluster.
> This happens after the following restart events.
> {code}
> ➜  chaos-2020-04-11-21-51-52-IST egrep "iniOzoneHAClusterImp|Failures" complete.log
> 2020-04-11 21:52:08,296 [org.apache.hadoop.ozone.TestMiniChaosOzoneCluster.main()] INFO  ozone.MiniOzoneHAClusterImpl (MiniOzoneHAClusterImpl.java:createOMService(373)) - Started OzoneManager RPC server at localhost/127.0.0.1:10804
> 2020-04-11 21:52:08,387 [org.apache.hadoop.ozone.TestMiniChaosOzoneCluster.main()] INFO  ozone.MiniOzoneHAClusterImpl (MiniOzoneHAClusterImpl.java:createOMService(373)) - Started OzoneManager RPC server at localhost/127.0.0.1:10810
> 2020-04-11 21:52:08,485 [org.apache.hadoop.ozone.TestMiniChaosOzoneCluster.main()] INFO  ozone.MiniOzoneHAClusterImpl (MiniOzoneHAClusterImpl.java:createOMService(373)) - Started OzoneManager RPC server at localhost/127.0.0.1:10816
> 2020-04-11 21:52:22,845 [org.apache.hadoop.ozone.TestMiniChaosOzoneCluster.main()] INFO  failure.Failures (FailureManager.java:start(66)) - starting failure manager 60 60 SECONDS
> 2020-04-11 21:53:22,850 [pool-59-thread-1] INFO  failure.Failures (FailureManager.java:fail(56)) - time failure with OzoneManagerRestartFailure
> 2020-04-11 21:53:22,853 [pool-59-thread-1] INFO  ozone.MiniOzoneHAClusterImpl (MiniOzoneHAClusterImpl.java:shutdownOzoneManager(211)) - Shutting down OzoneManager omNode-3
> 2020-04-11 21:53:22,988 [pool-59-thread-1] INFO  ozone.MiniOzoneHAClusterImpl (MiniOzoneHAClusterImpl.java:restartOzoneManager(228)) - Restarting OzoneManager omNode-3
> 	at org.apache.hadoop.ozone.MiniOzoneHAClusterImpl.restartOzoneManager(MiniOzoneHAClusterImpl.java:229)
> 	at org.apache.hadoop.ozone.MiniOzoneHAClusterImpl.restartOzoneManager(MiniOzoneHAClusterImpl.java:223)
> 	at org.apache.hadoop.ozone.failure.Failures$OzoneManagerRestartFailure.lambda$fail$0(Failures.java:101)
> 	at org.apache.hadoop.ozone.failure.Failures$OzoneManagerRestartFailure.fail(Failures.java:98)
> 2020-04-11 21:54:22,849 [pool-59-thread-1] INFO  failure.Failures (FailureManager.java:fail(56)) - time failure with OzoneManagerRestartFailure
> 2020-04-11 21:54:22,850 [pool-59-thread-1] INFO  ozone.MiniOzoneHAClusterImpl (MiniOzoneHAClusterImpl.java:shutdownOzoneManager(211)) - Shutting down OzoneManager omNode-1
> 2020-04-11 21:54:22,895 [pool-59-thread-1] INFO  ozone.MiniOzoneHAClusterImpl (MiniOzoneHAClusterImpl.java:restartOzoneManager(228)) - Restarting OzoneManager omNode-1
> 	at org.apache.hadoop.ozone.MiniOzoneHAClusterImpl.restartOzoneManager(MiniOzoneHAClusterImpl.java:229)
> 	at org.apache.hadoop.ozone.MiniOzoneHAClusterImpl.restartOzoneManager(MiniOzoneHAClusterImpl.java:223)
> 	at org.apache.hadoop.ozone.failure.Failures$OzoneManagerRestartFailure.lambda$fail$0(Failures.java:101)
> 	at org.apache.hadoop.ozone.failure.Failures$OzoneManagerRestartFailure.fail(Failures.java:98)
> ➜  chaos-2020-04-11-21-51-52-IST
> {code}
> This results in the following exception.
> {code}
> 2020-04-11 21:54:24,201 [pool-360-thread-4] ERROR loadgenerators.LoadExecutors (LoadExecutors.java:load(67)) - FilesystemLoadGenerator LOADGEN: Exiting due to exception
> java.io.IOException: java.io.IOException: Could not determine or connect to OM Leader.
>         at org.apache.hadoop.ozone.client.io.KeyOutputStream.handleWrite(KeyOutputStream.java:229)
>         at org.apache.hadoop.ozone.client.io.KeyOutputStream.write(KeyOutputStream.java:199)
>         at org.apache.hadoop.fs.ozone.OzoneFSOutputStream.write(OzoneFSOutputStream.java:46)
>         at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:57)
>         at java.io.DataOutputStream.write(DataOutputStream.java:107)
>         at java.io.FilterOutputStream.write(FilterOutputStream.java:97)
>         at org.apache.hadoop.ozone.utils.LoadBucket$WriteOp.doPostOp(LoadBucket.java:176)
>         at org.apache.hadoop.ozone.utils.LoadBucket$Op.execute(LoadBucket.java:132)
>         at org.apache.hadoop.ozone.utils.LoadBucket$WriteOp.execute(LoadBucket.java:153)
>         at org.apache.hadoop.ozone.utils.LoadBucket.writeKey(LoadBucket.java:76)
>         at org.apache.hadoop.ozone.loadgenerators.FilesystemLoadGenerator.generateLoad(FilesystemLoadGenerator.java:47)
>         at org.apache.hadoop.ozone.loadgenerators.LoadExecutors.load(LoadExecutors.java:65)
>         at org.apache.hadoop.ozone.loadgenerators.LoadExecutors.lambda$startLoad$0(LoadExecutors.java:89)
>         at java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1626)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
> Caused by: java.io.IOException: Could not determine or connect to OM Leader.
>         at org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.submitRequest(OzoneManagerProtocolClientSideTranslatorPB.java:429)
>         at org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.allocateBlock(OzoneManagerProtocolClientSideTranslatorPB.java:843)
>         at sun.reflect.GeneratedMethodAccessor80.invoke(Unknown Source)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:498)
>         at org.apache.hadoop.hdds.tracing.TraceAllMethod.invoke(TraceAllMethod.java:71)
>         at com.sun.proxy.$Proxy65.allocateBlock(Unknown Source)
>         at org.apache.hadoop.ozone.client.io.BlockOutputStreamEntryPool.allocateNewBlock(BlockOutputStreamEntryPool.java:281)
>         at org.apache.hadoop.ozone.client.io.BlockOutputStreamEntryPool.allocateBlockIfNeeded(BlockOutputStreamEntryPool.java:327)
>         at org.apache.hadoop.ozone.client.io.KeyOutputStream.handleWrite(KeyOutputStream.java:208)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org