You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by "Arpit Agarwal (Jira)" <ji...@apache.org> on 2020/06/01 21:51:00 UTC

[jira] [Updated] (HDDS-3379) Clients unable to failover after the OzoneManager leader is restart in MiniOzoneChaosCluster

     [ https://issues.apache.org/jira/browse/HDDS-3379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arpit Agarwal updated HDDS-3379:
--------------------------------
    Target Version/s: 0.6.0
              Labels: MiniOzoneChaosCluster TriagePending  (was: MiniOzoneChaosCluster)

> Clients unable to failover after the OzoneManager leader is restart in MiniOzoneChaosCluster
> --------------------------------------------------------------------------------------------
>
>                 Key: HDDS-3379
>                 URL: https://issues.apache.org/jira/browse/HDDS-3379
>             Project: Hadoop Distributed Data Store
>          Issue Type: Bug
>          Components: Ozone Manager
>            Reporter: Mukul Kumar Singh
>            Priority: Major
>              Labels: MiniOzoneChaosCluster, TriagePending
>
> Clients unable to failover after the OzoneManager leader is restart in MiniOzoneChaosCluster.
> This happens after the following restart events.
> {code}
> ➜  chaos-2020-04-11-21-51-52-IST egrep "iniOzoneHAClusterImp|Failures" complete.log
> 2020-04-11 21:52:08,296 [org.apache.hadoop.ozone.TestMiniChaosOzoneCluster.main()] INFO  ozone.MiniOzoneHAClusterImpl (MiniOzoneHAClusterImpl.java:createOMService(373)) - Started OzoneManager RPC server at localhost/127.0.0.1:10804
> 2020-04-11 21:52:08,387 [org.apache.hadoop.ozone.TestMiniChaosOzoneCluster.main()] INFO  ozone.MiniOzoneHAClusterImpl (MiniOzoneHAClusterImpl.java:createOMService(373)) - Started OzoneManager RPC server at localhost/127.0.0.1:10810
> 2020-04-11 21:52:08,485 [org.apache.hadoop.ozone.TestMiniChaosOzoneCluster.main()] INFO  ozone.MiniOzoneHAClusterImpl (MiniOzoneHAClusterImpl.java:createOMService(373)) - Started OzoneManager RPC server at localhost/127.0.0.1:10816
> 2020-04-11 21:52:22,845 [org.apache.hadoop.ozone.TestMiniChaosOzoneCluster.main()] INFO  failure.Failures (FailureManager.java:start(66)) - starting failure manager 60 60 SECONDS
> 2020-04-11 21:53:22,850 [pool-59-thread-1] INFO  failure.Failures (FailureManager.java:fail(56)) - time failure with OzoneManagerRestartFailure
> 2020-04-11 21:53:22,853 [pool-59-thread-1] INFO  ozone.MiniOzoneHAClusterImpl (MiniOzoneHAClusterImpl.java:shutdownOzoneManager(211)) - Shutting down OzoneManager omNode-3
> 2020-04-11 21:53:22,988 [pool-59-thread-1] INFO  ozone.MiniOzoneHAClusterImpl (MiniOzoneHAClusterImpl.java:restartOzoneManager(228)) - Restarting OzoneManager omNode-3
> 	at org.apache.hadoop.ozone.MiniOzoneHAClusterImpl.restartOzoneManager(MiniOzoneHAClusterImpl.java:229)
> 	at org.apache.hadoop.ozone.MiniOzoneHAClusterImpl.restartOzoneManager(MiniOzoneHAClusterImpl.java:223)
> 	at org.apache.hadoop.ozone.failure.Failures$OzoneManagerRestartFailure.lambda$fail$0(Failures.java:101)
> 	at org.apache.hadoop.ozone.failure.Failures$OzoneManagerRestartFailure.fail(Failures.java:98)
> 2020-04-11 21:54:22,849 [pool-59-thread-1] INFO  failure.Failures (FailureManager.java:fail(56)) - time failure with OzoneManagerRestartFailure
> 2020-04-11 21:54:22,850 [pool-59-thread-1] INFO  ozone.MiniOzoneHAClusterImpl (MiniOzoneHAClusterImpl.java:shutdownOzoneManager(211)) - Shutting down OzoneManager omNode-1
> 2020-04-11 21:54:22,895 [pool-59-thread-1] INFO  ozone.MiniOzoneHAClusterImpl (MiniOzoneHAClusterImpl.java:restartOzoneManager(228)) - Restarting OzoneManager omNode-1
> 	at org.apache.hadoop.ozone.MiniOzoneHAClusterImpl.restartOzoneManager(MiniOzoneHAClusterImpl.java:229)
> 	at org.apache.hadoop.ozone.MiniOzoneHAClusterImpl.restartOzoneManager(MiniOzoneHAClusterImpl.java:223)
> 	at org.apache.hadoop.ozone.failure.Failures$OzoneManagerRestartFailure.lambda$fail$0(Failures.java:101)
> 	at org.apache.hadoop.ozone.failure.Failures$OzoneManagerRestartFailure.fail(Failures.java:98)
> ➜  chaos-2020-04-11-21-51-52-IST
> {code}
> This results in the following exception.
> {code}
> 2020-04-11 21:54:24,201 [pool-360-thread-4] ERROR loadgenerators.LoadExecutors (LoadExecutors.java:load(67)) - FilesystemLoadGenerator LOADGEN: Exiting due to exception
> java.io.IOException: java.io.IOException: Could not determine or connect to OM Leader.
>         at org.apache.hadoop.ozone.client.io.KeyOutputStream.handleWrite(KeyOutputStream.java:229)
>         at org.apache.hadoop.ozone.client.io.KeyOutputStream.write(KeyOutputStream.java:199)
>         at org.apache.hadoop.fs.ozone.OzoneFSOutputStream.write(OzoneFSOutputStream.java:46)
>         at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:57)
>         at java.io.DataOutputStream.write(DataOutputStream.java:107)
>         at java.io.FilterOutputStream.write(FilterOutputStream.java:97)
>         at org.apache.hadoop.ozone.utils.LoadBucket$WriteOp.doPostOp(LoadBucket.java:176)
>         at org.apache.hadoop.ozone.utils.LoadBucket$Op.execute(LoadBucket.java:132)
>         at org.apache.hadoop.ozone.utils.LoadBucket$WriteOp.execute(LoadBucket.java:153)
>         at org.apache.hadoop.ozone.utils.LoadBucket.writeKey(LoadBucket.java:76)
>         at org.apache.hadoop.ozone.loadgenerators.FilesystemLoadGenerator.generateLoad(FilesystemLoadGenerator.java:47)
>         at org.apache.hadoop.ozone.loadgenerators.LoadExecutors.load(LoadExecutors.java:65)
>         at org.apache.hadoop.ozone.loadgenerators.LoadExecutors.lambda$startLoad$0(LoadExecutors.java:89)
>         at java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1626)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
> Caused by: java.io.IOException: Could not determine or connect to OM Leader.
>         at org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.submitRequest(OzoneManagerProtocolClientSideTranslatorPB.java:429)
>         at org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.allocateBlock(OzoneManagerProtocolClientSideTranslatorPB.java:843)
>         at sun.reflect.GeneratedMethodAccessor80.invoke(Unknown Source)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:498)
>         at org.apache.hadoop.hdds.tracing.TraceAllMethod.invoke(TraceAllMethod.java:71)
>         at com.sun.proxy.$Proxy65.allocateBlock(Unknown Source)
>         at org.apache.hadoop.ozone.client.io.BlockOutputStreamEntryPool.allocateNewBlock(BlockOutputStreamEntryPool.java:281)
>         at org.apache.hadoop.ozone.client.io.BlockOutputStreamEntryPool.allocateBlockIfNeeded(BlockOutputStreamEntryPool.java:327)
>         at org.apache.hadoop.ozone.client.io.KeyOutputStream.handleWrite(KeyOutputStream.java:208)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org