You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by "Mukul Kumar Singh (Jira)" <ji...@apache.org> on 2020/04/12 03:15:00 UTC

[jira] [Created] (HDDS-3379) Clients unable to failover after the OzoneManager leader is restart in MiniOzoneChaosCluster

Mukul Kumar Singh created HDDS-3379:
---------------------------------------

             Summary: Clients unable to failover after the OzoneManager leader is restart in MiniOzoneChaosCluster
                 Key: HDDS-3379
                 URL: https://issues.apache.org/jira/browse/HDDS-3379
             Project: Hadoop Distributed Data Store
          Issue Type: Bug
          Components: Ozone Manager
            Reporter: Mukul Kumar Singh


Clients unable to failover after the OzoneManager leader is restart in MiniOzoneChaosCluster.

This happens after the following restart events.

{code}
➜  chaos-2020-04-11-21-51-52-IST egrep "iniOzoneHAClusterImp|Failures" complete.log
2020-04-11 21:52:08,296 [org.apache.hadoop.ozone.TestMiniChaosOzoneCluster.main()] INFO  ozone.MiniOzoneHAClusterImpl (MiniOzoneHAClusterImpl.java:createOMService(373)) - Started OzoneManager RPC server at localhost/127.0.0.1:10804
2020-04-11 21:52:08,387 [org.apache.hadoop.ozone.TestMiniChaosOzoneCluster.main()] INFO  ozone.MiniOzoneHAClusterImpl (MiniOzoneHAClusterImpl.java:createOMService(373)) - Started OzoneManager RPC server at localhost/127.0.0.1:10810
2020-04-11 21:52:08,485 [org.apache.hadoop.ozone.TestMiniChaosOzoneCluster.main()] INFO  ozone.MiniOzoneHAClusterImpl (MiniOzoneHAClusterImpl.java:createOMService(373)) - Started OzoneManager RPC server at localhost/127.0.0.1:10816
2020-04-11 21:52:22,845 [org.apache.hadoop.ozone.TestMiniChaosOzoneCluster.main()] INFO  failure.Failures (FailureManager.java:start(66)) - starting failure manager 60 60 SECONDS
2020-04-11 21:53:22,850 [pool-59-thread-1] INFO  failure.Failures (FailureManager.java:fail(56)) - time failure with OzoneManagerRestartFailure
2020-04-11 21:53:22,853 [pool-59-thread-1] INFO  ozone.MiniOzoneHAClusterImpl (MiniOzoneHAClusterImpl.java:shutdownOzoneManager(211)) - Shutting down OzoneManager omNode-3
2020-04-11 21:53:22,988 [pool-59-thread-1] INFO  ozone.MiniOzoneHAClusterImpl (MiniOzoneHAClusterImpl.java:restartOzoneManager(228)) - Restarting OzoneManager omNode-3
	at org.apache.hadoop.ozone.MiniOzoneHAClusterImpl.restartOzoneManager(MiniOzoneHAClusterImpl.java:229)
	at org.apache.hadoop.ozone.MiniOzoneHAClusterImpl.restartOzoneManager(MiniOzoneHAClusterImpl.java:223)
	at org.apache.hadoop.ozone.failure.Failures$OzoneManagerRestartFailure.lambda$fail$0(Failures.java:101)
	at org.apache.hadoop.ozone.failure.Failures$OzoneManagerRestartFailure.fail(Failures.java:98)
2020-04-11 21:54:22,849 [pool-59-thread-1] INFO  failure.Failures (FailureManager.java:fail(56)) - time failure with OzoneManagerRestartFailure
2020-04-11 21:54:22,850 [pool-59-thread-1] INFO  ozone.MiniOzoneHAClusterImpl (MiniOzoneHAClusterImpl.java:shutdownOzoneManager(211)) - Shutting down OzoneManager omNode-1
2020-04-11 21:54:22,895 [pool-59-thread-1] INFO  ozone.MiniOzoneHAClusterImpl (MiniOzoneHAClusterImpl.java:restartOzoneManager(228)) - Restarting OzoneManager omNode-1
	at org.apache.hadoop.ozone.MiniOzoneHAClusterImpl.restartOzoneManager(MiniOzoneHAClusterImpl.java:229)
	at org.apache.hadoop.ozone.MiniOzoneHAClusterImpl.restartOzoneManager(MiniOzoneHAClusterImpl.java:223)
	at org.apache.hadoop.ozone.failure.Failures$OzoneManagerRestartFailure.lambda$fail$0(Failures.java:101)
	at org.apache.hadoop.ozone.failure.Failures$OzoneManagerRestartFailure.fail(Failures.java:98)
➜  chaos-2020-04-11-21-51-52-IST
{code}


This results in the following exception.

{code}
2020-04-11 21:54:24,201 [pool-360-thread-4] ERROR loadgenerators.LoadExecutors (LoadExecutors.java:load(67)) - FilesystemLoadGenerator LOADGEN: Exiting due to exception
java.io.IOException: java.io.IOException: Could not determine or connect to OM Leader.
        at org.apache.hadoop.ozone.client.io.KeyOutputStream.handleWrite(KeyOutputStream.java:229)
        at org.apache.hadoop.ozone.client.io.KeyOutputStream.write(KeyOutputStream.java:199)
        at org.apache.hadoop.fs.ozone.OzoneFSOutputStream.write(OzoneFSOutputStream.java:46)
        at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:57)
        at java.io.DataOutputStream.write(DataOutputStream.java:107)
        at java.io.FilterOutputStream.write(FilterOutputStream.java:97)
        at org.apache.hadoop.ozone.utils.LoadBucket$WriteOp.doPostOp(LoadBucket.java:176)
        at org.apache.hadoop.ozone.utils.LoadBucket$Op.execute(LoadBucket.java:132)
        at org.apache.hadoop.ozone.utils.LoadBucket$WriteOp.execute(LoadBucket.java:153)
        at org.apache.hadoop.ozone.utils.LoadBucket.writeKey(LoadBucket.java:76)
        at org.apache.hadoop.ozone.loadgenerators.FilesystemLoadGenerator.generateLoad(FilesystemLoadGenerator.java:47)
        at org.apache.hadoop.ozone.loadgenerators.LoadExecutors.load(LoadExecutors.java:65)
        at org.apache.hadoop.ozone.loadgenerators.LoadExecutors.lambda$startLoad$0(LoadExecutors.java:89)
        at java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1626)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: Could not determine or connect to OM Leader.
        at org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.submitRequest(OzoneManagerProtocolClientSideTranslatorPB.java:429)
        at org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.allocateBlock(OzoneManagerProtocolClientSideTranslatorPB.java:843)
        at sun.reflect.GeneratedMethodAccessor80.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.hdds.tracing.TraceAllMethod.invoke(TraceAllMethod.java:71)
        at com.sun.proxy.$Proxy65.allocateBlock(Unknown Source)
        at org.apache.hadoop.ozone.client.io.BlockOutputStreamEntryPool.allocateNewBlock(BlockOutputStreamEntryPool.java:281)
        at org.apache.hadoop.ozone.client.io.BlockOutputStreamEntryPool.allocateBlockIfNeeded(BlockOutputStreamEntryPool.java:327)
        at org.apache.hadoop.ozone.client.io.KeyOutputStream.handleWrite(KeyOutputStream.java:208)
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org