You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@qpid.apache.org by "Alex Rudyy (JIRA)" <ji...@apache.org> on 2016/02/17 16:59:18 UTC

[jira] [Updated] (QPID-7078) [Java Broker,HA] BDB HA VHN in master role designated as primary can sporadically transit into unknown role after loosing second replica node

     [ https://issues.apache.org/jira/browse/QPID-7078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alex Rudyy updated QPID-7078:
-----------------------------
    Summary: [Java Broker,HA] BDB HA VHN in master role designated as primary can sporadically transit into unknown role after loosing second replica node  (was: [Java Broker,HA] BDB HA VHN in master role designated as primary can sporadically tramsit into unknown role after loosing second replica node)

> [Java Broker,HA] BDB HA VHN in master role designated as primary can sporadically transit into unknown role after loosing second replica node
> ---------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: QPID-7078
>                 URL: https://issues.apache.org/jira/browse/QPID-7078
>             Project: Qpid
>          Issue Type: Bug
>          Components: Java Broker
>    Affects Versions: 0.32, qpid-java-6.0, qpid-java-6.0.1, qpid-java-6.1
>            Reporter: Alex Rudyy
>         Attachments: TEST-org.apache.qpid.server.store.berkeleydb.replication.TwoNodeTest.testDesignatedPrimaryContinuesAfterSecondaryStopped.txt
>
>
> Failure of test TwoNodeTest#testDesignatedPrimaryContinuesAfterSecondaryStopped reviled an unexpected behavior of  BDB JE when master node designated as primary suddenly transits into UNKNOWN role after shutting down of second replica node.
> The test failed as below:
> {noformat}
> testDesignatedPrimaryContinuesAfterSecondaryStopped(org.apache.qpid.server.store.berkeleydb.replication.TwoNodeTest)  Time elapsed: 7.236 sec  <<< ERROR!
> javax.jms.JMSException: Error registering consumer: org.apache.qpid.QpidException: Fail-over exception interrupted basic consume.
> 	at org.apache.qpid.client.AMQSession.registerConsumer(AMQSession.java:3093)
> 	at org.apache.qpid.client.AMQSession.access$400(AMQSession.java:94)
> 	at org.apache.qpid.client.AMQSession$5.execute(AMQSession.java:2094)
> 	at org.apache.qpid.client.AMQSession$5.execute(AMQSession.java:2069)
> 	at org.apache.qpid.client.AMQConnectionDelegate_8_0.executeRetrySupport(AMQConnectionDelegate_8_0.java:416)
> 	at org.apache.qpid.client.AMQConnection.executeRetrySupport(AMQConnection.java:737)
> 	at org.apache.qpid.client.failover.FailoverRetrySupport.execute(FailoverRetrySupport.java:90)
> 	at org.apache.qpid.client.AMQSession.createConsumerImpl(AMQSession.java:2067)
> 	at org.apache.qpid.client.AMQSession.createConsumer(AMQSession.java:989)
> 	at org.apache.qpid.client.AMQConnection.retrieveVirtualHostPropertiesIfNecessary(AMQConnection.java:809)
> 	at org.apache.qpid.client.AMQConnection.createSession(AMQConnection.java:796)
> 	at org.apache.qpid.client.AMQConnection.createSession(AMQConnection.java:771)
> 	at org.apache.qpid.client.AMQConnection.createSession(AMQConnection.java:765)
> 	at org.apache.qpid.client.AMQConnection.createSession(AMQConnection.java:88)
> 	at org.apache.qpid.test.utils.QpidBrokerTestCase.assertProducingConsuming(QpidBrokerTestCase.java:1256)
> 	at org.apache.qpid.server.store.berkeleydb.replication.TwoNodeTest.testDesignatedPrimaryContinuesAfterSecondaryStopped(TwoNodeTest.java:108)
> Caused by: org.apache.qpid.client.failover.FailoverException: Failing over about to start
> 	at org.apache.qpid.client.AMQProtocolHandler.notifyFailoverStarting(AMQProtocolHandler.java:434)
> 	at org.apache.qpid.client.AMQProtocolHandler$1.run(AMQProtocolHandler.java:287)
> 	at java.lang.Thread.run(Thread.java:745)
> {noformat}
> On broker side a transition into UNKNOWN state occurred as below:
> {noformat}
> 10:15:44,279 B-10000 DEBUG [Group-Change-Learner:test:nodetestDesignatedPrimaryContinuesAfterSecondaryStopped10001] o.a.q.s.s.b.r.DatabasePinger Ping transaction completed
> 10:15:44,279 B-10000 DEBUG [IO-/127.0.0.1:58662] o.a.q.s.p.v.BrokerDecoder Frame handled in 1344 ms.
> 10:15:44,279 B-10000 INFO  [MASTER nodetestDesignatedPrimaryContinuesAfterSecondaryStopped10001(1)] o.a.q.s.s.b.r.ReplicatedEnvironmentFacade The node 'test:nodetestDesignatedPrimaryContinuesAfterSecondaryStopped10001' state is UNKNOWN
> 10:15:44,279 B-10000 DEBUG [StateChange-test:nodetestDesignatedPrimaryContinuesAfterSecondaryStopped10001] o.a.q.s.s.b.r.ReplicatedEnvironmentFacade Received BDB event, new BDB state UNKNOWN Facade state : OPEN
> 10:15:44,279 B-10000 INFO  [StateChange-test:nodetestDesignatedPrimaryContinuesAfterSecondaryStopped10001] o.a.q.s.v.b.BDBHAVirtualHostNodeImpl Received BDB event indicating transition from state MASTER to UNKNOWN for nodetestDesignatedPrimaryContinuesAfterSecondaryStopped10001
> 10:15:44,280 B-10000 DEBUG [VirtualHostNode-nodetestDesignatedPrimaryContinuesAfterSecondaryStopped10001-Config] o.a.q.s.c.u.TaskExecutorImpl Performing Task['close' on 'BDBHAVirtualHostImpl [id=3e9eac0d-ff2e-4469-a7ed-aded200c0881, name=test]']
> 10:15:44,281 B-10000 DEBUG [VirtualHostNode-nodetestDesignatedPrimaryContinuesAfterSecondaryStopped10001-Config] o.a.q.s.m.AbstractConfiguredObject Closing BDBHAVirtualHostImpl : test
> 2016-02-17 10:15:44,281 B-10000 DEBUG [VirtualHostNode-nodetestDesignatedPrimaryContinuesAfterSecondaryStopped10001-Config] o.a.q.s.v.AbstractVirtualHost Closing connection registry :1 connections.
> 10:15:44,282 B-10000 DEBUG [VirtualHostNode-nodetestDesignatedPrimaryContinuesAfterSecondaryStopped10001-Config] o.a.q.s.c.u.TaskExecutorImpl Task['close' on 'BDBHAVirtualHostImpl [id=3e9eac0d-ff2e-4469-a7ed-aded200c0881, name=test]'] performed successfully with result: null
> 10:15:44,283 B-10000 DEBUG [Broker-Config] o.a.q.s.c.u.TaskExecutorImpl Performing Task['close' on '/127.0.0.1:58662(guest)']
> 10:15:44,284 B-10000 DEBUG [Broker-Config] o.a.q.s.m.AbstractConfiguredObject Closing AMQPConnection_0_8 : [1] 127.0.0.1:58662
> 10:15:44,284 B-10000 DEBUG [Broker-Config] o.a.q.s.c.u.TaskExecutorImpl Task['close' on '/127.0.0.1:58662(guest)'] performed successfully with result: null
> {noformat}
> The transition into UNKNOWN state should not happen as MASTER node is designated as primary. The exhibit behavior indicates about BDB JE bug.
> It is unclear whether JE Environment can recover from this unexpected flip into UNKNOWN state. If JE can recover, then on next transition into MASTER VHN should recover VH and connected applications can continue as usual. If JE can not recover, then BDB HA VHN will not recover automatically from this conditions, as we do not restart the environment on MasterUnknownException. The operator intervention would be required to restart BDB HA VHN.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org