You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@qpid.apache.org by "Alex Rudyy (JIRA)" <ji...@apache.org> on 2016/02/17 16:57:18 UTC

[jira] [Created] (QPID-7078) [Java Broker,HA] BDB HA VHN in master role designated as primary can sporadically tramsit into unknown role after loosing second replica node

Alex Rudyy created QPID-7078:
--------------------------------

             Summary: [Java Broker,HA] BDB HA VHN in master role designated as primary can sporadically tramsit into unknown role after loosing second replica node
                 Key: QPID-7078
                 URL: https://issues.apache.org/jira/browse/QPID-7078
             Project: Qpid
          Issue Type: Bug
          Components: Java Broker
    Affects Versions: qpid-java-6.0, 0.32, qpid-java-6.0.1, qpid-java-6.1
            Reporter: Alex Rudyy
         Attachments: TEST-org.apache.qpid.server.store.berkeleydb.replication.TwoNodeTest.testDesignatedPrimaryContinuesAfterSecondaryStopped.txt

Failure of test TwoNodeTest#testDesignatedPrimaryContinuesAfterSecondaryStopped reviled an unexpected behavior of  BDB JE when master node designated as primary suddenly transits into UNKNOWN role after shutting down of second replica node.

The test failed as below:
{noformat}
testDesignatedPrimaryContinuesAfterSecondaryStopped(org.apache.qpid.server.store.berkeleydb.replication.TwoNodeTest)  Time elapsed: 7.236 sec  <<< ERROR!
javax.jms.JMSException: Error registering consumer: org.apache.qpid.QpidException: Fail-over exception interrupted basic consume.
	at org.apache.qpid.client.AMQSession.registerConsumer(AMQSession.java:3093)
	at org.apache.qpid.client.AMQSession.access$400(AMQSession.java:94)
	at org.apache.qpid.client.AMQSession$5.execute(AMQSession.java:2094)
	at org.apache.qpid.client.AMQSession$5.execute(AMQSession.java:2069)
	at org.apache.qpid.client.AMQConnectionDelegate_8_0.executeRetrySupport(AMQConnectionDelegate_8_0.java:416)
	at org.apache.qpid.client.AMQConnection.executeRetrySupport(AMQConnection.java:737)
	at org.apache.qpid.client.failover.FailoverRetrySupport.execute(FailoverRetrySupport.java:90)
	at org.apache.qpid.client.AMQSession.createConsumerImpl(AMQSession.java:2067)
	at org.apache.qpid.client.AMQSession.createConsumer(AMQSession.java:989)
	at org.apache.qpid.client.AMQConnection.retrieveVirtualHostPropertiesIfNecessary(AMQConnection.java:809)
	at org.apache.qpid.client.AMQConnection.createSession(AMQConnection.java:796)
	at org.apache.qpid.client.AMQConnection.createSession(AMQConnection.java:771)
	at org.apache.qpid.client.AMQConnection.createSession(AMQConnection.java:765)
	at org.apache.qpid.client.AMQConnection.createSession(AMQConnection.java:88)
	at org.apache.qpid.test.utils.QpidBrokerTestCase.assertProducingConsuming(QpidBrokerTestCase.java:1256)
	at org.apache.qpid.server.store.berkeleydb.replication.TwoNodeTest.testDesignatedPrimaryContinuesAfterSecondaryStopped(TwoNodeTest.java:108)
Caused by: org.apache.qpid.client.failover.FailoverException: Failing over about to start
	at org.apache.qpid.client.AMQProtocolHandler.notifyFailoverStarting(AMQProtocolHandler.java:434)
	at org.apache.qpid.client.AMQProtocolHandler$1.run(AMQProtocolHandler.java:287)
	at java.lang.Thread.run(Thread.java:745)
{noformat}

On broker side a transition into UNKNOWN state occurred as below:
{noformat}
10:15:44,279 B-10000 DEBUG [Group-Change-Learner:test:nodetestDesignatedPrimaryContinuesAfterSecondaryStopped10001] o.a.q.s.s.b.r.DatabasePinger Ping transaction completed
10:15:44,279 B-10000 DEBUG [IO-/127.0.0.1:58662] o.a.q.s.p.v.BrokerDecoder Frame handled in 1344 ms.
10:15:44,279 B-10000 INFO  [MASTER nodetestDesignatedPrimaryContinuesAfterSecondaryStopped10001(1)] o.a.q.s.s.b.r.ReplicatedEnvironmentFacade The node 'test:nodetestDesignatedPrimaryContinuesAfterSecondaryStopped10001' state is UNKNOWN
10:15:44,279 B-10000 DEBUG [StateChange-test:nodetestDesignatedPrimaryContinuesAfterSecondaryStopped10001] o.a.q.s.s.b.r.ReplicatedEnvironmentFacade Received BDB event, new BDB state UNKNOWN Facade state : OPEN
10:15:44,279 B-10000 INFO  [StateChange-test:nodetestDesignatedPrimaryContinuesAfterSecondaryStopped10001] o.a.q.s.v.b.BDBHAVirtualHostNodeImpl Received BDB event indicating transition from state MASTER to UNKNOWN for nodetestDesignatedPrimaryContinuesAfterSecondaryStopped10001
10:15:44,280 B-10000 DEBUG [VirtualHostNode-nodetestDesignatedPrimaryContinuesAfterSecondaryStopped10001-Config] o.a.q.s.c.u.TaskExecutorImpl Performing Task['close' on 'BDBHAVirtualHostImpl [id=3e9eac0d-ff2e-4469-a7ed-aded200c0881, name=test]']
10:15:44,281 B-10000 DEBUG [VirtualHostNode-nodetestDesignatedPrimaryContinuesAfterSecondaryStopped10001-Config] o.a.q.s.m.AbstractConfiguredObject Closing BDBHAVirtualHostImpl : test
2016-02-17 10:15:44,281 B-10000 DEBUG [VirtualHostNode-nodetestDesignatedPrimaryContinuesAfterSecondaryStopped10001-Config] o.a.q.s.v.AbstractVirtualHost Closing connection registry :1 connections.
10:15:44,282 B-10000 DEBUG [VirtualHostNode-nodetestDesignatedPrimaryContinuesAfterSecondaryStopped10001-Config] o.a.q.s.c.u.TaskExecutorImpl Task['close' on 'BDBHAVirtualHostImpl [id=3e9eac0d-ff2e-4469-a7ed-aded200c0881, name=test]'] performed successfully with result: null
10:15:44,283 B-10000 DEBUG [Broker-Config] o.a.q.s.c.u.TaskExecutorImpl Performing Task['close' on '/127.0.0.1:58662(guest)']
10:15:44,284 B-10000 DEBUG [Broker-Config] o.a.q.s.m.AbstractConfiguredObject Closing AMQPConnection_0_8 : [1] 127.0.0.1:58662
10:15:44,284 B-10000 DEBUG [Broker-Config] o.a.q.s.c.u.TaskExecutorImpl Task['close' on '/127.0.0.1:58662(guest)'] performed successfully with result: null
{noformat}

The transition into UNKNOWN state should not happen as MASTER node is designated as primary. The exhibit behavior indicates about BDB JE bug.

It is unclear whether JE Environment can recover from this unexpected flip into UNKNOWN state. If JE can recover, then on next transition into MASTER VHN should recover VH and connected applications can continue as usual. If JE can not recover, then BDB HA VHN will not recover automatically from this conditions, as we do not restart the environment on MasterUnknownException. The operator intervention would be required to restart BDB HA VHN.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org