You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@qpid.apache.org by "Alex Rudyy (Jira)" <ji...@apache.org> on 2019/10/01 08:03:00 UTC

[jira] [Updated] (QPID-8366) [Broker-J] The loss of BDB HA majority on invocation of house keeping operations can crash the broker

     [ https://issues.apache.org/jira/browse/QPID-8366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alex Rudyy updated QPID-8366:
-----------------------------
    Fix Version/s: qpid-java-broker-7.1.5
                   qpid-java-broker-7.0.9
                   qpid-java-broker-8.0.0

> [Broker-J] The loss of BDB HA majority on invocation of house keeping operations can crash the broker
> -----------------------------------------------------------------------------------------------------
>
>                 Key: QPID-8366
>                 URL: https://issues.apache.org/jira/browse/QPID-8366
>             Project: Qpid
>          Issue Type: Task
>          Components: Broker-J
>    Affects Versions: qpid-java-broker-7.1.0, qpid-java-broker-7.0.4, qpid-java-broker-7.0.5, qpid-java-broker-7.0.6, qpid-java-broker-7.0.7, qpid-java-broker-7.1.1, qpid-java-broker-7.1.2, qpid-java-broker-7.0.8, qpid-java-broker-7.1.3, qpid-java-broker-7.1.4
>            Reporter: Alex Rudyy
>            Priority: Major
>             Fix For: qpid-java-broker-8.0.0, qpid-java-broker-7.0.9, qpid-java-broker-7.1.5
>
>
> The {{ConnectionScopedRuntimeException}} thrown from {{VirtualHost}} {{House Keeping}} thread on invocation of {{MessageStore}} operations like {{checkMessageStatus}} can crash the broker. An example of such exception stack trace  (from Qpid Broker version 7.0.6) is provided below:
> {noformat}
> 2019-09-27 07:53:38,168 ERROR [virtualhost-test-pool-1] (o.a.q.s.Main) - Uncaught exception, shutting down.
> org.apache.qpid.server.util.ConnectionScopedRuntimeException: Required number of nodes not reachable
>         at org.apache.qpid.server.store.berkeleydb.replication.ReplicatedEnvironmentFacade.handleDatabaseException(ReplicatedEnvironmentFacade.java:495)
>         at org.apache.qpid.server.store.berkeleydb.replication.ReplicatedEnvironmentFacade.commit(ReplicatedEnvironmentFacade.java:332)
>         at org.apache.qpid.server.store.berkeleydb.AbstractBDBMessageStore.removeMessage(AbstractBDBMessageStore.java:288)
>         at org.apache.qpid.server.store.berkeleydb.AbstractBDBMessageStore$StoredBDBMessage.remove(AbstractBDBMessageStore.java:1090)
>         at org.apache.qpid.server.message.AbstractServerMessageImpl.decrementReference(AbstractServerMessageImpl.java:118)
>         at org.apache.qpid.server.message.AbstractServerMessageImpl.access$500(AbstractServerMessageImpl.java:37)
>         at org.apache.qpid.server.message.AbstractServerMessageImpl$Reference.release(AbstractServerMessageImpl.java:309)
>         at org.apache.qpid.server.queue.QueueEntryImpl.dispose(QueueEntryImpl.java:557)
>         at org.apache.qpid.server.queue.QueueEntryImpl.delete(QueueEntryImpl.java:572)
>         at org.apache.qpid.server.queue.AbstractQueue$11.postCommit(AbstractQueue.java:1729)
>         at org.apache.qpid.server.txn.AutoCommitTransaction.dequeue(AutoCommitTransaction.java:92)
>         at org.apache.qpid.server.queue.AbstractQueue.dequeueEntry(AbstractQueue.java:1722)
>         at org.apache.qpid.server.queue.AbstractQueue.dequeueEntry(AbstractQueue.java:1717)
>         at org.apache.qpid.server.queue.AbstractQueue.deleteEntry(AbstractQueue.java:1761)
>         at org.apache.qpid.server.queue.AbstractQueue.checkMessageStatus(AbstractQueue.java:2165)
>         at org.apache.qpid.server.virtualhost.AbstractVirtualHost$VirtualHostHouseKeepingTask.execute(AbstractVirtualHost.java:1965)
>         at org.apache.qpid.server.virtualhost.HouseKeepingTask$1.run(HouseKeepingTask.java:56)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at org.apache.qpid.server.virtualhost.HouseKeepingTask.run(HouseKeepingTask.java:51)
>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at org.apache.qpid.server.bytebuffer.QpidByteBufferFactory.lambda$null$0(QpidByteBufferFactory.java:464)
>         at java.lang.Thread.run(Thread.java:748)
> Caused by: com.sleepycat.je.rep.InsufficientAcksException: (JE 7.4.5) Transaction: -3459038252  VLSN: 10,380,435,448, initiated at: 07:53:20.  Insufficient acks for policy:SIMPLE_MAJORITY. Need replica acks: 2. Missing replica acks: 2. Timeout: 15000ms. FeederState=acc3_2(3)[MASTER]
> Current feeds:
>  acc3_1: feederVLSN=10,380,435,456 replicaTxnEndVLSN=10,380,435,396
>  acc3: feederVLSN=10,380,435,456 replicaTxnEndVLSN=10,380,435,396
>         at com.sleepycat.je.rep.impl.node.DurabilityQuorum.ensureSufficientAcks(DurabilityQuorum.java:205)
>         at com.sleepycat.je.rep.stream.FeederTxns.awaitReplicaAcks(FeederTxns.java:189)
>         at com.sleepycat.je.rep.impl.RepImpl.postLogCommitHookInternal(RepImpl.java:1426)
>         at com.sleepycat.je.rep.impl.RepImpl.postLogCommitHook(RepImpl.java:1385)
>         at com.sleepycat.je.rep.txn.MasterTxn.postLogCommitHook(MasterTxn.java:228)
>         at com.sleepycat.je.txn.Txn.commit(Txn.java:772)
>         at com.sleepycat.je.Transaction.doCommit(Transaction.java:621)
>         at com.sleepycat.je.Transaction.commit(Transaction.java:401)
>         at org.apache.qpid.server.store.berkeleydb.replication.ReplicatedEnvironmentFacade.commit(ReplicatedEnvironmentFacade.java:328)
>         ... 25 common frames omitted
> {noformat}
> The issue reported with the stack trace above occurred when BDB HA {{VirtualHost}} was trying to delete an expired message, but its BDB HA group lost the majority when the  {{VirtualHost}} tried to commit a BDB HA transaction for message deletion operation. The majority loss is communicated as {{ConnectionScopeRuntimeException}} to the caller. It seems we need to catch and handle {{ConnectionScopeRuntimeException}}  in House Keeping operations.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org