You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@activemq.apache.org by "Andrew (JIRA)" <ji...@apache.org> on 2015/06/01 14:21:17 UTC

[jira] [Created] (AMQ-5815) Frequency IOException cast dead lock while using kahadb in high concurrency environment

Andrew created AMQ-5815:
---------------------------

             Summary: Frequency IOException cast dead lock while using kahadb in high concurrency environment
                 Key: AMQ-5815
                 URL: https://issues.apache.org/jira/browse/AMQ-5815
             Project: ActiveMQ
          Issue Type: Bug
          Components: Broker
    Affects Versions: 5.10.0
         Environment: Master/Slave AMQ + kahadb(NAS using NFS)
            Reporter: Andrew


When the concurrency is in high level, iops more than 3000/s, we find that most of the amq connections are blocked.We can't either get or push any data through them unless restart.This issue appears *frequency*.

There is many IOExceptions in the amq logs.

{code}
2015-05-28 00:37:08,544 | ERROR | Checkpoint failed | org.apache.activemq.store.kahadb.MessageDatabase | ActiveMQ Journal Checkpoint Worker
java.io.IOException: Input/output error
        at java.io.RandomAccessFile.write(Native Method)[:1.6.0_30]
        at java.io.RandomAccessFile.writeLong(RandomAccessFile.java:1001)[:1.6.0_30]
        at org.apache.activemq.util.RecoverableRandomAccessFile.writeLong(RecoverableRandomAccessFile.java:305)[activemq-kahadb-store-5.10.0.jar:5.10.0]
        at org.apache.activemq.store.kahadb.disk.page.PageFile.writeBatch(PageFile.java:1062)[activemq-kahadb-store-5.10.0.jar:5.10.0]
        at org.apache.activemq.store.kahadb.disk.page.PageFile.flush(PageFile.java:516)[activemq-kahadb-store-5.10.0.jar:5.10.0]
        at org.apache.activemq.store.kahadb.MessageDatabase.checkpointUpdate(MessageDatabase.java:1512)[activemq-kahadb-store-5.10.0.jar:5.10.0]
        at org.apache.activemq.store.kahadb.MessageDatabase$17.execute(MessageDatabase.java:1484)[activemq-kahadb-store-5.10.0.jar:5.10.0]
        at org.apache.activemq.store.kahadb.disk.page.Transaction.execute(Transaction.java:779)[activemq-kahadb-store-5.10.0.jar:5.10.0]
        at org.apache.activemq.store.kahadb.MessageDatabase.checkpointUpdate(MessageDatabase.java:1481)[activemq-kahadb-store-5.10.0.jar:5.10.0]
        at org.apache.activemq.store.kahadb.MessageDatabase.checkpointCleanup(MessageDatabase.java:929)[activemq-kahadb-store-5.10.0.jar:5.10.0]
        at org.apache.activemq.store.kahadb.MessageDatabase$3.run(MessageDatabase.java:357)[activemq-kahadb-store-5.10.0.jar:5.10.0]
2015-05-28 00:37:08,547 | INFO  | Ignoring IO exception, java.io.IOException: Input/output error | org.apache.activemq.util.DefaultIOExceptionHandler | ActiveMQ Journal Checkpoint Worker
java.io.IOException: Input/output error
        at java.io.RandomAccessFile.write(Native Method)[:1.6.0_30]
        at java.io.RandomAccessFile.writeLong(RandomAccessFile.java:1001)[:1.6.0_30]
        at org.apache.activemq.util.RecoverableRandomAccessFile.writeLong(RecoverableRandomAccessFile.java:305)[activemq-kahadb-store-5.10.0.jar:5.10.0]
        at org.apache.activemq.store.kahadb.disk.page.PageFile.writeBatch(PageFile.java:1062)[activemq-kahadb-store-5.10.0.jar:5.10.0]
        at org.apache.activemq.store.kahadb.disk.page.PageFile.flush(PageFile.java:516)[activemq-kahadb-store-5.10.0.jar:5.10.0]
        at org.apache.activemq.store.kahadb.MessageDatabase.checkpointUpdate(MessageDatabase.java:1512)[activemq-kahadb-store-5.10.0.jar:5.10.0]
        at org.apache.activemq.store.kahadb.MessageDatabase$17.execute(MessageDatabase.java:1484)[activemq-kahadb-store-5.10.0.jar:5.10.0]
        at org.apache.activemq.store.kahadb.disk.page.Transaction.execute(Transaction.java:779)[activemq-kahadb-store-5.10.0.jar:5.10.0]
        at org.apache.activemq.store.kahadb.MessageDatabase.checkpointUpdate(MessageDatabase.java:1481)[activemq-kahadb-store-5.10.0.jar:5.10.0]
        at org.apache.activemq.store.kahadb.MessageDatabase.checkpointCleanup(MessageDatabase.java:929)[activemq-kahadb-store-5.10.0.jar:5.10.0]
        at org.apache.activemq.store.kahadb.MessageDatabase$3.run(MessageDatabase.java:357)[activemq-kahadb-store-5.10.0.jar:5.10.0]
2015-05-28 00:37:08,580 | INFO  | KahaDB: Recovering checkpoint thread after death | org.apache.activemq.store.kahadb.MessageDatabase | ActiveMQ Transport: tcp:///10.40.50.31:58171@61616
2015-05-28 00:37:17,285 | INFO  | Journal failed while writing at: 2359276 | org.apache.activemq.store.kahadb.disk.journal.DataFileAppender | ActiveMQ Data File Writer
2015-05-28 00:37:17,287 | ERROR | KahaDB failed to store to Journal | org.apache.activemq.store.kahadb.MessageDatabase | ActiveMQ Transport: tcp:///10.40.50.17:4487@61616
java.io.IOException: Input/output error
        at sun.nio.ch.FileChannelImpl.force0(Native Method)[:1.6.0_30]
        at sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:348)[:1.6.0_30]
        at org.apache.activemq.util.RecoverableRandomAccessFile.sync(RecoverableRandomAccessFile.java:402)[activemq-kahadb-store-5.10.0.jar:5.10.0]
        at org.apache.activemq.store.kahadb.disk.journal.DataFileAppender.processQueue(DataFileAppender.java:373)[activemq-kahadb-store-5.10.0.jar:5.10.0]
        at org.apache.activemq.store.kahadb.disk.journal.DataFileAppender$1.run(DataFileAppender.java:193)[activemq-kahadb-store-5.10.0.jar:5.10.0]
2015-05-28 00:37:17,291 | ERROR | KahaDB failed to store to Journal | org.apache.activemq.store.kahadb.MessageDatabase | ActiveMQ Transport: tcp:///10.40.50.17:4442@61616
java.io.IOException: Input/output error
        at sun.nio.ch.FileChannelImpl.force0(Native Method)[:1.6.0_30]
        at sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:348)[:1.6.0_30]
        at org.apache.activemq.util.RecoverableRandomAccessFile.sync(RecoverableRandomAccessFile.java:402)[activemq-kahadb-store-5.10.0.jar:5.10.0]
        at org.apache.activemq.store.kahadb.disk.journal.DataFileAppender.processQueue(DataFileAppender.java:373)[activemq-kahadb-store-5.10.0.jar:5.10.0]
        at org.apache.activemq.store.kahadb.disk.journal.DataFileAppender$1.run(DataFileAppender.java:193)[activemq-kahadb-store-5.10.0.jar:5.10.0]
2015-05-28 00:37:17,293 | INFO  | Ignoring IO exception, java.io.IOException: Input/output error | org.apache.activemq.util.DefaultIOExceptionHandler | ActiveMQ Transport: tcp:///10.40.50.17:4442@61616
java.io.IOException: Input/output error
        at sun.nio.ch.FileChannelImpl.force0(Native Method)[:1.6.0_30]
        at sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:348)[:1.6.0_30]
        at org.apache.activemq.util.RecoverableRandomAccessFile.sync(RecoverableRandomAccessFile.java:402)[activemq-kahadb-store-5.10.0.jar:5.10.0]
        at org.apache.activemq.store.kahadb.disk.journal.DataFileAppender.processQueue(DataFileAppender.java:373)[activemq-kahadb-store-5.10.0.jar:5.10.0]
        at org.apache.activemq.store.kahadb.disk.journal.DataFileAppender$1.run(DataFileAppender.java:193)[activemq-kahadb-store-5.10.0.jar:5.10.0]
{code}


With the aid of jstack ,we find dead lock(How can I upload the whole jtack log?)
The blocked connections are all waiting for 0x00007f90500034e0
{code}
[01:08:01] "ActiveMQ Transport: tcp:///10.40.50.17:4479@61616" daemon prio=10 tid=0x00007f8f8c015000 nid=0x2ece waiting for monitor entry [0x00007f8f1f6e4000]
[01:08:01]    java.lang.Thread.State: BLOCKED (on object monitor)
[01:08:01]         at org.apache.activemq.transaction.LocalTransaction.rollback(LocalTransaction.java:94)
[01:08:01]         - waiting to lock <0x00007f90500034e0> (a org.apache.activemq.store.kahadb.KahaDBTransactionStore)
[01:08:01]         at org.apache.activemq.transaction.LocalTransaction.commit(LocalTransaction.java:74)
[01:08:01]         at org.apache.activemq.broker.TransactionBroker.commitTransaction(TransactionBroker.java:253)
[01:08:01]         at org.apache.activemq.broker.MutableBrokerFilter.commitTransaction(MutableBrokerFilter.java:112)
[01:08:01]         at org.apache.activemq.broker.TransportConnection.processCommitTransactionOnePhase(TransportConnection.java:453)
[01:08:01]         at org.apache.activemq.command.TransactionInfo.visit(TransactionInfo.java:100)
[01:08:01]         at org.apache.activemq.broker.TransportConnection.service(TransportConnection.java:294)
[01:08:01]         at org.apache.activemq.broker.TransportConnection$1.onCommand(TransportConnection.java:148)
[01:08:01]         at org.apache.activemq.transport.MutexTransport.onCommand(MutexTransport.java:50)
[01:08:01]         at org.apache.activemq.transport.WireFormatNegotiator.onCommand(WireFormatNegotiator.java:113)
[01:08:01]         at org.apache.activemq.transport.AbstractInactivityMonitor.onCommand(AbstractInactivityMonitor.java:270)
[01:08:01]         at org.apache.activemq.transport.TransportSupport.doConsume(TransportSupport.java:83)
[01:08:01]         at org.apache.activemq.transport.tcp.TcpTransport.doRun(TcpTransport.java:214)
[01:08:01]         at org.apache.activemq.transport.tcp.TcpTransport.run(TcpTransport.java:196)
[01:08:01]         at java.lang.Thread.run(Thread.java:662)
{code}

Only one connection(Thread) get this lock.But it is waiting for another Read lock 0x00007f90530ed8d8
{code}
[01:08:01] "ActiveMQ Transport: tcp:///10.40.50.17:4484@61616" daemon prio=10 tid=0x00007f8fa8039800 nid=0x2ed2 waiting on condition [0x00007f8f1f2e0000]
[01:08:01]    java.lang.Thread.State: WAITING (parking)
[01:08:01]         at sun.misc.Unsafe.park(Native Method)
[01:08:01]         - parking to wait for  <0x00007f90530ed8d8> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
[01:08:01]         at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156)
[01:08:01]         at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811)
[01:08:01]         at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:941)
[01:08:01]         at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1261)
[01:08:01]         at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:594)
[01:08:01]         at org.apache.activemq.store.kahadb.MessageDatabase.store(MessageDatabase.java:972)
[01:08:01]         at org.apache.activemq.store.kahadb.MessageDatabase.store(MessageDatabase.java:958)
[01:08:01]         at org.apache.activemq.store.kahadb.KahaDBTransactionStore.rollback(KahaDBTransactionStore.java:312)
[01:08:01]         at org.apache.activemq.transaction.LocalTransaction.rollback(LocalTransaction.java:94)
[01:08:01]         - locked <0x00007f90500034e0> (a org.apache.activemq.store.kahadb.KahaDBTransactionStore)
[01:08:01]         at org.apache.activemq.transaction.LocalTransaction.commit(LocalTransaction.java:74)
[01:08:01]         at org.apache.activemq.broker.TransactionBroker.commitTransaction(TransactionBroker.java:253)
[01:08:01]         at org.apache.activemq.broker.MutableBrokerFilter.commitTransaction(MutableBrokerFilter.java:112)
[01:08:01]         at org.apache.activemq.broker.TransportConnection.processCommitTransactionOnePhase(TransportConnection.java:453)
[01:08:01]         at org.apache.activemq.command.TransactionInfo.visit(TransactionInfo.java:100)
[01:08:01]         at org.apache.activemq.broker.TransportConnection.service(TransportConnection.java:294)
[01:08:01]         at org.apache.activemq.broker.TransportConnection$1.onCommand(TransportConnection.java:148)
[01:08:01]         at org.apache.activemq.transport.MutexTransport.onCommand(MutexTransport.java:50)
[01:08:01]         at org.apache.activemq.transport.WireFormatNegotiator.onCommand(WireFormatNegotiator.java:113)
[01:08:01]         at org.apache.activemq.transport.AbstractInactivityMonitor.onCommand(AbstractInactivityMonitor.java:270)
[01:08:01]         at org.apache.activemq.transport.TransportSupport.doConsume(TransportSupport.java:83)
[01:08:01]         at org.apache.activemq.transport.tcp.TcpTransport.doRun(TcpTransport.java:214)
[01:08:01]         at org.apache.activemq.transport.tcp.TcpTransport.run(TcpTransport.java:196)
[01:08:01]         at java.lang.Thread.run(Thread.java:662)
{code}

At last we find 0x00007f90530ed8d8 is in many write lock waiting.But none write lock hodes it. The dead lock occurs.
{code}
[01:07:56] "ActiveMQ Transport: tcp:///10.40.50.47:60701@61616" daemon prio=10 tid=0x00007f8f7cbc5000 nid=0x3ec8 waiting on condition [0x00007f900d864000]
[01:07:56]    java.lang.Thread.State: WAITING (parking)
[01:07:56]         at sun.misc.Unsafe.park(Native Method)
[01:07:56]         - parking to wait for  <0x00007f90530ed8d8> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
[01:07:56]         at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156)
[01:07:56]         at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811)
[01:07:56]         at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:941)
[01:07:56]         at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1261)
[01:07:56]         at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:594)
[01:07:56]         at org.apache.activemq.store.kahadb.MessageDatabase.store(MessageDatabase.java:972)
[01:07:56]         at org.apache.activemq.store.kahadb.MessageDatabase.store(MessageDatabase.java:958)
[01:07:56]         at org.apache.activemq.store.kahadb.KahaDBStore$KahaDBMessageStore.removeMessage(KahaDBStore.java:453)
[01:07:56]         at org.apache.activemq.store.kahadb.KahaDBStore$KahaDBMessageStore.removeAsyncMessage(KahaDBStore.java:407)
[01:07:56]         at org.apache.activemq.store.kahadb.KahaDBTransactionStore.removeAsyncMessage(KahaDBTransactionStore.java:498)
[01:07:56]         at org.apache.activemq.store.kahadb.KahaDBTransactionStore$1.removeAsyncMessage(KahaDBTransactionStore.java:185)
[01:07:56]         at org.apache.activemq.broker.region.Queue.acknowledge(Queue.java:998)
[01:07:56]         at org.apache.activemq.broker.region.Queue.removeMessage(Queue.java:1781)
[01:07:56]         at org.apache.activemq.broker.region.QueueSubscription.acknowledge(QueueSubscription.java:63)
[01:07:56]         at org.apache.activemq.broker.region.PrefetchSubscription.acknowledge(PrefetchSubscription.java:235)
[01:07:56]         - locked <0x00007f905d10b1e8> (a java.lang.Object)
[01:07:56]         at org.apache.activemq.broker.region.AbstractRegion.acknowledge(AbstractRegion.java:446)
[01:07:56]         at org.apache.activemq.broker.region.RegionBroker.acknowledge(RegionBroker.java:461)
[01:07:56]         at org.apache.activemq.broker.BrokerFilter.acknowledge(BrokerFilter.java:82)
[01:07:56]         at org.apache.activemq.broker.TransactionBroker.acknowledge(TransactionBroker.java:277)
[01:07:56]         at org.apache.activemq.broker.MutableBrokerFilter.acknowledge(MutableBrokerFilter.java:92)
[01:07:56]         at org.apache.activemq.broker.TransportConnection.processMessageAck(TransportConnection.java:505)
[01:07:56]         at org.apache.activemq.command.MessageAck.visit(MessageAck.java:245)
[01:07:56]         at org.apache.activemq.broker.TransportConnection.service(TransportConnection.java:294)
[01:07:56]         at org.apache.activemq.broker.TransportConnection$1.onCommand(TransportConnection.java:148)
[01:07:56]         at org.apache.activemq.transport.MutexTransport.onCommand(MutexTransport.java:50)
[01:07:56]         at org.apache.activemq.transport.WireFormatNegotiator.onCommand(WireFormatNegotiator.java:113)
[01:07:56]         at org.apache.activemq.transport.AbstractInactivityMonitor.onCommand(AbstractInactivityMonitor.java:270)
[01:07:56]         at org.apache.activemq.transport.TransportSupport.doConsume(TransportSupport.java:83)
[01:07:56]         at org.apache.activemq.transport.tcp.TcpTransport.doRun(TcpTransport.java:214)
[01:07:56]         at org.apache.activemq.transport.tcp.TcpTransport.run(TcpTransport.java:196)
[01:07:56]         at java.lang.Thread.run(Thread.java:662)
{code}

In my opinion, IOExceptions may lead to losing persisten messages and so on ,but the broker shouldn't stop service.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)