You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@qpid.apache.org by "Alex Rudyy (JIRA)" <ji...@apache.org> on 2015/08/05 17:50:06 UTC

[jira] [Comment Edited] (QPID-3521) failover process for the 0-8 client does not clear the pre-dispatch queue

    [ https://issues.apache.org/jira/browse/QPID-3521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14653924#comment-14653924 ] 

Alex Rudyy edited comment on QPID-3521 at 8/5/15 3:50 PM:
----------------------------------------------------------

It seems that changes implemented in revision [r1693542|https://svn.apache.org/r1693542] might cause a deadlock on 0-9 path when Session is closed whilst failover is in progress. Here is the thread dump demonstrating the issue: 
{noformat}
"Failover" prio=10 tid=0x00007fe0d804e000 nid=0x657c waiting on condition [0x00007fe0cf1f0000]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00000000f41c2528> (a java.util.concurrent.CountDownLatch$Sync)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:994)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303)
        at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:236)
        at org.apache.qpid.client.AMQSession.drainDispatchQueue(AMQSession.java:2306)
        at org.apache.qpid.client.AMQSession.drainDispatchQueueWithDispatcher(AMQSession.java:3697)
        at org.apache.qpid.client.AMQSession_0_8.resubscribe(AMQSession_0_8.java:186)
        at org.apache.qpid.client.AMQConnectionDelegate_8_0.resubscribeSessions(AMQConnectionDelegate_8_0.java:379)
        at org.apache.qpid.client.AMQConnection.resubscribeSessions(AMQConnection.java:1387)
        at org.apache.qpid.client.failover.FailoverHandler.run(FailoverHandler.java:221)
        - locked <0x00000000f35412c0> (a java.lang.Object)
        at java.lang.Thread.run(Thread.java:745)

"Dispatcher-2-Conn-84" prio=10 tid=0x00007fe1341c0000 nid=0x657a waiting for monitor entry [0x00007fe0cf4f3000]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at org.apache.qpid.client.AMQSession$Dispatcher.dispatchMessage(AMQSession.java:3492)
        - waiting to lock <0x00000000f35c8e78> (a java.lang.Object)
        - locked <0x00000000f3cbea70> (a java.lang.Object)
        at org.apache.qpid.client.AMQSession$Dispatcher.access$1000(AMQSession.java:3279)
        at org.apache.qpid.client.AMQSession.dispatch(AMQSession.java:3272)
        at org.apache.qpid.client.message.UnprocessedMessage.dispatch(UnprocessedMessage.java:54)
        at org.apache.qpid.client.AMQSession$Dispatcher.run(AMQSession.java:3410)
        - locked <0x00000000f3cbea70> (a java.lang.Object)
        at java.lang.Thread.run(Thread.java:745)

"main" prio=10 tid=0x00007fe134008800 nid=0x58f1 waiting for monitor entry [0x00007fe13d822000]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at org.apache.qpid.client.AMQSession.close(AMQSession.java:728)
        - waiting to lock <0x00000000f35412c0> (a java.lang.Object)
        - locked <0x00000000f35c8e78> (a java.lang.Object)
        at org.apache.qpid.client.AMQSession.close(AMQSession.java:447)
        at org.apache.qpid.client.failover.FailoverBehaviourTest.sessionCloseWhileFailoverImpl(FailoverBehaviourTest.java:1705)
        at org.apache.qpid.client.failover.FailoverBehaviourTest.testClientAcknowledgedSessionCloseWhileFailover(FailoverBehaviourTest.java:702)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at junit.framework.TestCase.runTest(TestCase.java:176)
        at org.apache.qpid.test.utils.QpidTestCase.runTest(QpidTestCase.java:171)
        at junit.framework.TestCase.runBare(TestCase.java:141)
        at org.apache.qpid.test.utils.QpidBrokerTestCase.runBare(QpidBrokerTestCase.java:332)
        at junit.framework.TestResult$1.protect(TestResult.java:122)
        at junit.framework.TestResult.runProtected(TestResult.java:142)
        at junit.framework.TestResult.run(TestResult.java:125)
        at junit.framework.TestCase.run(TestCase.java:129)
        at org.apache.qpid.test.utils.QpidTestCase.run(QpidTestCase.java:156)
        at junit.framework.TestSuite.runTest(TestSuite.java:255)
        at junit.framework.TestSuite.run(TestSuite.java:250)
        at org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:84)
        at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:264)
        at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153)
        at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124)
        at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200)
        at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153)
        at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103)

{noformat}

On the thread dump above Session#close() is invoked from the "main" thread. As part of Session#close() _messageDeliveryLock is acquired and main thread is waiting for the failover mutex which is acquired by "Failover" thread which is waiting for a Dispatcher thread to drain the pre-dispatch queue.  However, Dispatcher thread requires  _messageDeliveryLock to perform the clean up. Thus, it is in BLOCKED state causing the application hang.


was (Author: alex.rufous):
It seems that might changes causes the deadlock on 0-9 path when Session is closed whilst failover is in progress:

{noformat}
"Failover" prio=10 tid=0x00007fe0d804e000 nid=0x657c waiting on condition [0x00007fe0cf1f0000]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00000000f41c2528> (a java.util.concurrent.CountDownLatch$Sync)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:994)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303)
        at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:236)
        at org.apache.qpid.client.AMQSession.drainDispatchQueue(AMQSession.java:2306)
        at org.apache.qpid.client.AMQSession.drainDispatchQueueWithDispatcher(AMQSession.java:3697)
        at org.apache.qpid.client.AMQSession_0_8.resubscribe(AMQSession_0_8.java:186)
        at org.apache.qpid.client.AMQConnectionDelegate_8_0.resubscribeSessions(AMQConnectionDelegate_8_0.java:379)
        at org.apache.qpid.client.AMQConnection.resubscribeSessions(AMQConnection.java:1387)
        at org.apache.qpid.client.failover.FailoverHandler.run(FailoverHandler.java:221)
        - locked <0x00000000f35412c0> (a java.lang.Object)
        at java.lang.Thread.run(Thread.java:745)

"Dispatcher-2-Conn-84" prio=10 tid=0x00007fe1341c0000 nid=0x657a waiting for monitor entry [0x00007fe0cf4f3000]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at org.apache.qpid.client.AMQSession$Dispatcher.dispatchMessage(AMQSession.java:3492)
        - waiting to lock <0x00000000f35c8e78> (a java.lang.Object)
        - locked <0x00000000f3cbea70> (a java.lang.Object)
        at org.apache.qpid.client.AMQSession$Dispatcher.access$1000(AMQSession.java:3279)
        at org.apache.qpid.client.AMQSession.dispatch(AMQSession.java:3272)
        at org.apache.qpid.client.message.UnprocessedMessage.dispatch(UnprocessedMessage.java:54)
        at org.apache.qpid.client.AMQSession$Dispatcher.run(AMQSession.java:3410)
        - locked <0x00000000f3cbea70> (a java.lang.Object)
        at java.lang.Thread.run(Thread.java:745)

"main" prio=10 tid=0x00007fe134008800 nid=0x58f1 waiting for monitor entry [0x00007fe13d822000]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at org.apache.qpid.client.AMQSession.close(AMQSession.java:728)
        - waiting to lock <0x00000000f35412c0> (a java.lang.Object)
        - locked <0x00000000f35c8e78> (a java.lang.Object)
        at org.apache.qpid.client.AMQSession.close(AMQSession.java:447)
        at org.apache.qpid.client.failover.FailoverBehaviourTest.sessionCloseWhileFailoverImpl(FailoverBehaviourTest.java:1705)
        at org.apache.qpid.client.failover.FailoverBehaviourTest.testClientAcknowledgedSessionCloseWhileFailover(FailoverBehaviourTest.java:702)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at junit.framework.TestCase.runTest(TestCase.java:176)
        at org.apache.qpid.test.utils.QpidTestCase.runTest(QpidTestCase.java:171)
        at junit.framework.TestCase.runBare(TestCase.java:141)
        at org.apache.qpid.test.utils.QpidBrokerTestCase.runBare(QpidBrokerTestCase.java:332)
        at junit.framework.TestResult$1.protect(TestResult.java:122)
        at junit.framework.TestResult.runProtected(TestResult.java:142)
        at junit.framework.TestResult.run(TestResult.java:125)
        at junit.framework.TestCase.run(TestCase.java:129)
        at org.apache.qpid.test.utils.QpidTestCase.run(QpidTestCase.java:156)
        at junit.framework.TestSuite.runTest(TestSuite.java:255)
        at junit.framework.TestSuite.run(TestSuite.java:250)
        at org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:84)
        at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:264)
        at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153)
        at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124)
        at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200)
        at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153)
        at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103)

{noformat}

On the thread dump above Session#close() is invoked from the "main" thread. As part of Session#close() _messageDeliveryLock was acquired and main thread is waiting for the failover mutex which is acquired by "Failover" thread which is waiting for a Dispatcher thread to drain the pre-dispatch queue.  However, Dispatcher thread requires  _messageDeliveryLock to perform the clean up. Thus, it is in BLOCKED state causing the application hang.

> failover process for the 0-8 client does not clear the pre-dispatch queue
> -------------------------------------------------------------------------
>
>                 Key: QPID-3521
>                 URL: https://issues.apache.org/jira/browse/QPID-3521
>             Project: Qpid
>          Issue Type: Bug
>          Components: Java Client
>            Reporter: Robbie Gemmell
>            Assignee: Keith Wall
>              Labels: failover
>         Attachments: clear-dispatch-queue-on-failover.diff
>
>
> failover process for the 0-8 client does not clear the pre-dispatch queue, only the consumer receive queue.
> This is currently masked by an issue with the rollbackMark. The changes made in QPID-3546 to fix the 0-10 client path need to be applied to the 0-8/9/9-1 client path when this issue is resolved.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org