You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@activemq.apache.org by "Bob Wiegand (JIRA)" <ji...@apache.org> on 2011/08/26 17:28:29 UTC

[jira] [Commented] (AMQCPP-376) Deadlock in IOTransport when network of brokers restart and failover is used.

    [ https://issues.apache.org/jira/browse/AMQCPP-376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13091823#comment-13091823 ] 

Bob Wiegand commented on AMQCPP-376:
------------------------------------

I was eager to try out the updated code since we were also experiencing problems (deadlock) with 3.4.0 failover.  The stack was different in my cases (attached).

I have built the trunk (r1161214) on Linux and Windows.  A few trivial modifications were needed to compile on Windows (patch and list of files missing from VS project attached). 

I ran 10-20 failover tests with 10 clients (which are both publishing and consuming) messages.  In every case but one, all clients failed over.  The exception was one Linux client which crashed during the failover.  I realize that such a report has minimal value since I have tons of code running on top of ActiveMQ CPP including JNI, but thought I would at least put it out there in case you can glean anything from it (Java crash file attached).

> Deadlock in IOTransport when network of brokers restart and failover is used. 
> ------------------------------------------------------------------------------
>
>                 Key: AMQCPP-376
>                 URL: https://issues.apache.org/jira/browse/AMQCPP-376
>             Project: ActiveMQ C++ Client
>          Issue Type: Bug
>          Components: Other C++ Clients
>    Affects Versions: 3.4.0
>         Environment: ActiveMQ-CPP  ver - 3.4.0
> Broker  5.3.1
> Machine: Linux mars 2.6.18-128.el5 #1 SMP Wed Dec 17 11:41:38 EST 2008 x86_64 x86_64 x86_64 GNU/Linux
> gcc version: 4.1.2 20080704 (Red Hat 4.1.2-44))
>            Reporter: igor khaustov
>            Assignee: Timothy Bish
>         Attachments: bt_1.txt, bt_2.txt
>
>
> The problem description:
> We  run Network of brokers ( 4 in number ) . 
> Broker URI : broker URI 'failover://(tcp://10.10.13.20:61616,tcp://10.10.13.22:61616,tcp://10.10.13.24:61616,tcp://10.10.13.26:61616)?randomize=true&connection.closeTimeout=10000&transport.soTimeout=3000&timeout=3000&connection.useAsyncSend=true&connection.alwaysSyncSend=false'
> Producer loads broker with 1000 message/sec . We testing the producer behavior while failover by  restarting all brokers in row ( all 4 ) while sending the messages and get deadlock as shown below .
> Note: the problem tested only with network on brokers .
> The backtrace ( only relevant threads ):
> +Thread 16 (process 26892):+
> *#0  0x00000032ef00ce74 in __lll_lock_wait () from /lib64/libpthread.so.0*
> #1  0x00000032ef008874 in _L_lock_106 () from /lib64/libpthread.so.0
> #2  0x00000032ef0082e0 in pthread_mutex_lock () from /lib64/libpthread.so.0
> #3  0x0000000000dc5a04 in decaf::internal::util::concurrent::MutexImpl::lock (handle=0xfefdd38) at decaf/internal/util/concurrent/unix/MutexImpl.cpp:77
> #4  0x0000000000bd9092 in decaf::util::concurrent::Mutex::lock (this=0xff54100) at decaf/util/concurrent/Mutex.cpp:111
> #5  0x0000000000d51f3f in decaf::util::AbstractCollection<decaf::lang::Pointer<activemq::transport::Transport, decaf::util::concurrent::atomic::AtomicRefCounter> >::lock (this=0xff540f8) at ./decaf/util/AbstractCollection.h:331
> #6  0x0000000000bd86c9 in decaf::util::concurrent::Lock::lock (this=0x4c7b9c90) at decaf/util/concurrent/Lock.cpp:54
> #7  0x0000000000bd883a in Lock (this=0x4c7b9c90, object=0xff54188, intiallyLocked=true) at decaf/util/concurrent/Lock.cpp:32
> *#8  0x0000000000d47a77 in activemq::transport::failover::CloseTransportsTask::add (this=0xff540e8, transport=@0x4c7b9cf0) at activemq/transport/failover/CloseTransportsTask.cpp:46*
> #9  0x0000000000b1b748 in activemq::transport::failover::FailoverTransport::handleTransportFailure (this=0xffed498, error=@0x4c7b9ee0) at activemq/transport/failover/FailoverTransport.cpp:483
> #10 0x0000000000b41a06 in activemq::transport::failover::FailoverTransportListener::onException (this=0xfde2e58, ex=@0x4c7b9ee0) at activemq/transport/failover/FailoverTransportListener.cpp:76
> #11 0x0000000000d34813 in activemq::transport::TransportFilter::fire (this=0x10627498, ex=@0x4c7b9ee0) at activemq/transport/TransportFilter.cpp:54
> #12 0x0000000000d34841 in activemq::transport::TransportFilter::onException (this=0x10627498, ex=@0x4c7b9ee0) at activemq/transport/TransportFilter.cpp:46
> #13 0x0000000000d34813 in activemq::transport::TransportFilter::fire (this=0xfeeb558, ex=@0x4c7b9ee0) at activemq/transport/TransportFilter.cpp:54
> #14 0x0000000000d34841 in activemq::transport::TransportFilter::onException (this=0xfeeb558, ex=@0x4c7b9ee0) at activemq/transport/TransportFilter.cpp:46
> #15 0x0000000000d554c8 in activemq::transport::inactivity::InactivityMonitor::onException (this=0xfeeb558, ex=@0x4c7b9ee0) at activemq/transport/inactivity/InactivityMonitor.cpp:312
> #16 0x0000000000d34813 in activemq::transport::TransportFilter::fire (this=0x1020c118, ex=@0x4c7b9ee0) at activemq/transport/TransportFilter.cpp:54
> #17 0x0000000000d34841 in activemq::transport::TransportFilter::onException (this=0x1020c118, ex=@0x4c7b9ee0) at activemq/transport/TransportFilter.cpp:46
> #18 0x0000000000d326f2 in activemq::transport::IOTransport::fire (this=0xdce10b8, ex=@0x4c7b9ee0) at activemq/transport/IOTransport.cpp:87
> #19 0x0000000000d32982 in activemq::transport::IOTransport::run (this=0xdce10b8) at activemq/transport/IOTransport.cpp:264
> #20 0x0000000000baad49 in decaf::lang::ThreadProperties::runCallback (properties=0x105871d8) at decaf/lang/Thread.cpp:137
> #21 0x0000000000ba9068 in threadWorker (arg=0x105871d8) at decaf/lang/Thread.cpp:190
> #22 0x00000032ef006367 in start_thread () from /lib64/libpthread.so.0
> #23 0x00000032ee4d30ad in clone () from /lib64/libc.so.6
> +Thread 9 (process 14470):+
> *#0  0x00000032ef00a899 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0*
> #1  0x0000000000dc54b3 in decaf::internal::util::concurrent::ConditionImpl::wait (condition=0x1072d2b8) at decaf/internal/util/concurrent/unix/ConditionImpl.cpp:101
> #2  0x0000000000bd9033 in decaf::util::concurrent::Mutex::wait (this=0x105871d8) at decaf/util/concurrent/Mutex.cpp:126
> #3  0x0000000000ba8538 in decaf::lang::Thread::join (this=0x12a4a418) at decaf/lang/Thread.cpp:452
> #4  0x0000000000d32c28 in activemq::transport::IOTransport::close (this=0xdce10b8) at activemq/transport/IOTransport.cpp:222
> #5  0x0000000000d34bfe in activemq::transport::TransportFilter::close (this=0x1020c118) at activemq/transport/TransportFilter.cpp:106
> #6  0x0000000000b47d3a in activemq::transport::tcp::TcpTransport::close (this=0x1020c118) at activemq/transport/tcp/TcpTransport.cpp:74
> #7  0x0000000000d34bfe in activemq::transport::TransportFilter::close (this=0xfeeb558) at activemq/transport/TransportFilter.cpp:106
> #8  0x0000000000d554ec in activemq::transport::inactivity::InactivityMonitor::close (this=0xfeeb558) at activemq/transport/inactivity/InactivityMonitor.cpp:300
> #9  0x0000000000d77867 in activemq::wireformat::openwire::OpenWireFormatNegotiator::close (this=0x10627498) at activemq/wireformat/openwire/OpenWireFormatNegotiator.cpp:248
> *#10 0x0000000000d478ff in activemq::transport::failover::CloseTransportsTask::iterate (this=0xff540e8) at activemq/transport/failover/CloseTransportsTask.cpp:75*
> #11 0x0000000000d25891 in activemq::threads::CompositeTaskRunner::iterate (this=0xddc0108) at activemq/threads/CompositeTaskRunner.cpp:173
> #12 0x0000000000d25ae4 in activemq::threads::CompositeTaskRunner::run (this=0xddc0108) at activemq/threads/CompositeTaskRunner.cpp:107
> #13 0x0000000000baad49 in decaf::lang::ThreadProperties::runCallback (properties=0xfeeb2b8) at decaf/lang/Thread.cpp:137
> #14 0x0000000000ba9068 in threadWorker (arg=0xfeeb2b8) at decaf/lang/Thread.cpp:190
> #15 0x00000032ef006367 in start_thread () from /lib64/libpthread.so.0
> #16 0x00000032ee4d30ad in clone () from /lib64/libc.so.6
> As you can see +Thread 16+ is on lock_wait for *_synchronized( &transports )_* in activemq::transport::failover::CloseTransportsTask::add .
> The *_synchronized( &transports )_* in locked by +Thread 9+ in activemq::threads::CompositeTaskRunner::iterate . But  +Thread 9+ is on pthread_cond_wait which has to be signalled by the +Thread 16+.
> Kind regards .
> Igor.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira