You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@activemq.apache.org by "igor khaustov (JIRA)" <ji...@apache.org> on 2011/07/07 08:39:19 UTC

[jira] [Issue Comment Edited] (AMQCPP-376) Deadlock in IOTransport when network of brokers restart and failover is used.

    [ https://issues.apache.org/jira/browse/AMQCPP-376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13061061#comment-13061061 ] 

igor khaustov edited comment on AMQCPP-376 at 7/7/11 6:38 AM:
--------------------------------------------------------------

Hi Timothy , thanks for response.
Attached full backtrace examples.
Igor.

      was (Author: igorkh):
    full backtrace examples.
  
> Deadlock in IOTransport when network of brokers restart and failover is used. 
> ------------------------------------------------------------------------------
>
>                 Key: AMQCPP-376
>                 URL: https://issues.apache.org/jira/browse/AMQCPP-376
>             Project: ActiveMQ C++ Client
>          Issue Type: Bug
>          Components: Other C++ Clients
>    Affects Versions: 3.4.0
>         Environment: ActiveMQ-CPP  ver - 3.4.0
> Broker  5.3.1
> Machine: Linux mars 2.6.18-128.el5 #1 SMP Wed Dec 17 11:41:38 EST 2008 x86_64 x86_64 x86_64 GNU/Linux
> gcc version: 4.1.2 20080704 (Red Hat 4.1.2-44))
>            Reporter: igor khaustov
>            Assignee: Timothy Bish
>         Attachments: bt_1.txt, bt_2.txt
>
>
> The problem description:
> We  run Network of brokers ( 4 in number ) . 
> Broker URI : broker URI 'failover://(tcp://10.10.13.20:61616,tcp://10.10.13.22:61616,tcp://10.10.13.24:61616,tcp://10.10.13.26:61616)?randomize=true&connection.closeTimeout=10000&transport.soTimeout=3000&timeout=3000&connection.useAsyncSend=true&connection.alwaysSyncSend=false'
> Producer loads broker with 1000 message/sec . We testing the producer behavior while failover by  restarting all brokers in row ( all 4 ) while sending the messages and get deadlock as shown below .
> Note: the problem tested only with network on brokers .
> The backtrace ( only relevant threads ):
> +Thread 16 (process 26892):+
> *#0  0x00000032ef00ce74 in __lll_lock_wait () from /lib64/libpthread.so.0*
> #1  0x00000032ef008874 in _L_lock_106 () from /lib64/libpthread.so.0
> #2  0x00000032ef0082e0 in pthread_mutex_lock () from /lib64/libpthread.so.0
> #3  0x0000000000dc5a04 in decaf::internal::util::concurrent::MutexImpl::lock (handle=0xfefdd38) at decaf/internal/util/concurrent/unix/MutexImpl.cpp:77
> #4  0x0000000000bd9092 in decaf::util::concurrent::Mutex::lock (this=0xff54100) at decaf/util/concurrent/Mutex.cpp:111
> #5  0x0000000000d51f3f in decaf::util::AbstractCollection<decaf::lang::Pointer<activemq::transport::Transport, decaf::util::concurrent::atomic::AtomicRefCounter> >::lock (this=0xff540f8) at ./decaf/util/AbstractCollection.h:331
> #6  0x0000000000bd86c9 in decaf::util::concurrent::Lock::lock (this=0x4c7b9c90) at decaf/util/concurrent/Lock.cpp:54
> #7  0x0000000000bd883a in Lock (this=0x4c7b9c90, object=0xff54188, intiallyLocked=true) at decaf/util/concurrent/Lock.cpp:32
> *#8  0x0000000000d47a77 in activemq::transport::failover::CloseTransportsTask::add (this=0xff540e8, transport=@0x4c7b9cf0) at activemq/transport/failover/CloseTransportsTask.cpp:46*
> #9  0x0000000000b1b748 in activemq::transport::failover::FailoverTransport::handleTransportFailure (this=0xffed498, error=@0x4c7b9ee0) at activemq/transport/failover/FailoverTransport.cpp:483
> #10 0x0000000000b41a06 in activemq::transport::failover::FailoverTransportListener::onException (this=0xfde2e58, ex=@0x4c7b9ee0) at activemq/transport/failover/FailoverTransportListener.cpp:76
> #11 0x0000000000d34813 in activemq::transport::TransportFilter::fire (this=0x10627498, ex=@0x4c7b9ee0) at activemq/transport/TransportFilter.cpp:54
> #12 0x0000000000d34841 in activemq::transport::TransportFilter::onException (this=0x10627498, ex=@0x4c7b9ee0) at activemq/transport/TransportFilter.cpp:46
> #13 0x0000000000d34813 in activemq::transport::TransportFilter::fire (this=0xfeeb558, ex=@0x4c7b9ee0) at activemq/transport/TransportFilter.cpp:54
> #14 0x0000000000d34841 in activemq::transport::TransportFilter::onException (this=0xfeeb558, ex=@0x4c7b9ee0) at activemq/transport/TransportFilter.cpp:46
> #15 0x0000000000d554c8 in activemq::transport::inactivity::InactivityMonitor::onException (this=0xfeeb558, ex=@0x4c7b9ee0) at activemq/transport/inactivity/InactivityMonitor.cpp:312
> #16 0x0000000000d34813 in activemq::transport::TransportFilter::fire (this=0x1020c118, ex=@0x4c7b9ee0) at activemq/transport/TransportFilter.cpp:54
> #17 0x0000000000d34841 in activemq::transport::TransportFilter::onException (this=0x1020c118, ex=@0x4c7b9ee0) at activemq/transport/TransportFilter.cpp:46
> #18 0x0000000000d326f2 in activemq::transport::IOTransport::fire (this=0xdce10b8, ex=@0x4c7b9ee0) at activemq/transport/IOTransport.cpp:87
> #19 0x0000000000d32982 in activemq::transport::IOTransport::run (this=0xdce10b8) at activemq/transport/IOTransport.cpp:264
> #20 0x0000000000baad49 in decaf::lang::ThreadProperties::runCallback (properties=0x105871d8) at decaf/lang/Thread.cpp:137
> #21 0x0000000000ba9068 in threadWorker (arg=0x105871d8) at decaf/lang/Thread.cpp:190
> #22 0x00000032ef006367 in start_thread () from /lib64/libpthread.so.0
> #23 0x00000032ee4d30ad in clone () from /lib64/libc.so.6
> +Thread 9 (process 14470):+
> *#0  0x00000032ef00a899 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0*
> #1  0x0000000000dc54b3 in decaf::internal::util::concurrent::ConditionImpl::wait (condition=0x1072d2b8) at decaf/internal/util/concurrent/unix/ConditionImpl.cpp:101
> #2  0x0000000000bd9033 in decaf::util::concurrent::Mutex::wait (this=0x105871d8) at decaf/util/concurrent/Mutex.cpp:126
> #3  0x0000000000ba8538 in decaf::lang::Thread::join (this=0x12a4a418) at decaf/lang/Thread.cpp:452
> #4  0x0000000000d32c28 in activemq::transport::IOTransport::close (this=0xdce10b8) at activemq/transport/IOTransport.cpp:222
> #5  0x0000000000d34bfe in activemq::transport::TransportFilter::close (this=0x1020c118) at activemq/transport/TransportFilter.cpp:106
> #6  0x0000000000b47d3a in activemq::transport::tcp::TcpTransport::close (this=0x1020c118) at activemq/transport/tcp/TcpTransport.cpp:74
> #7  0x0000000000d34bfe in activemq::transport::TransportFilter::close (this=0xfeeb558) at activemq/transport/TransportFilter.cpp:106
> #8  0x0000000000d554ec in activemq::transport::inactivity::InactivityMonitor::close (this=0xfeeb558) at activemq/transport/inactivity/InactivityMonitor.cpp:300
> #9  0x0000000000d77867 in activemq::wireformat::openwire::OpenWireFormatNegotiator::close (this=0x10627498) at activemq/wireformat/openwire/OpenWireFormatNegotiator.cpp:248
> *#10 0x0000000000d478ff in activemq::transport::failover::CloseTransportsTask::iterate (this=0xff540e8) at activemq/transport/failover/CloseTransportsTask.cpp:75*
> #11 0x0000000000d25891 in activemq::threads::CompositeTaskRunner::iterate (this=0xddc0108) at activemq/threads/CompositeTaskRunner.cpp:173
> #12 0x0000000000d25ae4 in activemq::threads::CompositeTaskRunner::run (this=0xddc0108) at activemq/threads/CompositeTaskRunner.cpp:107
> #13 0x0000000000baad49 in decaf::lang::ThreadProperties::runCallback (properties=0xfeeb2b8) at decaf/lang/Thread.cpp:137
> #14 0x0000000000ba9068 in threadWorker (arg=0xfeeb2b8) at decaf/lang/Thread.cpp:190
> #15 0x00000032ef006367 in start_thread () from /lib64/libpthread.so.0
> #16 0x00000032ee4d30ad in clone () from /lib64/libc.so.6
> As you can see +Thread 16+ is on lock_wait for *_synchronized( &transports )_* in activemq::transport::failover::CloseTransportsTask::add .
> The *_synchronized( &transports )_* in locked by +Thread 9+ in activemq::threads::CompositeTaskRunner::iterate . But  +Thread 9+ is on pthread_cond_wait which has to be signalled by the +Thread 16+.
> Kind regards .
> Igor.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira