You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@activemq.apache.org by James Green <ja...@gmail.com> on 2012/11/01 12:27:37 UTC

Re: Network bridge is stuck, repeatedly requires broker restart

Having implemented queue and destination filtering between spokes and the
hub, the connections lasted just over 24 hours.

It's just died. Same traces on the hub as before. On the spoke with which
comms have died:

Name: ActiveMQ Transport: ssl://hub/x.x.x.x:61617
State: WAITING on
java.util.concurrent.locks.ReentrantLock$NonfairSync@4864ae35 owned by:
ActiveMQ BrokerService[zorin] Task-3350
Total blocked: 68  Total waited: 1,862

Stack trace:
sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:838)
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:871)
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1201)
java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:214)
java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:290)
org.apache.activemq.transport.MutexTransport.oneway(MutexTransport.java:66)
org.apache.activemq.transport.ResponseCorrelator.oneway(ResponseCorrelator.java:60)
org.apache.activemq.network.DemandForwardingBridgeSupport.serviceRemoteCommand(DemandForwardingBridgeSupport.java:467)
org.apache.activemq.network.DemandForwardingBridgeSupport$2.onCommand(DemandForwardingBridgeSupport.java:179)
org.apache.activemq.transport.ResponseCorrelator.onCommand(ResponseCorrelator.java:116)
org.apache.activemq.transport.MutexTransport.onCommand(MutexTransport.java:50)
org.apache.activemq.transport.WireFormatNegotiator.onCommand(WireFormatNegotiator.java:113)
org.apache.activemq.transport.AbstractInactivityMonitor.onCommand(AbstractInactivityMonitor.java:256)
org.apache.activemq.transport.TransportSupport.doConsume(TransportSupport.java:83)
org.apache.activemq.transport.tcp.SslTransport.doConsume(SslTransport.java:91)
org.apache.activemq.transport.tcp.TcpTransport.doRun(TcpTransport.java:222)
org.apache.activemq.transport.tcp.TcpTransport.run(TcpTransport.java:204)
java.lang.Thread.run(Thread.java:636)

Name: ActiveMQ BrokerService[zorin] Task-3350
State: RUNNABLE
Total blocked: 0  Total waited: 668

Stack trace:
java.net.SocketOutputStream.socketWrite0(Native Method)
java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:109)
java.net.SocketOutputStream.write(SocketOutputStream.java:153)
sun.security.ssl.OutputRecord.writeBuffer(OutputRecord.java:314)
sun.security.ssl.OutputRecord.write(OutputRecord.java:303)
sun.security.ssl.SSLSocketImpl.writeRecordInternal(SSLSocketImpl.java:763)
sun.security.ssl.SSLSocketImpl.writeRecord(SSLSocketImpl.java:751)
sun.security.ssl.AppOutputStream.write(AppOutputStream.java:78)
   - locked sun.security.ssl.AppOutputStream@3109e59b
org.apache.activemq.transport.tcp.TcpBufferedOutputStream.write(TcpBufferedOutputStream.java:96)
java.io.DataOutputStream.write(DataOutputStream.java:107)
   - locked java.io.DataOutputStream@c13c44
org.apache.activemq.openwire.v9.BaseDataStreamMarshaller.tightMarshalByteSequence2(BaseDataStreamMarshaller.java:431)
org.apache.activemq.openwire.v9.MessageMarshaller.tightMarshal2(MessageMarshaller.java:180)
org.apache.activemq.openwire.v9.ActiveMQMessageMarshaller.tightMarshal2(ActiveMQMessageMarshaller.java:89)
org.apache.activemq.openwire.v9.ActiveMQTextMessageMarshaller.tightMarshal2(ActiveMQTextMessageMarshaller.java:89)
org.apache.activemq.openwire.OpenWireFormat.marshal(OpenWireFormat.java:243)
   - locked org.apache.activemq.openwire.OpenWireFormat@e5b94bf
org.apache.activemq.transport.tcp.TcpTransport.oneway(TcpTransport.java:183)
org.apache.activemq.transport.AbstractInactivityMonitor.doOnewaySend(AbstractInactivityMonitor.java:289)
org.apache.activemq.transport.AbstractInactivityMonitor.oneway(AbstractInactivityMonitor.java:271)
org.apache.activemq.transport.TransportFilter.oneway(TransportFilter.java:85)
org.apache.activemq.transport.WireFormatNegotiator.oneway(WireFormatNegotiator.java:104)
org.apache.activemq.transport.MutexTransport.oneway(MutexTransport.java:68)
org.apache.activemq.transport.ResponseCorrelator.oneway(ResponseCorrelator.java:60)
org.apache.activemq.network.DemandForwardingBridgeSupport.serviceLocalCommand(DemandForwardingBridgeSupport.java:726)
org.apache.activemq.network.DemandForwardingBridgeSupport$1.onCommand(DemandForwardingBridgeSupport.java:167)
org.apache.activemq.transport.ResponseCorrelator.onCommand(ResponseCorrelator.java:116)
org.apache.activemq.transport.MutexTransport.onCommand(MutexTransport.java:50)
org.apache.activemq.transport.vm.VMTransport.doDispatch(VMTransport.java:138)
org.apache.activemq.transport.vm.VMTransport.dispatch(VMTransport.java:127)
   - locked java.util.concurrent.atomic.AtomicBoolean@1e828ef6
org.apache.activemq.transport.vm.VMTransport.oneway(VMTransport.java:104)
org.apache.activemq.transport.MutexTransport.oneway(MutexTransport.java:68)
org.apache.activemq.transport.ResponseCorrelator.oneway(ResponseCorrelator.java:60)
org.apache.activemq.broker.TransportConnection.dispatch(TransportConnection.java:1312)
org.apache.activemq.broker.TransportConnection.processDispatch(TransportConnection.java:838)
org.apache.activemq.broker.TransportConnection.iterate(TransportConnection.java:873)
org.apache.activemq.thread.PooledTaskRunner.runTask(PooledTaskRunner.java:129)
org.apache.activemq.thread.PooledTaskRunner$1.run(PooledTaskRunner.java:47)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
java.lang.Thread.run(Thread.java:636)

Is this something that should be reported in JIRA? And should we be
changing the writeTimeout you speak of? If so, to what?

Thanks,

James

On 31 October 2012 16:57, Gary Tully <ga...@gmail.com> wrote:

> from a quick look at one of those stack traces, it looks like a thread
> is blocked on a socket write, it may be that the socket is half closed
> and that the write needs to timeout at the tcp level. that can take
> some time.
>
> There is a writeTimeout filter that can help at the activemq level.
>
> Have a peek at https://issues.apache.org/jira/browse/AMQ-1993
>
>
> On 30 October 2012 21:31, James Green <ja...@gmail.com> wrote:
> > This is happening repeatedly causing memory consumption to balloon
> locally
> > and message queues to build remotely.
> >
> > Googling around isn't revealing much. Can anyone take a look and tell me
> if
> > this stuff is normal or not? Need to narrow down the investigations.
> >
> > Thanks,
> >
> > James
> >
> > On 30 October 2012 11:32, James Green <ja...@gmail.com> wrote:
> >
> >> Hub with three spokes. The network between the hub and one of these
> spokes
> >> seems to be dying irregularly yet frequently.
> >>
> >> I managed to get a thread dump of the hub's ssl connector to this spoke
> >> along with the thread it was waiting on:
> >>
> >> http://pastebin.com/ZjeSxDK7
> >>
> >> I also captched a full thread dump on the hub:
> >>
> >> http://pastebin.com/Qu113JhQ
> >>
> >> On the spoke the thread dump button was disabled but I captured the
> >> equivalent individual threads:
> >>
> >> http://pastebin.com/1hwb1BDp
> >>
> >> FWIW we just restarted the spoke and the network connector appears to be
> >> stuck once again. Not sure what we can actually do now...
> >>
> >> Can anyone offer advice, perhaps a JIRA issue? This is with 5.7.0.
> >>
> >> James
> >>
> >>
>
>
>
> --
> http://redhat.com
> http://blog.garytully.com
>

Re: Network bridge is stuck, repeatedly requires broker restart

Posted by James Green <ja...@gmail.com>.
A follow-up. Our hub and spoke were restarted. The connection dropped
within a couple of hours.

This time, subsequently, we are having trouble with Stomp transport
threads. Check this out:

Name: ActiveMQ Transport Server Thread Handler:
stomp://localhost:61612?trace=true
State: TIMED_WAITING on
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@104273f
Total blocked: 0  Total waited: 6,613

Stack trace:
sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2081)
java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:423)
org.apache.activemq.transport.tcp.TcpTransportServer$1.run(TcpTransportServer.java:351)
java.lang.Thread.run(Thread.java:679)

I am assuming the daemon thread that listens for stomp connections on port
61612 is the above, and it's waiting on... something?

Having to restart again! What's the best way forward here, anyone?

Thanks again,

James

On 1 November 2012 12:18, James Green <ja...@gmail.com> wrote:

> Gary,
>
> Spokes reach out to the hub. The hub is in an office with a busy ADSL line
> attached. Connection drops are to be expected. We have observed some spokes
> drop and reconnect absolutely fine. Obviously in more recent times
> sometimes this connections are becoming hung. We wondered if this might be
> because of increased traffic load over the line as the network has been
> reliable for many months previously.
>
> The past week or more hanging drops have been occuring every few hours. We
> filtered the traffic to see if this reduced or eliminated the problem - so
> far one drop being experienced now in 24 hours.
>
> The thread remains blocked at the moment. We have evidence to suggest the
> connection to this spoke ("quarrel") dropped approximately two hours ago.
>
> Checking out the OS layer:
>
> jamesg@hub:~$ sudo netstat -natp | grep x.x.x.x
> tcp6  103343  78476 hub:61617         x.x.x.x:44493    ESTABLISHED
> 20345/java
>
> Spoke:
> jamesg@quarrel:~$ sudo netstat -natp | grep 61617
> tcp6  113111 146448 x.x.x.x:44493    hub:61617      ESTABLISHED 27968/java
>
> Clearly something is in fact present. These two are both Ubuntu boxes (all
> ours are). Should we be tweaking something at the OS level, or perhaps
> switching from ssl:// to ssl+nio://?
>
> Floating ideas now!
>
> Thanks,
>
> James
>
>
> On 1 November 2012 11:55, Gary Tully <ga...@gmail.com> wrote:
>
>> actually soWriteTimeout does not help with ssl, because the ssl stream
>> has an exclusive lock that blocks any call to close.
>>
>> you will have to deal with the blocked write at the OS level via the
>> tcp stack timeout configuration.
>>
>> But the real root cause is understanding why the connection drops in
>> the first place.
>>
>> Do you see the thread blocked on a write eventually timing out when
>> the OS detects the half closed socket? It should do once a remote
>> close has been initiated.
>>
>> Or is the problem that the hub/spoke does not really close the
>> connection when it sees a problem?
>>
>> It is odd that it is a 24hr thing,  what happens at that time.
>>
>> On 1 November 2012 11:27, James Green <ja...@gmail.com> wrote:
>> > Having implemented queue and destination filtering between spokes and
>> the
>> > hub, the connections lasted just over 24 hours.
>> >
>> > It's just died. Same traces on the hub as before. On the spoke with
>> which
>> > comms have died:
>> >
>> > Name: ActiveMQ Transport: ssl://hub/x.x.x.x:61617
>> > State: WAITING on
>> > java.util.concurrent.locks.ReentrantLock$NonfairSync@4864ae35 owned by:
>> > ActiveMQ BrokerService[zorin] Task-3350
>> > Total blocked: 68  Total waited: 1,862
>> >
>> > Stack trace:
>> > sun.misc.Unsafe.park(Native Method)
>> > java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
>> >
>> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:838)
>> >
>> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:871)
>> >
>> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1201)
>> >
>> java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:214)
>> > java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:290)
>> >
>> org.apache.activemq.transport.MutexTransport.oneway(MutexTransport.java:66)
>> >
>> org.apache.activemq.transport.ResponseCorrelator.oneway(ResponseCorrelator.java:60)
>> >
>> org.apache.activemq.network.DemandForwardingBridgeSupport.serviceRemoteCommand(DemandForwardingBridgeSupport.java:467)
>> >
>> org.apache.activemq.network.DemandForwardingBridgeSupport$2.onCommand(DemandForwardingBridgeSupport.java:179)
>> >
>> org.apache.activemq.transport.ResponseCorrelator.onCommand(ResponseCorrelator.java:116)
>> >
>> org.apache.activemq.transport.MutexTransport.onCommand(MutexTransport.java:50)
>> >
>> org.apache.activemq.transport.WireFormatNegotiator.onCommand(WireFormatNegotiator.java:113)
>> >
>> org.apache.activemq.transport.AbstractInactivityMonitor.onCommand(AbstractInactivityMonitor.java:256)
>> >
>> org.apache.activemq.transport.TransportSupport.doConsume(TransportSupport.java:83)
>> >
>> org.apache.activemq.transport.tcp.SslTransport.doConsume(SslTransport.java:91)
>> >
>> org.apache.activemq.transport.tcp.TcpTransport.doRun(TcpTransport.java:222)
>> >
>> org.apache.activemq.transport.tcp.TcpTransport.run(TcpTransport.java:204)
>> > java.lang.Thread.run(Thread.java:636)
>> >
>> > Name: ActiveMQ BrokerService[zorin] Task-3350
>> > State: RUNNABLE
>> > Total blocked: 0  Total waited: 668
>> >
>> > Stack trace:
>> > java.net.SocketOutputStream.socketWrite0(Native Method)
>> > java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:109)
>> > java.net.SocketOutputStream.write(SocketOutputStream.java:153)
>> > sun.security.ssl.OutputRecord.writeBuffer(OutputRecord.java:314)
>> > sun.security.ssl.OutputRecord.write(OutputRecord.java:303)
>> >
>> sun.security.ssl.SSLSocketImpl.writeRecordInternal(SSLSocketImpl.java:763)
>> > sun.security.ssl.SSLSocketImpl.writeRecord(SSLSocketImpl.java:751)
>> > sun.security.ssl.AppOutputStream.write(AppOutputStream.java:78)
>> >    - locked sun.security.ssl.AppOutputStream@3109e59b
>> >
>> org.apache.activemq.transport.tcp.TcpBufferedOutputStream.write(TcpBufferedOutputStream.java:96)
>> > java.io.DataOutputStream.write(DataOutputStream.java:107)
>> >    - locked java.io.DataOutputStream@c13c44
>> >
>> org.apache.activemq.openwire.v9.BaseDataStreamMarshaller.tightMarshalByteSequence2(BaseDataStreamMarshaller.java:431)
>> >
>> org.apache.activemq.openwire.v9.MessageMarshaller.tightMarshal2(MessageMarshaller.java:180)
>> >
>> org.apache.activemq.openwire.v9.ActiveMQMessageMarshaller.tightMarshal2(ActiveMQMessageMarshaller.java:89)
>> >
>> org.apache.activemq.openwire.v9.ActiveMQTextMessageMarshaller.tightMarshal2(ActiveMQTextMessageMarshaller.java:89)
>> >
>> org.apache.activemq.openwire.OpenWireFormat.marshal(OpenWireFormat.java:243)
>> >    - locked org.apache.activemq.openwire.OpenWireFormat@e5b94bf
>> >
>> org.apache.activemq.transport.tcp.TcpTransport.oneway(TcpTransport.java:183)
>> >
>> org.apache.activemq.transport.AbstractInactivityMonitor.doOnewaySend(AbstractInactivityMonitor.java:289)
>> >
>> org.apache.activemq.transport.AbstractInactivityMonitor.oneway(AbstractInactivityMonitor.java:271)
>> >
>> org.apache.activemq.transport.TransportFilter.oneway(TransportFilter.java:85)
>> >
>> org.apache.activemq.transport.WireFormatNegotiator.oneway(WireFormatNegotiator.java:104)
>> >
>> org.apache.activemq.transport.MutexTransport.oneway(MutexTransport.java:68)
>> >
>> org.apache.activemq.transport.ResponseCorrelator.oneway(ResponseCorrelator.java:60)
>> >
>> org.apache.activemq.network.DemandForwardingBridgeSupport.serviceLocalCommand(DemandForwardingBridgeSupport.java:726)
>> >
>> org.apache.activemq.network.DemandForwardingBridgeSupport$1.onCommand(DemandForwardingBridgeSupport.java:167)
>> >
>> org.apache.activemq.transport.ResponseCorrelator.onCommand(ResponseCorrelator.java:116)
>> >
>> org.apache.activemq.transport.MutexTransport.onCommand(MutexTransport.java:50)
>> >
>> org.apache.activemq.transport.vm.VMTransport.doDispatch(VMTransport.java:138)
>> >
>> org.apache.activemq.transport.vm.VMTransport.dispatch(VMTransport.java:127)
>> >    - locked java.util.concurrent.atomic.AtomicBoolean@1e828ef6
>> >
>> org.apache.activemq.transport.vm.VMTransport.oneway(VMTransport.java:104)
>> >
>> org.apache.activemq.transport.MutexTransport.oneway(MutexTransport.java:68)
>> >
>> org.apache.activemq.transport.ResponseCorrelator.oneway(ResponseCorrelator.java:60)
>> >
>> org.apache.activemq.broker.TransportConnection.dispatch(TransportConnection.java:1312)
>> >
>> org.apache.activemq.broker.TransportConnection.processDispatch(TransportConnection.java:838)
>> >
>> org.apache.activemq.broker.TransportConnection.iterate(TransportConnection.java:873)
>> >
>> org.apache.activemq.thread.PooledTaskRunner.runTask(PooledTaskRunner.java:129)
>> >
>> org.apache.activemq.thread.PooledTaskRunner$1.run(PooledTaskRunner.java:47)
>> >
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>> >
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>> > java.lang.Thread.run(Thread.java:636)
>> >
>> > Is this something that should be reported in JIRA? And should we be
>> > changing the writeTimeout you speak of? If so, to what?
>> >
>> > Thanks,
>> >
>> > James
>> >
>> > On 31 October 2012 16:57, Gary Tully <ga...@gmail.com> wrote:
>> >
>> >> from a quick look at one of those stack traces, it looks like a thread
>> >> is blocked on a socket write, it may be that the socket is half closed
>> >> and that the write needs to timeout at the tcp level. that can take
>> >> some time.
>> >>
>> >> There is a writeTimeout filter that can help at the activemq level.
>> >>
>> >> Have a peek at https://issues.apache.org/jira/browse/AMQ-1993
>> >>
>> >>
>> >> On 30 October 2012 21:31, James Green <ja...@gmail.com>
>> wrote:
>> >> > This is happening repeatedly causing memory consumption to balloon
>> >> locally
>> >> > and message queues to build remotely.
>> >> >
>> >> > Googling around isn't revealing much. Can anyone take a look and
>> tell me
>> >> if
>> >> > this stuff is normal or not? Need to narrow down the investigations.
>> >> >
>> >> > Thanks,
>> >> >
>> >> > James
>> >> >
>> >> > On 30 October 2012 11:32, James Green <ja...@gmail.com>
>> wrote:
>> >> >
>> >> >> Hub with three spokes. The network between the hub and one of these
>> >> spokes
>> >> >> seems to be dying irregularly yet frequently.
>> >> >>
>> >> >> I managed to get a thread dump of the hub's ssl connector to this
>> spoke
>> >> >> along with the thread it was waiting on:
>> >> >>
>> >> >> http://pastebin.com/ZjeSxDK7
>> >> >>
>> >> >> I also captched a full thread dump on the hub:
>> >> >>
>> >> >> http://pastebin.com/Qu113JhQ
>> >> >>
>> >> >> On the spoke the thread dump button was disabled but I captured the
>> >> >> equivalent individual threads:
>> >> >>
>> >> >> http://pastebin.com/1hwb1BDp
>> >> >>
>> >> >> FWIW we just restarted the spoke and the network connector appears
>> to be
>> >> >> stuck once again. Not sure what we can actually do now...
>> >> >>
>> >> >> Can anyone offer advice, perhaps a JIRA issue? This is with 5.7.0.
>> >> >>
>> >> >> James
>> >> >>
>> >> >>
>> >>
>> >>
>> >>
>> >> --
>> >> http://redhat.com
>> >> http://blog.garytully.com
>> >>
>>
>>
>>
>> --
>> http://redhat.com
>> http://blog.garytully.com
>>
>
>

Re: Network bridge is stuck, repeatedly requires broker restart

Posted by James Green <ja...@gmail.com>.
Gary,

Spokes reach out to the hub. The hub is in an office with a busy ADSL line
attached. Connection drops are to be expected. We have observed some spokes
drop and reconnect absolutely fine. Obviously in more recent times
sometimes this connections are becoming hung. We wondered if this might be
because of increased traffic load over the line as the network has been
reliable for many months previously.

The past week or more hanging drops have been occuring every few hours. We
filtered the traffic to see if this reduced or eliminated the problem - so
far one drop being experienced now in 24 hours.

The thread remains blocked at the moment. We have evidence to suggest the
connection to this spoke ("quarrel") dropped approximately two hours ago.

Checking out the OS layer:

jamesg@hub:~$ sudo netstat -natp | grep x.x.x.x
tcp6  103343  78476 hub:61617         x.x.x.x:44493    ESTABLISHED
20345/java

Spoke:
jamesg@quarrel:~$ sudo netstat -natp | grep 61617
tcp6  113111 146448 x.x.x.x:44493    hub:61617      ESTABLISHED 27968/java

Clearly something is in fact present. These two are both Ubuntu boxes (all
ours are). Should we be tweaking something at the OS level, or perhaps
switching from ssl:// to ssl+nio://?

Floating ideas now!

Thanks,

James

On 1 November 2012 11:55, Gary Tully <ga...@gmail.com> wrote:

> actually soWriteTimeout does not help with ssl, because the ssl stream
> has an exclusive lock that blocks any call to close.
>
> you will have to deal with the blocked write at the OS level via the
> tcp stack timeout configuration.
>
> But the real root cause is understanding why the connection drops in
> the first place.
>
> Do you see the thread blocked on a write eventually timing out when
> the OS detects the half closed socket? It should do once a remote
> close has been initiated.
>
> Or is the problem that the hub/spoke does not really close the
> connection when it sees a problem?
>
> It is odd that it is a 24hr thing,  what happens at that time.
>
> On 1 November 2012 11:27, James Green <ja...@gmail.com> wrote:
> > Having implemented queue and destination filtering between spokes and the
> > hub, the connections lasted just over 24 hours.
> >
> > It's just died. Same traces on the hub as before. On the spoke with which
> > comms have died:
> >
> > Name: ActiveMQ Transport: ssl://hub/x.x.x.x:61617
> > State: WAITING on
> > java.util.concurrent.locks.ReentrantLock$NonfairSync@4864ae35 owned by:
> > ActiveMQ BrokerService[zorin] Task-3350
> > Total blocked: 68  Total waited: 1,862
> >
> > Stack trace:
> > sun.misc.Unsafe.park(Native Method)
> > java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
> >
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:838)
> >
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:871)
> >
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1201)
> >
> java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:214)
> > java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:290)
> >
> org.apache.activemq.transport.MutexTransport.oneway(MutexTransport.java:66)
> >
> org.apache.activemq.transport.ResponseCorrelator.oneway(ResponseCorrelator.java:60)
> >
> org.apache.activemq.network.DemandForwardingBridgeSupport.serviceRemoteCommand(DemandForwardingBridgeSupport.java:467)
> >
> org.apache.activemq.network.DemandForwardingBridgeSupport$2.onCommand(DemandForwardingBridgeSupport.java:179)
> >
> org.apache.activemq.transport.ResponseCorrelator.onCommand(ResponseCorrelator.java:116)
> >
> org.apache.activemq.transport.MutexTransport.onCommand(MutexTransport.java:50)
> >
> org.apache.activemq.transport.WireFormatNegotiator.onCommand(WireFormatNegotiator.java:113)
> >
> org.apache.activemq.transport.AbstractInactivityMonitor.onCommand(AbstractInactivityMonitor.java:256)
> >
> org.apache.activemq.transport.TransportSupport.doConsume(TransportSupport.java:83)
> >
> org.apache.activemq.transport.tcp.SslTransport.doConsume(SslTransport.java:91)
> >
> org.apache.activemq.transport.tcp.TcpTransport.doRun(TcpTransport.java:222)
> > org.apache.activemq.transport.tcp.TcpTransport.run(TcpTransport.java:204)
> > java.lang.Thread.run(Thread.java:636)
> >
> > Name: ActiveMQ BrokerService[zorin] Task-3350
> > State: RUNNABLE
> > Total blocked: 0  Total waited: 668
> >
> > Stack trace:
> > java.net.SocketOutputStream.socketWrite0(Native Method)
> > java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:109)
> > java.net.SocketOutputStream.write(SocketOutputStream.java:153)
> > sun.security.ssl.OutputRecord.writeBuffer(OutputRecord.java:314)
> > sun.security.ssl.OutputRecord.write(OutputRecord.java:303)
> >
> sun.security.ssl.SSLSocketImpl.writeRecordInternal(SSLSocketImpl.java:763)
> > sun.security.ssl.SSLSocketImpl.writeRecord(SSLSocketImpl.java:751)
> > sun.security.ssl.AppOutputStream.write(AppOutputStream.java:78)
> >    - locked sun.security.ssl.AppOutputStream@3109e59b
> >
> org.apache.activemq.transport.tcp.TcpBufferedOutputStream.write(TcpBufferedOutputStream.java:96)
> > java.io.DataOutputStream.write(DataOutputStream.java:107)
> >    - locked java.io.DataOutputStream@c13c44
> >
> org.apache.activemq.openwire.v9.BaseDataStreamMarshaller.tightMarshalByteSequence2(BaseDataStreamMarshaller.java:431)
> >
> org.apache.activemq.openwire.v9.MessageMarshaller.tightMarshal2(MessageMarshaller.java:180)
> >
> org.apache.activemq.openwire.v9.ActiveMQMessageMarshaller.tightMarshal2(ActiveMQMessageMarshaller.java:89)
> >
> org.apache.activemq.openwire.v9.ActiveMQTextMessageMarshaller.tightMarshal2(ActiveMQTextMessageMarshaller.java:89)
> >
> org.apache.activemq.openwire.OpenWireFormat.marshal(OpenWireFormat.java:243)
> >    - locked org.apache.activemq.openwire.OpenWireFormat@e5b94bf
> >
> org.apache.activemq.transport.tcp.TcpTransport.oneway(TcpTransport.java:183)
> >
> org.apache.activemq.transport.AbstractInactivityMonitor.doOnewaySend(AbstractInactivityMonitor.java:289)
> >
> org.apache.activemq.transport.AbstractInactivityMonitor.oneway(AbstractInactivityMonitor.java:271)
> >
> org.apache.activemq.transport.TransportFilter.oneway(TransportFilter.java:85)
> >
> org.apache.activemq.transport.WireFormatNegotiator.oneway(WireFormatNegotiator.java:104)
> >
> org.apache.activemq.transport.MutexTransport.oneway(MutexTransport.java:68)
> >
> org.apache.activemq.transport.ResponseCorrelator.oneway(ResponseCorrelator.java:60)
> >
> org.apache.activemq.network.DemandForwardingBridgeSupport.serviceLocalCommand(DemandForwardingBridgeSupport.java:726)
> >
> org.apache.activemq.network.DemandForwardingBridgeSupport$1.onCommand(DemandForwardingBridgeSupport.java:167)
> >
> org.apache.activemq.transport.ResponseCorrelator.onCommand(ResponseCorrelator.java:116)
> >
> org.apache.activemq.transport.MutexTransport.onCommand(MutexTransport.java:50)
> >
> org.apache.activemq.transport.vm.VMTransport.doDispatch(VMTransport.java:138)
> >
> org.apache.activemq.transport.vm.VMTransport.dispatch(VMTransport.java:127)
> >    - locked java.util.concurrent.atomic.AtomicBoolean@1e828ef6
> > org.apache.activemq.transport.vm.VMTransport.oneway(VMTransport.java:104)
> >
> org.apache.activemq.transport.MutexTransport.oneway(MutexTransport.java:68)
> >
> org.apache.activemq.transport.ResponseCorrelator.oneway(ResponseCorrelator.java:60)
> >
> org.apache.activemq.broker.TransportConnection.dispatch(TransportConnection.java:1312)
> >
> org.apache.activemq.broker.TransportConnection.processDispatch(TransportConnection.java:838)
> >
> org.apache.activemq.broker.TransportConnection.iterate(TransportConnection.java:873)
> >
> org.apache.activemq.thread.PooledTaskRunner.runTask(PooledTaskRunner.java:129)
> >
> org.apache.activemq.thread.PooledTaskRunner$1.run(PooledTaskRunner.java:47)
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> > java.lang.Thread.run(Thread.java:636)
> >
> > Is this something that should be reported in JIRA? And should we be
> > changing the writeTimeout you speak of? If so, to what?
> >
> > Thanks,
> >
> > James
> >
> > On 31 October 2012 16:57, Gary Tully <ga...@gmail.com> wrote:
> >
> >> from a quick look at one of those stack traces, it looks like a thread
> >> is blocked on a socket write, it may be that the socket is half closed
> >> and that the write needs to timeout at the tcp level. that can take
> >> some time.
> >>
> >> There is a writeTimeout filter that can help at the activemq level.
> >>
> >> Have a peek at https://issues.apache.org/jira/browse/AMQ-1993
> >>
> >>
> >> On 30 October 2012 21:31, James Green <ja...@gmail.com> wrote:
> >> > This is happening repeatedly causing memory consumption to balloon
> >> locally
> >> > and message queues to build remotely.
> >> >
> >> > Googling around isn't revealing much. Can anyone take a look and tell
> me
> >> if
> >> > this stuff is normal or not? Need to narrow down the investigations.
> >> >
> >> > Thanks,
> >> >
> >> > James
> >> >
> >> > On 30 October 2012 11:32, James Green <ja...@gmail.com>
> wrote:
> >> >
> >> >> Hub with three spokes. The network between the hub and one of these
> >> spokes
> >> >> seems to be dying irregularly yet frequently.
> >> >>
> >> >> I managed to get a thread dump of the hub's ssl connector to this
> spoke
> >> >> along with the thread it was waiting on:
> >> >>
> >> >> http://pastebin.com/ZjeSxDK7
> >> >>
> >> >> I also captched a full thread dump on the hub:
> >> >>
> >> >> http://pastebin.com/Qu113JhQ
> >> >>
> >> >> On the spoke the thread dump button was disabled but I captured the
> >> >> equivalent individual threads:
> >> >>
> >> >> http://pastebin.com/1hwb1BDp
> >> >>
> >> >> FWIW we just restarted the spoke and the network connector appears
> to be
> >> >> stuck once again. Not sure what we can actually do now...
> >> >>
> >> >> Can anyone offer advice, perhaps a JIRA issue? This is with 5.7.0.
> >> >>
> >> >> James
> >> >>
> >> >>
> >>
> >>
> >>
> >> --
> >> http://redhat.com
> >> http://blog.garytully.com
> >>
>
>
>
> --
> http://redhat.com
> http://blog.garytully.com
>

Re: Network bridge is stuck, repeatedly requires broker restart

Posted by Gary Tully <ga...@gmail.com>.
actually soWriteTimeout does not help with ssl, because the ssl stream
has an exclusive lock that blocks any call to close.

you will have to deal with the blocked write at the OS level via the
tcp stack timeout configuration.

But the real root cause is understanding why the connection drops in
the first place.

Do you see the thread blocked on a write eventually timing out when
the OS detects the half closed socket? It should do once a remote
close has been initiated.

Or is the problem that the hub/spoke does not really close the
connection when it sees a problem?

It is odd that it is a 24hr thing,  what happens at that time.

On 1 November 2012 11:27, James Green <ja...@gmail.com> wrote:
> Having implemented queue and destination filtering between spokes and the
> hub, the connections lasted just over 24 hours.
>
> It's just died. Same traces on the hub as before. On the spoke with which
> comms have died:
>
> Name: ActiveMQ Transport: ssl://hub/x.x.x.x:61617
> State: WAITING on
> java.util.concurrent.locks.ReentrantLock$NonfairSync@4864ae35 owned by:
> ActiveMQ BrokerService[zorin] Task-3350
> Total blocked: 68  Total waited: 1,862
>
> Stack trace:
> sun.misc.Unsafe.park(Native Method)
> java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:838)
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:871)
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1201)
> java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:214)
> java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:290)
> org.apache.activemq.transport.MutexTransport.oneway(MutexTransport.java:66)
> org.apache.activemq.transport.ResponseCorrelator.oneway(ResponseCorrelator.java:60)
> org.apache.activemq.network.DemandForwardingBridgeSupport.serviceRemoteCommand(DemandForwardingBridgeSupport.java:467)
> org.apache.activemq.network.DemandForwardingBridgeSupport$2.onCommand(DemandForwardingBridgeSupport.java:179)
> org.apache.activemq.transport.ResponseCorrelator.onCommand(ResponseCorrelator.java:116)
> org.apache.activemq.transport.MutexTransport.onCommand(MutexTransport.java:50)
> org.apache.activemq.transport.WireFormatNegotiator.onCommand(WireFormatNegotiator.java:113)
> org.apache.activemq.transport.AbstractInactivityMonitor.onCommand(AbstractInactivityMonitor.java:256)
> org.apache.activemq.transport.TransportSupport.doConsume(TransportSupport.java:83)
> org.apache.activemq.transport.tcp.SslTransport.doConsume(SslTransport.java:91)
> org.apache.activemq.transport.tcp.TcpTransport.doRun(TcpTransport.java:222)
> org.apache.activemq.transport.tcp.TcpTransport.run(TcpTransport.java:204)
> java.lang.Thread.run(Thread.java:636)
>
> Name: ActiveMQ BrokerService[zorin] Task-3350
> State: RUNNABLE
> Total blocked: 0  Total waited: 668
>
> Stack trace:
> java.net.SocketOutputStream.socketWrite0(Native Method)
> java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:109)
> java.net.SocketOutputStream.write(SocketOutputStream.java:153)
> sun.security.ssl.OutputRecord.writeBuffer(OutputRecord.java:314)
> sun.security.ssl.OutputRecord.write(OutputRecord.java:303)
> sun.security.ssl.SSLSocketImpl.writeRecordInternal(SSLSocketImpl.java:763)
> sun.security.ssl.SSLSocketImpl.writeRecord(SSLSocketImpl.java:751)
> sun.security.ssl.AppOutputStream.write(AppOutputStream.java:78)
>    - locked sun.security.ssl.AppOutputStream@3109e59b
> org.apache.activemq.transport.tcp.TcpBufferedOutputStream.write(TcpBufferedOutputStream.java:96)
> java.io.DataOutputStream.write(DataOutputStream.java:107)
>    - locked java.io.DataOutputStream@c13c44
> org.apache.activemq.openwire.v9.BaseDataStreamMarshaller.tightMarshalByteSequence2(BaseDataStreamMarshaller.java:431)
> org.apache.activemq.openwire.v9.MessageMarshaller.tightMarshal2(MessageMarshaller.java:180)
> org.apache.activemq.openwire.v9.ActiveMQMessageMarshaller.tightMarshal2(ActiveMQMessageMarshaller.java:89)
> org.apache.activemq.openwire.v9.ActiveMQTextMessageMarshaller.tightMarshal2(ActiveMQTextMessageMarshaller.java:89)
> org.apache.activemq.openwire.OpenWireFormat.marshal(OpenWireFormat.java:243)
>    - locked org.apache.activemq.openwire.OpenWireFormat@e5b94bf
> org.apache.activemq.transport.tcp.TcpTransport.oneway(TcpTransport.java:183)
> org.apache.activemq.transport.AbstractInactivityMonitor.doOnewaySend(AbstractInactivityMonitor.java:289)
> org.apache.activemq.transport.AbstractInactivityMonitor.oneway(AbstractInactivityMonitor.java:271)
> org.apache.activemq.transport.TransportFilter.oneway(TransportFilter.java:85)
> org.apache.activemq.transport.WireFormatNegotiator.oneway(WireFormatNegotiator.java:104)
> org.apache.activemq.transport.MutexTransport.oneway(MutexTransport.java:68)
> org.apache.activemq.transport.ResponseCorrelator.oneway(ResponseCorrelator.java:60)
> org.apache.activemq.network.DemandForwardingBridgeSupport.serviceLocalCommand(DemandForwardingBridgeSupport.java:726)
> org.apache.activemq.network.DemandForwardingBridgeSupport$1.onCommand(DemandForwardingBridgeSupport.java:167)
> org.apache.activemq.transport.ResponseCorrelator.onCommand(ResponseCorrelator.java:116)
> org.apache.activemq.transport.MutexTransport.onCommand(MutexTransport.java:50)
> org.apache.activemq.transport.vm.VMTransport.doDispatch(VMTransport.java:138)
> org.apache.activemq.transport.vm.VMTransport.dispatch(VMTransport.java:127)
>    - locked java.util.concurrent.atomic.AtomicBoolean@1e828ef6
> org.apache.activemq.transport.vm.VMTransport.oneway(VMTransport.java:104)
> org.apache.activemq.transport.MutexTransport.oneway(MutexTransport.java:68)
> org.apache.activemq.transport.ResponseCorrelator.oneway(ResponseCorrelator.java:60)
> org.apache.activemq.broker.TransportConnection.dispatch(TransportConnection.java:1312)
> org.apache.activemq.broker.TransportConnection.processDispatch(TransportConnection.java:838)
> org.apache.activemq.broker.TransportConnection.iterate(TransportConnection.java:873)
> org.apache.activemq.thread.PooledTaskRunner.runTask(PooledTaskRunner.java:129)
> org.apache.activemq.thread.PooledTaskRunner$1.run(PooledTaskRunner.java:47)
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> java.lang.Thread.run(Thread.java:636)
>
> Is this something that should be reported in JIRA? And should we be
> changing the writeTimeout you speak of? If so, to what?
>
> Thanks,
>
> James
>
> On 31 October 2012 16:57, Gary Tully <ga...@gmail.com> wrote:
>
>> from a quick look at one of those stack traces, it looks like a thread
>> is blocked on a socket write, it may be that the socket is half closed
>> and that the write needs to timeout at the tcp level. that can take
>> some time.
>>
>> There is a writeTimeout filter that can help at the activemq level.
>>
>> Have a peek at https://issues.apache.org/jira/browse/AMQ-1993
>>
>>
>> On 30 October 2012 21:31, James Green <ja...@gmail.com> wrote:
>> > This is happening repeatedly causing memory consumption to balloon
>> locally
>> > and message queues to build remotely.
>> >
>> > Googling around isn't revealing much. Can anyone take a look and tell me
>> if
>> > this stuff is normal or not? Need to narrow down the investigations.
>> >
>> > Thanks,
>> >
>> > James
>> >
>> > On 30 October 2012 11:32, James Green <ja...@gmail.com> wrote:
>> >
>> >> Hub with three spokes. The network between the hub and one of these
>> spokes
>> >> seems to be dying irregularly yet frequently.
>> >>
>> >> I managed to get a thread dump of the hub's ssl connector to this spoke
>> >> along with the thread it was waiting on:
>> >>
>> >> http://pastebin.com/ZjeSxDK7
>> >>
>> >> I also captched a full thread dump on the hub:
>> >>
>> >> http://pastebin.com/Qu113JhQ
>> >>
>> >> On the spoke the thread dump button was disabled but I captured the
>> >> equivalent individual threads:
>> >>
>> >> http://pastebin.com/1hwb1BDp
>> >>
>> >> FWIW we just restarted the spoke and the network connector appears to be
>> >> stuck once again. Not sure what we can actually do now...
>> >>
>> >> Can anyone offer advice, perhaps a JIRA issue? This is with 5.7.0.
>> >>
>> >> James
>> >>
>> >>
>>
>>
>>
>> --
>> http://redhat.com
>> http://blog.garytully.com
>>



-- 
http://redhat.com
http://blog.garytully.com