You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@activemq.apache.org by Charles Anthony <ch...@hpdsoftware.com> on 2006/08/11 14:04:13 UTC

TCP Connection Timeout Problems. Possibly.

Hi All,

We've just had a nasty situation : our ActiveMQ Server standalone plain
vanilla TCP Transport, no persistency, no nuffink) on one of our live
installations suddenly refused to accept any new connections - no clients
could connect. All currently connected clients were fine, and messages were
being processed sent and received fine. Just no-one else could connect.

After 20 minutes, new connections were suddenly allowed.

The following exception was in our log.

2006-Aug-11 12:17:47.726 aqualive [ActiveMQ Transport Server:
tcp://blah:61616]  ERROR org.apache.activemq.broker.TransportConnector -
Could not accept connection: java.net.SocketException: Connection reset by
peer: socket write error
java.net.SocketException: Connection reset by peer: socket write error
 at java.net.SocketOutputStream.socketWrite0(Native Method)
 at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
 at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
 at
org.apache.activemq.transport.tcp.TcpBufferedOutputStream.flush(TcpBufferedO
utputStream.java:108)
 at java.io.DataOutputStream.flush(DataOutputStream.java:101)
 at
org.apache.activemq.transport.tcp.TcpTransport.oneway(TcpTransport.java:125)
 at
org.apache.activemq.transport.InactivityMonitor.oneway(InactivityMonitor.jav
a:141)
 at
org.apache.activemq.transport.WireFormatNegotiator.sendWireFormat(WireFormat
Negotiator.java:128)
 at
org.apache.activemq.transport.WireFormatNegotiator.start(WireFormatNegotiato
r.java:64)
 at
org.apache.activemq.transport.TransportFilter.start(TransportFilter.java:52)
 at
org.apache.activemq.transport.TransportFilter.start(TransportFilter.java:52)
 at
org.apache.activemq.broker.TransportConnection.start(TransportConnection.jav
a:75)
 at
org.apache.activemq.broker.TransportConnector$1.onAccept(TransportConnector.
java:136)
 at
org.apache.activemq.transport.tcp.TcpTransportServer.run(TcpTransportServer.
java:137)
 at java.lang.Thread.run(Thread.java:534)

My interpretation of the above that something (port scanner maybe ? Our
curious IT department ?) is connecting to the listening socket, and the
TransportServer is trying to tell the connecting process what the wireformat
is - and the connection process is just sitting there, not responding,
acknlowedging, or doing anything at all - yet not closing the connection.
Therefore, the transport server is blocked, preventing anyone else
connecting. After 20 mins - which I am guessing is somekind of lowlevel
timeout, seeing as all the default AMQ timeouts seen to be of the order of 1
- 30 secs - a low level TCP exception is thrown, freeing the whole shebang
up.

I notice there is an InactivityMonitor, and looking at the code there is the
following comment
// Disable inactivity monitoring while processing a command.

Could this be the case ? That - until the wireformat has been negotiated -
there is no timeout configured ? Is there anything we can do to reduce this
timeout from 20 mins ? Or have I completed gone down the wrong track ?

This is AMQ 4.0, Win2K, JRE 1.4.2

Cheers,

Charles


___________________________________________________________
HPD Software Ltd. - Helping Business Finance Business
Email terms and conditions: www.hpdsoftware.com/disclaimer 



Re: TCP Connection Timeout Problems. Possibly.

Posted by Hiram Chirino <hi...@hiramchirino.com>.
Wow.  Glad you caught this.  And you did a very through analysis of the problem.
I think that there is a simple fix for this too.  It should be
possible do the transport.start() call in an async thread.  That way
the acceptor thread can't get blocked by a bad connections.

I opened jira issue:
http://issues.apache.org/activemq/browse/AMQ-875

to track the bug.

On 8/11/06, Charles Anthony <ch...@hpdsoftware.com> wrote:
> Hi All,
>
> We've just had a nasty situation : our ActiveMQ Server standalone plain
> vanilla TCP Transport, no persistency, no nuffink) on one of our live
> installations suddenly refused to accept any new connections - no clients
> could connect. All currently connected clients were fine, and messages were
> being processed sent and received fine. Just no-one else could connect.
>
> After 20 minutes, new connections were suddenly allowed.
>
> The following exception was in our log.
>
> 2006-Aug-11 12:17:47.726 aqualive [ActiveMQ Transport Server:
> tcp://blah:61616]  ERROR org.apache.activemq.broker.TransportConnector -
> Could not accept connection: java.net.SocketException: Connection reset by
> peer: socket write error
> java.net.SocketException: Connection reset by peer: socket write error
>  at java.net.SocketOutputStream.socketWrite0(Native Method)
>  at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
>  at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
>  at
> org.apache.activemq.transport.tcp.TcpBufferedOutputStream.flush(TcpBufferedO
> utputStream.java:108)
>  at java.io.DataOutputStream.flush(DataOutputStream.java:101)
>  at
> org.apache.activemq.transport.tcp.TcpTransport.oneway(TcpTransport.java:125)
>  at
> org.apache.activemq.transport.InactivityMonitor.oneway(InactivityMonitor.jav
> a:141)
>  at
> org.apache.activemq.transport.WireFormatNegotiator.sendWireFormat(WireFormat
> Negotiator.java:128)
>  at
> org.apache.activemq.transport.WireFormatNegotiator.start(WireFormatNegotiato
> r.java:64)
>  at
> org.apache.activemq.transport.TransportFilter.start(TransportFilter.java:52)
>  at
> org.apache.activemq.transport.TransportFilter.start(TransportFilter.java:52)
>  at
> org.apache.activemq.broker.TransportConnection.start(TransportConnection.jav
> a:75)
>  at
> org.apache.activemq.broker.TransportConnector$1.onAccept(TransportConnector.
> java:136)
>  at
> org.apache.activemq.transport.tcp.TcpTransportServer.run(TcpTransportServer.
> java:137)
>  at java.lang.Thread.run(Thread.java:534)
>
> My interpretation of the above that something (port scanner maybe ? Our
> curious IT department ?) is connecting to the listening socket, and the
> TransportServer is trying to tell the connecting process what the wireformat
> is - and the connection process is just sitting there, not responding,
> acknlowedging, or doing anything at all - yet not closing the connection.
> Therefore, the transport server is blocked, preventing anyone else
> connecting. After 20 mins - which I am guessing is somekind of lowlevel
> timeout, seeing as all the default AMQ timeouts seen to be of the order of 1
> - 30 secs - a low level TCP exception is thrown, freeing the whole shebang
> up.
>
> I notice there is an InactivityMonitor, and looking at the code there is the
> following comment
> // Disable inactivity monitoring while processing a command.
>
> Could this be the case ? That - until the wireformat has been negotiated -
> there is no timeout configured ? Is there anything we can do to reduce this
> timeout from 20 mins ? Or have I completed gone down the wrong track ?
>
> This is AMQ 4.0, Win2K, JRE 1.4.2
>
> Cheers,
>
> Charles
>
>
> ___________________________________________________________
> HPD Software Ltd. - Helping Business Finance Business
> Email terms and conditions: www.hpdsoftware.com/disclaimer
>
>
>


-- 
Regards,
Hiram

Blog: http://hiramchirino.com