You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@activemq.apache.org by Jerry Cwiklik <cw...@us.ibm.com> on 2013/08/19 21:51:00 UTC

Broker leaks FDs - Too many open files

Our production broker (v.5.6.0) keeps dying while in heavy use. The broker
log is filled with:

2013-07-28 00:04:08,264 [teTimeout=45000] ERROR TransportConnector            
- Could not accept connection : java.net.SocketException: Too many open
files

2013-07-28 00:04:08,264 [teTimeout=45000] ERROR TransportConnector            
- Could not accept connection : java.net.SocketException: Too many open
files

2013-07-28 00:04:08,264 [teTimeout=45000] ERROR TransportConnector            
- Could not accept connection : java.net.SocketException: Too many open
files
 
This is logged at such a rapid rate that the logs roll and hide the initial
error/warning. We capture open fd of the broker's process and notice that
when the broker starts to croak the open fd count just explodes. Here is
part of the open fd log. The first column shows broker's open fds and each
line is logged every 60 secs.

   1284   12569  160194   -- normal count
   1294   12669  161438
   1305   12779  162812
   1318   12909  164426
   1328   13009  165658   --------- FD explosion
   1393   13659  173816
   1528   15009  190748
   1611   15839  201152
   1701   16739  212419
   1951   19239  243520
   2310   22830  290374
   2667   26399  332362
   3013   29859  375262
   3369   33422  422638
   3729   37019  464017
   4111   40841  515342
   4484   44570  561933
   4870   48432  609992
   5249   52219  652157
   5634   56071  705356
   6019   59919  747457
   6484   64571  811476
   6892   68652  862375
   7307   72802  914122
   7727   77002  966555
   8129   81022 1016717
   8336   83090 1042601
   8336   83090 1042584
   8336   83090 1042583
 
It normally shows ~1300 fds and this is more or less constant overtime, but
eventually it rapidly increases to 8336 and the broker becomes unusable. The
ulimit is set to 4094. The netstat shows a ton of sockets in CLOSE_WAIT
suggesting that the broker is not closing its side of a socket.

I found related open issue
https://issues.apache.org/jira/browse/AMQ-4531?page=com.atlassian.jirafisheyeplugin:fisheye-issuepanel

This Jira states that the problem surfaces in 5.8.0 and when
maximumConnections is set. We dont use this setting and we run with an older
version of AMQ. Any ideas how to deal with this? Would closeAsync=false have
any effect? 

JC




--
View this message in context: http://activemq.2283324.n4.nabble.com/Broker-leaks-FDs-Too-many-open-files-tp4670496.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.

Re: Broker leaks FDs - Too many open files

Posted by DV <vi...@gmail.com>.
We've had similar issue in the past, which went away with the following
changes:

* adding closeAsync=false to transportConnectors
* using nio instead of tcp in transportConnectors
* setting ulimits to unlimited for activemq user
* fine-tuning kahaDB settings (enableIndexWriteAsync="true"
enableJournalDiskSyncs="false" journalMaxFileLength="256mb")
* fine-tuning topic and queue settings (destinationPolicy)
* enabling producer flow control

However, all this fine-tuning can only do so much, so ultimately we had to:

* reduce broker usage by splitting datasets onto multiple brokers
* optimize consumers to reduce the length of time a message spends on the
broker

The less messages broker has to hold on to, the less likely you'll run into
some sort of a limit.


On Tue, Aug 20, 2013 at 9:04 AM, Jerry Cwiklik <cw...@us.ibm.com> wrote:

> Thanks, Paul. We are running on Linux (SLES). All clients use openwire. The
> broker is configured
> with producerFlowControl="false", optimizedDispatch="true" for both queues
> and topics.
>
> The openwire connector configured with transport.soWriteTimeout=45000. We
> dont use persistence for messaging. The broker's jvm is given 8Gig.
>
> JC
>
>
>
> --
> View this message in context:
> http://activemq.2283324.n4.nabble.com/Broker-leaks-FDs-Too-many-open-files-tp4670496p4670525.html
> Sent from the ActiveMQ - User mailing list archive at Nabble.com.
>



-- 
Best regards, Dmitriy V.

Re: Broker leaks FDs - Too many open files

Posted by Pietro Romanazzi <p....@innovapuglia.it>.
etc/security/limits.conf

set nofile

regards,


-----Messaggio originale----- 
From: Jerry Cwiklik
Sent: Tuesday, August 20, 2013 6:04 PM
To: users@activemq.apache.org
Subject: Re: Broker leaks FDs - Too many open files

Thanks, Paul. We are running on Linux (SLES). All clients use openwire. The
broker is configured
with producerFlowControl="false", optimizedDispatch="true" for both queues
and topics.

The openwire connector configured with transport.soWriteTimeout=45000. We
dont use persistence for messaging. The broker's jvm is given 8Gig.

JC



--
View this message in context: 
http://activemq.2283324.n4.nabble.com/Broker-leaks-FDs-Too-many-open-files-tp4670496p4670525.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com. 


Re: Broker leaks FDs - Too many open files

Posted by Jerry Cwiklik <cw...@us.ibm.com>.
Thanks, Paul. We are running on Linux (SLES). All clients use openwire. The
broker is configured 
with producerFlowControl="false", optimizedDispatch="true" for both queues
and topics.

The openwire connector configured with transport.soWriteTimeout=45000. We
dont use persistence for messaging. The broker's jvm is given 8Gig. 

JC



--
View this message in context: http://activemq.2283324.n4.nabble.com/Broker-leaks-FDs-Too-many-open-files-tp4670496p4670525.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.

Re: Broker leaks FDs - Too many open files

Posted by Paul Gale <pa...@gmail.com>.
On Mon, Aug 19, 2013 at 4:57 PM, Jerry Cwiklik <cw...@us.ibm.com> wrote:

> What are the consequences of using closeAsync="false"?



Setting async to false means that the socket close call is blocking and is
not handled in a separate thread.

This is preferable and common in web applications where STOMP clients, say,
will open a connection in the context of a web request, send a message,
then close the connection. Depending on the amount of web requests being
handled this translates into a lot of connect/disconnect traffic for the
broker.

It can take up to 30 seconds (on Linux systems, unless otherwise
configured) for a socket that's been 'closed' for its descriptor to be made
available for a new socket. If the close were made asynchronous then, as
you've seen, socket open requests are being made at a faster rate than
closed sockets can be recycled. Making the close operation synchronous
forces the client to block until it completes, thus controlling the number
of open requests keeping the descriptor count manageable.

What OS are you running your broker on? Please give more detail about your
clients, e.g., are they STOMP based etc? If they're STOMP based you might
want to consider configuring the STOMP transport connector to managed over
NIO for greater efficiency.

Thanks,
Paul

Re: Broker leaks FDs - Too many open files

Posted by Jerry Cwiklik <cw...@us.ibm.com>.
Christian, thanks. More questions:

What would be your theory why we see the "explosion" of open FDs? This is
triggered by some event. Any clues as to what that might be? 

Also, isnt it a bug that the broker just goes bezerk logging the same thing
over and over? Our broker's logs are filled with the error below. I suspect
that perhaps Camel or Spring which are used on the client side are in some
recovery mode trying to establish a connection to the broker which keeps
failing. 

2013-07-28 00:04:08,264 [teTimeout=45000] ERROR TransportConnector            
- Could not accept connection : java.net.SocketException: Too many open
files
 ..

What are the consequences of using closeAsync="false"? 

JC



--
View this message in context: http://activemq.2283324.n4.nabble.com/Broker-leaks-FDs-Too-many-open-files-tp4670496p4670498.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.

Re: Broker leaks FDs - Too many open files

Posted by Christian Posta <ch...@gmail.com>.
Yah, give that a try (as seen here
https://issues.apache.org/jira/browse/AMQ-1739).
Could also have a look at
http://activemq.apache.org/maven/apidocs/org/apache/activemq/transport/WriteTimeoutFilter.htmlper
this jira
https://issues.apache.org/jira/browse/AMQ-1993


On Mon, Aug 19, 2013 at 12:51 PM, Jerry Cwiklik <cw...@us.ibm.com> wrote:

> Our production broker (v.5.6.0) keeps dying while in heavy use. The broker
> log is filled with:
>
> 2013-07-28 00:04:08,264 [teTimeout=45000] ERROR TransportConnector
> - Could not accept connection : java.net.SocketException: Too many open
> files
>
> 2013-07-28 00:04:08,264 [teTimeout=45000] ERROR TransportConnector
> - Could not accept connection : java.net.SocketException: Too many open
> files
>
> 2013-07-28 00:04:08,264 [teTimeout=45000] ERROR TransportConnector
> - Could not accept connection : java.net.SocketException: Too many open
> files
>
> This is logged at such a rapid rate that the logs roll and hide the initial
> error/warning. We capture open fd of the broker's process and notice that
> when the broker starts to croak the open fd count just explodes. Here is
> part of the open fd log. The first column shows broker's open fds and each
> line is logged every 60 secs.
>
>    1284   12569  160194   -- normal count
>    1294   12669  161438
>    1305   12779  162812
>    1318   12909  164426
>    1328   13009  165658   --------- FD explosion
>    1393   13659  173816
>    1528   15009  190748
>    1611   15839  201152
>    1701   16739  212419
>    1951   19239  243520
>    2310   22830  290374
>    2667   26399  332362
>    3013   29859  375262
>    3369   33422  422638
>    3729   37019  464017
>    4111   40841  515342
>    4484   44570  561933
>    4870   48432  609992
>    5249   52219  652157
>    5634   56071  705356
>    6019   59919  747457
>    6484   64571  811476
>    6892   68652  862375
>    7307   72802  914122
>    7727   77002  966555
>    8129   81022 1016717
>    8336   83090 1042601
>    8336   83090 1042584
>    8336   83090 1042583
>
> It normally shows ~1300 fds and this is more or less constant overtime, but
> eventually it rapidly increases to 8336 and the broker becomes unusable.
> The
> ulimit is set to 4094. The netstat shows a ton of sockets in CLOSE_WAIT
> suggesting that the broker is not closing its side of a socket.
>
> I found related open issue
>
> https://issues.apache.org/jira/browse/AMQ-4531?page=com.atlassian.jirafisheyeplugin:fisheye-issuepanel
>
> This Jira states that the problem surfaces in 5.8.0 and when
> maximumConnections is set. We dont use this setting and we run with an
> older
> version of AMQ. Any ideas how to deal with this? Would closeAsync=false
> have
> any effect?
>
> JC
>
>
>
>
> --
> View this message in context:
> http://activemq.2283324.n4.nabble.com/Broker-leaks-FDs-Too-many-open-files-tp4670496.html
> Sent from the ActiveMQ - User mailing list archive at Nabble.com.
>



-- 
*Christian Posta*
http://www.christianposta.com/blog
twitter: @christianposta