You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@activemq.apache.org by Sébastien LETHIELLEUX <se...@cecurity.com> on 2021/01/06 17:33:59 UTC

Connections timeout on artemis 2.10/2.16

Hello (again),

I'm trying to find the root cause of a significant number of failed
connexions attempts / broken existing connections on an artemis broker.

The issue have been produced on an embedded artemis 2.10.1 and a
standalone 2.16.0 (tomcat9, openjdk11)

Two type of errors occurs : timeouts during handshakes and broken
existing connexions.

such as

2021-01-04 15:28:53,243 ERROR [org.apache.activemq.artemis.core.server]
AMQ224088: Timeout (10 seconds) on acceptor "netty-ssl" during protocol
handshake with /xxx.xxx.xxx.xxx:41760 has occurred.

2021-01-06 16:56:28,016 WARN {Thread-16
(ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6@f493a59)}
[org.apache.activemq.artemis.core.client] : AMQ212037: Connection
failure to /xxx.xxx.xxx.xxx:49918 has been detected: AMQ229014: Did not
receive data from /xxx.xxx.xxx.xxx:49918 within the 30,000ms connection
TTL. The connection will now be closed. [code=CONNECTION_TIMEDOUT]

Both brokers were deployed on RHEL7 with artemis-native and libaio (32
logical cores, plenty of RAM). Clients use JMS with openwire
(activemq-client).

The investigations on network infrastructures came up empty handed, so
I'm trying to explore the possibility that something went wrong in
artemis underpants.

Is there a possibility that the thread pool configured with
remotingThreads is too small (default values) ? The observation of the
thread stack in JMX seems to expose plenty of threads happily idle.

The clients are known to open and close a lot of connections (we know
it's wrong, and now they know it too, but it still should work). The
number of open connections is usually around 90-100 which hardly seems
like an unbearable burden.

Any ideas or suggestions on what to check/monitor/etc ?

Regards,

Re: Connections timeout on artemis 2.10/2.16

Posted by Domenico Francesco Bruscino <br...@gmail.com>.

Hi Sébastien,

I have seen similar logs in deployments where the tcp port of the acceptor
is used to execute health checks (just opening the connection).

Regards,
Domenico

Il giorno mer 6 gen 2021 alle ore 18:34 Sébastien LETHIELLEUX <
sebastien.lethielleux@cecurity.com> ha scritto:

> Hello (again),
>
> I'm trying to find the root cause of a significant number of failed
> connexions attempts / broken existing connections on an artemis broker.
>
> The issue have been produced on an embedded artemis 2.10.1 and a
> standalone 2.16.0 (tomcat9, openjdk11)
>
> Two type of errors occurs : timeouts during handshakes and broken
> existing connexions.
>
> such as
>
> 2021-01-04 15:28:53,243 ERROR [org.apache.activemq.artemis.core.server]
> AMQ224088: Timeout (10 seconds) on acceptor "netty-ssl" during protocol
> handshake with /xxx.xxx.xxx.xxx:41760 has occurred.
>
> 2021-01-06 16:56:28,016 WARN  {Thread-16
>
> (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6@f493a59
> )}
> [org.apache.activemq.artemis.core.client] : AMQ212037: Connection
> failure to /xxx.xxx.xxx.xxx:49918 has been detected: AMQ229014: Did not
> receive data from /xxx.xxx.xxx.xxx:49918 within the 30,000ms connection
> TTL. The connection will now be closed. [code=CONNECTION_TIMEDOUT]
>
> Both brokers were deployed on RHEL7 with artemis-native and libaio (32
> logical cores, plenty of RAM). Clients use JMS with openwire
> (activemq-client).
>
> The investigations on network infrastructures came up empty handed, so
> I'm trying to explore the possibility that something went wrong in
> artemis underpants.
>
> Is there a possibility that the thread pool configured with
> remotingThreads is too small (default values) ? The observation of the
> thread stack in JMX seems to expose plenty of threads happily idle.
>
> The clients are known to open and close a lot of connections (we know
> it's wrong, and now they know it too, but it still should work). The
> number of open connections is usually around 90-100 which hardly seems
> like an unbearable burden.
>
> Any ideas or suggestions on what to check/monitor/etc ?
>
> Regards,
>
> SL
>
>

Re: Connections timeout on artemis 2.10/2.16

Posted by Justin Bertram <jb...@apache.org>.

You may be hitting ARTEMIS-3117 [1]. I recommend you upgrade to the latest
release (i.e. 2.20.0) and see if that resolves the issue.


Justin

[1] https://issues.apache.org/jira/browse/ARTEMIS-3117

On Thu, Jan 7, 2021 at 4:08 AM <sl...@cecurity.com> wrote:

> Hello,
>
> No, The failed connections are from the external clients (I do not have
> the client environments, nor its code). On the embedded broker, the
> server-side use vm connectors which to not seems to have such issues
> (and do not use netty-ssl).
>
> We made a deployment with a standalone artemis (2.16) to act as a sort
> of proxy broker for the embedded one. We have connections failures from
> clients on it too. The bridges used to forward locally seems fine (but
> its a different context, the clients use JMS on openwire)
>
> No i did not do a sampling with visualvm. It happens mostly on a
> production environnement and trying to produce reliably the exact
> problem on test have been a mixed bag.
>
> I did capture more stacktrace last night at a point where the issue was
> occuring more frequently and it seems the netty-threads were much less
> free than during previous observations
>
> Name: Thread-50 (activemq-netty-threads)
> State: BLOCKED on
> org.apache.activemq.artemis.core.remoting.impl.netty.NettyAcceptor@662692e8
> owned by: Thread-95 (activemq-netty-threads)
> Total blocked: 145Â 739  Total waited: 4Â 186
>
> Stack trace:
>
> org.apache.activemq.artemis.core.remoting.impl.netty.NettyAcceptor.getSslHandler(NettyAcceptor.java:492)
>
> org.apache.activemq.artemis.core.remoting.impl.netty.NettyAcceptor$4.initChannel(NettyAcceptor.java:403)
>
> io.netty.channel.ChannelInitializer.initChannel(ChannelInitializer.java:129)
>
> io.netty.channel.ChannelInitializer.handlerAdded(ChannelInitializer.java:112)
>
> io.netty.channel.AbstractChannelHandlerContext.callHandlerAdded(AbstractChannelHandlerContext.java:953)
>
> io.netty.channel.DefaultChannelPipeline.callHandlerAdded0(DefaultChannelPipeline.java:610)
>
> io.netty.channel.DefaultChannelPipeline.access$100(DefaultChannelPipeline.java:46)
>
> io.netty.channel.DefaultChannelPipeline$PendingHandlerAddedTask.execute(DefaultChannelPipeline.java:1461)
>
> io.netty.channel.DefaultChannelPipeline.callHandlerAddedForAllHandlers(DefaultChannelPipeline.java:1126)
>
> io.netty.channel.DefaultChannelPipeline.invokeHandlerAddedIfNeeded(DefaultChannelPipeline.java:651)
>
> io.netty.channel.AbstractChannel$AbstractUnsafe.register0(AbstractChannel.java:515)
>
> io.netty.channel.AbstractChannel$AbstractUnsafe.access$200(AbstractChannel.java:428)
>
> io.netty.channel.AbstractChannel$AbstractUnsafe$1.run(AbstractChannel.java:487)
>
> io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
>
> io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:404)
> io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:333)
>
> io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:905)
>
> org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118)
>
> I found a bit odd than a lot of netty threads where stuck at this point,
> but i'm not familiar with netty internals
>
>
> Le 07/01/2021 à 03:12, Tim Bain a écrit :
> > For the embedded 2.10.1 broker case, are you saying that connections
> failed
> > when made from other threads in the process in which the broker was
> > embedded? If so, that would seem to rule out the network, since traffic
> > would never leave the host.
> >
> > You mentioned capturing a stack trace, but have you done CPU sampling via
> > VisualVM or a similar tool? CPU sampling isn't a perfectly accurate
> > technique, but often it gives enough information to narrow in on the
> cause
> > of a problem (or to rule out certain possibilities).
> >
> > Tim
> >
> > On Wed, Jan 6, 2021, 10:34 AM Sébastien LETHIELLEUX <
> > sebastien.lethielleux@cecurity.com> wrote:
> >
> >> Hello (again),
> >>
> >> I'm trying to find the root cause of a significant number of failed
> >> connexions attempts / broken existing connections on an artemis broker.
> >>
> >> The issue have been produced on an embedded artemis 2.10.1 and a
> >> standalone 2.16.0 (tomcat9, openjdk11)
> >>
> >> Two type of errors occurs : timeouts during handshakes and broken
> >> existing connexions.
> >>
> >> such as
> >>
> >> 2021-01-04 15:28:53,243 ERROR [org.apache.activemq.artemis.core.server]
> >> AMQ224088: Timeout (10 seconds) on acceptor "netty-ssl" during protocol
> >> handshake with /xxx.xxx.xxx.xxx:41760 has occurred.
> >>
> >> 2021-01-06 16:56:28,016 WARN  {Thread-16
> >>
> >>
> (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6@f493a59
> >> )}
> >> [org.apache.activemq.artemis.core.client] : AMQ212037: Connection
> >> failure to /xxx.xxx.xxx.xxx:49918 has been detected: AMQ229014: Did not
> >> receive data from /xxx.xxx.xxx.xxx:49918 within the 30,000ms connection
> >> TTL. The connection will now be closed. [code=CONNECTION_TIMEDOUT]
> >>
> >> Both brokers were deployed on RHEL7 with artemis-native and libaio (32
> >> logical cores, plenty of RAM). Clients use JMS with openwire
> >> (activemq-client).
> >>
> >> The investigations on network infrastructures came up empty handed, so
> >> I'm trying to explore the possibility that something went wrong in
> >> artemis underpants.
> >>
> >> Is there a possibility that the thread pool configured with
> >> remotingThreads is too small (default values) ? The observation of the
> >> thread stack in JMX seems to expose plenty of threads happily idle.
> >>
> >> The clients are known to open and close a lot of connections (we know
> >> it's wrong, and now they know it too, but it still should work). The
> >> number of open connections is usually around 90-100 which hardly seems
> >> like an unbearable burden.
> >>
> >> Any ideas or suggestions on what to check/monitor/etc ?
> >>
> >> Regards,
> >>
> >> SL
> >>
> >>
>
>

Re: Connections timeout on artemis 2.10/2.16

Posted by sl...@cecurity.com.

Hello,

No, The failed connections are from the external clients (I do not have
the client environments, nor its code). On the embedded broker, the
server-side use vm connectors which to not seems to have such issues
(and do not use netty-ssl).

We made a deployment with a standalone artemis (2.16) to act as a sort
of proxy broker for the embedded one. We have connections failures from
clients on it too. The bridges used to forward locally seems fine (but
its a different context, the clients use JMS on openwire)

No i did not do a sampling with visualvm. It happens mostly on a
production environnement and trying to produce reliably the exact
problem on test have been a mixed bag.

I did capture more stacktrace last night at a point where the issue was
occuring more frequently and it seems the netty-threads were much less
free than during previous observations

Name: Thread-50 (activemq-netty-threads)
State: BLOCKED on
org.apache.activemq.artemis.core.remoting.impl.netty.NettyAcceptor@662692e8
owned by: Thread-95 (activemq-netty-threads)
Total blocked: 145Â 739  Total waited: 4Â 186

Stack trace:
org.apache.activemq.artemis.core.remoting.impl.netty.NettyAcceptor.getSslHandler(NettyAcceptor.java:492)
org.apache.activemq.artemis.core.remoting.impl.netty.NettyAcceptor$4.initChannel(NettyAcceptor.java:403)
io.netty.channel.ChannelInitializer.initChannel(ChannelInitializer.java:129)
io.netty.channel.ChannelInitializer.handlerAdded(ChannelInitializer.java:112)
io.netty.channel.AbstractChannelHandlerContext.callHandlerAdded(AbstractChannelHandlerContext.java:953)
io.netty.channel.DefaultChannelPipeline.callHandlerAdded0(DefaultChannelPipeline.java:610)
io.netty.channel.DefaultChannelPipeline.access$100(DefaultChannelPipeline.java:46)
io.netty.channel.DefaultChannelPipeline$PendingHandlerAddedTask.execute(DefaultChannelPipeline.java:1461)
io.netty.channel.DefaultChannelPipeline.callHandlerAddedForAllHandlers(DefaultChannelPipeline.java:1126)
io.netty.channel.DefaultChannelPipeline.invokeHandlerAddedIfNeeded(DefaultChannelPipeline.java:651)
io.netty.channel.AbstractChannel$AbstractUnsafe.register0(AbstractChannel.java:515)
io.netty.channel.AbstractChannel$AbstractUnsafe.access$200(AbstractChannel.java:428)
io.netty.channel.AbstractChannel$AbstractUnsafe$1.run(AbstractChannel.java:487)
io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:404)
io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:333)
io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:905)
org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118)

I found a bit odd than a lot of netty threads where stuck at this point,
but i'm not familiar with netty internals


Le 07/01/2021 à 03:12, Tim Bain a écrit :
> For the embedded 2.10.1 broker case, are you saying that connections failed
> when made from other threads in the process in which the broker was
> embedded? If so, that would seem to rule out the network, since traffic
> would never leave the host.
>
> You mentioned capturing a stack trace, but have you done CPU sampling via
> VisualVM or a similar tool? CPU sampling isn't a perfectly accurate
> technique, but often it gives enough information to narrow in on the cause
> of a problem (or to rule out certain possibilities).
>
> Tim
>
> On Wed, Jan 6, 2021, 10:34 AM Sébastien LETHIELLEUX <
> sebastien.lethielleux@cecurity.com> wrote:
>
>> Hello (again),
>>
>> I'm trying to find the root cause of a significant number of failed
>> connexions attempts / broken existing connections on an artemis broker.
>>
>> The issue have been produced on an embedded artemis 2.10.1 and a
>> standalone 2.16.0 (tomcat9, openjdk11)
>>
>> Two type of errors occurs : timeouts during handshakes and broken
>> existing connexions.
>>
>> such as
>>
>> 2021-01-04 15:28:53,243 ERROR [org.apache.activemq.artemis.core.server]
>> AMQ224088: Timeout (10 seconds) on acceptor "netty-ssl" during protocol
>> handshake with /xxx.xxx.xxx.xxx:41760 has occurred.
>>
>> 2021-01-06 16:56:28,016 WARN  {Thread-16
>>
>> (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6@f493a59
>> )}
>> [org.apache.activemq.artemis.core.client] : AMQ212037: Connection
>> failure to /xxx.xxx.xxx.xxx:49918 has been detected: AMQ229014: Did not
>> receive data from /xxx.xxx.xxx.xxx:49918 within the 30,000ms connection
>> TTL. The connection will now be closed. [code=CONNECTION_TIMEDOUT]
>>
>> Both brokers were deployed on RHEL7 with artemis-native and libaio (32
>> logical cores, plenty of RAM). Clients use JMS with openwire
>> (activemq-client).
>>
>> The investigations on network infrastructures came up empty handed, so
>> I'm trying to explore the possibility that something went wrong in
>> artemis underpants.
>>
>> Is there a possibility that the thread pool configured with
>> remotingThreads is too small (default values) ? The observation of the
>> thread stack in JMX seems to expose plenty of threads happily idle.
>>
>> The clients are known to open and close a lot of connections (we know
>> it's wrong, and now they know it too, but it still should work). The
>> number of open connections is usually around 90-100 which hardly seems
>> like an unbearable burden.
>>
>> Any ideas or suggestions on what to check/monitor/etc ?
>>
>> Regards,
>>
>> SL
>>
>>

Re: Connections timeout on artemis 2.10/2.16

Posted by Tim Bain <tb...@alumni.duke.edu>.

For the embedded 2.10.1 broker case, are you saying that connections failed
when made from other threads in the process in which the broker was
embedded? If so, that would seem to rule out the network, since traffic
would never leave the host.

You mentioned capturing a stack trace, but have you done CPU sampling via
VisualVM or a similar tool? CPU sampling isn't a perfectly accurate
technique, but often it gives enough information to narrow in on the cause
of a problem (or to rule out certain possibilities).

Tim

On Wed, Jan 6, 2021, 10:34 AM Sébastien LETHIELLEUX <
sebastien.lethielleux@cecurity.com> wrote:

> Hello (again),
>
> I'm trying to find the root cause of a significant number of failed
> connexions attempts / broken existing connections on an artemis broker.
>
> The issue have been produced on an embedded artemis 2.10.1 and a
> standalone 2.16.0 (tomcat9, openjdk11)
>
> Two type of errors occurs : timeouts during handshakes and broken
> existing connexions.
>
> such as
>
> 2021-01-04 15:28:53,243 ERROR [org.apache.activemq.artemis.core.server]
> AMQ224088: Timeout (10 seconds) on acceptor "netty-ssl" during protocol
> handshake with /xxx.xxx.xxx.xxx:41760 has occurred.
>
> 2021-01-06 16:56:28,016 WARN  {Thread-16
>
> (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6@f493a59
> )}
> [org.apache.activemq.artemis.core.client] : AMQ212037: Connection
> failure to /xxx.xxx.xxx.xxx:49918 has been detected: AMQ229014: Did not
> receive data from /xxx.xxx.xxx.xxx:49918 within the 30,000ms connection
> TTL. The connection will now be closed. [code=CONNECTION_TIMEDOUT]
>
> Both brokers were deployed on RHEL7 with artemis-native and libaio (32
> logical cores, plenty of RAM). Clients use JMS with openwire
> (activemq-client).
>
> The investigations on network infrastructures came up empty handed, so
> I'm trying to explore the possibility that something went wrong in
> artemis underpants.
>
> Is there a possibility that the thread pool configured with
> remotingThreads is too small (default values) ? The observation of the
> thread stack in JMX seems to expose plenty of threads happily idle.
>
> The clients are known to open and close a lot of connections (we know
> it's wrong, and now they know it too, but it still should work). The
> number of open connections is usually around 90-100 which hardly seems
> like an unbearable burden.
>
> Any ideas or suggestions on what to check/monitor/etc ?
>
> Regards,
>
> SL
>
>