You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@activemq.apache.org by "Oliver Deckert (JIRA)" <ji...@apache.org> on 2014/09/25 09:51:34 UTC

[jira] [Updated] (AMQ-5368) SSL handshake stalls broker with NIO

     [ https://issues.apache.org/jira/browse/AMQ-5368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Oliver Deckert updated AMQ-5368:
--------------------------------
      Component/s:     (was: JMS client)
                   Transport
      Description: 
using NIOSSL transport, SSL handshakes for ~5000 connections easily stalls a broker taking 100% CPU

I'm using version ActiveMQ 5.8, but it occurs on 5.9, 5.10 versions as well

doing some profiling, it showed up that the SSL handshake on broker side eats up ~90% of overall CPU time
by checking just the handshake status in very high frequency

top 3 methods sorted by own processor time:
com.sun.net.ssl.internal.ssl.SSLEngineImpl.getHandshakeStatus()
org.apache.activemq.transport.nio.NIOSSLTransport.doHandshake()
com.sun.net.ssl.internal.ssl.SSLEngineImpl.getHSStatus(javax.net.ssl.SSLEngineResult$HandshakeStatus)

the reason is the asynchronous nature of the SSL handshake with NIO, especially the execution of delegated tasks:
- NIOSSLTransport.doHandshake() executes delegated tasks using a TaskRunnerFactory asynchronously
- in the meantime it loops calling SSLEngine.getHandshakeStatus()

to improve the situation I did the following changes:
- run delegated tasks synchronously in method doHandshake (handshake status NEED_TASK) instead of asynchronously
- added some small wait cycles in method secureRead as there is not always data available with NIO (to further reduce the number of calls to SSLEngine.getHandshakeStatus)

after these changes the SSL handshake for several thousand connections in parallel was not a problem anymore


  was:
Start with a client application running qpid v0.24 connecting to activemq v5.8.0 server over amqps. Configure the activemq server to use client SSL certificates for authentication.

{code}
                <!-- aqmp with SSL client certs -->
         <transportConnector name="amqps" uri="amqp+ssl://0.0.0.0:5672?maximumConnections=1000&amp;wireformat.maxFrameSize=104857600&amp;transport.transformer=jms&amp;needClientAuth=true"/>
{code}

This works and messages successfully flow from server to client. Qpid however has a fatal bug where it cannot recover from broken connections, and so attempt to switch to the activemq amqp client to work around this problem.

On the client, we initialise activemq-amqp with the following parameters:

{code}
<!-- activemq -->
  <Parameter name="java.naming.factory.initial" value="org.apache.activemq.jndi.ActiveMQInitialContextFactory" />
  <Parameter name="connectionFactoryNames" value="ConnectionFactory, QueueConnectionFactory, TopicConnectionFactory" />

  <!-- Server to connect to -->
<!-- activemq -->
  <Parameter name="java.naming.provider.url" value="amqp+nio+ssl://amqp.${env:SERVER_ENV}.example.com:5672" />
{code}

With activemq-amqp in place instead of qpid, the client starts up, but no messages are processed. Instead, it is found that the aqmp+nio+ssl provider is spinning the CPU at 100% part of the way through the SSL handshake process.

A thread dump of the spinning thread is as follows:

{code}
Full thread dump Java HotSpot(TM) 64-Bit Server VM (23.25-b01 mixed mode):

"localhost-startStop-1" daemon prio=10 tid=0x000000000179b800 nid=0x638e runnable [0x00007fd1fd84a000]
   java.lang.Thread.State: RUNNABLE
        at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
        at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
        at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:225)
        at sun.nio.ch.IOUtil.read(IOUtil.java:198)
        at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:375)
        - locked <0x00000000c4da50e8> (a java.lang.Object)
        at org.apache.activemq.transport.nio.NIOSSLTransport.secureRead(NIOSSLTransport.java:285)
        at org.apache.activemq.transport.nio.NIOSSLTransport.doHandshake(NIOSSLTransport.java:333)
        at org.apache.activemq.transport.nio.NIOSSLTransport.initializeStreams(NIOSSLTransport.java:128)
        at org.apache.activemq.transport.amqp.AmqpNioSslTransport.initializeStreams(AmqpNioSslTransport.java:43)
        at org.apache.activemq.transport.tcp.TcpTransport.connect(TcpTransport.java:514)
        at org.apache.activemq.transport.nio.NIOTransport.doStart(NIOTransport.java:156)
        at org.apache.activemq.transport.nio.NIOSSLTransport.doStart(NIOSSLTransport.java:356)
        at org.apache.activemq.util.ServiceSupport.start(ServiceSupport.java:55)
        at org.apache.activemq.transport.TransportFilter.start(TransportFilter.java:58)
        at org.apache.activemq.transport.TransportFilter.start(TransportFilter.java:58)
        at org.apache.activemq.transport.TransportFilter.start(TransportFilter.java:58)
        at org.apache.activemq.ActiveMQConnectionFactory.createActiveMQConnection(ActiveMQConnectionFactory.java:273)
        at org.apache.activemq.ActiveMQConnectionFactory.createActiveMQConnection(ActiveMQConnectionFactory.java:238)
        at org.apache.activemq.ActiveMQConnectionFactory.createConnection(ActiveMQConnectionFactory.java:184)
...
{code}

If an attempt is made to restart the activemq server, despite the spinning thread on the client the server side disconnection is detected by the client and the following exception is logged and the connection is successfully aborted:

{code}
Caused by: java.io.IOException: javax.net.ssl.SSLException: Received close_notify during handshake
        at org.apache.activemq.transport.nio.NIOSSLTransport.initializeStreams(NIOSSLTransport.java:130)
        at org.apache.activemq.transport.amqp.AmqpNioSslTransport.initializeStreams(AmqpNioSslTransport.java:43)
        at org.apache.activemq.transport.tcp.TcpTransport.connect(TcpTransport.java:514)
        at org.apache.activemq.transport.nio.NIOTransport.doStart(NIOTransport.java:156)
        at org.apache.activemq.transport.nio.NIOSSLTransport.doStart(NIOSSLTransport.java:356)
        at org.apache.activemq.util.ServiceSupport.start(ServiceSupport.java:55)
        at org.apache.activemq.transport.TransportFilter.start(TransportFilter.java:58)
        at org.apache.activemq.transport.TransportFilter.start(TransportFilter.java:58)
        at org.apache.activemq.transport.TransportFilter.start(TransportFilter.java:58)
        at org.apache.activemq.ActiveMQConnectionFactory.createActiveMQConnection(ActiveMQConnectionFactory.java:273)
        ... 33 more
Caused by: javax.net.ssl.SSLException: Received close_notify during handshake
        at sun.security.ssl.Alerts.getSSLException(Alerts.java:208)
        at sun.security.ssl.SSLEngineImpl.fatal(SSLEngineImpl.java:1619)
        at sun.security.ssl.SSLEngineImpl.fatal(SSLEngineImpl.java:1587)
        at sun.security.ssl.SSLEngineImpl.recvAlert(SSLEngineImpl.java:1732)
        at sun.security.ssl.SSLEngineImpl.readRecord(SSLEngineImpl.java:1060)
        at sun.security.ssl.SSLEngineImpl.readNetRecord(SSLEngineImpl.java:884)
        at sun.security.ssl.SSLEngineImpl.unwrap(SSLEngineImpl.java:758)
        at javax.net.ssl.SSLEngine.unwrap(SSLEngine.java:624)
        at org.apache.activemq.transport.nio.NIOSSLTransport.secureRead(NIOSSLTransport.java:304)
        at org.apache.activemq.transport.nio.NIOSSLTransport.doHandshake(NIOSSLTransport.java:333)
        at org.apache.activemq.transport.nio.NIOSSLTransport.initializeStreams(NIOSSLTransport.java:128)
        ... 42 more
{code}

What seems to be happening is that amqp+nio+ssl gets part of the way through the handshake fails and goes into a spin, continuing the spin until the TCP connection is killed from the other side, at which point the client aborts and continues as if nothing had happened.


       Patch Info: Patch Available
      Environment: 
java version "1.7.0_65"


  was:
java version "1.7.0_25"
Java(TM) SE Runtime Environment (build 1.7.0_25-b15)
Java HotSpot(TM) 64-Bit Server VM (build 23.25-b01, mixed mode)


    Fix Version/s:     (was: 5.9.0)
           Labels: nio ssl  (was: amqp nio ssl)

> SSL handshake stalls broker with NIO
> ------------------------------------
>
>                 Key: AMQ-5368
>                 URL: https://issues.apache.org/jira/browse/AMQ-5368
>             Project: ActiveMQ
>          Issue Type: Bug
>          Components: Transport
>    Affects Versions: 5.8.0
>         Environment: java version "1.7.0_65"
>            Reporter: Oliver Deckert
>              Labels: nio, ssl
>
> using NIOSSL transport, SSL handshakes for ~5000 connections easily stalls a broker taking 100% CPU
> I'm using version ActiveMQ 5.8, but it occurs on 5.9, 5.10 versions as well
> doing some profiling, it showed up that the SSL handshake on broker side eats up ~90% of overall CPU time
> by checking just the handshake status in very high frequency
> top 3 methods sorted by own processor time:
> com.sun.net.ssl.internal.ssl.SSLEngineImpl.getHandshakeStatus()
> org.apache.activemq.transport.nio.NIOSSLTransport.doHandshake()
> com.sun.net.ssl.internal.ssl.SSLEngineImpl.getHSStatus(javax.net.ssl.SSLEngineResult$HandshakeStatus)
> the reason is the asynchronous nature of the SSL handshake with NIO, especially the execution of delegated tasks:
> - NIOSSLTransport.doHandshake() executes delegated tasks using a TaskRunnerFactory asynchronously
> - in the meantime it loops calling SSLEngine.getHandshakeStatus()
> to improve the situation I did the following changes:
> - run delegated tasks synchronously in method doHandshake (handshake status NEED_TASK) instead of asynchronously
> - added some small wait cycles in method secureRead as there is not always data available with NIO (to further reduce the number of calls to SSLEngine.getHandshakeStatus)
> after these changes the SSL handshake for several thousand connections in parallel was not a problem anymore



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)