You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@tomcat.apache.org by David Oswell <do...@gmail.com> on 2017/01/04 16:02:34 UTC

Tomcat 8.5 - APR 1.2.10 SSL CPU issue ?

Hello all,

wondering if anyone has seen similar behaviour - sorry if already reported,
googling hasn't found much other than a somewhat similar older issue on APR
going to 100% CPU when idle due to a poll loop/timeout issue - however this
seems a bit different as this is only occurring with SSL;

Running (ESX Based):
  Server version:        Apache Tomcat/8.5.9
  Server built:          Dec 5 2016 20:18:12 UTC
  Server number:         8.5.9.0
  OS Name:               Windows Server 2012 R2
  OS Version:            6.3
  Architecture:          amd64
  JVM Version:           1.8.0_112-b15
  JVM Vendor:            Oracle Corporation
  Apache Tomcat Native library 1.2.10 using APR version 1.5.2.

  APR capabilities: IPv6 [true], sendfile [true], accept filters [false],
random [true].
  APR/OpenSSL configuration: useAprConnector [false], useOpenSSL [true]
  OpenSSL successfully initialized (OpenSSL 1.0.2j  26 Sep 2016)

Server setup as 4 vCPU / 8GB mem

Problem:
After running load on the server (this is a QA system, so just load tests)
Tomcat goes to 70-100% CPU usage when it should be idle.
This only occurs when putting load on the SSL connectors w/APR. When
re-applying load (to SSL connector), the CPU usage drops to appropriate
levels for the load, but then returns to the higher idle CPU usage.
If applying load back to the non-SSL connector, CPU usage stays high
(higher than after a restart and sending to non-SSL connector first).

Performing the same load tests on non-SSL (still with APR), or non-APR
connectors CPU returns to idle after the load run is completed.

The load applied is 8 persistent connections, which send basic POST
requests, the service does some basic operations and responds, generally -
if not maxed out - in 1-2ms.
Even when using 1 connection, the CPU usage stays constant when the
requests stop.
Again this only occurs when using the SSL/APR connector, if using a non-APR
connector, or the APR-non-SSL connector it's fine.

It seems to be a problem with just how APR-SSL works or possibly OpenSSL ?
Not sure how to further isolate to see which one is the problem.


SSL Connector setup:
    <Connector port="8443"
                protocol="org.apache.coyote.http11.Http11AprProtocol"
                maxThreads="150"
                SSLEnabled="true"
                maxKeepAliveRequests="-1">
        <SSLHostConfig>
            <Certificate
                        certificateFile="conf/localhost-rsa-cert.pem"
                        certificateKeyFile="conf/localhost-rsa-key.pem"
                        certificateKeyPassword="password"
                        type="RSA" />
        </SSLHostConfig>
    </Connector>

Non-SSL Connector:
       <Connector port="8080"
               protocol="org.apache.coyote.http11.Http11AprProtocol"
               connectionTimeout="20000"
               maxKeepAliveRequests="-1"
               redirectPort="8443" >
       </Connector>

Initializing ProtocolHandler ["http-apr-8080"]
Initializing ProtocolHandler ["https-openssl-apr-8443"]

Re: Tomcat 8.5 - APR 1.2.10 SSL CPU issue ?

Posted by David Oswell <do...@gmail.com>.
Hi Mark,

For reproducing, if you're making the requests locally to the server that
might be why you're not seeing the issue occur. I had been testing between
servers (client on server A, tomcat on server B); I tried this morning
doing a wget locally to tomcat and do not see the issue occur (no
WSAECONNABORTED status) while doing the wget remotely does get the aborted
status.


in sslnetwork.c:ssl_socket_recv adding the following debug statements;


 if (s == 0 && (SSL_get_shutdown(con->ssl) & SSL_RECEIVED_SHUTDOWN)) {
                con->shutdown_type = SSL_SHUTDOWN_TYPE_STANDARD;
                return APR_EOF;
            }
printf("ssl_socket_recv:: Have value for i: %d\n", i);
printf("ssl_socket_recv:: Have value for s: %d\n", s);
printf("ssl_socket_recv:: Have value for rv: %d\n", rv);
printf("ssl_socket_recv:: con->shutdown_type: %d\n", con->shutdown_type);
fflush(stdout);

            switch (i) {
                case SSL_ERROR_WANT_READ:
                case SSL_ERROR_WANT_WRITE:




Outputs:
Local server (wget --no-check-certificate  https://localhost:8443/examples
)

ssl_socket_recv:: Have value for i: 5
ssl_socket_recv:: Have value for s: -1
ssl_socket_recv:: Have value for rv: 730054
ssl_socket_recv:: con->shutdown_type: 2



While on remote server performing the same request (wget
--no-check-certificate https://serverB:8443/examples)

ssl_socket_recv:: Have value for i: 5
ssl_socket_recv:: Have value for s: -1
ssl_socket_recv:: Have value for rv: 730053
ssl_socket_recv:: con->shutdown_type: 2




Thanks,





On Fri, Jan 6, 2017 at 5:43 PM, David Oswell <do...@gmail.com> wrote:

> Yep that would, Originally I had just turned the return 0 to throw an
> IOException - which as it would fall through to the else would be the same
> effect.
>
> I'll see if a java client would do the same, I'm using a golang
> client/console app for testing.
> I just tried with wget and see the WSAECONNABORTED status within APR.
> It also seems a bit time dependent, might be a case of whether APR is
> getting to the socket read before or after the FIN packet coming though ??
>
>
>
> On Fri, Jan 6, 2017 at 5:11 PM, Mark Thomas <ma...@apache.org> wrote:
>
>> On 06/01/2017 21:50, David Oswell wrote:
>> > I've somehow gotten the build for tcnative working here (more shocked I
>> > finally got openssl to build!)
>> >
>> > There seems to be a slight difference in how the shutdown occurs. When
>> the
>> > APR_EGENERAL is returned its due to falling through the
>> SSL_ERROR_SYSCALL
>> > block in sslnetwork.c:ssl_socket_recv,
>> > This seems to be due to a difference in the value returned by;
>> > sslnetwork.c:324 :                                rv =
>> > apr_get_netos_error();
>> >
>> > on the bad case (quick socket close), rv is (730053) which
>> > is WSAECONNABORTED   - APR_STATUS_IS_ECONNABORTED
>> > on a good disconnect case (slower socket close) rv is (730054)
>> > = WSAECONNRESET  - APR_STATUS_IS_ECONNRESET
>> >
>> > I suspect a check with APR_STATUS_IS_ECONNABORTED(rv) might be needed to
>> > capture this scenario (WSAECONNABORTED), however I'm not sure how else
>> this
>> > status might occur, and if any of those cases should not flag it as
>> closed
>> > - although reading on WSAECONNABORTED it sounds like this is a close
>> case.
>> > Not sure if it's an exception case or just normal EOF though.
>>
>> Thanks. That is really useful information.
>>
>> I've been trying to re-create the original issue that led to this odd
>> 'treat an error as eagain' code but without success. I have found a
>> couple of other bugs (now fixed) so it wasn't a complete waste of time.
>>
>> When I added this hack I was fairly sure I was missing something and it
>> is looking increasingly like this code was fixing a symptom rather than
>> the root cause. Given that I can't re-create the original problem, I'm
>> leaning towards removing the special handling for EGENERAL and letting
>> it trigger a close.
>>
>> If you remove the
>> else if (-result == Status.APR_EGENERAL && isSecure()) {
>> ...
>> }
>>
>> block, does that fix the problem you are seeing?
>>
>> Mark
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
>> For additional commands, e-mail: users-help@tomcat.apache.org
>>
>>
>

Re: Tomcat 8.5 - APR 1.2.10 SSL CPU issue ?

Posted by David Oswell <do...@gmail.com>.
Yep that would, Originally I had just turned the return 0 to throw an
IOException - which as it would fall through to the else would be the same
effect.

I'll see if a java client would do the same, I'm using a golang
client/console app for testing.
I just tried with wget and see the WSAECONNABORTED status within APR.
It also seems a bit time dependent, might be a case of whether APR is
getting to the socket read before or after the FIN packet coming though ??



On Fri, Jan 6, 2017 at 5:11 PM, Mark Thomas <ma...@apache.org> wrote:

> On 06/01/2017 21:50, David Oswell wrote:
> > I've somehow gotten the build for tcnative working here (more shocked I
> > finally got openssl to build!)
> >
> > There seems to be a slight difference in how the shutdown occurs. When
> the
> > APR_EGENERAL is returned its due to falling through the SSL_ERROR_SYSCALL
> > block in sslnetwork.c:ssl_socket_recv,
> > This seems to be due to a difference in the value returned by;
> > sslnetwork.c:324 :                                rv =
> > apr_get_netos_error();
> >
> > on the bad case (quick socket close), rv is (730053) which
> > is WSAECONNABORTED   - APR_STATUS_IS_ECONNABORTED
> > on a good disconnect case (slower socket close) rv is (730054)
> > = WSAECONNRESET  - APR_STATUS_IS_ECONNRESET
> >
> > I suspect a check with APR_STATUS_IS_ECONNABORTED(rv) might be needed to
> > capture this scenario (WSAECONNABORTED), however I'm not sure how else
> this
> > status might occur, and if any of those cases should not flag it as
> closed
> > - although reading on WSAECONNABORTED it sounds like this is a close
> case.
> > Not sure if it's an exception case or just normal EOF though.
>
> Thanks. That is really useful information.
>
> I've been trying to re-create the original issue that led to this odd
> 'treat an error as eagain' code but without success. I have found a
> couple of other bugs (now fixed) so it wasn't a complete waste of time.
>
> When I added this hack I was fairly sure I was missing something and it
> is looking increasingly like this code was fixing a symptom rather than
> the root cause. Given that I can't re-create the original problem, I'm
> leaning towards removing the special handling for EGENERAL and letting
> it trigger a close.
>
> If you remove the
> else if (-result == Status.APR_EGENERAL && isSecure()) {
> ...
> }
>
> block, does that fix the problem you are seeing?
>
> Mark
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
> For additional commands, e-mail: users-help@tomcat.apache.org
>
>

Re: Tomcat 8.5 - APR 1.2.10 SSL CPU issue ?

Posted by Mark Thomas <ma...@apache.org>.
On 06/01/2017 21:50, David Oswell wrote:
> I've somehow gotten the build for tcnative working here (more shocked I
> finally got openssl to build!)
> 
> There seems to be a slight difference in how the shutdown occurs. When the
> APR_EGENERAL is returned its due to falling through the SSL_ERROR_SYSCALL
> block in sslnetwork.c:ssl_socket_recv,
> This seems to be due to a difference in the value returned by;
> sslnetwork.c:324 :                                rv =
> apr_get_netos_error();
> 
> on the bad case (quick socket close), rv is (730053) which
> is WSAECONNABORTED   - APR_STATUS_IS_ECONNABORTED
> on a good disconnect case (slower socket close) rv is (730054)
> = WSAECONNRESET  - APR_STATUS_IS_ECONNRESET
> 
> I suspect a check with APR_STATUS_IS_ECONNABORTED(rv) might be needed to
> capture this scenario (WSAECONNABORTED), however I'm not sure how else this
> status might occur, and if any of those cases should not flag it as closed
> - although reading on WSAECONNABORTED it sounds like this is a close case.
> Not sure if it's an exception case or just normal EOF though.

Thanks. That is really useful information.

I've been trying to re-create the original issue that led to this odd
'treat an error as eagain' code but without success. I have found a
couple of other bugs (now fixed) so it wasn't a complete waste of time.

When I added this hack I was fairly sure I was missing something and it
is looking increasingly like this code was fixing a symptom rather than
the root cause. Given that I can't re-create the original problem, I'm
leaning towards removing the special handling for EGENERAL and letting
it trigger a close.

If you remove the
else if (-result == Status.APR_EGENERAL && isSecure()) {
...
}

block, does that fix the problem you are seeing?

Mark


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Re: Tomcat 8.5 - APR 1.2.10 SSL CPU issue ?

Posted by David Oswell <do...@gmail.com>.
I've somehow gotten the build for tcnative working here (more shocked I
finally got openssl to build!)

There seems to be a slight difference in how the shutdown occurs. When the
APR_EGENERAL is returned its due to falling through the SSL_ERROR_SYSCALL
block in sslnetwork.c:ssl_socket_recv,
This seems to be due to a difference in the value returned by;
sslnetwork.c:324 :                                rv =
apr_get_netos_error();

on the bad case (quick socket close), rv is (730053) which
is WSAECONNABORTED   - APR_STATUS_IS_ECONNABORTED
on a good disconnect case (slower socket close) rv is (730054)
= WSAECONNRESET  - APR_STATUS_IS_ECONNRESET

I suspect a check with APR_STATUS_IS_ECONNABORTED(rv) might be needed to
capture this scenario (WSAECONNABORTED), however I'm not sure how else this
status might occur, and if any of those cases should not flag it as closed
- although reading on WSAECONNABORTED it sounds like this is a close case.
Not sure if it's an exception case or just normal EOF though.



On Fri, Jan 6, 2017 at 10:00 AM, Mark Thomas <ma...@apache.org> wrote:

> On 06/01/2017 10:52, Mark Thomas wrote:
>
> > I can reproduce the (new?) loop error with the following:
> > - clean trunk (9.0.x) build
> > - enable debug logging for AprEndpoint
> > - APR/native 1.2.10
> > - WebSocket drawboard example
> > - hold down F5
> >
> > I see some error messages as per the original problem but fairly quickly
> > Tomcat enters the infinite loop.
> >
> > Next steps are digging into the APR/native code.
>
> It was a different loop related specifically to WebSocket that is now
> fixed.
>
> I'm still looking into this. I'm am currently leaning towards removing
> the suspect APR_EGENERAL handling from Tomcat but I want to see if I can
> re-create the problem that caused it to be added in the first place. I
> haven't had any success so far which leads me to suspect that the root
> cause may have been a Tomcat bug that was fixed in the 8.5.x/trunk
> refactoring. Proving that is going to be tricky.
>
> Mark
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
> For additional commands, e-mail: users-help@tomcat.apache.org
>
>

Re: Tomcat 8.5 - APR 1.2.10 SSL CPU issue ?

Posted by Mark Thomas <ma...@apache.org>.
On 06/01/2017 10:52, Mark Thomas wrote:

> I can reproduce the (new?) loop error with the following:
> - clean trunk (9.0.x) build
> - enable debug logging for AprEndpoint
> - APR/native 1.2.10
> - WebSocket drawboard example
> - hold down F5
> 
> I see some error messages as per the original problem but fairly quickly
> Tomcat enters the infinite loop.
> 
> Next steps are digging into the APR/native code.

It was a different loop related specifically to WebSocket that is now fixed.

I'm still looking into this. I'm am currently leaning towards removing
the suspect APR_EGENERAL handling from Tomcat but I want to see if I can
re-create the problem that caused it to be added in the first place. I
haven't had any success so far which leads me to suspect that the root
cause may have been a Tomcat bug that was fixed in the 8.5.x/trunk
refactoring. Proving that is going to be tricky.

Mark

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Re: Tomcat 8.5 - APR 1.2.10 SSL CPU issue ?

Posted by Mark Thomas <ma...@apache.org>.
On 05/01/2017 22:30, Mark Thomas wrote:
> On 05/01/2017 22:01, David Oswell wrote:
>> After some more digging I've been able to further narrow down the problem
>> somewhat, but still not able to pin point the exact cause;
>> The issue is not load related, but rather seems to be related to the timing
>> of the TCP connection being closed.
>> Depending on the timing the poller and exec appear to get into a loop -
>> drilled thsi down to an error status returned in AprEndpoint - which might
>> need to be thrown as an exception rather than return 0 from
>> the fillReadBuffer in AprEndpoint.
>>
>> Poller thread - AprEndpoint:1573 - Poller(ID 55) wakes ups
>> Poller thread - AprEndpoint:1652 - Poller adds socket to timeout
>> Poller thread - AprEndpoint:1675 - Poller gets rv = 1
>> Poller thread - AprEndpoint:1694 - Poller gets connection (socket
>> Id 565911936 )
>> Poller thread - AprEndpoint:1731 - Poller processesSocket as socket event
>> OPEN_READ
>> Poller thread - AbstractEndpoint:903   - executor executes
>> AprEndpoint$SocketProcessor (id 63)  -> No exception thrown.
>>
>> Exec thread - AprEndpoint:2403 - Socket.recvb result = -20014
>> Interesting comment ?  at AprEndpoint:2445 :
>>             } else if (-result == Status.APR_EGENERAL && isSecure()) {
>>   //Status.APR_EGENERAL=20014
>>                 // Not entirely sure why this is necessary. Testing to date
>> has not
>>                 // identified any issues with this but log it so it can be
>> tracked
>>                 // if it is suspected of causing issues in the future.
>>                 if (log.isDebugEnabled()) {
>>
>> log.debug(sm.getString("socket.apr.read.sslGeneralError", getSocket(),
>> this));
>>                 }
>>                 return 0;
>> Does this need to throw an exception to get caught higher up as an error?
> 
> Oh great. That code.
> 
> It originates here:
> http://svn.apache.org/viewvc?view=revision&revision=1534619
> 
> For the background see this thread:
> http://tomcat.markmail.org/thread/4vspjutd4kzqkc5q
> 
> As far as I could tell, something was happening in the TLS layer that
> APR/native was reporting as an error that wasn't really an error.
> Therefore, I changed Tomcat to ignore the report of an error and carried on.
> 
> What I suspect is happening is that you are seeing a real error that
> Tomcat now isn't treating as an error.
> 
> I do have a working build environment for tc-native on Windows now so
> this is probably worth a re-visit.
> 
> I'll put this at the top of my TODO list for after the 9.0.x and 8.5.x
> releases I've been meaning to start for the last couple of days.
> 
> Given how far you've got with this in a short time, if you wanted to
> continue digging that would be great. My suggestion for a way forward
> would be:
> - enable debug logging for AprEndpoint
> - try and recreate the 20014 error with the WebSocket drawing board
>   example (as described in the links above)
> - trace back into APR/native to figure out a) what is generating that
>   20014 error code and b) what it should really be generating.
> 
> If you need to build APR/native, the instructions are here:
> https://cwiki.apache.org/confluence/display/TOMCAT/Building+the+Tomcat+Native+Connector+binaries+for+Windows

I can reproduce the (new?) loop error with the following:
- clean trunk (9.0.x) build
- enable debug logging for AprEndpoint
- APR/native 1.2.10
- WebSocket drawboard example
- hold down F5

I see some error messages as per the original problem but fairly quickly
Tomcat enters the infinite loop.

Next steps are digging into the APR/native code.

Mark


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Re: Tomcat 8.5 - APR 1.2.10 SSL CPU issue ?

Posted by Mark Thomas <ma...@apache.org>.
On 05/01/2017 22:01, David Oswell wrote:
> After some more digging I've been able to further narrow down the problem
> somewhat, but still not able to pin point the exact cause;
> The issue is not load related, but rather seems to be related to the timing
> of the TCP connection being closed.
> Depending on the timing the poller and exec appear to get into a loop -
> drilled thsi down to an error status returned in AprEndpoint - which might
> need to be thrown as an exception rather than return 0 from
> the fillReadBuffer in AprEndpoint.
> 
> Poller thread - AprEndpoint:1573 - Poller(ID 55) wakes ups
> Poller thread - AprEndpoint:1652 - Poller adds socket to timeout
> Poller thread - AprEndpoint:1675 - Poller gets rv = 1
> Poller thread - AprEndpoint:1694 - Poller gets connection (socket
> Id 565911936 )
> Poller thread - AprEndpoint:1731 - Poller processesSocket as socket event
> OPEN_READ
> Poller thread - AbstractEndpoint:903   - executor executes
> AprEndpoint$SocketProcessor (id 63)  -> No exception thrown.
> 
> Exec thread - AprEndpoint:2403 - Socket.recvb result = -20014
> Interesting comment ?  at AprEndpoint:2445 :
>             } else if (-result == Status.APR_EGENERAL && isSecure()) {
>   //Status.APR_EGENERAL=20014
>                 // Not entirely sure why this is necessary. Testing to date
> has not
>                 // identified any issues with this but log it so it can be
> tracked
>                 // if it is suspected of causing issues in the future.
>                 if (log.isDebugEnabled()) {
> 
> log.debug(sm.getString("socket.apr.read.sslGeneralError", getSocket(),
> this));
>                 }
>                 return 0;
> Does this need to throw an exception to get caught higher up as an error?

Oh great. That code.

It originates here:
http://svn.apache.org/viewvc?view=revision&revision=1534619

For the background see this thread:
http://tomcat.markmail.org/thread/4vspjutd4kzqkc5q

As far as I could tell, something was happening in the TLS layer that
APR/native was reporting as an error that wasn't really an error.
Therefore, I changed Tomcat to ignore the report of an error and carried on.

What I suspect is happening is that you are seeing a real error that
Tomcat now isn't treating as an error.

I do have a working build environment for tc-native on Windows now so
this is probably worth a re-visit.

I'll put this at the top of my TODO list for after the 9.0.x and 8.5.x
releases I've been meaning to start for the last couple of days.

Given how far you've got with this in a short time, if you wanted to
continue digging that would be great. My suggestion for a way forward
would be:
- enable debug logging for AprEndpoint
- try and recreate the 20014 error with the WebSocket drawing board
  example (as described in the links above)
- trace back into APR/native to figure out a) what is generating that
  20014 error code and b) what it should really be generating.

If you need to build APR/native, the instructions are here:
https://cwiki.apache.org/confluence/display/TOMCAT/Building+the+Tomcat+Native+Connector+binaries+for+Windows


HTH,

Mark

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Re: Tomcat 8.5 - APR 1.2.10 SSL CPU issue ?

Posted by David Oswell <do...@gmail.com>.
After some more digging I've been able to further narrow down the problem
somewhat, but still not able to pin point the exact cause;
The issue is not load related, but rather seems to be related to the timing
of the TCP connection being closed.
Depending on the timing the poller and exec appear to get into a loop -
drilled thsi down to an error status returned in AprEndpoint - which might
need to be thrown as an exception rather than return 0 from
the fillReadBuffer in AprEndpoint.

Poller thread - AprEndpoint:1573 - Poller(ID 55) wakes ups
Poller thread - AprEndpoint:1652 - Poller adds socket to timeout
Poller thread - AprEndpoint:1675 - Poller gets rv = 1
Poller thread - AprEndpoint:1694 - Poller gets connection (socket
Id 565911936 )
Poller thread - AprEndpoint:1731 - Poller processesSocket as socket event
OPEN_READ
Poller thread - AbstractEndpoint:903   - executor executes
AprEndpoint$SocketProcessor (id 63)  -> No exception thrown.

Exec thread - AprEndpoint:2403 - Socket.recvb result = -20014
Interesting comment ?  at AprEndpoint:2445 :
            } else if (-result == Status.APR_EGENERAL && isSecure()) {
  //Status.APR_EGENERAL=20014
                // Not entirely sure why this is necessary. Testing to date
has not
                // identified any issues with this but log it so it can be
tracked
                // if it is suspected of causing issues in the future.
                if (log.isDebugEnabled()) {

log.debug(sm.getString("socket.apr.read.sslGeneralError", getSocket(),
this));
                }
                return 0;
Does this need to throw an exception to get caught higher up as an error?

Exec thread - HJttp11InputBuffer:708 - Reads from buffer, returns nRead=0
Exec thread - AbstractProtocol:789 - process reuqest ; state returns OPEN
Exec thread - AbstractProtocol:2678 - Exec adds socket (id 565911936)
Exec thread - AprEndpoint:1435 - Calls notify() on AprEndpoint$Poller (id
55)

It seems to just bounce between those two operations, the poller adding the
executor for OPEN_READ, but the execute just does a add of the socket for
APR_POLLIN, the poller then adds it back as OPEN_READ again, and exec adds
back to poller for APR_POLLIN.

desc in AprEndpoint contains;
[48, 565911936, 1, 565903728, 0, 0, 0, 0, 0, ....]
desc[0] looks to indicate Poll.APR_POLLHUP
wrapper.pollerFlags = 1






On Wed, Jan 4, 2017 at 5:54 PM, David Oswell <do...@gmail.com> wrote:

> Ok the issue appears to be related to the state the connection handler
> thinks it is in, after the connection is disconnected the state returned
> by org.apache.coyote.http11.Http11Processor.process remains as OPEN,
> which causes the underlying poll thread to get notified and repeats.
>
> Breakpointing on all of the CLOSED states does not get triggered, unless
> at low requests per second, on my server at around 200 requests/second on
> one connection the disconnect doesn't get triggered, but at 1-2 it does.
> Because it doesn't get flagged as disconnect it just constantly gets
> processed and always shows a state of OPEN_READ.
>
> As a side note, this only seems to occur on the servers (have 2 where the
> same behaviour occurs); running it on my local machine the issue doesn't
> appear. So not sure if somehow related to Win2012R2 &APR interaction??? -
> Yes I know the obvious fix is to not use Win2012 :)
>
>
>
> On Wed, Jan 4, 2017 at 12:29 PM, David Oswell <do...@gmail.com> wrote:
>
>> Hi Mark,
>>
>> Thanks for the response - So far I've narrowed down to this;
>>     0 - ntoskrnl.exe!KeSynchronizeExecution+0x2246
>>     1 - ntoskrnl.exe!KeWaitForMultipleObjects+0x135e
>>     2 - ntoskrnl.exe!KeWaitForMultipleObjects+0xdd9
>>     3 - ntoskrnl.exe!KeWaitForSingleObject+0x373
>>     4 - ntoskrnl.exe!KeStallWhileFrozen+0x1977
>>     5 - ntoskrnl.exe!_misaligned_access+0x13f9
>>     6 - ntoskrnl.exe!KeSynchronizeExecution+0x4133
>>     7 - ntdll.dll!RtlCopyMemory+0x3
>>     8 - WS2_32.dll!WSAPoll+0xb9
>>     9 - tcnative-1.dll!Java_org_apache_tomcat_jni_Proc_detach+0xad10
>>     10 - tcnative-1.dll!Java_org_apache_tomcat_jni_Poll_poll+0xac
>>
>> From what I can see there is one thread which is the main cpu user when
>> idle, it cycles through the above and;
>>     jvm.dll!JVM_FindSignal+0x2cba
>>     jvm.dll!JVM_ResolveClass+0x371c7
>>     jvm.dll!JVM_ResolveClass+0x571d7
>>     jvm.dll!JVM_MonitorWait+0xfe
>>     jvm.dll!JVM_FindSignal+0x73f
>>
>> Which looks to correspond to https-openssl-apr-8443-poller thread, which
>> I see cycling through RUNNABLE/TIMED_WAITING states (below are from when
>> there are zero requests occuring);
>>     Name: https-openssl-apr-8443-Poller
>>     State: RUNNABLE
>>     Total blocked: 11,222,146  Total waited: 11,222,507
>>
>>     Stack trace:
>>     org.apache.tomcat.jni.Poll.poll(Native Method)
>>     org.apache.tomcat.util.net.AprEndpoint$Poller.run(AprEndpo
>> int.java:1675)
>>     java.lang.Thread.run(Thread.java:745)
>>
>>  -------------
>>
>>     Name: https-openssl-apr-8443-Poller
>>     State: RUNNABLE
>>     Total blocked: 11,624,292  Total waited: 11,624,681
>>
>>     Stack trace:
>>     org.apache.tomcat.jni.Poll.add(Native Method)
>>     org.apache.tomcat.util.net.AprEndpoint$Poller.addToPoller(
>> AprEndpoint.java:1449)
>>     org.apache.tomcat.util.net.AprEndpoint$Poller.run(AprEndpo
>> int.java:1649)
>>     java.lang.Thread.run(Thread.java:745)
>>
>> ------------------
>>
>>     Name: https-openssl-apr-8443-Poller
>>     State: TIMED_WAITING on org.apache.tomcat.util.net.Apr
>> Endpoint$Poller@4f5e5841
>>     Total blocked: 12,441,180  Total waited: 12,441,614
>>
>>     Stack trace:
>>     java.lang.Object.wait(Native Method)
>>     org.apache.tomcat.util.net.AprEndpoint$Poller.run(AprEndpoin
>> t.java:1573)
>>     java.lang.Thread.run(Thread.java:745)
>>
>> ---------------------
>>
>>     Name: https-openssl-apr-8443-Poller
>>     State: RUNNABLE
>>     Total blocked: 13,278,363  Total waited: 13,278,853
>>
>>     Stack trace:
>>     sun.misc.Unsafe.unpark(Native Method)
>>     java.util.concurrent.locks.LockSupport.unpark(LockSupport.java:141)
>>     java.util.concurrent.locks.AbstractQueuedSynchronizer.unpark
>> Successor(AbstractQueuedSynchronizer.java:662)
>>     java.util.concurrent.locks.AbstractQueuedSynchronizer.releas
>> e(AbstractQueuedSynchronizer.java:1264)
>>     java.util.concurrent.locks.ReentrantLock.unlock(ReentrantLoc
>> k.java:457)
>>     java.util.concurrent.LinkedBlockingQueue.signalNotEmpty(Link
>> edBlockingQueue.java:176)
>>     java.util.concurrent.LinkedBlockingQueue.offer(LinkedBlockin
>> gQueue.java:430)
>>     org.apache.tomcat.util.threads.TaskQueue.offer(TaskQueue.java:74)
>>     org.apache.tomcat.util.threads.TaskQueue.offer(TaskQueue.java:31)
>>     java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolEx
>> ecutor.java:1361)
>>     org.apache.tomcat.util.threads.ThreadPoolExecutor.execute(
>> ThreadPoolExecutor.java:163)
>>     org.apache.tomcat.util.threads.ThreadPoolExecutor.execute(
>> ThreadPoolExecutor.java:141)
>>     org.apache.tomcat.util.net.AbstractEndpoint.processSocket(Ab
>> stractEndpoint.java:903)
>>     org.apache.tomcat.util.net.AprEndpoint.processSocket(AprEndp
>> oint.java:866)
>>     org.apache.tomcat.util.net.AprEndpoint$Poller.run(AprEndpoin
>> t.java:1731)
>>     java.lang.Thread.run(Thread.java:745)
>>
>>
>> If I look at the 8080 poller thread (after load), it's waiting
>> appropriately.
>> I'll try and get the remote debugger connected now and see what time is
>> being passed for waiting.
>>
>>
>>
>> On Wed, Jan 4, 2017 at 11:30 AM, Mark Thomas <ma...@apache.org> wrote:
>>
>>> On 04/01/2017 16:02, David Oswell wrote:
>>> > Hello all,
>>> >
>>> > wondering if anyone has seen similar behaviour - sorry if already
>>> reported,
>>> > googling hasn't found much other than a somewhat similar older issue
>>> on APR
>>> > going to 100% CPU when idle due to a poll loop/timeout issue - however
>>> this
>>> > seems a bit different as this is only occurring with SSL;
>>> >
>>> > Running (ESX Based):
>>> >   Server version:        Apache Tomcat/8.5.9
>>> >   Server built:          Dec 5 2016 20:18:12 UTC
>>> >   Server number:         8.5.9.0
>>> >   OS Name:               Windows Server 2012 R2
>>> >   OS Version:            6.3
>>> >   Architecture:          amd64
>>> >   JVM Version:           1.8.0_112-b15
>>> >   JVM Vendor:            Oracle Corporation
>>> >   Apache Tomcat Native library 1.2.10 using APR version 1.5.2.
>>> >
>>> >   APR capabilities: IPv6 [true], sendfile [true], accept filters
>>> [false],
>>> > random [true].
>>> >   APR/OpenSSL configuration: useAprConnector [false], useOpenSSL [true]
>>> >   OpenSSL successfully initialized (OpenSSL 1.0.2j  26 Sep 2016)
>>> >
>>> > Server setup as 4 vCPU / 8GB mem
>>> >
>>> > Problem:
>>> > After running load on the server (this is a QA system, so just load
>>> tests)
>>> > Tomcat goes to 70-100% CPU usage when it should be idle.
>>> > This only occurs when putting load on the SSL connectors w/APR. When
>>> > re-applying load (to SSL connector), the CPU usage drops to appropriate
>>> > levels for the load, but then returns to the higher idle CPU usage.
>>> > If applying load back to the non-SSL connector, CPU usage stays high
>>> > (higher than after a restart and sending to non-SSL connector first).
>>> >
>>> > Performing the same load tests on non-SSL (still with APR), or non-APR
>>> > connectors CPU returns to idle after the load run is completed.
>>> >
>>> > The load applied is 8 persistent connections, which send basic POST
>>> > requests, the service does some basic operations and responds,
>>> generally -
>>> > if not maxed out - in 1-2ms.
>>> > Even when using 1 connection, the CPU usage stays constant when the
>>> > requests stop.
>>> > Again this only occurs when using the SSL/APR connector, if using a
>>> non-APR
>>> > connector, or the APR-non-SSL connector it's fine.
>>> >
>>> > It seems to be a problem with just how APR-SSL works or possibly
>>> OpenSSL ?
>>>
>>> An issue with the Tomcat code is more likely than with the OpenSSL code.
>>>
>>> It sounds like one or more threads is entering a fairly tight loop. Use
>>> a combination of top to find out which thread(s) is(are) using the CPU
>>> and then a thread dump to find where in the code it is happening.
>>>
>>> Then add a breakpoint at that point in the code, run your test and once
>>> the high CPU starts, attach your IDE and step through the code to see
>>> where/how the loop is happening.
>>>
>>> Instructions for running Tomcat under a debugger are on the Wiki.
>>>
>>> The more of the above you are able to complete, the simpler it is going
>>> to be to get this fixed.
>>>
>>> Mark
>>>
>>>
>>> > Not sure how to further isolate to see which one is the problem.
>>> >
>>> >
>>> > SSL Connector setup:
>>> >     <Connector port="8443"
>>> >                 protocol="org.apache.coyote.http11.Http11AprProtocol"
>>> >                 maxThreads="150"
>>> >                 SSLEnabled="true"
>>> >                 maxKeepAliveRequests="-1">
>>> >         <SSLHostConfig>
>>> >             <Certificate
>>> >                         certificateFile="conf/localhost-rsa-cert.pem"
>>> >                         certificateKeyFile="conf/loca
>>> lhost-rsa-key.pem"
>>> >                         certificateKeyPassword="password"
>>> >                         type="RSA" />
>>> >         </SSLHostConfig>
>>> >     </Connector>
>>> >
>>> > Non-SSL Connector:
>>> >        <Connector port="8080"
>>> >                protocol="org.apache.coyote.http11.Http11AprProtocol"
>>> >                connectionTimeout="20000"
>>> >                maxKeepAliveRequests="-1"
>>> >                redirectPort="8443" >
>>> >        </Connector>
>>> >
>>> > Initializing ProtocolHandler ["http-apr-8080"]
>>> > Initializing ProtocolHandler ["https-openssl-apr-8443"]
>>> >
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
>>> For additional commands, e-mail: users-help@tomcat.apache.org
>>>
>>>
>>
>

Re: Tomcat 8.5 - APR 1.2.10 SSL CPU issue ?

Posted by David Oswell <do...@gmail.com>.
Ok the issue appears to be related to the state the connection handler
thinks it is in, after the connection is disconnected the state returned
by org.apache.coyote.http11.Http11Processor.process remains as OPEN, which
causes the underlying poll thread to get notified and repeats.

Breakpointing on all of the CLOSED states does not get triggered, unless at
low requests per second, on my server at around 200 requests/second on one
connection the disconnect doesn't get triggered, but at 1-2 it does.
Because it doesn't get flagged as disconnect it just constantly gets
processed and always shows a state of OPEN_READ.

As a side note, this only seems to occur on the servers (have 2 where the
same behaviour occurs); running it on my local machine the issue doesn't
appear. So not sure if somehow related to Win2012R2 &APR interaction??? -
Yes I know the obvious fix is to not use Win2012 :)



On Wed, Jan 4, 2017 at 12:29 PM, David Oswell <do...@gmail.com> wrote:

> Hi Mark,
>
> Thanks for the response - So far I've narrowed down to this;
>     0 - ntoskrnl.exe!KeSynchronizeExecution+0x2246
>     1 - ntoskrnl.exe!KeWaitForMultipleObjects+0x135e
>     2 - ntoskrnl.exe!KeWaitForMultipleObjects+0xdd9
>     3 - ntoskrnl.exe!KeWaitForSingleObject+0x373
>     4 - ntoskrnl.exe!KeStallWhileFrozen+0x1977
>     5 - ntoskrnl.exe!_misaligned_access+0x13f9
>     6 - ntoskrnl.exe!KeSynchronizeExecution+0x4133
>     7 - ntdll.dll!RtlCopyMemory+0x3
>     8 - WS2_32.dll!WSAPoll+0xb9
>     9 - tcnative-1.dll!Java_org_apache_tomcat_jni_Proc_detach+0xad10
>     10 - tcnative-1.dll!Java_org_apache_tomcat_jni_Poll_poll+0xac
>
> From what I can see there is one thread which is the main cpu user when
> idle, it cycles through the above and;
>     jvm.dll!JVM_FindSignal+0x2cba
>     jvm.dll!JVM_ResolveClass+0x371c7
>     jvm.dll!JVM_ResolveClass+0x571d7
>     jvm.dll!JVM_MonitorWait+0xfe
>     jvm.dll!JVM_FindSignal+0x73f
>
> Which looks to correspond to https-openssl-apr-8443-poller thread, which I
> see cycling through RUNNABLE/TIMED_WAITING states (below are from when
> there are zero requests occuring);
>     Name: https-openssl-apr-8443-Poller
>     State: RUNNABLE
>     Total blocked: 11,222,146  Total waited: 11,222,507
>
>     Stack trace:
>     org.apache.tomcat.jni.Poll.poll(Native Method)
>     org.apache.tomcat.util.net.AprEndpoint$Poller.run(
> AprEndpoint.java:1675)
>     java.lang.Thread.run(Thread.java:745)
>
>  -------------
>
>     Name: https-openssl-apr-8443-Poller
>     State: RUNNABLE
>     Total blocked: 11,624,292  Total waited: 11,624,681
>
>     Stack trace:
>     org.apache.tomcat.jni.Poll.add(Native Method)
>     org.apache.tomcat.util.net.AprEndpoint$Poller.
> addToPoller(AprEndpoint.java:1449)
>     org.apache.tomcat.util.net.AprEndpoint$Poller.run(
> AprEndpoint.java:1649)
>     java.lang.Thread.run(Thread.java:745)
>
> ------------------
>
>     Name: https-openssl-apr-8443-Poller
>     State: TIMED_WAITING on org.apache.tomcat.util.net.
> AprEndpoint$Poller@4f5e5841
>     Total blocked: 12,441,180  Total waited: 12,441,614
>
>     Stack trace:
>     java.lang.Object.wait(Native Method)
>     org.apache.tomcat.util.net.AprEndpoint$Poller.run(
> AprEndpoint.java:1573)
>     java.lang.Thread.run(Thread.java:745)
>
> ---------------------
>
>     Name: https-openssl-apr-8443-Poller
>     State: RUNNABLE
>     Total blocked: 13,278,363  Total waited: 13,278,853
>
>     Stack trace:
>     sun.misc.Unsafe.unpark(Native Method)
>     java.util.concurrent.locks.LockSupport.unpark(LockSupport.java:141)
>     java.util.concurrent.locks.AbstractQueuedSynchronizer.unparkSuccessor(
> AbstractQueuedSynchronizer.java:662)
>     java.util.concurrent.locks.AbstractQueuedSynchronizer.release(
> AbstractQueuedSynchronizer.java:1264)
>     java.util.concurrent.locks.ReentrantLock.unlock(
> ReentrantLock.java:457)
>     java.util.concurrent.LinkedBlockingQueue.signalNotEmpty(
> LinkedBlockingQueue.java:176)
>     java.util.concurrent.LinkedBlockingQueue.offer(
> LinkedBlockingQueue.java:430)
>     org.apache.tomcat.util.threads.TaskQueue.offer(TaskQueue.java:74)
>     org.apache.tomcat.util.threads.TaskQueue.offer(TaskQueue.java:31)
>     java.util.concurrent.ThreadPoolExecutor.execute(
> ThreadPoolExecutor.java:1361)
>     org.apache.tomcat.util.threads.ThreadPoolExecutor.
> execute(ThreadPoolExecutor.java:163)
>     org.apache.tomcat.util.threads.ThreadPoolExecutor.
> execute(ThreadPoolExecutor.java:141)
>     org.apache.tomcat.util.net.AbstractEndpoint.processSocket(
> AbstractEndpoint.java:903)
>     org.apache.tomcat.util.net.AprEndpoint.processSocket(
> AprEndpoint.java:866)
>     org.apache.tomcat.util.net.AprEndpoint$Poller.run(
> AprEndpoint.java:1731)
>     java.lang.Thread.run(Thread.java:745)
>
>
> If I look at the 8080 poller thread (after load), it's waiting
> appropriately.
> I'll try and get the remote debugger connected now and see what time is
> being passed for waiting.
>
>
>
> On Wed, Jan 4, 2017 at 11:30 AM, Mark Thomas <ma...@apache.org> wrote:
>
>> On 04/01/2017 16:02, David Oswell wrote:
>> > Hello all,
>> >
>> > wondering if anyone has seen similar behaviour - sorry if already
>> reported,
>> > googling hasn't found much other than a somewhat similar older issue on
>> APR
>> > going to 100% CPU when idle due to a poll loop/timeout issue - however
>> this
>> > seems a bit different as this is only occurring with SSL;
>> >
>> > Running (ESX Based):
>> >   Server version:        Apache Tomcat/8.5.9
>> >   Server built:          Dec 5 2016 20:18:12 UTC
>> >   Server number:         8.5.9.0
>> >   OS Name:               Windows Server 2012 R2
>> >   OS Version:            6.3
>> >   Architecture:          amd64
>> >   JVM Version:           1.8.0_112-b15
>> >   JVM Vendor:            Oracle Corporation
>> >   Apache Tomcat Native library 1.2.10 using APR version 1.5.2.
>> >
>> >   APR capabilities: IPv6 [true], sendfile [true], accept filters
>> [false],
>> > random [true].
>> >   APR/OpenSSL configuration: useAprConnector [false], useOpenSSL [true]
>> >   OpenSSL successfully initialized (OpenSSL 1.0.2j  26 Sep 2016)
>> >
>> > Server setup as 4 vCPU / 8GB mem
>> >
>> > Problem:
>> > After running load on the server (this is a QA system, so just load
>> tests)
>> > Tomcat goes to 70-100% CPU usage when it should be idle.
>> > This only occurs when putting load on the SSL connectors w/APR. When
>> > re-applying load (to SSL connector), the CPU usage drops to appropriate
>> > levels for the load, but then returns to the higher idle CPU usage.
>> > If applying load back to the non-SSL connector, CPU usage stays high
>> > (higher than after a restart and sending to non-SSL connector first).
>> >
>> > Performing the same load tests on non-SSL (still with APR), or non-APR
>> > connectors CPU returns to idle after the load run is completed.
>> >
>> > The load applied is 8 persistent connections, which send basic POST
>> > requests, the service does some basic operations and responds,
>> generally -
>> > if not maxed out - in 1-2ms.
>> > Even when using 1 connection, the CPU usage stays constant when the
>> > requests stop.
>> > Again this only occurs when using the SSL/APR connector, if using a
>> non-APR
>> > connector, or the APR-non-SSL connector it's fine.
>> >
>> > It seems to be a problem with just how APR-SSL works or possibly
>> OpenSSL ?
>>
>> An issue with the Tomcat code is more likely than with the OpenSSL code.
>>
>> It sounds like one or more threads is entering a fairly tight loop. Use
>> a combination of top to find out which thread(s) is(are) using the CPU
>> and then a thread dump to find where in the code it is happening.
>>
>> Then add a breakpoint at that point in the code, run your test and once
>> the high CPU starts, attach your IDE and step through the code to see
>> where/how the loop is happening.
>>
>> Instructions for running Tomcat under a debugger are on the Wiki.
>>
>> The more of the above you are able to complete, the simpler it is going
>> to be to get this fixed.
>>
>> Mark
>>
>>
>> > Not sure how to further isolate to see which one is the problem.
>> >
>> >
>> > SSL Connector setup:
>> >     <Connector port="8443"
>> >                 protocol="org.apache.coyote.http11.Http11AprProtocol"
>> >                 maxThreads="150"
>> >                 SSLEnabled="true"
>> >                 maxKeepAliveRequests="-1">
>> >         <SSLHostConfig>
>> >             <Certificate
>> >                         certificateFile="conf/localhost-rsa-cert.pem"
>> >                         certificateKeyFile="conf/localhost-rsa-key.pem"
>> >                         certificateKeyPassword="password"
>> >                         type="RSA" />
>> >         </SSLHostConfig>
>> >     </Connector>
>> >
>> > Non-SSL Connector:
>> >        <Connector port="8080"
>> >                protocol="org.apache.coyote.http11.Http11AprProtocol"
>> >                connectionTimeout="20000"
>> >                maxKeepAliveRequests="-1"
>> >                redirectPort="8443" >
>> >        </Connector>
>> >
>> > Initializing ProtocolHandler ["http-apr-8080"]
>> > Initializing ProtocolHandler ["https-openssl-apr-8443"]
>> >
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
>> For additional commands, e-mail: users-help@tomcat.apache.org
>>
>>
>

Re: Tomcat 8.5 - APR 1.2.10 SSL CPU issue ?

Posted by David Oswell <do...@gmail.com>.
Hi Mark,

Thanks for the response - So far I've narrowed down to this;
    0 - ntoskrnl.exe!KeSynchronizeExecution+0x2246
    1 - ntoskrnl.exe!KeWaitForMultipleObjects+0x135e
    2 - ntoskrnl.exe!KeWaitForMultipleObjects+0xdd9
    3 - ntoskrnl.exe!KeWaitForSingleObject+0x373
    4 - ntoskrnl.exe!KeStallWhileFrozen+0x1977
    5 - ntoskrnl.exe!_misaligned_access+0x13f9
    6 - ntoskrnl.exe!KeSynchronizeExecution+0x4133
    7 - ntdll.dll!RtlCopyMemory+0x3
    8 - WS2_32.dll!WSAPoll+0xb9
    9 - tcnative-1.dll!Java_org_apache_tomcat_jni_Proc_detach+0xad10
    10 - tcnative-1.dll!Java_org_apache_tomcat_jni_Poll_poll+0xac

From what I can see there is one thread which is the main cpu user when
idle, it cycles through the above and;
    jvm.dll!JVM_FindSignal+0x2cba
    jvm.dll!JVM_ResolveClass+0x371c7
    jvm.dll!JVM_ResolveClass+0x571d7
    jvm.dll!JVM_MonitorWait+0xfe
    jvm.dll!JVM_FindSignal+0x73f

Which looks to correspond to https-openssl-apr-8443-poller thread, which I
see cycling through RUNNABLE/TIMED_WAITING states (below are from when
there are zero requests occuring);
    Name: https-openssl-apr-8443-Poller
    State: RUNNABLE
    Total blocked: 11,222,146  Total waited: 11,222,507

    Stack trace:
    org.apache.tomcat.jni.Poll.poll(Native Method)
    org.apache.tomcat.util.net.AprEndpoint$Poller.run(AprEndpoint.java:1675)
    java.lang.Thread.run(Thread.java:745)

 -------------

    Name: https-openssl-apr-8443-Poller
    State: RUNNABLE
    Total blocked: 11,624,292  Total waited: 11,624,681

    Stack trace:
    org.apache.tomcat.jni.Poll.add(Native Method)

org.apache.tomcat.util.net.AprEndpoint$Poller.addToPoller(AprEndpoint.java:1449)
    org.apache.tomcat.util.net.AprEndpoint$Poller.run(AprEndpoint.java:1649)
    java.lang.Thread.run(Thread.java:745)

------------------

    Name: https-openssl-apr-8443-Poller
    State: TIMED_WAITING on
org.apache.tomcat.util.net.AprEndpoint$Poller@4f5e5841
    Total blocked: 12,441,180  Total waited: 12,441,614

    Stack trace:
    java.lang.Object.wait(Native Method)
    org.apache.tomcat.util.net.AprEndpoint$Poller.run(AprEndpoint.java:1573)
    java.lang.Thread.run(Thread.java:745)

---------------------

    Name: https-openssl-apr-8443-Poller
    State: RUNNABLE
    Total blocked: 13,278,363  Total waited: 13,278,853

    Stack trace:
    sun.misc.Unsafe.unpark(Native Method)
    java.util.concurrent.locks.LockSupport.unpark(LockSupport.java:141)

java.util.concurrent.locks.AbstractQueuedSynchronizer.unparkSuccessor(AbstractQueuedSynchronizer.java:662)

java.util.concurrent.locks.AbstractQueuedSynchronizer.release(AbstractQueuedSynchronizer.java:1264)
    java.util.concurrent.locks.ReentrantLock.unlock(ReentrantLock.java:457)

java.util.concurrent.LinkedBlockingQueue.signalNotEmpty(LinkedBlockingQueue.java:176)

java.util.concurrent.LinkedBlockingQueue.offer(LinkedBlockingQueue.java:430)
    org.apache.tomcat.util.threads.TaskQueue.offer(TaskQueue.java:74)
    org.apache.tomcat.util.threads.TaskQueue.offer(TaskQueue.java:31)

java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1361)

org.apache.tomcat.util.threads.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:163)

org.apache.tomcat.util.threads.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:141)

org.apache.tomcat.util.net.AbstractEndpoint.processSocket(AbstractEndpoint.java:903)

org.apache.tomcat.util.net.AprEndpoint.processSocket(AprEndpoint.java:866)
    org.apache.tomcat.util.net.AprEndpoint$Poller.run(AprEndpoint.java:1731)
    java.lang.Thread.run(Thread.java:745)


If I look at the 8080 poller thread (after load), it's waiting
appropriately.
I'll try and get the remote debugger connected now and see what time is
being passed for waiting.



On Wed, Jan 4, 2017 at 11:30 AM, Mark Thomas <ma...@apache.org> wrote:

> On 04/01/2017 16:02, David Oswell wrote:
> > Hello all,
> >
> > wondering if anyone has seen similar behaviour - sorry if already
> reported,
> > googling hasn't found much other than a somewhat similar older issue on
> APR
> > going to 100% CPU when idle due to a poll loop/timeout issue - however
> this
> > seems a bit different as this is only occurring with SSL;
> >
> > Running (ESX Based):
> >   Server version:        Apache Tomcat/8.5.9
> >   Server built:          Dec 5 2016 20:18:12 UTC
> >   Server number:         8.5.9.0
> >   OS Name:               Windows Server 2012 R2
> >   OS Version:            6.3
> >   Architecture:          amd64
> >   JVM Version:           1.8.0_112-b15
> >   JVM Vendor:            Oracle Corporation
> >   Apache Tomcat Native library 1.2.10 using APR version 1.5.2.
> >
> >   APR capabilities: IPv6 [true], sendfile [true], accept filters [false],
> > random [true].
> >   APR/OpenSSL configuration: useAprConnector [false], useOpenSSL [true]
> >   OpenSSL successfully initialized (OpenSSL 1.0.2j  26 Sep 2016)
> >
> > Server setup as 4 vCPU / 8GB mem
> >
> > Problem:
> > After running load on the server (this is a QA system, so just load
> tests)
> > Tomcat goes to 70-100% CPU usage when it should be idle.
> > This only occurs when putting load on the SSL connectors w/APR. When
> > re-applying load (to SSL connector), the CPU usage drops to appropriate
> > levels for the load, but then returns to the higher idle CPU usage.
> > If applying load back to the non-SSL connector, CPU usage stays high
> > (higher than after a restart and sending to non-SSL connector first).
> >
> > Performing the same load tests on non-SSL (still with APR), or non-APR
> > connectors CPU returns to idle after the load run is completed.
> >
> > The load applied is 8 persistent connections, which send basic POST
> > requests, the service does some basic operations and responds, generally
> -
> > if not maxed out - in 1-2ms.
> > Even when using 1 connection, the CPU usage stays constant when the
> > requests stop.
> > Again this only occurs when using the SSL/APR connector, if using a
> non-APR
> > connector, or the APR-non-SSL connector it's fine.
> >
> > It seems to be a problem with just how APR-SSL works or possibly OpenSSL
> ?
>
> An issue with the Tomcat code is more likely than with the OpenSSL code.
>
> It sounds like one or more threads is entering a fairly tight loop. Use
> a combination of top to find out which thread(s) is(are) using the CPU
> and then a thread dump to find where in the code it is happening.
>
> Then add a breakpoint at that point in the code, run your test and once
> the high CPU starts, attach your IDE and step through the code to see
> where/how the loop is happening.
>
> Instructions for running Tomcat under a debugger are on the Wiki.
>
> The more of the above you are able to complete, the simpler it is going
> to be to get this fixed.
>
> Mark
>
>
> > Not sure how to further isolate to see which one is the problem.
> >
> >
> > SSL Connector setup:
> >     <Connector port="8443"
> >                 protocol="org.apache.coyote.http11.Http11AprProtocol"
> >                 maxThreads="150"
> >                 SSLEnabled="true"
> >                 maxKeepAliveRequests="-1">
> >         <SSLHostConfig>
> >             <Certificate
> >                         certificateFile="conf/localhost-rsa-cert.pem"
> >                         certificateKeyFile="conf/localhost-rsa-key.pem"
> >                         certificateKeyPassword="password"
> >                         type="RSA" />
> >         </SSLHostConfig>
> >     </Connector>
> >
> > Non-SSL Connector:
> >        <Connector port="8080"
> >                protocol="org.apache.coyote.http11.Http11AprProtocol"
> >                connectionTimeout="20000"
> >                maxKeepAliveRequests="-1"
> >                redirectPort="8443" >
> >        </Connector>
> >
> > Initializing ProtocolHandler ["http-apr-8080"]
> > Initializing ProtocolHandler ["https-openssl-apr-8443"]
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
> For additional commands, e-mail: users-help@tomcat.apache.org
>
>

Re: Tomcat 8.5 - APR 1.2.10 SSL CPU issue ?

Posted by Mark Thomas <ma...@apache.org>.
On 04/01/2017 16:02, David Oswell wrote:
> Hello all,
> 
> wondering if anyone has seen similar behaviour - sorry if already reported,
> googling hasn't found much other than a somewhat similar older issue on APR
> going to 100% CPU when idle due to a poll loop/timeout issue - however this
> seems a bit different as this is only occurring with SSL;
> 
> Running (ESX Based):
>   Server version:        Apache Tomcat/8.5.9
>   Server built:          Dec 5 2016 20:18:12 UTC
>   Server number:         8.5.9.0
>   OS Name:               Windows Server 2012 R2
>   OS Version:            6.3
>   Architecture:          amd64
>   JVM Version:           1.8.0_112-b15
>   JVM Vendor:            Oracle Corporation
>   Apache Tomcat Native library 1.2.10 using APR version 1.5.2.
> 
>   APR capabilities: IPv6 [true], sendfile [true], accept filters [false],
> random [true].
>   APR/OpenSSL configuration: useAprConnector [false], useOpenSSL [true]
>   OpenSSL successfully initialized (OpenSSL 1.0.2j  26 Sep 2016)
> 
> Server setup as 4 vCPU / 8GB mem
> 
> Problem:
> After running load on the server (this is a QA system, so just load tests)
> Tomcat goes to 70-100% CPU usage when it should be idle.
> This only occurs when putting load on the SSL connectors w/APR. When
> re-applying load (to SSL connector), the CPU usage drops to appropriate
> levels for the load, but then returns to the higher idle CPU usage.
> If applying load back to the non-SSL connector, CPU usage stays high
> (higher than after a restart and sending to non-SSL connector first).
> 
> Performing the same load tests on non-SSL (still with APR), or non-APR
> connectors CPU returns to idle after the load run is completed.
> 
> The load applied is 8 persistent connections, which send basic POST
> requests, the service does some basic operations and responds, generally -
> if not maxed out - in 1-2ms.
> Even when using 1 connection, the CPU usage stays constant when the
> requests stop.
> Again this only occurs when using the SSL/APR connector, if using a non-APR
> connector, or the APR-non-SSL connector it's fine.
> 
> It seems to be a problem with just how APR-SSL works or possibly OpenSSL ?

An issue with the Tomcat code is more likely than with the OpenSSL code.

It sounds like one or more threads is entering a fairly tight loop. Use
a combination of top to find out which thread(s) is(are) using the CPU
and then a thread dump to find where in the code it is happening.

Then add a breakpoint at that point in the code, run your test and once
the high CPU starts, attach your IDE and step through the code to see
where/how the loop is happening.

Instructions for running Tomcat under a debugger are on the Wiki.

The more of the above you are able to complete, the simpler it is going
to be to get this fixed.

Mark


> Not sure how to further isolate to see which one is the problem.
> 
> 
> SSL Connector setup:
>     <Connector port="8443"
>                 protocol="org.apache.coyote.http11.Http11AprProtocol"
>                 maxThreads="150"
>                 SSLEnabled="true"
>                 maxKeepAliveRequests="-1">
>         <SSLHostConfig>
>             <Certificate
>                         certificateFile="conf/localhost-rsa-cert.pem"
>                         certificateKeyFile="conf/localhost-rsa-key.pem"
>                         certificateKeyPassword="password"
>                         type="RSA" />
>         </SSLHostConfig>
>     </Connector>
> 
> Non-SSL Connector:
>        <Connector port="8080"
>                protocol="org.apache.coyote.http11.Http11AprProtocol"
>                connectionTimeout="20000"
>                maxKeepAliveRequests="-1"
>                redirectPort="8443" >
>        </Connector>
> 
> Initializing ProtocolHandler ["http-apr-8080"]
> Initializing ProtocolHandler ["https-openssl-apr-8443"]
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org