You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@tomcat.apache.org by Toom Andreas <An...@volvofinans.se> on 2018/01/12 06:58:29 UTC

Ajp Nio-thread stuck in loop and consuming a lot of cpu

Hi,

We are having an issue with an application running Apache Tomcat 8.0.47 using Oracle Jvm 1.8.0.151 on Linux (RHEL 7). The Tomcat process is consuming  cpu at a constant high level even though there is a low amount of incoming traffic. When inspecting the process health using JMX /JVisualVM CPU Sampler we see that there are 4 ajp-nio-exec threads together with a NioBlockingSelector.BlockPoller thread that consume most of the cpu.

A stack trace of one of the ajp-io-exec threads looks like this:

"ajp-nio-48317-exec-14233" - Thread t@201195
   java.lang.Thread.State: TIMED_WAITING
                             at sun.misc.Unsafe.park(Native Method)
                             - parking to wait for <342fab60> (a java.util.concurrent.CountDownLatch$Sync)
                             at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
                             at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
                             at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
                             at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277)
                             at org.apache.tomcat.util.net.NioEndpoint$KeyAttachment.awaitLatch(NioEndpoint.java:1400)
                             at org.apache.tomcat.util.net.NioEndpoint$KeyAttachment.awaitReadLatch(NioEndpoint.java:1402)
                             at org.apache.tomcat.util.net.NioBlockingSelector.read(NioBlockingSelector.java:185)
                             at org.apache.tomcat.util.net.NioSelectorPool.read(NioSelectorPool.java:250)
                             at org.apache.tomcat.util.net.NioSelectorPool.read(NioSelectorPool.java:231)
                             at org.apache.coyote.ajp.AjpNioProcessor.readSocket(AjpNioProcessor.java:194)
                             at org.apache.coyote.ajp.AjpNioProcessor.read(AjpNioProcessor.java:160)
                             at org.apache.coyote.ajp.AbstractAjpProcessor.readMessage(AbstractAjpProcessor.java:1091)
                             at org.apache.coyote.ajp.AbstractAjpProcessor.process(AbstractAjpProcessor.java:804)
                             at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:684)
                             at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1539)
                             at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:1495)
                             - locked <279cc7f7> (a org.apache.tomcat.util.net.NioChannel)
                             at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
                             at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
                             at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
                             at java.lang.Thread.run(Thread.java:748)

   Locked ownable synchronizers:
                             - locked <10e1f474> (a java.util.concurrent.ThreadPoolExecutor$Worker)

A stack trace of the NioBlockingSelector.BlockPoller thread looks like this:

"NioBlockingSelector.BlockPoller-2" - Thread t@17
   java.lang.Thread.State: RUNNABLE
                             at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
                             at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
                             at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
                             at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
                             - locked <648895e4> (a sun.nio.ch.Util$3)
                             - locked <17921532> (a java.util.Collections$UnmodifiableSet)
                             - locked <67762111> (a sun.nio.ch.EPollSelectorImpl)
                             at sun.nio.ch.SelectorImpl.selectNow(SelectorImpl.java:105)
                             at org.apache.tomcat.util.net.NioBlockingSelector$BlockPoller.run(NioBlockingSelector.java:339)

   Locked ownable synchronizers:
                             - None

It seems like the worker thread is trying to read from the underlying socket. When we inspect the OS tcp/ip statistics using the ss-command (ss -i src :<ajp-port>)  the following connections are suspicious:

State      Recv-Q Send-Q Local Address:Port                 Peer Address:Port
CLOSE-WAIT 2      0      10.x.x.x:48317                10.x.x.11:53724
                              cubic wscale:7,7 rto:202 rtt:1.724/2.424 ato:40 mss:1448 cwnd:10 ssthresh:31 bytes_acked:343869507 bytes_received:22259372 segs_out:302687 segs_in:141943 send 67.2Mbps lastsnd:1290791171 lastrcv:1290786778 lastack:1290786767 pacing_rate 134.4Mbps retrans:0/433 rcv_rtt:1.875 rcv_space:28960
CLOSE-WAIT 1      0      10.x.x.x:48317                10.x.x.12:41990
                              cubic wscale:7,7 rto:202 rtt:1.132/0.899 ato:40 mss:1448 cwnd:10 ssthresh:22 bytes_acked:37347221 bytes_received:3346200 segs_out:35273 segs_in:20608 send 102.3Mbps lastsnd:289840681 lastrcv:289730802 lastack:289720796 pacing_rate 204.5Mbps retrans:0/13 reordering:24 rcv_rtt:1.875 rcv_space:28960
CLOSE-WAIT 1      0      10.x.x.x:48317                10.x.x.12:60068
                              cubic wscale:7,7 rto:209 rtt:8.757/14.317 ato:40 mss:1448 cwnd:10 ssthresh:36 bytes_acked:180903401 bytes_received:16401965 segs_out:170017 segs_in:100392 send 13.2Mbps lastsnd:395602314 lastrcv:395602267 lastack:395602257 pacing_rate 26.5Mbps retrans:0/106 rcv_rtt:1.875 rcv_space:28960
CLOSE-WAIT 2      0      10.x.x.x:48317                10.x.x.11:50628
                              cubic wscale:7,7 rto:203 rtt:2.992/4.252 ato:40 mss:1448 cwnd:13 ssthresh:25 bytes_acked:4185335 bytes_received:471220 segs_out:4193 segs_in:2181 send 50.3Mbps lastsnd:1036916135 lastrcv:1036871377 lastack:1036871367 pacing_rate 100.6Mbps retrans:0/1 reordering:26 rcv_rtt:6.375 rcv_space:28960

We are running our Tomcat node behind and Apache server and use Tomcat Connector (mod_jk) as a load balancer. When looking at the hosts running the Apache node the corresponding sockets are closed. To us it seems that Tomcat has not properly closed the underlying socket when Apache/mod_jk initiated a FIN and is instead continuously trying to read from the socket even though it has been closed on the Apache side ?

I know that we are one version behind the latest release of Tomcat 8.0.x but I checked the changelog and I could not see any fixed issues matching this problem. Unfortunately we are only seeing this in production and we have not been able to reproduce in test. Has anyone experienced a similar issue like this before ?

For reference our connector configuration in server.xml looks like this:

<Connector port="48317"
               protocol="AJP/1.3"
               maxThreads="200"
               connectionTimeout="600000"
               xpoweredBy="false"
               allowTrace="false"
               URIEncoding="UTF-8"
               requiredSecret="xxx"
               secure="true"
               scheme="https"
               proxyPort="443" />

And our Tomcat Connector configuration for Apache in workers.properties looks like this:

worker.list = status,app

# Create workers for app
worker.app-1.type=ajp13
worker.app-1.host=host8128.corpdomain.internal
worker.app-1.port=48317
worker.app-1.ping_mode=A
worker.app-1.connection_pool_timeout=600
worker.app-1.secret=xxx
worker.app-2.type=ajp13
worker.app-2.host=host8129.corpdomain.internal
worker.app-2.port=48317
worker.app-2.ping_mode=A
worker.app-2.connection_pool_timeout=600
worker.app-2.secret=xxx

# Create a loadbalancer for app
worker.app.type=lb
worker.app.sticky_session=1
worker.app.balance_workers=app-1, app-2

Best regards,
Andreas


Andreas Toom
Systemutvecklare/IT
TEL: +46 702 03 04 98
[Volvofinans Bank AB]<http://www.volvofinans.se/>




Re: Ajp Nio-thread stuck in loop and consuming a lot of cpu

Posted by "Terence M. Bandoian" <te...@tmbsw.com>.
On 1/18/2018 7:45 AM, Rainer Jung wrote:
> Just an addition to one of Mark's questions:
>
> Am 17.01.2018 um 22:20 schrieb Mark Thomas:
>> Is it always the same threads generating the load or does it move
>> between threads?
>
> Just in case Andreas is not aware: one can check with "top -H -p <PID>".
>
> Using -H lets top show one line per thread instead of one line per 
> process. That way you can easily see CPU use per thread. The "-p 
> <PID>" would simply filter to only show the threads in your java 
> process (by using for <PID> the process ID of your Tomcat java process).
>
> Furthermore the PID column in "top -H" does not actually show the PID, 
> but instead the thread IDs. If you convert that decimal numbers to 
> hex, e.g. using the shell commandline
>
> printf "0x%x\n" <ID>
>
> (and replacing <ID> by the respective thread id of the lines that show 
> high CPU usage), you get a hex number that will point you to the 
> respective thread in the java thread dump by looking for tid=0x... in 
> the thread header lines. The association between operating system 
> thread number as shown in "top -H" and the tid in the Java thread dump 
> does is stable during the lifetime of the process.
>
> Regards,
>
> Rainer
>


Thanks for posting this!

-Terence Bandoian
http://www.tmbsw.com/


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


RE: Ajp Nio-thread stuck in loop and consuming a lot of cpu

Posted by Toom Andreas <An...@volvofinans.se>.
Hi Greg,

Thanks for looking into this issue. Unfortunately I am 100% sure we are running Tomcat 8.0.47 so the referenced patch/fix is already applied.

Best regards,
Andreas

-----Original Message-----
From: Greg Huber [mailto:gregh3269@gmail.com] 
Sent: den 18 januari 2018 14:56
To: Tomcat Users List <us...@tomcat.apache.org>
Subject: Re: Ajp Nio-thread stuck in loop and consuming a lot of cpu

Following this thread, I run the same setup, apache/mod_jk/ tomcat.  There was a problem a while back where the cpu went to max for me,  as my server is slow and thread blocking more of a possibility.

https://bz.apache.org/bugzilla/show_bug.cgi?id=58151

Although this was in version 8.0.23, are you sure your using 8.0.47 jars?

Cheers Greg


On 18 January 2018 at 13:45, Rainer Jung <ra...@kippdata.de> wrote:

> Just an addition to one of Mark's questions:
>
> Am 17.01.2018 um 22:20 schrieb Mark Thomas:
>
>> Is it always the same threads generating the load or does it move 
>> between threads?
>>
>
> Just in case Andreas is not aware: one can check with "top -H -p <PID>".
>
> Using -H lets top show one line per thread instead of one line per 
> process. That way you can easily see CPU use per thread. The "-p <PID>"
> would simply filter to only show the threads in your java process (by 
> using for <PID> the process ID of your Tomcat java process).
>
> Furthermore the PID column in "top -H" does not actually show the PID, 
> but instead the thread IDs. If you convert that decimal numbers to hex, e.g.
> using the shell commandline
>
> printf "0x%x\n" <ID>
>
> (and replacing <ID> by the respective thread id of the lines that show 
> high CPU usage), you get a hex number that will point you to the 
> respective thread in the java thread dump by looking for tid=0x... in 
> the thread header lines. The association between operating system 
> thread number as shown in "top -H" and the tid in the Java thread dump 
> does is stable during the lifetime of the process.
>
> Regards,
>
> Rainer
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
> For additional commands, e-mail: users-help@tomcat.apache.org
>
>

Re: Ajp Nio-thread stuck in loop and consuming a lot of cpu

Posted by Greg Huber <gr...@gmail.com>.
Following this thread, I run the same setup, apache/mod_jk/ tomcat.  There
was a problem a while back where the cpu went to max for me,  as my server
is slow and thread blocking more of a possibility.

https://bz.apache.org/bugzilla/show_bug.cgi?id=58151

Although this was in version 8.0.23, are you sure your using 8.0.47 jars?

Cheers Greg


On 18 January 2018 at 13:45, Rainer Jung <ra...@kippdata.de> wrote:

> Just an addition to one of Mark's questions:
>
> Am 17.01.2018 um 22:20 schrieb Mark Thomas:
>
>> Is it always the same threads generating the load or does it move
>> between threads?
>>
>
> Just in case Andreas is not aware: one can check with "top -H -p <PID>".
>
> Using -H lets top show one line per thread instead of one line per
> process. That way you can easily see CPU use per thread. The "-p <PID>"
> would simply filter to only show the threads in your java process (by using
> for <PID> the process ID of your Tomcat java process).
>
> Furthermore the PID column in "top -H" does not actually show the PID, but
> instead the thread IDs. If you convert that decimal numbers to hex, e.g.
> using the shell commandline
>
> printf "0x%x\n" <ID>
>
> (and replacing <ID> by the respective thread id of the lines that show
> high CPU usage), you get a hex number that will point you to the respective
> thread in the java thread dump by looking for tid=0x... in the thread
> header lines. The association between operating system thread number as
> shown in "top -H" and the tid in the Java thread dump does is stable during
> the lifetime of the process.
>
> Regards,
>
> Rainer
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
> For additional commands, e-mail: users-help@tomcat.apache.org
>
>

RE: Ajp Nio-thread stuck in loop and consuming a lot of cpu

Posted by Toom Andreas <An...@volvofinans.se>.
Hi Rainer,

Thanks for your reply!

We did a restart of the application earlier this week and so far the cpu usage is at normal levels. If/when the cpu usage goes up I will check with your suggested parameters!

Best regards,
Andreas

-----Original Message-----
From: Rainer Jung [mailto:rainer.jung@kippdata.de] 
Sent: den 18 januari 2018 14:46
To: Tomcat Users List <us...@tomcat.apache.org>
Subject: Re: Ajp Nio-thread stuck in loop and consuming a lot of cpu

Just an addition to one of Mark's questions:

Am 17.01.2018 um 22:20 schrieb Mark Thomas:
> Is it always the same threads generating the load or does it move 
> between threads?

Just in case Andreas is not aware: one can check with "top -H -p <PID>".

Using -H lets top show one line per thread instead of one line per process. That way you can easily see CPU use per thread. The "-p <PID>" 
would simply filter to only show the threads in your java process (by using for <PID> the process ID of your Tomcat java process).

Furthermore the PID column in "top -H" does not actually show the PID, but instead the thread IDs. If you convert that decimal numbers to hex, e.g. using the shell commandline

printf "0x%x\n" <ID>

(and replacing <ID> by the respective thread id of the lines that show high CPU usage), you get a hex number that will point you to the respective thread in the java thread dump by looking for tid=0x... in the thread header lines. The association between operating system thread number as shown in "top -H" and the tid in the Java thread dump does is stable during the lifetime of the process.

Regards,

Rainer

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Re: Ajp Nio-thread stuck in loop and consuming a lot of cpu

Posted by Rainer Jung <ra...@kippdata.de>.
Just an addition to one of Mark's questions:

Am 17.01.2018 um 22:20 schrieb Mark Thomas:
> Is it always the same threads generating the load or does it move
> between threads?

Just in case Andreas is not aware: one can check with "top -H -p <PID>".

Using -H lets top show one line per thread instead of one line per 
process. That way you can easily see CPU use per thread. The "-p <PID>" 
would simply filter to only show the threads in your java process (by 
using for <PID> the process ID of your Tomcat java process).

Furthermore the PID column in "top -H" does not actually show the PID, 
but instead the thread IDs. If you convert that decimal numbers to hex, 
e.g. using the shell commandline

printf "0x%x\n" <ID>

(and replacing <ID> by the respective thread id of the lines that show 
high CPU usage), you get a hex number that will point you to the 
respective thread in the java thread dump by looking for tid=0x... in 
the thread header lines. The association between operating system thread 
number as shown in "top -H" and the tid in the Java thread dump does is 
stable during the lifetime of the process.

Regards,

Rainer

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


RE: Ajp Nio-thread stuck in loop and consuming a lot of cpu

Posted by Toom Andreas <An...@volvofinans.se>.
Hi Mark,

Thanks for the info - I think we will focus on migrating to the 8.5.x branch first. If we experience the same issue on that branch I will restart our investigation and might come back with additional debugging information to see if we can identify the root cause.

Thanks for your help!

Best regards,
Andreas
 
-----Original Message-----
From: Mark Thomas [mailto:markt@apache.org] 
Sent: den 19 januari 2018 18:46
To: Tomcat Users List <us...@tomcat.apache.org>
Subject: Re: Ajp Nio-thread stuck in loop and consuming a lot of cpu

On 19/01/2018 14:35, Toom Andreas wrote:
> Hi Mark,
> 
> Sorry for the late reply, it has been a couple of busy days.
> 
> We are running our application on two nodes and we have seen the issue on both nodes. These machines are running quad core cpus. What we have usually seen is that the application jvm process is consuming 100% (basically one core) on one machine and 200% (2 cores) on the other machine. The cpu usage/core usage seems to correlate quite well with the number of looping threads that we can observe via jmx.
> 
> The load does not move between threads, it is pinned to the looping thread.
> 
> We have only observed this behavior on one application and only in production. "Unfortunately" we did a deployment this week so both nodes have been restarted which means that at the moment the cpu usage is down to almost 0.
> 
> We are planning to update to Tomcat 8.5.x in the coming week (planned update, not a forced update related to this issue). Are you aware of any major code changes between the 8.0.x and 8.5.x branches related to this functionality  ?

There is a significant refactoring of the connectors between 8.0.x and 8.5.x to reduce code duplication, improve consistency of behaviour between the implementations and to support HTTP/2.

Mark


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Re: Ajp Nio-thread stuck in loop and consuming a lot of cpu

Posted by Mark Thomas <ma...@apache.org>.
On 19/01/2018 14:35, Toom Andreas wrote:
> Hi Mark,
> 
> Sorry for the late reply, it has been a couple of busy days.
> 
> We are running our application on two nodes and we have seen the issue on both nodes. These machines are running quad core cpus. What we have usually seen is that the application jvm process is consuming 100% (basically one core) on one machine and 200% (2 cores) on the other machine. The cpu usage/core usage seems to correlate quite well with the number of looping threads that we can observe via jmx.
> 
> The load does not move between threads, it is pinned to the looping thread.
> 
> We have only observed this behavior on one application and only in production. "Unfortunately" we did a deployment this week so both nodes have been restarted which means that at the moment the cpu usage is down to almost 0.
> 
> We are planning to update to Tomcat 8.5.x in the coming week (planned update, not a forced update related to this issue). Are you aware of any major code changes between the 8.0.x and 8.5.x branches related to this functionality  ?

There is a significant refactoring of the connectors between 8.0.x and
8.5.x to reduce code duplication, improve consistency of behaviour
between the implementations and to support HTTP/2.

Mark


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


RE: Ajp Nio-thread stuck in loop and consuming a lot of cpu

Posted by Toom Andreas <An...@volvofinans.se>.
Hi Mark,

Sorry for the late reply, it has been a couple of busy days.

We are running our application on two nodes and we have seen the issue on both nodes. These machines are running quad core cpus. What we have usually seen is that the application jvm process is consuming 100% (basically one core) on one machine and 200% (2 cores) on the other machine. The cpu usage/core usage seems to correlate quite well with the number of looping threads that we can observe via jmx.

The load does not move between threads, it is pinned to the looping thread.

We have only observed this behavior on one application and only in production. "Unfortunately" we did a deployment this week so both nodes have been restarted which means that at the moment the cpu usage is down to almost 0.

We are planning to update to Tomcat 8.5.x in the coming week (planned update, not a forced update related to this issue). Are you aware of any major code changes between the 8.0.x and 8.5.x branches related to this functionality  ?

Best regards,
Andreas

-----Original Message-----
From: Mark Thomas [mailto:markt@apache.org] 
Sent: den 17 januari 2018 22:20
To: Tomcat Users List <us...@tomcat.apache.org>
Subject: Re: Ajp Nio-thread stuck in loop and consuming a lot of cpu

On 16/01/18 23:21, Mark Thomas wrote:
> On 16/01/18 09:21, Toom Andreas wrote:
>> Hi Mark,
>>
>> We pulled out a CPU Call Tree using JVisualVM instead of YourKit and 
>> I have uploaded a screenshot of it here: https://imgur.com/a/mqYxn
>>
>> There is not much extra information compared to the java thread dump in my initial post but it highlights in which code branch/method call most of the cpu time is spent. Perhaps it will give some additional clues ? Let me know if you think that it is worthwhile to use YourKit instead of JVisualVM and I will try to get that for you instead.
> 
> Thanks. This has given me a couple of ideas. I need to do some more 
> testing. I hope to be able to get this done in the next day or so.

Unfortunately my ideas didn't pan out.

What levels of CPU usage are you observing?

Is it always the same threads generating the load or does it move between threads?

Can you reproduce this on a system you can attach a remote debugger to?

What we really need to do is understand the code path that is being taken and why. I can see the likely code path in the data we have so far but nothing that looks abnormal.

Mark


> 
> Mark
> 
> 
>>
>> Best regards,
>> Andreas
>>
>> -----Original Message-----
>> From: Mark Thomas [mailto:markt@apache.org]
>> Sent: den 12 januari 2018 11:27
>> To: Tomcat Users List <us...@tomcat.apache.org>
>> Subject: Re: Ajp Nio-thread stuck in loop and consuming a lot of cpu
>>
>> On 12/01/18 06:58, Toom Andreas wrote:
>>> Hi,
>>>
>>> We are having an issue with an application running Apache Tomcat 8.0.47 using Oracle Jvm 1.8.0.151 on Linux (RHEL 7). The Tomcat process is consuming  cpu at a constant high level even though there is a low amount of incoming traffic. When inspecting the process health using JMX /JVisualVM CPU Sampler we see that there are 4 ajp-nio-exec threads together with a NioBlockingSelector.BlockPoller thread that consume most of the cpu.
>>
>> Are you able to determine the code path of the loop? A single stack trace can point us in the right direction but often the root cause can still be tricky to track down.
>>
>> A profiler such as YourKit in CPU tracing mode should be able to provide enough info to figure this out (and won't require any code changes to your system).
>>
>> Remote debugging is the other option but that is less ideal in production.
>>
>> Mark
>>
>>
>>>
>>> A stack trace of one of the ajp-io-exec threads looks like this:
>>>
>>> "ajp-nio-48317-exec-14233" - Thread t@201195
>>>    java.lang.Thread.State: TIMED_WAITING
>>>                              at sun.misc.Unsafe.park(Native Method)
>>>                              - parking to wait for <342fab60> (a java.util.concurrent.CountDownLatch$Sync)
>>>                              at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
>>>                              at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
>>>                              at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
>>>                              at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277)
>>>                              at org.apache.tomcat.util.net.NioEndpoint$KeyAttachment.awaitLatch(NioEndpoint.java:1400)
>>>                              at org.apache.tomcat.util.net.NioEndpoint$KeyAttachment.awaitReadLatch(NioEndpoint.java:1402)
>>>                              at org.apache.tomcat.util.net.NioBlockingSelector.read(NioBlockingSelector.java:185)
>>>                              at org.apache.tomcat.util.net.NioSelectorPool.read(NioSelectorPool.java:250)
>>>                              at org.apache.tomcat.util.net.NioSelectorPool.read(NioSelectorPool.java:231)
>>>                              at org.apache.coyote.ajp.AjpNioProcessor.readSocket(AjpNioProcessor.java:194)
>>>                              at org.apache.coyote.ajp.AjpNioProcessor.read(AjpNioProcessor.java:160)
>>>                              at org.apache.coyote.ajp.AbstractAjpProcessor.readMessage(AbstractAjpProcessor.java:1091)
>>>                              at org.apache.coyote.ajp.AbstractAjpProcessor.process(AbstractAjpProcessor.java:804)
>>>                              at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:684)
>>>                              at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1539)
>>>                              at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:1495)
>>>                              - locked <279cc7f7> (a org.apache.tomcat.util.net.NioChannel)
>>>                              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>>>                              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>>>                              at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
>>>                              at 
>>> java.lang.Thread.run(Thread.java:748)
>>>
>>>    Locked ownable synchronizers:
>>>                              - locked <10e1f474> (a
>>> java.util.concurrent.ThreadPoolExecutor$Worker)
>>>
>>> A stack trace of the NioBlockingSelector.BlockPoller thread looks like this:
>>>
>>> "NioBlockingSelector.BlockPoller-2" - Thread t@17
>>>    java.lang.Thread.State: RUNNABLE
>>>                              at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
>>>                              at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
>>>                              at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
>>>                              at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
>>>                              - locked <648895e4> (a sun.nio.ch.Util$3)
>>>                              - locked <17921532> (a java.util.Collections$UnmodifiableSet)
>>>                              - locked <67762111> (a sun.nio.ch.EPollSelectorImpl)
>>>                              at sun.nio.ch.SelectorImpl.selectNow(SelectorImpl.java:105)
>>>                              at
>>> org.apache.tomcat.util.net.NioBlockingSelector$BlockPoller.run(NioBl
>>> oc
>>> kingSelector.java:339)
>>>
>>>    Locked ownable synchronizers:
>>>                              - None
>>>
>>> It seems like the worker thread is trying to read from the underlying socket. When we inspect the OS tcp/ip statistics using the ss-command (ss -i src :<ajp-port>)  the following connections are suspicious:
>>>
>>> State      Recv-Q Send-Q Local Address:Port                 Peer Address:Port
>>> CLOSE-WAIT 2      0      10.x.x.x:48317                10.x.x.11:53724
>>>                               cubic wscale:7,7 rto:202 rtt:1.724/2.424 ato:40 mss:1448 cwnd:10 ssthresh:31 bytes_acked:343869507 bytes_received:22259372 segs_out:302687 segs_in:141943 send 67.2Mbps lastsnd:1290791171 lastrcv:1290786778 lastack:1290786767 pacing_rate 134.4Mbps retrans:0/433 rcv_rtt:1.875 rcv_space:28960
>>> CLOSE-WAIT 1      0      10.x.x.x:48317                10.x.x.12:41990
>>>                               cubic wscale:7,7 rto:202 rtt:1.132/0.899 ato:40 mss:1448 cwnd:10 ssthresh:22 bytes_acked:37347221 bytes_received:3346200 segs_out:35273 segs_in:20608 send 102.3Mbps lastsnd:289840681 lastrcv:289730802 lastack:289720796 pacing_rate 204.5Mbps retrans:0/13 reordering:24 rcv_rtt:1.875 rcv_space:28960
>>> CLOSE-WAIT 1      0      10.x.x.x:48317                10.x.x.12:60068
>>>                               cubic wscale:7,7 rto:209 rtt:8.757/14.317 ato:40 mss:1448 cwnd:10 ssthresh:36 bytes_acked:180903401 bytes_received:16401965 segs_out:170017 segs_in:100392 send 13.2Mbps lastsnd:395602314 lastrcv:395602267 lastack:395602257 pacing_rate 26.5Mbps retrans:0/106 rcv_rtt:1.875 rcv_space:28960
>>> CLOSE-WAIT 2      0      10.x.x.x:48317                10.x.x.11:50628
>>>                               cubic wscale:7,7 rto:203 
>>> rtt:2.992/4.252
>>> ato:40 mss:1448 cwnd:13 ssthresh:25 bytes_acked:4185335
>>> bytes_received:471220 segs_out:4193 segs_in:2181 send 50.3Mbps
>>> lastsnd:1036916135 lastrcv:1036871377 lastack:1036871367 pacing_rate 
>>> 100.6Mbps retrans:0/1 reordering:26 rcv_rtt:6.375 rcv_space:28960
>>>
>>> We are running our Tomcat node behind and Apache server and use Tomcat Connector (mod_jk) as a load balancer. When looking at the hosts running the Apache node the corresponding sockets are closed. To us it seems that Tomcat has not properly closed the underlying socket when Apache/mod_jk initiated a FIN and is instead continuously trying to read from the socket even though it has been closed on the Apache side ?
>>>
>>> I know that we are one version behind the latest release of Tomcat 8.0.x but I checked the changelog and I could not see any fixed issues matching this problem. Unfortunately we are only seeing this in production and we have not been able to reproduce in test. Has anyone experienced a similar issue like this before ?
>>>
>>> For reference our connector configuration in server.xml looks like this:
>>>
>>> <Connector port="48317"
>>>                protocol="AJP/1.3"
>>>                maxThreads="200"
>>>                connectionTimeout="600000"
>>>                xpoweredBy="false"
>>>                allowTrace="false"
>>>                URIEncoding="UTF-8"
>>>                requiredSecret="xxx"
>>>                secure="true"
>>>                scheme="https"
>>>                proxyPort="443" />
>>>
>>> And our Tomcat Connector configuration for Apache in workers.properties looks like this:
>>>
>>> worker.list = status,app
>>>
>>> # Create workers for app
>>> worker.app-1.type=ajp13
>>> worker.app-1.host=host8128.corpdomain.internal
>>> worker.app-1.port=48317
>>> worker.app-1.ping_mode=A
>>> worker.app-1.connection_pool_timeout=600
>>> worker.app-1.secret=xxx
>>> worker.app-2.type=ajp13
>>> worker.app-2.host=host8129.corpdomain.internal
>>> worker.app-2.port=48317
>>> worker.app-2.ping_mode=A
>>> worker.app-2.connection_pool_timeout=600
>>> worker.app-2.secret=xxx
>>>
>>> # Create a loadbalancer for app
>>> worker.app.type=lb
>>> worker.app.sticky_session=1
>>> worker.app.balance_workers=app-1, app-2
>>>
>>> Best regards,
>>> Andreas
>>>
>>>
>>> Andreas Toom
>>> Systemutvecklare/IT
>>> TEL: +46 702 03 04 98
>>> [Volvofinans Bank AB]<http://www.volvofinans.se/>
>>>
>>>
>>>
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
>> For additional commands, e-mail: users-help@tomcat.apache.org
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
>> For additional commands, e-mail: users-help@tomcat.apache.org
>>
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
> For additional commands, e-mail: users-help@tomcat.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Re: Ajp Nio-thread stuck in loop and consuming a lot of cpu

Posted by Mark Thomas <ma...@apache.org>.
On 16/01/18 23:21, Mark Thomas wrote:
> On 16/01/18 09:21, Toom Andreas wrote:
>> Hi Mark,
>>
>> We pulled out a CPU Call Tree using JVisualVM instead of YourKit and I have uploaded a screenshot of it here: https://imgur.com/a/mqYxn
>>
>> There is not much extra information compared to the java thread dump in my initial post but it highlights in which code branch/method call most of the cpu time is spent. Perhaps it will give some additional clues ? Let me know if you think that it is worthwhile to use YourKit instead of JVisualVM and I will try to get that for you instead.
> 
> Thanks. This has given me a couple of ideas. I need to do some more
> testing. I hope to be able to get this done in the next day or so.

Unfortunately my ideas didn't pan out.

What levels of CPU usage are you observing?

Is it always the same threads generating the load or does it move
between threads?

Can you reproduce this on a system you can attach a remote debugger to?

What we really need to do is understand the code path that is being
taken and why. I can see the likely code path in the data we have so far
but nothing that looks abnormal.

Mark


> 
> Mark
> 
> 
>>
>> Best regards,
>> Andreas 
>>
>> -----Original Message-----
>> From: Mark Thomas [mailto:markt@apache.org] 
>> Sent: den 12 januari 2018 11:27
>> To: Tomcat Users List <us...@tomcat.apache.org>
>> Subject: Re: Ajp Nio-thread stuck in loop and consuming a lot of cpu
>>
>> On 12/01/18 06:58, Toom Andreas wrote:
>>> Hi,
>>>
>>> We are having an issue with an application running Apache Tomcat 8.0.47 using Oracle Jvm 1.8.0.151 on Linux (RHEL 7). The Tomcat process is consuming  cpu at a constant high level even though there is a low amount of incoming traffic. When inspecting the process health using JMX /JVisualVM CPU Sampler we see that there are 4 ajp-nio-exec threads together with a NioBlockingSelector.BlockPoller thread that consume most of the cpu.
>>
>> Are you able to determine the code path of the loop? A single stack trace can point us in the right direction but often the root cause can still be tricky to track down.
>>
>> A profiler such as YourKit in CPU tracing mode should be able to provide enough info to figure this out (and won't require any code changes to your system).
>>
>> Remote debugging is the other option but that is less ideal in production.
>>
>> Mark
>>
>>
>>>
>>> A stack trace of one of the ajp-io-exec threads looks like this:
>>>
>>> "ajp-nio-48317-exec-14233" - Thread t@201195
>>>    java.lang.Thread.State: TIMED_WAITING
>>>                              at sun.misc.Unsafe.park(Native Method)
>>>                              - parking to wait for <342fab60> (a java.util.concurrent.CountDownLatch$Sync)
>>>                              at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
>>>                              at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
>>>                              at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
>>>                              at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277)
>>>                              at org.apache.tomcat.util.net.NioEndpoint$KeyAttachment.awaitLatch(NioEndpoint.java:1400)
>>>                              at org.apache.tomcat.util.net.NioEndpoint$KeyAttachment.awaitReadLatch(NioEndpoint.java:1402)
>>>                              at org.apache.tomcat.util.net.NioBlockingSelector.read(NioBlockingSelector.java:185)
>>>                              at org.apache.tomcat.util.net.NioSelectorPool.read(NioSelectorPool.java:250)
>>>                              at org.apache.tomcat.util.net.NioSelectorPool.read(NioSelectorPool.java:231)
>>>                              at org.apache.coyote.ajp.AjpNioProcessor.readSocket(AjpNioProcessor.java:194)
>>>                              at org.apache.coyote.ajp.AjpNioProcessor.read(AjpNioProcessor.java:160)
>>>                              at org.apache.coyote.ajp.AbstractAjpProcessor.readMessage(AbstractAjpProcessor.java:1091)
>>>                              at org.apache.coyote.ajp.AbstractAjpProcessor.process(AbstractAjpProcessor.java:804)
>>>                              at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:684)
>>>                              at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1539)
>>>                              at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:1495)
>>>                              - locked <279cc7f7> (a org.apache.tomcat.util.net.NioChannel)
>>>                              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>>>                              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>>>                              at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
>>>                              at java.lang.Thread.run(Thread.java:748)
>>>
>>>    Locked ownable synchronizers:
>>>                              - locked <10e1f474> (a 
>>> java.util.concurrent.ThreadPoolExecutor$Worker)
>>>
>>> A stack trace of the NioBlockingSelector.BlockPoller thread looks like this:
>>>
>>> "NioBlockingSelector.BlockPoller-2" - Thread t@17
>>>    java.lang.Thread.State: RUNNABLE
>>>                              at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
>>>                              at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
>>>                              at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
>>>                              at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
>>>                              - locked <648895e4> (a sun.nio.ch.Util$3)
>>>                              - locked <17921532> (a java.util.Collections$UnmodifiableSet)
>>>                              - locked <67762111> (a sun.nio.ch.EPollSelectorImpl)
>>>                              at sun.nio.ch.SelectorImpl.selectNow(SelectorImpl.java:105)
>>>                              at 
>>> org.apache.tomcat.util.net.NioBlockingSelector$BlockPoller.run(NioBloc
>>> kingSelector.java:339)
>>>
>>>    Locked ownable synchronizers:
>>>                              - None
>>>
>>> It seems like the worker thread is trying to read from the underlying socket. When we inspect the OS tcp/ip statistics using the ss-command (ss -i src :<ajp-port>)  the following connections are suspicious:
>>>
>>> State      Recv-Q Send-Q Local Address:Port                 Peer Address:Port
>>> CLOSE-WAIT 2      0      10.x.x.x:48317                10.x.x.11:53724
>>>                               cubic wscale:7,7 rto:202 rtt:1.724/2.424 ato:40 mss:1448 cwnd:10 ssthresh:31 bytes_acked:343869507 bytes_received:22259372 segs_out:302687 segs_in:141943 send 67.2Mbps lastsnd:1290791171 lastrcv:1290786778 lastack:1290786767 pacing_rate 134.4Mbps retrans:0/433 rcv_rtt:1.875 rcv_space:28960
>>> CLOSE-WAIT 1      0      10.x.x.x:48317                10.x.x.12:41990
>>>                               cubic wscale:7,7 rto:202 rtt:1.132/0.899 ato:40 mss:1448 cwnd:10 ssthresh:22 bytes_acked:37347221 bytes_received:3346200 segs_out:35273 segs_in:20608 send 102.3Mbps lastsnd:289840681 lastrcv:289730802 lastack:289720796 pacing_rate 204.5Mbps retrans:0/13 reordering:24 rcv_rtt:1.875 rcv_space:28960
>>> CLOSE-WAIT 1      0      10.x.x.x:48317                10.x.x.12:60068
>>>                               cubic wscale:7,7 rto:209 rtt:8.757/14.317 ato:40 mss:1448 cwnd:10 ssthresh:36 bytes_acked:180903401 bytes_received:16401965 segs_out:170017 segs_in:100392 send 13.2Mbps lastsnd:395602314 lastrcv:395602267 lastack:395602257 pacing_rate 26.5Mbps retrans:0/106 rcv_rtt:1.875 rcv_space:28960
>>> CLOSE-WAIT 2      0      10.x.x.x:48317                10.x.x.11:50628
>>>                               cubic wscale:7,7 rto:203 rtt:2.992/4.252 
>>> ato:40 mss:1448 cwnd:13 ssthresh:25 bytes_acked:4185335 
>>> bytes_received:471220 segs_out:4193 segs_in:2181 send 50.3Mbps 
>>> lastsnd:1036916135 lastrcv:1036871377 lastack:1036871367 pacing_rate 
>>> 100.6Mbps retrans:0/1 reordering:26 rcv_rtt:6.375 rcv_space:28960
>>>
>>> We are running our Tomcat node behind and Apache server and use Tomcat Connector (mod_jk) as a load balancer. When looking at the hosts running the Apache node the corresponding sockets are closed. To us it seems that Tomcat has not properly closed the underlying socket when Apache/mod_jk initiated a FIN and is instead continuously trying to read from the socket even though it has been closed on the Apache side ?
>>>
>>> I know that we are one version behind the latest release of Tomcat 8.0.x but I checked the changelog and I could not see any fixed issues matching this problem. Unfortunately we are only seeing this in production and we have not been able to reproduce in test. Has anyone experienced a similar issue like this before ?
>>>
>>> For reference our connector configuration in server.xml looks like this:
>>>
>>> <Connector port="48317"
>>>                protocol="AJP/1.3"
>>>                maxThreads="200"
>>>                connectionTimeout="600000"
>>>                xpoweredBy="false"
>>>                allowTrace="false"
>>>                URIEncoding="UTF-8"
>>>                requiredSecret="xxx"
>>>                secure="true"
>>>                scheme="https"
>>>                proxyPort="443" />
>>>
>>> And our Tomcat Connector configuration for Apache in workers.properties looks like this:
>>>
>>> worker.list = status,app
>>>
>>> # Create workers for app
>>> worker.app-1.type=ajp13
>>> worker.app-1.host=host8128.corpdomain.internal
>>> worker.app-1.port=48317
>>> worker.app-1.ping_mode=A
>>> worker.app-1.connection_pool_timeout=600
>>> worker.app-1.secret=xxx
>>> worker.app-2.type=ajp13
>>> worker.app-2.host=host8129.corpdomain.internal
>>> worker.app-2.port=48317
>>> worker.app-2.ping_mode=A
>>> worker.app-2.connection_pool_timeout=600
>>> worker.app-2.secret=xxx
>>>
>>> # Create a loadbalancer for app
>>> worker.app.type=lb
>>> worker.app.sticky_session=1
>>> worker.app.balance_workers=app-1, app-2
>>>
>>> Best regards,
>>> Andreas
>>>
>>>
>>> Andreas Toom
>>> Systemutvecklare/IT
>>> TEL: +46 702 03 04 98
>>> [Volvofinans Bank AB]<http://www.volvofinans.se/>
>>>
>>>
>>>
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
>> For additional commands, e-mail: users-help@tomcat.apache.org
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
>> For additional commands, e-mail: users-help@tomcat.apache.org
>>
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
> For additional commands, e-mail: users-help@tomcat.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Re: Ajp Nio-thread stuck in loop and consuming a lot of cpu

Posted by Mark Thomas <ma...@apache.org>.
On 16/01/18 09:21, Toom Andreas wrote:
> Hi Mark,
> 
> We pulled out a CPU Call Tree using JVisualVM instead of YourKit and I have uploaded a screenshot of it here: https://imgur.com/a/mqYxn
> 
> There is not much extra information compared to the java thread dump in my initial post but it highlights in which code branch/method call most of the cpu time is spent. Perhaps it will give some additional clues ? Let me know if you think that it is worthwhile to use YourKit instead of JVisualVM and I will try to get that for you instead.

Thanks. This has given me a couple of ideas. I need to do some more
testing. I hope to be able to get this done in the next day or so.

Mark


> 
> Best regards,
> Andreas 
> 
> -----Original Message-----
> From: Mark Thomas [mailto:markt@apache.org] 
> Sent: den 12 januari 2018 11:27
> To: Tomcat Users List <us...@tomcat.apache.org>
> Subject: Re: Ajp Nio-thread stuck in loop and consuming a lot of cpu
> 
> On 12/01/18 06:58, Toom Andreas wrote:
>> Hi,
>>
>> We are having an issue with an application running Apache Tomcat 8.0.47 using Oracle Jvm 1.8.0.151 on Linux (RHEL 7). The Tomcat process is consuming  cpu at a constant high level even though there is a low amount of incoming traffic. When inspecting the process health using JMX /JVisualVM CPU Sampler we see that there are 4 ajp-nio-exec threads together with a NioBlockingSelector.BlockPoller thread that consume most of the cpu.
> 
> Are you able to determine the code path of the loop? A single stack trace can point us in the right direction but often the root cause can still be tricky to track down.
> 
> A profiler such as YourKit in CPU tracing mode should be able to provide enough info to figure this out (and won't require any code changes to your system).
> 
> Remote debugging is the other option but that is less ideal in production.
> 
> Mark
> 
> 
>>
>> A stack trace of one of the ajp-io-exec threads looks like this:
>>
>> "ajp-nio-48317-exec-14233" - Thread t@201195
>>    java.lang.Thread.State: TIMED_WAITING
>>                              at sun.misc.Unsafe.park(Native Method)
>>                              - parking to wait for <342fab60> (a java.util.concurrent.CountDownLatch$Sync)
>>                              at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
>>                              at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
>>                              at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
>>                              at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277)
>>                              at org.apache.tomcat.util.net.NioEndpoint$KeyAttachment.awaitLatch(NioEndpoint.java:1400)
>>                              at org.apache.tomcat.util.net.NioEndpoint$KeyAttachment.awaitReadLatch(NioEndpoint.java:1402)
>>                              at org.apache.tomcat.util.net.NioBlockingSelector.read(NioBlockingSelector.java:185)
>>                              at org.apache.tomcat.util.net.NioSelectorPool.read(NioSelectorPool.java:250)
>>                              at org.apache.tomcat.util.net.NioSelectorPool.read(NioSelectorPool.java:231)
>>                              at org.apache.coyote.ajp.AjpNioProcessor.readSocket(AjpNioProcessor.java:194)
>>                              at org.apache.coyote.ajp.AjpNioProcessor.read(AjpNioProcessor.java:160)
>>                              at org.apache.coyote.ajp.AbstractAjpProcessor.readMessage(AbstractAjpProcessor.java:1091)
>>                              at org.apache.coyote.ajp.AbstractAjpProcessor.process(AbstractAjpProcessor.java:804)
>>                              at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:684)
>>                              at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1539)
>>                              at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:1495)
>>                              - locked <279cc7f7> (a org.apache.tomcat.util.net.NioChannel)
>>                              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>>                              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>>                              at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
>>                              at java.lang.Thread.run(Thread.java:748)
>>
>>    Locked ownable synchronizers:
>>                              - locked <10e1f474> (a 
>> java.util.concurrent.ThreadPoolExecutor$Worker)
>>
>> A stack trace of the NioBlockingSelector.BlockPoller thread looks like this:
>>
>> "NioBlockingSelector.BlockPoller-2" - Thread t@17
>>    java.lang.Thread.State: RUNNABLE
>>                              at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
>>                              at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
>>                              at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
>>                              at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
>>                              - locked <648895e4> (a sun.nio.ch.Util$3)
>>                              - locked <17921532> (a java.util.Collections$UnmodifiableSet)
>>                              - locked <67762111> (a sun.nio.ch.EPollSelectorImpl)
>>                              at sun.nio.ch.SelectorImpl.selectNow(SelectorImpl.java:105)
>>                              at 
>> org.apache.tomcat.util.net.NioBlockingSelector$BlockPoller.run(NioBloc
>> kingSelector.java:339)
>>
>>    Locked ownable synchronizers:
>>                              - None
>>
>> It seems like the worker thread is trying to read from the underlying socket. When we inspect the OS tcp/ip statistics using the ss-command (ss -i src :<ajp-port>)  the following connections are suspicious:
>>
>> State      Recv-Q Send-Q Local Address:Port                 Peer Address:Port
>> CLOSE-WAIT 2      0      10.x.x.x:48317                10.x.x.11:53724
>>                               cubic wscale:7,7 rto:202 rtt:1.724/2.424 ato:40 mss:1448 cwnd:10 ssthresh:31 bytes_acked:343869507 bytes_received:22259372 segs_out:302687 segs_in:141943 send 67.2Mbps lastsnd:1290791171 lastrcv:1290786778 lastack:1290786767 pacing_rate 134.4Mbps retrans:0/433 rcv_rtt:1.875 rcv_space:28960
>> CLOSE-WAIT 1      0      10.x.x.x:48317                10.x.x.12:41990
>>                               cubic wscale:7,7 rto:202 rtt:1.132/0.899 ato:40 mss:1448 cwnd:10 ssthresh:22 bytes_acked:37347221 bytes_received:3346200 segs_out:35273 segs_in:20608 send 102.3Mbps lastsnd:289840681 lastrcv:289730802 lastack:289720796 pacing_rate 204.5Mbps retrans:0/13 reordering:24 rcv_rtt:1.875 rcv_space:28960
>> CLOSE-WAIT 1      0      10.x.x.x:48317                10.x.x.12:60068
>>                               cubic wscale:7,7 rto:209 rtt:8.757/14.317 ato:40 mss:1448 cwnd:10 ssthresh:36 bytes_acked:180903401 bytes_received:16401965 segs_out:170017 segs_in:100392 send 13.2Mbps lastsnd:395602314 lastrcv:395602267 lastack:395602257 pacing_rate 26.5Mbps retrans:0/106 rcv_rtt:1.875 rcv_space:28960
>> CLOSE-WAIT 2      0      10.x.x.x:48317                10.x.x.11:50628
>>                               cubic wscale:7,7 rto:203 rtt:2.992/4.252 
>> ato:40 mss:1448 cwnd:13 ssthresh:25 bytes_acked:4185335 
>> bytes_received:471220 segs_out:4193 segs_in:2181 send 50.3Mbps 
>> lastsnd:1036916135 lastrcv:1036871377 lastack:1036871367 pacing_rate 
>> 100.6Mbps retrans:0/1 reordering:26 rcv_rtt:6.375 rcv_space:28960
>>
>> We are running our Tomcat node behind and Apache server and use Tomcat Connector (mod_jk) as a load balancer. When looking at the hosts running the Apache node the corresponding sockets are closed. To us it seems that Tomcat has not properly closed the underlying socket when Apache/mod_jk initiated a FIN and is instead continuously trying to read from the socket even though it has been closed on the Apache side ?
>>
>> I know that we are one version behind the latest release of Tomcat 8.0.x but I checked the changelog and I could not see any fixed issues matching this problem. Unfortunately we are only seeing this in production and we have not been able to reproduce in test. Has anyone experienced a similar issue like this before ?
>>
>> For reference our connector configuration in server.xml looks like this:
>>
>> <Connector port="48317"
>>                protocol="AJP/1.3"
>>                maxThreads="200"
>>                connectionTimeout="600000"
>>                xpoweredBy="false"
>>                allowTrace="false"
>>                URIEncoding="UTF-8"
>>                requiredSecret="xxx"
>>                secure="true"
>>                scheme="https"
>>                proxyPort="443" />
>>
>> And our Tomcat Connector configuration for Apache in workers.properties looks like this:
>>
>> worker.list = status,app
>>
>> # Create workers for app
>> worker.app-1.type=ajp13
>> worker.app-1.host=host8128.corpdomain.internal
>> worker.app-1.port=48317
>> worker.app-1.ping_mode=A
>> worker.app-1.connection_pool_timeout=600
>> worker.app-1.secret=xxx
>> worker.app-2.type=ajp13
>> worker.app-2.host=host8129.corpdomain.internal
>> worker.app-2.port=48317
>> worker.app-2.ping_mode=A
>> worker.app-2.connection_pool_timeout=600
>> worker.app-2.secret=xxx
>>
>> # Create a loadbalancer for app
>> worker.app.type=lb
>> worker.app.sticky_session=1
>> worker.app.balance_workers=app-1, app-2
>>
>> Best regards,
>> Andreas
>>
>>
>> Andreas Toom
>> Systemutvecklare/IT
>> TEL: +46 702 03 04 98
>> [Volvofinans Bank AB]<http://www.volvofinans.se/>
>>
>>
>>
>>
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
> For additional commands, e-mail: users-help@tomcat.apache.org
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
> For additional commands, e-mail: users-help@tomcat.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


RE: Ajp Nio-thread stuck in loop and consuming a lot of cpu

Posted by Toom Andreas <An...@volvofinans.se>.
Hi Mark,

We pulled out a CPU Call Tree using JVisualVM instead of YourKit and I have uploaded a screenshot of it here: https://imgur.com/a/mqYxn

There is not much extra information compared to the java thread dump in my initial post but it highlights in which code branch/method call most of the cpu time is spent. Perhaps it will give some additional clues ? Let me know if you think that it is worthwhile to use YourKit instead of JVisualVM and I will try to get that for you instead.

Best regards,
Andreas 

-----Original Message-----
From: Mark Thomas [mailto:markt@apache.org] 
Sent: den 12 januari 2018 11:27
To: Tomcat Users List <us...@tomcat.apache.org>
Subject: Re: Ajp Nio-thread stuck in loop and consuming a lot of cpu

On 12/01/18 06:58, Toom Andreas wrote:
> Hi,
> 
> We are having an issue with an application running Apache Tomcat 8.0.47 using Oracle Jvm 1.8.0.151 on Linux (RHEL 7). The Tomcat process is consuming  cpu at a constant high level even though there is a low amount of incoming traffic. When inspecting the process health using JMX /JVisualVM CPU Sampler we see that there are 4 ajp-nio-exec threads together with a NioBlockingSelector.BlockPoller thread that consume most of the cpu.

Are you able to determine the code path of the loop? A single stack trace can point us in the right direction but often the root cause can still be tricky to track down.

A profiler such as YourKit in CPU tracing mode should be able to provide enough info to figure this out (and won't require any code changes to your system).

Remote debugging is the other option but that is less ideal in production.

Mark


> 
> A stack trace of one of the ajp-io-exec threads looks like this:
> 
> "ajp-nio-48317-exec-14233" - Thread t@201195
>    java.lang.Thread.State: TIMED_WAITING
>                              at sun.misc.Unsafe.park(Native Method)
>                              - parking to wait for <342fab60> (a java.util.concurrent.CountDownLatch$Sync)
>                              at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
>                              at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
>                              at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
>                              at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277)
>                              at org.apache.tomcat.util.net.NioEndpoint$KeyAttachment.awaitLatch(NioEndpoint.java:1400)
>                              at org.apache.tomcat.util.net.NioEndpoint$KeyAttachment.awaitReadLatch(NioEndpoint.java:1402)
>                              at org.apache.tomcat.util.net.NioBlockingSelector.read(NioBlockingSelector.java:185)
>                              at org.apache.tomcat.util.net.NioSelectorPool.read(NioSelectorPool.java:250)
>                              at org.apache.tomcat.util.net.NioSelectorPool.read(NioSelectorPool.java:231)
>                              at org.apache.coyote.ajp.AjpNioProcessor.readSocket(AjpNioProcessor.java:194)
>                              at org.apache.coyote.ajp.AjpNioProcessor.read(AjpNioProcessor.java:160)
>                              at org.apache.coyote.ajp.AbstractAjpProcessor.readMessage(AbstractAjpProcessor.java:1091)
>                              at org.apache.coyote.ajp.AbstractAjpProcessor.process(AbstractAjpProcessor.java:804)
>                              at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:684)
>                              at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1539)
>                              at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:1495)
>                              - locked <279cc7f7> (a org.apache.tomcat.util.net.NioChannel)
>                              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>                              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>                              at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
>                              at java.lang.Thread.run(Thread.java:748)
> 
>    Locked ownable synchronizers:
>                              - locked <10e1f474> (a 
> java.util.concurrent.ThreadPoolExecutor$Worker)
> 
> A stack trace of the NioBlockingSelector.BlockPoller thread looks like this:
> 
> "NioBlockingSelector.BlockPoller-2" - Thread t@17
>    java.lang.Thread.State: RUNNABLE
>                              at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
>                              at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
>                              at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
>                              at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
>                              - locked <648895e4> (a sun.nio.ch.Util$3)
>                              - locked <17921532> (a java.util.Collections$UnmodifiableSet)
>                              - locked <67762111> (a sun.nio.ch.EPollSelectorImpl)
>                              at sun.nio.ch.SelectorImpl.selectNow(SelectorImpl.java:105)
>                              at 
> org.apache.tomcat.util.net.NioBlockingSelector$BlockPoller.run(NioBloc
> kingSelector.java:339)
> 
>    Locked ownable synchronizers:
>                              - None
> 
> It seems like the worker thread is trying to read from the underlying socket. When we inspect the OS tcp/ip statistics using the ss-command (ss -i src :<ajp-port>)  the following connections are suspicious:
> 
> State      Recv-Q Send-Q Local Address:Port                 Peer Address:Port
> CLOSE-WAIT 2      0      10.x.x.x:48317                10.x.x.11:53724
>                               cubic wscale:7,7 rto:202 rtt:1.724/2.424 ato:40 mss:1448 cwnd:10 ssthresh:31 bytes_acked:343869507 bytes_received:22259372 segs_out:302687 segs_in:141943 send 67.2Mbps lastsnd:1290791171 lastrcv:1290786778 lastack:1290786767 pacing_rate 134.4Mbps retrans:0/433 rcv_rtt:1.875 rcv_space:28960
> CLOSE-WAIT 1      0      10.x.x.x:48317                10.x.x.12:41990
>                               cubic wscale:7,7 rto:202 rtt:1.132/0.899 ato:40 mss:1448 cwnd:10 ssthresh:22 bytes_acked:37347221 bytes_received:3346200 segs_out:35273 segs_in:20608 send 102.3Mbps lastsnd:289840681 lastrcv:289730802 lastack:289720796 pacing_rate 204.5Mbps retrans:0/13 reordering:24 rcv_rtt:1.875 rcv_space:28960
> CLOSE-WAIT 1      0      10.x.x.x:48317                10.x.x.12:60068
>                               cubic wscale:7,7 rto:209 rtt:8.757/14.317 ato:40 mss:1448 cwnd:10 ssthresh:36 bytes_acked:180903401 bytes_received:16401965 segs_out:170017 segs_in:100392 send 13.2Mbps lastsnd:395602314 lastrcv:395602267 lastack:395602257 pacing_rate 26.5Mbps retrans:0/106 rcv_rtt:1.875 rcv_space:28960
> CLOSE-WAIT 2      0      10.x.x.x:48317                10.x.x.11:50628
>                               cubic wscale:7,7 rto:203 rtt:2.992/4.252 
> ato:40 mss:1448 cwnd:13 ssthresh:25 bytes_acked:4185335 
> bytes_received:471220 segs_out:4193 segs_in:2181 send 50.3Mbps 
> lastsnd:1036916135 lastrcv:1036871377 lastack:1036871367 pacing_rate 
> 100.6Mbps retrans:0/1 reordering:26 rcv_rtt:6.375 rcv_space:28960
> 
> We are running our Tomcat node behind and Apache server and use Tomcat Connector (mod_jk) as a load balancer. When looking at the hosts running the Apache node the corresponding sockets are closed. To us it seems that Tomcat has not properly closed the underlying socket when Apache/mod_jk initiated a FIN and is instead continuously trying to read from the socket even though it has been closed on the Apache side ?
> 
> I know that we are one version behind the latest release of Tomcat 8.0.x but I checked the changelog and I could not see any fixed issues matching this problem. Unfortunately we are only seeing this in production and we have not been able to reproduce in test. Has anyone experienced a similar issue like this before ?
> 
> For reference our connector configuration in server.xml looks like this:
> 
> <Connector port="48317"
>                protocol="AJP/1.3"
>                maxThreads="200"
>                connectionTimeout="600000"
>                xpoweredBy="false"
>                allowTrace="false"
>                URIEncoding="UTF-8"
>                requiredSecret="xxx"
>                secure="true"
>                scheme="https"
>                proxyPort="443" />
> 
> And our Tomcat Connector configuration for Apache in workers.properties looks like this:
> 
> worker.list = status,app
> 
> # Create workers for app
> worker.app-1.type=ajp13
> worker.app-1.host=host8128.corpdomain.internal
> worker.app-1.port=48317
> worker.app-1.ping_mode=A
> worker.app-1.connection_pool_timeout=600
> worker.app-1.secret=xxx
> worker.app-2.type=ajp13
> worker.app-2.host=host8129.corpdomain.internal
> worker.app-2.port=48317
> worker.app-2.ping_mode=A
> worker.app-2.connection_pool_timeout=600
> worker.app-2.secret=xxx
> 
> # Create a loadbalancer for app
> worker.app.type=lb
> worker.app.sticky_session=1
> worker.app.balance_workers=app-1, app-2
> 
> Best regards,
> Andreas
> 
> 
> Andreas Toom
> Systemutvecklare/IT
> TEL: +46 702 03 04 98
> [Volvofinans Bank AB]<http://www.volvofinans.se/>
> 
> 
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


RE: Ajp Nio-thread stuck in loop and consuming a lot of cpu

Posted by Toom Andreas <An...@volvofinans.se>.
Hi Mark,

Thanks for the reply!

We will try got get some additional info and get back to you as soon as possible!

Best regards,
Andreas

-----Original Message-----
From: Mark Thomas [mailto:markt@apache.org] 
Sent: den 12 januari 2018 11:27
To: Tomcat Users List <us...@tomcat.apache.org>
Subject: Re: Ajp Nio-thread stuck in loop and consuming a lot of cpu

On 12/01/18 06:58, Toom Andreas wrote:
> Hi,
> 
> We are having an issue with an application running Apache Tomcat 8.0.47 using Oracle Jvm 1.8.0.151 on Linux (RHEL 7). The Tomcat process is consuming  cpu at a constant high level even though there is a low amount of incoming traffic. When inspecting the process health using JMX /JVisualVM CPU Sampler we see that there are 4 ajp-nio-exec threads together with a NioBlockingSelector.BlockPoller thread that consume most of the cpu.

Are you able to determine the code path of the loop? A single stack trace can point us in the right direction but often the root cause can still be tricky to track down.

A profiler such as YourKit in CPU tracing mode should be able to provide enough info to figure this out (and won't require any code changes to your system).

Remote debugging is the other option but that is less ideal in production.

Mark


> 
> A stack trace of one of the ajp-io-exec threads looks like this:
> 
> "ajp-nio-48317-exec-14233" - Thread t@201195
>    java.lang.Thread.State: TIMED_WAITING
>                              at sun.misc.Unsafe.park(Native Method)
>                              - parking to wait for <342fab60> (a java.util.concurrent.CountDownLatch$Sync)
>                              at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
>                              at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
>                              at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
>                              at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277)
>                              at org.apache.tomcat.util.net.NioEndpoint$KeyAttachment.awaitLatch(NioEndpoint.java:1400)
>                              at org.apache.tomcat.util.net.NioEndpoint$KeyAttachment.awaitReadLatch(NioEndpoint.java:1402)
>                              at org.apache.tomcat.util.net.NioBlockingSelector.read(NioBlockingSelector.java:185)
>                              at org.apache.tomcat.util.net.NioSelectorPool.read(NioSelectorPool.java:250)
>                              at org.apache.tomcat.util.net.NioSelectorPool.read(NioSelectorPool.java:231)
>                              at org.apache.coyote.ajp.AjpNioProcessor.readSocket(AjpNioProcessor.java:194)
>                              at org.apache.coyote.ajp.AjpNioProcessor.read(AjpNioProcessor.java:160)
>                              at org.apache.coyote.ajp.AbstractAjpProcessor.readMessage(AbstractAjpProcessor.java:1091)
>                              at org.apache.coyote.ajp.AbstractAjpProcessor.process(AbstractAjpProcessor.java:804)
>                              at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:684)
>                              at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1539)
>                              at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:1495)
>                              - locked <279cc7f7> (a org.apache.tomcat.util.net.NioChannel)
>                              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>                              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>                              at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
>                              at java.lang.Thread.run(Thread.java:748)
> 
>    Locked ownable synchronizers:
>                              - locked <10e1f474> (a 
> java.util.concurrent.ThreadPoolExecutor$Worker)
> 
> A stack trace of the NioBlockingSelector.BlockPoller thread looks like this:
> 
> "NioBlockingSelector.BlockPoller-2" - Thread t@17
>    java.lang.Thread.State: RUNNABLE
>                              at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
>                              at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
>                              at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
>                              at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
>                              - locked <648895e4> (a sun.nio.ch.Util$3)
>                              - locked <17921532> (a java.util.Collections$UnmodifiableSet)
>                              - locked <67762111> (a sun.nio.ch.EPollSelectorImpl)
>                              at sun.nio.ch.SelectorImpl.selectNow(SelectorImpl.java:105)
>                              at 
> org.apache.tomcat.util.net.NioBlockingSelector$BlockPoller.run(NioBloc
> kingSelector.java:339)
> 
>    Locked ownable synchronizers:
>                              - None
> 
> It seems like the worker thread is trying to read from the underlying socket. When we inspect the OS tcp/ip statistics using the ss-command (ss -i src :<ajp-port>)  the following connections are suspicious:
> 
> State      Recv-Q Send-Q Local Address:Port                 Peer Address:Port
> CLOSE-WAIT 2      0      10.x.x.x:48317                10.x.x.11:53724
>                               cubic wscale:7,7 rto:202 rtt:1.724/2.424 ato:40 mss:1448 cwnd:10 ssthresh:31 bytes_acked:343869507 bytes_received:22259372 segs_out:302687 segs_in:141943 send 67.2Mbps lastsnd:1290791171 lastrcv:1290786778 lastack:1290786767 pacing_rate 134.4Mbps retrans:0/433 rcv_rtt:1.875 rcv_space:28960
> CLOSE-WAIT 1      0      10.x.x.x:48317                10.x.x.12:41990
>                               cubic wscale:7,7 rto:202 rtt:1.132/0.899 ato:40 mss:1448 cwnd:10 ssthresh:22 bytes_acked:37347221 bytes_received:3346200 segs_out:35273 segs_in:20608 send 102.3Mbps lastsnd:289840681 lastrcv:289730802 lastack:289720796 pacing_rate 204.5Mbps retrans:0/13 reordering:24 rcv_rtt:1.875 rcv_space:28960
> CLOSE-WAIT 1      0      10.x.x.x:48317                10.x.x.12:60068
>                               cubic wscale:7,7 rto:209 rtt:8.757/14.317 ato:40 mss:1448 cwnd:10 ssthresh:36 bytes_acked:180903401 bytes_received:16401965 segs_out:170017 segs_in:100392 send 13.2Mbps lastsnd:395602314 lastrcv:395602267 lastack:395602257 pacing_rate 26.5Mbps retrans:0/106 rcv_rtt:1.875 rcv_space:28960
> CLOSE-WAIT 2      0      10.x.x.x:48317                10.x.x.11:50628
>                               cubic wscale:7,7 rto:203 rtt:2.992/4.252 
> ato:40 mss:1448 cwnd:13 ssthresh:25 bytes_acked:4185335 
> bytes_received:471220 segs_out:4193 segs_in:2181 send 50.3Mbps 
> lastsnd:1036916135 lastrcv:1036871377 lastack:1036871367 pacing_rate 
> 100.6Mbps retrans:0/1 reordering:26 rcv_rtt:6.375 rcv_space:28960
> 
> We are running our Tomcat node behind and Apache server and use Tomcat Connector (mod_jk) as a load balancer. When looking at the hosts running the Apache node the corresponding sockets are closed. To us it seems that Tomcat has not properly closed the underlying socket when Apache/mod_jk initiated a FIN and is instead continuously trying to read from the socket even though it has been closed on the Apache side ?
> 
> I know that we are one version behind the latest release of Tomcat 8.0.x but I checked the changelog and I could not see any fixed issues matching this problem. Unfortunately we are only seeing this in production and we have not been able to reproduce in test. Has anyone experienced a similar issue like this before ?
> 
> For reference our connector configuration in server.xml looks like this:
> 
> <Connector port="48317"
>                protocol="AJP/1.3"
>                maxThreads="200"
>                connectionTimeout="600000"
>                xpoweredBy="false"
>                allowTrace="false"
>                URIEncoding="UTF-8"
>                requiredSecret="xxx"
>                secure="true"
>                scheme="https"
>                proxyPort="443" />
> 
> And our Tomcat Connector configuration for Apache in workers.properties looks like this:
> 
> worker.list = status,app
> 
> # Create workers for app
> worker.app-1.type=ajp13
> worker.app-1.host=host8128.corpdomain.internal
> worker.app-1.port=48317
> worker.app-1.ping_mode=A
> worker.app-1.connection_pool_timeout=600
> worker.app-1.secret=xxx
> worker.app-2.type=ajp13
> worker.app-2.host=host8129.corpdomain.internal
> worker.app-2.port=48317
> worker.app-2.ping_mode=A
> worker.app-2.connection_pool_timeout=600
> worker.app-2.secret=xxx
> 
> # Create a loadbalancer for app
> worker.app.type=lb
> worker.app.sticky_session=1
> worker.app.balance_workers=app-1, app-2
> 
> Best regards,
> Andreas
> 
> 
> Andreas Toom
> Systemutvecklare/IT
> TEL: +46 702 03 04 98
> [Volvofinans Bank AB]<http://www.volvofinans.se/>
> 
> 
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Re: Ajp Nio-thread stuck in loop and consuming a lot of cpu

Posted by Mark Thomas <ma...@apache.org>.
On 12/01/18 06:58, Toom Andreas wrote:
> Hi,
> 
> We are having an issue with an application running Apache Tomcat 8.0.47 using Oracle Jvm 1.8.0.151 on Linux (RHEL 7). The Tomcat process is consuming  cpu at a constant high level even though there is a low amount of incoming traffic. When inspecting the process health using JMX /JVisualVM CPU Sampler we see that there are 4 ajp-nio-exec threads together with a NioBlockingSelector.BlockPoller thread that consume most of the cpu.

Are you able to determine the code path of the loop? A single stack
trace can point us in the right direction but often the root cause can
still be tricky to track down.

A profiler such as YourKit in CPU tracing mode should be able to provide
enough info to figure this out (and won't require any code changes to
your system).

Remote debugging is the other option but that is less ideal in production.

Mark


> 
> A stack trace of one of the ajp-io-exec threads looks like this:
> 
> "ajp-nio-48317-exec-14233" - Thread t@201195
>    java.lang.Thread.State: TIMED_WAITING
>                              at sun.misc.Unsafe.park(Native Method)
>                              - parking to wait for <342fab60> (a java.util.concurrent.CountDownLatch$Sync)
>                              at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
>                              at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
>                              at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
>                              at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277)
>                              at org.apache.tomcat.util.net.NioEndpoint$KeyAttachment.awaitLatch(NioEndpoint.java:1400)
>                              at org.apache.tomcat.util.net.NioEndpoint$KeyAttachment.awaitReadLatch(NioEndpoint.java:1402)
>                              at org.apache.tomcat.util.net.NioBlockingSelector.read(NioBlockingSelector.java:185)
>                              at org.apache.tomcat.util.net.NioSelectorPool.read(NioSelectorPool.java:250)
>                              at org.apache.tomcat.util.net.NioSelectorPool.read(NioSelectorPool.java:231)
>                              at org.apache.coyote.ajp.AjpNioProcessor.readSocket(AjpNioProcessor.java:194)
>                              at org.apache.coyote.ajp.AjpNioProcessor.read(AjpNioProcessor.java:160)
>                              at org.apache.coyote.ajp.AbstractAjpProcessor.readMessage(AbstractAjpProcessor.java:1091)
>                              at org.apache.coyote.ajp.AbstractAjpProcessor.process(AbstractAjpProcessor.java:804)
>                              at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:684)
>                              at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1539)
>                              at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:1495)
>                              - locked <279cc7f7> (a org.apache.tomcat.util.net.NioChannel)
>                              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>                              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>                              at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
>                              at java.lang.Thread.run(Thread.java:748)
> 
>    Locked ownable synchronizers:
>                              - locked <10e1f474> (a java.util.concurrent.ThreadPoolExecutor$Worker)
> 
> A stack trace of the NioBlockingSelector.BlockPoller thread looks like this:
> 
> "NioBlockingSelector.BlockPoller-2" - Thread t@17
>    java.lang.Thread.State: RUNNABLE
>                              at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
>                              at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
>                              at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
>                              at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
>                              - locked <648895e4> (a sun.nio.ch.Util$3)
>                              - locked <17921532> (a java.util.Collections$UnmodifiableSet)
>                              - locked <67762111> (a sun.nio.ch.EPollSelectorImpl)
>                              at sun.nio.ch.SelectorImpl.selectNow(SelectorImpl.java:105)
>                              at org.apache.tomcat.util.net.NioBlockingSelector$BlockPoller.run(NioBlockingSelector.java:339)
> 
>    Locked ownable synchronizers:
>                              - None
> 
> It seems like the worker thread is trying to read from the underlying socket. When we inspect the OS tcp/ip statistics using the ss-command (ss -i src :<ajp-port>)  the following connections are suspicious:
> 
> State      Recv-Q Send-Q Local Address:Port                 Peer Address:Port
> CLOSE-WAIT 2      0      10.x.x.x:48317                10.x.x.11:53724
>                               cubic wscale:7,7 rto:202 rtt:1.724/2.424 ato:40 mss:1448 cwnd:10 ssthresh:31 bytes_acked:343869507 bytes_received:22259372 segs_out:302687 segs_in:141943 send 67.2Mbps lastsnd:1290791171 lastrcv:1290786778 lastack:1290786767 pacing_rate 134.4Mbps retrans:0/433 rcv_rtt:1.875 rcv_space:28960
> CLOSE-WAIT 1      0      10.x.x.x:48317                10.x.x.12:41990
>                               cubic wscale:7,7 rto:202 rtt:1.132/0.899 ato:40 mss:1448 cwnd:10 ssthresh:22 bytes_acked:37347221 bytes_received:3346200 segs_out:35273 segs_in:20608 send 102.3Mbps lastsnd:289840681 lastrcv:289730802 lastack:289720796 pacing_rate 204.5Mbps retrans:0/13 reordering:24 rcv_rtt:1.875 rcv_space:28960
> CLOSE-WAIT 1      0      10.x.x.x:48317                10.x.x.12:60068
>                               cubic wscale:7,7 rto:209 rtt:8.757/14.317 ato:40 mss:1448 cwnd:10 ssthresh:36 bytes_acked:180903401 bytes_received:16401965 segs_out:170017 segs_in:100392 send 13.2Mbps lastsnd:395602314 lastrcv:395602267 lastack:395602257 pacing_rate 26.5Mbps retrans:0/106 rcv_rtt:1.875 rcv_space:28960
> CLOSE-WAIT 2      0      10.x.x.x:48317                10.x.x.11:50628
>                               cubic wscale:7,7 rto:203 rtt:2.992/4.252 ato:40 mss:1448 cwnd:13 ssthresh:25 bytes_acked:4185335 bytes_received:471220 segs_out:4193 segs_in:2181 send 50.3Mbps lastsnd:1036916135 lastrcv:1036871377 lastack:1036871367 pacing_rate 100.6Mbps retrans:0/1 reordering:26 rcv_rtt:6.375 rcv_space:28960
> 
> We are running our Tomcat node behind and Apache server and use Tomcat Connector (mod_jk) as a load balancer. When looking at the hosts running the Apache node the corresponding sockets are closed. To us it seems that Tomcat has not properly closed the underlying socket when Apache/mod_jk initiated a FIN and is instead continuously trying to read from the socket even though it has been closed on the Apache side ?
> 
> I know that we are one version behind the latest release of Tomcat 8.0.x but I checked the changelog and I could not see any fixed issues matching this problem. Unfortunately we are only seeing this in production and we have not been able to reproduce in test. Has anyone experienced a similar issue like this before ?
> 
> For reference our connector configuration in server.xml looks like this:
> 
> <Connector port="48317"
>                protocol="AJP/1.3"
>                maxThreads="200"
>                connectionTimeout="600000"
>                xpoweredBy="false"
>                allowTrace="false"
>                URIEncoding="UTF-8"
>                requiredSecret="xxx"
>                secure="true"
>                scheme="https"
>                proxyPort="443" />
> 
> And our Tomcat Connector configuration for Apache in workers.properties looks like this:
> 
> worker.list = status,app
> 
> # Create workers for app
> worker.app-1.type=ajp13
> worker.app-1.host=host8128.corpdomain.internal
> worker.app-1.port=48317
> worker.app-1.ping_mode=A
> worker.app-1.connection_pool_timeout=600
> worker.app-1.secret=xxx
> worker.app-2.type=ajp13
> worker.app-2.host=host8129.corpdomain.internal
> worker.app-2.port=48317
> worker.app-2.ping_mode=A
> worker.app-2.connection_pool_timeout=600
> worker.app-2.secret=xxx
> 
> # Create a loadbalancer for app
> worker.app.type=lb
> worker.app.sticky_session=1
> worker.app.balance_workers=app-1, app-2
> 
> Best regards,
> Andreas
> 
> 
> Andreas Toom
> Systemutvecklare/IT
> TEL: +46 702 03 04 98
> [Volvofinans Bank AB]<http://www.volvofinans.se/>
> 
> 
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Re: Ajp Nio-thread stuck in loop and consuming a lot of cpu

Posted by "André Warnier (tomcat)" <aw...@ice-sa.com>.
Hi.

On 12.01.2018 11:06, Toom Andreas wrote:
> Hi André,
>
> Thanks for your reply!
>
> The Tomcat Connector connection_pool_timeout  is specified in seconds and the Tomcat ajp connector connectionTimeout is specified in milliseconds so both sides have the same value. This is the recommended setting. See quote from the link that your provided: https://tomcat.apache.org/connectors-doc/common_howto/timeouts.html
>
> "We generally recommend values around 10 minutes, so setting connection_pool_timeout to 600 (seconds). If you use this attribute, please also set the attribute keepAliveTimeout (if it is set explicitly) or connectionTimeout in the AJP Connector element of your Tomcat server.xml configuration file to an analogous value. Caution: keepAliveTimeout and connectionTimeout must be given in milliseconds. So if you set JK connection_pool_timeout to 600, you should set Tomcat keepAliveTimeout or connectionTimeout to 600000."
>

Yes, I understand this, and I did also read this in the documentation.
But
1) it is always possible that a documentation would not be perfect, or maybe somewhat 
outdated (considering for example various kinds of Connector implementations)
and
2) I was just wondering what happens if the two sides (mod_jk on one side, and the 
Connector on the other side), because of a long period of inactivity, decided per chance 
to close each their own side of the connection, at (almost) exactly the same time.
Maybe there is some edge case there, where one side "misses a signal" from the other side, 
and starts some cpu-intensive wait for a connection state which will never happen on the 
socket side it is waiting on.
I admit that this is a wild guess; but maybe just changing the timeout of one of them, by 
a couple of seconds, may show something interesting (at little cost in terms of 
re-configuration or testing).





> Best regards,
> Andreas
>
> -----Original Message-----
> From: André Warnier (tomcat) [mailto:aw@ice-sa.com]
> Sent: den 12 januari 2018 09:57
> To: users@tomcat.apache.org
> Subject: Re: Ajp Nio-thread stuck in loop and consuming a lot of cpu
>
> On 12.01.2018 07:58, Toom Andreas wrote:
>> Hi,
>>
>> We are having an issue with an application running Apache Tomcat 8.0.47 using Oracle Jvm 1.8.0.151 on Linux (RHEL 7). The Tomcat process is consuming  cpu at a constant high level even though there is a low amount of incoming traffic. When inspecting the process health using JMX /JVisualVM CPU Sampler we see that there are 4 ajp-nio-exec threads together with a NioBlockingSelector.BlockPoller thread that consume most of the cpu.
>>
>> A stack trace of one of the ajp-io-exec threads looks like this:
>>
>> "ajp-nio-48317-exec-14233" - Thread t@201195
>>      java.lang.Thread.State: TIMED_WAITING
>>                                at sun.misc.Unsafe.park(Native Method)
>>                                - parking to wait for <342fab60> (a java.util.concurrent.CountDownLatch$Sync)
>>                                at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
>>                                at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
>>                                at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
>>                                at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277)
>>                                at org.apache.tomcat.util.net.NioEndpoint$KeyAttachment.awaitLatch(NioEndpoint.java:1400)
>>                                at org.apache.tomcat.util.net.NioEndpoint$KeyAttachment.awaitReadLatch(NioEndpoint.java:1402)
>>                                at org.apache.tomcat.util.net.NioBlockingSelector.read(NioBlockingSelector.java:185)
>>                                at org.apache.tomcat.util.net.NioSelectorPool.read(NioSelectorPool.java:250)
>>                                at org.apache.tomcat.util.net.NioSelectorPool.read(NioSelectorPool.java:231)
>>                                at org.apache.coyote.ajp.AjpNioProcessor.readSocket(AjpNioProcessor.java:194)
>>                                at org.apache.coyote.ajp.AjpNioProcessor.read(AjpNioProcessor.java:160)
>>                                at org.apache.coyote.ajp.AbstractAjpProcessor.readMessage(AbstractAjpProcessor.java:1091)
>>                                at org.apache.coyote.ajp.AbstractAjpProcessor.process(AbstractAjpProcessor.java:804)
>>                                at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:684)
>>                                at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1539)
>>                                at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:1495)
>>                                - locked <279cc7f7> (a org.apache.tomcat.util.net.NioChannel)
>>                                at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>>                                at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>>                                at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
>>                                at java.lang.Thread.run(Thread.java:748)
>>
>>      Locked ownable synchronizers:
>>                                - locked <10e1f474> (a
>> java.util.concurrent.ThreadPoolExecutor$Worker)
>>
>> A stack trace of the NioBlockingSelector.BlockPoller thread looks like this:
>>
>> "NioBlockingSelector.BlockPoller-2" - Thread t@17
>>      java.lang.Thread.State: RUNNABLE
>>                                at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
>>                                at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
>>                                at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
>>                                at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
>>                                - locked <648895e4> (a sun.nio.ch.Util$3)
>>                                - locked <17921532> (a java.util.Collections$UnmodifiableSet)
>>                                - locked <67762111> (a sun.nio.ch.EPollSelectorImpl)
>>                                at sun.nio.ch.SelectorImpl.selectNow(SelectorImpl.java:105)
>>                                at
>> org.apache.tomcat.util.net.NioBlockingSelector$BlockPoller.run(NioBloc
>> kingSelector.java:339)
>>
>>      Locked ownable synchronizers:
>>                                - None
>>
>> It seems like the worker thread is trying to read from the underlying socket. When we inspect the OS tcp/ip statistics using the ss-command (ss -i src :<ajp-port>)  the following connections are suspicious:
>>
>> State      Recv-Q Send-Q Local Address:Port                 Peer Address:Port
>> CLOSE-WAIT 2      0      10.x.x.x:48317                10.x.x.11:53724
>>                                 cubic wscale:7,7 rto:202 rtt:1.724/2.424 ato:40 mss:1448 cwnd:10 ssthresh:31 bytes_acked:343869507 bytes_received:22259372 segs_out:302687 segs_in:141943 send 67.2Mbps lastsnd:1290791171 lastrcv:1290786778 lastack:1290786767 pacing_rate 134.4Mbps retrans:0/433 rcv_rtt:1.875 rcv_space:28960
>> CLOSE-WAIT 1      0      10.x.x.x:48317                10.x.x.12:41990
>>                                 cubic wscale:7,7 rto:202 rtt:1.132/0.899 ato:40 mss:1448 cwnd:10 ssthresh:22 bytes_acked:37347221 bytes_received:3346200 segs_out:35273 segs_in:20608 send 102.3Mbps lastsnd:289840681 lastrcv:289730802 lastack:289720796 pacing_rate 204.5Mbps retrans:0/13 reordering:24 rcv_rtt:1.875 rcv_space:28960
>> CLOSE-WAIT 1      0      10.x.x.x:48317                10.x.x.12:60068
>>                                 cubic wscale:7,7 rto:209 rtt:8.757/14.317 ato:40 mss:1448 cwnd:10 ssthresh:36 bytes_acked:180903401 bytes_received:16401965 segs_out:170017 segs_in:100392 send 13.2Mbps lastsnd:395602314 lastrcv:395602267 lastack:395602257 pacing_rate 26.5Mbps retrans:0/106 rcv_rtt:1.875 rcv_space:28960
>> CLOSE-WAIT 2      0      10.x.x.x:48317                10.x.x.11:50628
>>                                 cubic wscale:7,7 rto:203
>> rtt:2.992/4.252 ato:40 mss:1448 cwnd:13 ssthresh:25
>> bytes_acked:4185335 bytes_received:471220 segs_out:4193 segs_in:2181
>> send 50.3Mbps lastsnd:1036916135 lastrcv:1036871377 lastack:1036871367
>> pacing_rate 100.6Mbps retrans:0/1 reordering:26 rcv_rtt:6.375
>> rcv_space:28960
>>
>> We are running our Tomcat node behind and Apache server and use Tomcat Connector (mod_jk) as a load balancer. When looking at the hosts running the Apache node the corresponding sockets are closed. To us it seems that Tomcat has not properly closed the underlying socket when Apache/mod_jk initiated a FIN and is instead continuously trying to read from the socket even though it has been closed on the Apache side ?
>>
>> I know that we are one version behind the latest release of Tomcat 8.0.x but I checked the changelog and I could not see any fixed issues matching this problem. Unfortunately we are only seeing this in production and we have not been able to reproduce in test. Has anyone experienced a similar issue like this before ?
>>
>> For reference our connector configuration in server.xml looks like this:
>>
>> <Connector port="48317"
>>                  protocol="AJP/1.3"
>>                  maxThreads="200"
>>                  connectionTimeout="600000"
>>                  xpoweredBy="false"
>>                  allowTrace="false"
>>                  URIEncoding="UTF-8"
>>                  requiredSecret="xxx"
>>                  secure="true"
>>                  scheme="https"
>>                  proxyPort="443" />
>>
>> And our Tomcat Connector configuration for Apache in workers.properties looks like this:
>>
>> worker.list = status,app
>>
>> # Create workers for app
>> worker.app-1.type=ajp13
>> worker.app-1.host=host8128.corpdomain.internal
>> worker.app-1.port=48317
>> worker.app-1.ping_mode=A
>> worker.app-1.connection_pool_timeout=600
>> worker.app-1.secret=xxx
>> worker.app-2.type=ajp13
>> worker.app-2.host=host8129.corpdomain.internal
>> worker.app-2.port=48317
>> worker.app-2.ping_mode=A
>> worker.app-2.connection_pool_timeout=600
>> worker.app-2.secret=xxx
>>
>> # Create a loadbalancer for app
>> worker.app.type=lb
>> worker.app.sticky_session=1
>> worker.app.balance_workers=app-1, app-2
>>
>> Best regards,
>> Andreas
>>
>>
>> Andreas Toom
>> Systemutvecklare/IT
>> TEL: +46 702 03 04 98
>> [Volvofinans Bank AB]<http://www.volvofinans.se/>
>>
>
> Hi.
> First, thanks for the very complete issue description.
>
> This page may be of help :
> https://tomcat.apache.org/connectors-doc/common_howto/timeouts.html
> (which considering the above, you probably have read already).
>
> I did not really analyse in detail your description, nor that page.
> But I am just wondering if in this case, setting both the
> worker.app-x.connection_pool_timeout=600
> and the
> <Connector .. connectionTimeout="600000" ..> (to exactly the same value in seconds) is not in the end counter-productive.
>
> This is just a hunch, because determining the exact interaction and side-effects of these multiple timeout parameters is somewhat of a black art, but it is maybe worth changing one of them (for example back to the default value), and check what happens then.
> (You say that traffic is low, so you probably do hit one or more of these timeouts at some
> point)
>
> That is not to say that there may not be some bug somewhere in the code. But doing the experiment above and reporting the result may provide some more clues for someone to analyse your issue in detail.
>
> You may also want to try swapping the AJP Connector, using the following attribute :
> http://tomcat.apache.org/tomcat-8.5-doc/config/ajp.html#Common_Attributes
> --> protocol
>
> Of course, doing such changes one at a time facilitates the analysis.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
> For additional commands, e-mail: users-help@tomcat.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
> For additional commands, e-mail: users-help@tomcat.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


RE: Ajp Nio-thread stuck in loop and consuming a lot of cpu

Posted by Toom Andreas <An...@volvofinans.se>.
Hi André,

Thanks for your reply!

The Tomcat Connector connection_pool_timeout  is specified in seconds and the Tomcat ajp connector connectionTimeout is specified in milliseconds so both sides have the same value. This is the recommended setting. See quote from the link that your provided: https://tomcat.apache.org/connectors-doc/common_howto/timeouts.html

"We generally recommend values around 10 minutes, so setting connection_pool_timeout to 600 (seconds). If you use this attribute, please also set the attribute keepAliveTimeout (if it is set explicitly) or connectionTimeout in the AJP Connector element of your Tomcat server.xml configuration file to an analogous value. Caution: keepAliveTimeout and connectionTimeout must be given in milliseconds. So if you set JK connection_pool_timeout to 600, you should set Tomcat keepAliveTimeout or connectionTimeout to 600000."

Best regards,
Andreas

-----Original Message-----
From: André Warnier (tomcat) [mailto:aw@ice-sa.com] 
Sent: den 12 januari 2018 09:57
To: users@tomcat.apache.org
Subject: Re: Ajp Nio-thread stuck in loop and consuming a lot of cpu

On 12.01.2018 07:58, Toom Andreas wrote:
> Hi,
>
> We are having an issue with an application running Apache Tomcat 8.0.47 using Oracle Jvm 1.8.0.151 on Linux (RHEL 7). The Tomcat process is consuming  cpu at a constant high level even though there is a low amount of incoming traffic. When inspecting the process health using JMX /JVisualVM CPU Sampler we see that there are 4 ajp-nio-exec threads together with a NioBlockingSelector.BlockPoller thread that consume most of the cpu.
>
> A stack trace of one of the ajp-io-exec threads looks like this:
>
> "ajp-nio-48317-exec-14233" - Thread t@201195
>     java.lang.Thread.State: TIMED_WAITING
>                               at sun.misc.Unsafe.park(Native Method)
>                               - parking to wait for <342fab60> (a java.util.concurrent.CountDownLatch$Sync)
>                               at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
>                               at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
>                               at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
>                               at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277)
>                               at org.apache.tomcat.util.net.NioEndpoint$KeyAttachment.awaitLatch(NioEndpoint.java:1400)
>                               at org.apache.tomcat.util.net.NioEndpoint$KeyAttachment.awaitReadLatch(NioEndpoint.java:1402)
>                               at org.apache.tomcat.util.net.NioBlockingSelector.read(NioBlockingSelector.java:185)
>                               at org.apache.tomcat.util.net.NioSelectorPool.read(NioSelectorPool.java:250)
>                               at org.apache.tomcat.util.net.NioSelectorPool.read(NioSelectorPool.java:231)
>                               at org.apache.coyote.ajp.AjpNioProcessor.readSocket(AjpNioProcessor.java:194)
>                               at org.apache.coyote.ajp.AjpNioProcessor.read(AjpNioProcessor.java:160)
>                               at org.apache.coyote.ajp.AbstractAjpProcessor.readMessage(AbstractAjpProcessor.java:1091)
>                               at org.apache.coyote.ajp.AbstractAjpProcessor.process(AbstractAjpProcessor.java:804)
>                               at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:684)
>                               at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1539)
>                               at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:1495)
>                               - locked <279cc7f7> (a org.apache.tomcat.util.net.NioChannel)
>                               at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>                               at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>                               at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
>                               at java.lang.Thread.run(Thread.java:748)
>
>     Locked ownable synchronizers:
>                               - locked <10e1f474> (a 
> java.util.concurrent.ThreadPoolExecutor$Worker)
>
> A stack trace of the NioBlockingSelector.BlockPoller thread looks like this:
>
> "NioBlockingSelector.BlockPoller-2" - Thread t@17
>     java.lang.Thread.State: RUNNABLE
>                               at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
>                               at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
>                               at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
>                               at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
>                               - locked <648895e4> (a sun.nio.ch.Util$3)
>                               - locked <17921532> (a java.util.Collections$UnmodifiableSet)
>                               - locked <67762111> (a sun.nio.ch.EPollSelectorImpl)
>                               at sun.nio.ch.SelectorImpl.selectNow(SelectorImpl.java:105)
>                               at 
> org.apache.tomcat.util.net.NioBlockingSelector$BlockPoller.run(NioBloc
> kingSelector.java:339)
>
>     Locked ownable synchronizers:
>                               - None
>
> It seems like the worker thread is trying to read from the underlying socket. When we inspect the OS tcp/ip statistics using the ss-command (ss -i src :<ajp-port>)  the following connections are suspicious:
>
> State      Recv-Q Send-Q Local Address:Port                 Peer Address:Port
> CLOSE-WAIT 2      0      10.x.x.x:48317                10.x.x.11:53724
>                                cubic wscale:7,7 rto:202 rtt:1.724/2.424 ato:40 mss:1448 cwnd:10 ssthresh:31 bytes_acked:343869507 bytes_received:22259372 segs_out:302687 segs_in:141943 send 67.2Mbps lastsnd:1290791171 lastrcv:1290786778 lastack:1290786767 pacing_rate 134.4Mbps retrans:0/433 rcv_rtt:1.875 rcv_space:28960
> CLOSE-WAIT 1      0      10.x.x.x:48317                10.x.x.12:41990
>                                cubic wscale:7,7 rto:202 rtt:1.132/0.899 ato:40 mss:1448 cwnd:10 ssthresh:22 bytes_acked:37347221 bytes_received:3346200 segs_out:35273 segs_in:20608 send 102.3Mbps lastsnd:289840681 lastrcv:289730802 lastack:289720796 pacing_rate 204.5Mbps retrans:0/13 reordering:24 rcv_rtt:1.875 rcv_space:28960
> CLOSE-WAIT 1      0      10.x.x.x:48317                10.x.x.12:60068
>                                cubic wscale:7,7 rto:209 rtt:8.757/14.317 ato:40 mss:1448 cwnd:10 ssthresh:36 bytes_acked:180903401 bytes_received:16401965 segs_out:170017 segs_in:100392 send 13.2Mbps lastsnd:395602314 lastrcv:395602267 lastack:395602257 pacing_rate 26.5Mbps retrans:0/106 rcv_rtt:1.875 rcv_space:28960
> CLOSE-WAIT 2      0      10.x.x.x:48317                10.x.x.11:50628
>                                cubic wscale:7,7 rto:203 
> rtt:2.992/4.252 ato:40 mss:1448 cwnd:13 ssthresh:25 
> bytes_acked:4185335 bytes_received:471220 segs_out:4193 segs_in:2181 
> send 50.3Mbps lastsnd:1036916135 lastrcv:1036871377 lastack:1036871367 
> pacing_rate 100.6Mbps retrans:0/1 reordering:26 rcv_rtt:6.375 
> rcv_space:28960
>
> We are running our Tomcat node behind and Apache server and use Tomcat Connector (mod_jk) as a load balancer. When looking at the hosts running the Apache node the corresponding sockets are closed. To us it seems that Tomcat has not properly closed the underlying socket when Apache/mod_jk initiated a FIN and is instead continuously trying to read from the socket even though it has been closed on the Apache side ?
>
> I know that we are one version behind the latest release of Tomcat 8.0.x but I checked the changelog and I could not see any fixed issues matching this problem. Unfortunately we are only seeing this in production and we have not been able to reproduce in test. Has anyone experienced a similar issue like this before ?
>
> For reference our connector configuration in server.xml looks like this:
>
> <Connector port="48317"
>                 protocol="AJP/1.3"
>                 maxThreads="200"
>                 connectionTimeout="600000"
>                 xpoweredBy="false"
>                 allowTrace="false"
>                 URIEncoding="UTF-8"
>                 requiredSecret="xxx"
>                 secure="true"
>                 scheme="https"
>                 proxyPort="443" />
>
> And our Tomcat Connector configuration for Apache in workers.properties looks like this:
>
> worker.list = status,app
>
> # Create workers for app
> worker.app-1.type=ajp13
> worker.app-1.host=host8128.corpdomain.internal
> worker.app-1.port=48317
> worker.app-1.ping_mode=A
> worker.app-1.connection_pool_timeout=600
> worker.app-1.secret=xxx
> worker.app-2.type=ajp13
> worker.app-2.host=host8129.corpdomain.internal
> worker.app-2.port=48317
> worker.app-2.ping_mode=A
> worker.app-2.connection_pool_timeout=600
> worker.app-2.secret=xxx
>
> # Create a loadbalancer for app
> worker.app.type=lb
> worker.app.sticky_session=1
> worker.app.balance_workers=app-1, app-2
>
> Best regards,
> Andreas
>
>
> Andreas Toom
> Systemutvecklare/IT
> TEL: +46 702 03 04 98
> [Volvofinans Bank AB]<http://www.volvofinans.se/>
>

Hi.
First, thanks for the very complete issue description.

This page may be of help :
https://tomcat.apache.org/connectors-doc/common_howto/timeouts.html
(which considering the above, you probably have read already).

I did not really analyse in detail your description, nor that page.
But I am just wondering if in this case, setting both the
worker.app-x.connection_pool_timeout=600
and the
<Connector .. connectionTimeout="600000" ..> (to exactly the same value in seconds) is not in the end counter-productive.

This is just a hunch, because determining the exact interaction and side-effects of these multiple timeout parameters is somewhat of a black art, but it is maybe worth changing one of them (for example back to the default value), and check what happens then.
(You say that traffic is low, so you probably do hit one or more of these timeouts at some
point)

That is not to say that there may not be some bug somewhere in the code. But doing the experiment above and reporting the result may provide some more clues for someone to analyse your issue in detail.

You may also want to try swapping the AJP Connector, using the following attribute :
http://tomcat.apache.org/tomcat-8.5-doc/config/ajp.html#Common_Attributes
--> protocol

Of course, doing such changes one at a time facilitates the analysis.


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Re: Ajp Nio-thread stuck in loop and consuming a lot of cpu

Posted by "André Warnier (tomcat)" <aw...@ice-sa.com>.
On 12.01.2018 07:58, Toom Andreas wrote:
> Hi,
>
> We are having an issue with an application running Apache Tomcat 8.0.47 using Oracle Jvm 1.8.0.151 on Linux (RHEL 7). The Tomcat process is consuming  cpu at a constant high level even though there is a low amount of incoming traffic. When inspecting the process health using JMX /JVisualVM CPU Sampler we see that there are 4 ajp-nio-exec threads together with a NioBlockingSelector.BlockPoller thread that consume most of the cpu.
>
> A stack trace of one of the ajp-io-exec threads looks like this:
>
> "ajp-nio-48317-exec-14233" - Thread t@201195
>     java.lang.Thread.State: TIMED_WAITING
>                               at sun.misc.Unsafe.park(Native Method)
>                               - parking to wait for <342fab60> (a java.util.concurrent.CountDownLatch$Sync)
>                               at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
>                               at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
>                               at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
>                               at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277)
>                               at org.apache.tomcat.util.net.NioEndpoint$KeyAttachment.awaitLatch(NioEndpoint.java:1400)
>                               at org.apache.tomcat.util.net.NioEndpoint$KeyAttachment.awaitReadLatch(NioEndpoint.java:1402)
>                               at org.apache.tomcat.util.net.NioBlockingSelector.read(NioBlockingSelector.java:185)
>                               at org.apache.tomcat.util.net.NioSelectorPool.read(NioSelectorPool.java:250)
>                               at org.apache.tomcat.util.net.NioSelectorPool.read(NioSelectorPool.java:231)
>                               at org.apache.coyote.ajp.AjpNioProcessor.readSocket(AjpNioProcessor.java:194)
>                               at org.apache.coyote.ajp.AjpNioProcessor.read(AjpNioProcessor.java:160)
>                               at org.apache.coyote.ajp.AbstractAjpProcessor.readMessage(AbstractAjpProcessor.java:1091)
>                               at org.apache.coyote.ajp.AbstractAjpProcessor.process(AbstractAjpProcessor.java:804)
>                               at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:684)
>                               at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1539)
>                               at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:1495)
>                               - locked <279cc7f7> (a org.apache.tomcat.util.net.NioChannel)
>                               at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>                               at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>                               at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
>                               at java.lang.Thread.run(Thread.java:748)
>
>     Locked ownable synchronizers:
>                               - locked <10e1f474> (a java.util.concurrent.ThreadPoolExecutor$Worker)
>
> A stack trace of the NioBlockingSelector.BlockPoller thread looks like this:
>
> "NioBlockingSelector.BlockPoller-2" - Thread t@17
>     java.lang.Thread.State: RUNNABLE
>                               at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
>                               at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
>                               at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
>                               at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
>                               - locked <648895e4> (a sun.nio.ch.Util$3)
>                               - locked <17921532> (a java.util.Collections$UnmodifiableSet)
>                               - locked <67762111> (a sun.nio.ch.EPollSelectorImpl)
>                               at sun.nio.ch.SelectorImpl.selectNow(SelectorImpl.java:105)
>                               at org.apache.tomcat.util.net.NioBlockingSelector$BlockPoller.run(NioBlockingSelector.java:339)
>
>     Locked ownable synchronizers:
>                               - None
>
> It seems like the worker thread is trying to read from the underlying socket. When we inspect the OS tcp/ip statistics using the ss-command (ss -i src :<ajp-port>)  the following connections are suspicious:
>
> State      Recv-Q Send-Q Local Address:Port                 Peer Address:Port
> CLOSE-WAIT 2      0      10.x.x.x:48317                10.x.x.11:53724
>                                cubic wscale:7,7 rto:202 rtt:1.724/2.424 ato:40 mss:1448 cwnd:10 ssthresh:31 bytes_acked:343869507 bytes_received:22259372 segs_out:302687 segs_in:141943 send 67.2Mbps lastsnd:1290791171 lastrcv:1290786778 lastack:1290786767 pacing_rate 134.4Mbps retrans:0/433 rcv_rtt:1.875 rcv_space:28960
> CLOSE-WAIT 1      0      10.x.x.x:48317                10.x.x.12:41990
>                                cubic wscale:7,7 rto:202 rtt:1.132/0.899 ato:40 mss:1448 cwnd:10 ssthresh:22 bytes_acked:37347221 bytes_received:3346200 segs_out:35273 segs_in:20608 send 102.3Mbps lastsnd:289840681 lastrcv:289730802 lastack:289720796 pacing_rate 204.5Mbps retrans:0/13 reordering:24 rcv_rtt:1.875 rcv_space:28960
> CLOSE-WAIT 1      0      10.x.x.x:48317                10.x.x.12:60068
>                                cubic wscale:7,7 rto:209 rtt:8.757/14.317 ato:40 mss:1448 cwnd:10 ssthresh:36 bytes_acked:180903401 bytes_received:16401965 segs_out:170017 segs_in:100392 send 13.2Mbps lastsnd:395602314 lastrcv:395602267 lastack:395602257 pacing_rate 26.5Mbps retrans:0/106 rcv_rtt:1.875 rcv_space:28960
> CLOSE-WAIT 2      0      10.x.x.x:48317                10.x.x.11:50628
>                                cubic wscale:7,7 rto:203 rtt:2.992/4.252 ato:40 mss:1448 cwnd:13 ssthresh:25 bytes_acked:4185335 bytes_received:471220 segs_out:4193 segs_in:2181 send 50.3Mbps lastsnd:1036916135 lastrcv:1036871377 lastack:1036871367 pacing_rate 100.6Mbps retrans:0/1 reordering:26 rcv_rtt:6.375 rcv_space:28960
>
> We are running our Tomcat node behind and Apache server and use Tomcat Connector (mod_jk) as a load balancer. When looking at the hosts running the Apache node the corresponding sockets are closed. To us it seems that Tomcat has not properly closed the underlying socket when Apache/mod_jk initiated a FIN and is instead continuously trying to read from the socket even though it has been closed on the Apache side ?
>
> I know that we are one version behind the latest release of Tomcat 8.0.x but I checked the changelog and I could not see any fixed issues matching this problem. Unfortunately we are only seeing this in production and we have not been able to reproduce in test. Has anyone experienced a similar issue like this before ?
>
> For reference our connector configuration in server.xml looks like this:
>
> <Connector port="48317"
>                 protocol="AJP/1.3"
>                 maxThreads="200"
>                 connectionTimeout="600000"
>                 xpoweredBy="false"
>                 allowTrace="false"
>                 URIEncoding="UTF-8"
>                 requiredSecret="xxx"
>                 secure="true"
>                 scheme="https"
>                 proxyPort="443" />
>
> And our Tomcat Connector configuration for Apache in workers.properties looks like this:
>
> worker.list = status,app
>
> # Create workers for app
> worker.app-1.type=ajp13
> worker.app-1.host=host8128.corpdomain.internal
> worker.app-1.port=48317
> worker.app-1.ping_mode=A
> worker.app-1.connection_pool_timeout=600
> worker.app-1.secret=xxx
> worker.app-2.type=ajp13
> worker.app-2.host=host8129.corpdomain.internal
> worker.app-2.port=48317
> worker.app-2.ping_mode=A
> worker.app-2.connection_pool_timeout=600
> worker.app-2.secret=xxx
>
> # Create a loadbalancer for app
> worker.app.type=lb
> worker.app.sticky_session=1
> worker.app.balance_workers=app-1, app-2
>
> Best regards,
> Andreas
>
>
> Andreas Toom
> Systemutvecklare/IT
> TEL: +46 702 03 04 98
> [Volvofinans Bank AB]<http://www.volvofinans.se/>
>

Hi.
First, thanks for the very complete issue description.

This page may be of help :
https://tomcat.apache.org/connectors-doc/common_howto/timeouts.html
(which considering the above, you probably have read already).

I did not really analyse in detail your description, nor that page.
But I am just wondering if in this case, setting both the
worker.app-x.connection_pool_timeout=600
and the
<Connector .. connectionTimeout="600000" ..>
(to exactly the same value in seconds)
is not in the end counter-productive.

This is just a hunch, because determining the exact interaction and side-effects of these 
multiple timeout parameters is somewhat of a black art, but it is maybe worth changing one 
of them (for example back to the default value), and check what happens then.
(You say that traffic is low, so you probably do hit one or more of these timeouts at some 
point)

That is not to say that there may not be some bug somewhere in the code. But doing the 
experiment above and reporting the result may provide some more clues for someone to 
analyse your issue in detail.

You may also want to try swapping the AJP Connector, using the following attribute :
http://tomcat.apache.org/tomcat-8.5-doc/config/ajp.html#Common_Attributes
--> protocol

Of course, doing such changes one at a time facilitates the analysis.


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org