You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tomcat.apache.org by Martin Grigorov <mg...@apache.org> on 2020/06/04 15:33:00 UTC

Usage of SynchronizedStack/Queue

Hi team,

I am profiling our application and I've found that Tomcat uses its own
Synchronized Queue/Stack implementations which synchronize on instance
level, i.e. their methods are 'synchronized'.

The javadoc for those two classes say that they are preferred because they
are GC-friendlier than
java.uti.concurrent.ConcurrentLinkedQueue.

I use Async Profiler (https://github.com/jvm-profiling-tools/async-profiler)
to profile CPU, memory,
locks and few lower level types (from perf).
Issuing "./profiler.sh -e lock -d 60 PID" against 9.0.x shows such reports:

Started [lock] profiling
                                                                   ---
Execution profile ---
                                                               Total
samples       : 713834


                                                        Frame buffer usage
 : 0.0288%


                                    --- 1106165618177 ns (58.97%), 395609
samples

                   [ 0]
org.apache.tomcat.util.collections.SynchronizedStack

                                                   [ 1]
org.apache.tomcat.util.collections.SynchronizedStack.push

                                                  [ 2]
org.apache.tomcat.util.net.NioBlockingSelector.write

                                                   [ 3]
org.apache.tomcat.util.net.NioSelectorPool.write

                                                   [ 4]
org.apache.tomcat.util.net.NioEndpoint$NioSocketWrapper.doWrite

                                                  [ 5]
org.apache.tomcat.util.net.SocketWrapperBase.doWrite

                                                   [ 6]
org.apache.tomcat.util.net.SocketWrapperBase.flushBlocking

                                                   [ 7]
org.apache.tomcat.util.net.SocketWrapperBase.flush

                                                   [ 8]
org.apache.coyote.http11.Http11OutputBuffer$SocketOutputBuffer.end
  [ 9] org.apache.coyote.http11.filters.IdentityOutputFilter.end

  [10] org.apache.coyote.http11.Http11OutputBuffer.end
  [11] org.apache.coyote.http11.Http11Processor.finishResponse
  [12] org.apache.coyote.AbstractProcessor.action
  [13] org.apache.coyote.Response.action
  [14] org.apache.catalina.connector.OutputBuffer.close
  [15] org.apache.catalina.connector.Response.finishResponse
  [16] org.apache.catalina.connector.CoyoteAdapter.service
  [17] org.apache.coyote.http11.Http11Processor.service
  [18] org.apache.coyote.AbstractProcessorLight.process
  [19] org.apache.coyote.AbstractProtocol$ConnectionHandler.process
  [20] org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun
  [21] org.apache.tomcat.util.net.SocketProcessorBase.run
  [22] java.util.concurrent.ThreadPoolExecutor.runWorker
  [23] java.util.concurrent.ThreadPoolExecutor$Worker.run
  [24] org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run
  [25] java.lang.Thread.run

--- 769519847169 ns (41.03%), 314384 samples
  [ 0] org.apache.tomcat.util.collections.SynchronizedStack
  [ 1] org.apache.tomcat.util.collections.SynchronizedStack.pop
  [ 2] org.apache.tomcat.util.net.NioBlockingSelector.write
  [ 3] org.apache.tomcat.util.net.NioSelectorPool.write
  [ 4] org.apache.tomcat.util.net.NioEndpoint$NioSocketWrapper.doWrite
  [ 5] org.apache.tomcat.util.net.SocketWrapperBase.doWrite
  [ 6] org.apache.tomcat.util.net.SocketWrapperBase.flushBlocking
  [ 7] org.apache.tomcat.util.net.SocketWrapperBase.flush
  [ 8] org.apache.coyote.http11.Http11OutputBuffer$SocketOutputBuffer.end
  [ 9] org.apache.coyote.http11.filters.IdentityOutputFilter.end
  [10] org.apache.coyote.http11.Http11OutputBuffer.end
  [11] org.apache.coyote.http11.Http11Processor.finishResponse
   ....


I've made the following changes -
https://github.com/apache/tomcat/compare/9.0.x...martin-g:feature/lock-free-queue-and-stack
and ran the test suite - all green!

Now the output from the locks profiling looks like:


--- 72354435197 ns (47.98%), 4008 samples
  [ 0] java.util.concurrent.locks.ReentrantLock$NonfairSync
  [ 1] jdk.internal.misc.Unsafe.park
  [ 2] java.util.concurrent.locks.LockSupport.park
  [ 3] java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire
  [ 4]
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly
  [ 5] java.util.concurrent.locks.ReentrantLock$Sync.lockInterruptibly
  [ 6] java.util.concurrent.locks.ReentrantLock.lockInterruptibly
  [ 7] java.util.concurrent.LinkedBlockingQueue.poll
  [ 8] org.apache.tomcat.util.threads.TaskQueue.poll
  [ 9] org.apache.tomcat.util.threads.TaskQueue.poll
  [10] java.util.concurrent.ThreadPoolExecutor.getTask
  [11] java.util.concurrent.ThreadPoolExecutor.runWorker
  [12] java.util.concurrent.ThreadPoolExecutor$Worker.run
  [13] org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run
  [14] java.lang.Thread.run

--- 58413748017 ns (38.74%), 10179 samples
  [ 0] java.lang.Object
  [ 1] org.apache.catalina.core.StandardContext$ContextFilterMaps.asArray
  [ 2] org.apache.catalina.core.StandardContext.findFilterMaps
  [ 3] org.apache.catalina.core.ApplicationFilterFactory.createFilterChain
  [ 4] org.apache.catalina.core.StandardWrapperValve.invoke
  [ 5] org.apache.catalina.core.StandardContextValve.invoke
  [ 6] org.apache.catalina.authenticator.AuthenticatorBase.invoke
  [ 7] org.apache.catalina.core.StandardHostValve.invoke
  [ 8] org.apache.catalina.valves.ErrorReportValve.invoke
  [ 9] org.apache.catalina.core.StandardEngineValve.invoke
  [10] org.apache.catalina.connector.CoyoteAdapter.service
  [11] org.apache.coyote.http11.Http11Processor.service
  [12] org.apache.coyote.AbstractProcessorLight.process
  [13] org.apache.coyote.AbstractProtocol$ConnectionHandler.process
  [14] org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun
  [15] org.apache.tomcat.util.net.SocketProcessorBase.run
  [16] java.util.concurrent.ThreadPoolExecutor.runWorker
  [17] java.util.concurrent.ThreadPoolExecutor$Worker.run
  [18] org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run
  [19] java.lang.Thread.run

--- 5856040617 ns (3.88%), 11395 samples
  [ 0] org.apache.tomcat.util.collections.SynchronizedQueue
  [ 1] org.apache.tomcat.util.collections.SynchronizedQueue.offer
  [ 2] org.apache.tomcat.util.net.NioEndpoint$Poller.addEvent
  [ 3] org.apache.tomcat.util.net.NioEndpoint$Poller.add
  [ 4]
org.apache.tomcat.util.net.NioEndpoint$NioSocketWrapper.registerReadInterest
  [ 5] org.apache.coyote.AbstractProtocol$ConnectionHandler.process
  ...


I monitor the GC both with -Xloggc:gc.log and with JMX and the patterns are
the same, no new spikes.
JDK 14 with G1GC.

I load the application with wrk (https://github.com/wg/wrk/) and it makes
70/80K per second.

I guess there is a reason to use these custom synchronized data structures
but so far I don't see any drawbacks after moving to the lock-free impls.

Regards,
Martin

Re: Usage of SynchronizedStack/Queue

Posted by Martin Grigorov <mg...@apache.org>.
On Fri, Jun 5, 2020 at 6:29 PM Mark Thomas <ma...@apache.org> wrote:

> On 05/06/2020 14:08, Martin Grigorov wrote:
> > For load testing it I use wrk: wrk -c96 -t8 -d60s https://host:port/test
> > The GC JVM arguments are: -Xloggc:./gc.log -XX:+PrintGCDetails
> -verbose:gc
> > JDK 1.8.0 b252
> >
> > The GC logs are:
> > - Tomcat 9.0.x (uses
> > SynchronizedQueue/Stack):
> https://gist.github.com/martin-g/d2570e7a6896e4d094ce548ceea3adb6
> > - Tomcat 9.0.x with my
> > changes:
> https://gist.github.com/martin-g/52c7d3a883b37e9bcd11ad6430800852
> > I've uploaded them to https://gceasy.io/ and the charts are similar.
>
> lock-free has marginally higher throughput.
>
> It has a longer GC pause but that might be a result of running longer
> (~x2 longer).
>
> Average creation rate and average promotion rate are lower.
>
> Given this was with Java 8 I'm +1 to switching Tomcat 10 and also +1 to
> switching Tomcat 9 if we can do it without impacting the API.
>

I've
noticed org.apache.tomcat.util.collections.TesterPerformanceSynchronizedStack
and org.apache.tomcat.util.collections.TesterPerformanceSynchronizedQueue
in test/ folder.
Those give 1.5-3 times better results for SynchronizedQueue/Stack than Java
Collections impls.
I am not so sure my proposal is a good one anymore.


>
> I'm neutral on making the change in Tomcat 8.5 and -0.5 on Tomcat 7.
>
> Mark
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
> For additional commands, e-mail: dev-help@tomcat.apache.org
>
>

Re: Usage of SynchronizedStack/Queue

Posted by Mark Thomas <ma...@apache.org>.
On 05/06/2020 14:08, Martin Grigorov wrote:
> For load testing it I use wrk: wrk -c96 -t8 -d60s https://host:port/test
> The GC JVM arguments are: -Xloggc:./gc.log -XX:+PrintGCDetails -verbose:gc
> JDK 1.8.0 b252
> 
> The GC logs are:
> - Tomcat 9.0.x (uses
> SynchronizedQueue/Stack): https://gist.github.com/martin-g/d2570e7a6896e4d094ce548ceea3adb6
> - Tomcat 9.0.x with my
> changes: https://gist.github.com/martin-g/52c7d3a883b37e9bcd11ad6430800852
> I've uploaded them to https://gceasy.io/ and the charts are similar.

lock-free has marginally higher throughput.

It has a longer GC pause but that might be a result of running longer
(~x2 longer).

Average creation rate and average promotion rate are lower.

Given this was with Java 8 I'm +1 to switching Tomcat 10 and also +1 to
switching Tomcat 9 if we can do it without impacting the API.

I'm neutral on making the change in Tomcat 8.5 and -0.5 on Tomcat 7.

Mark

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


Re: Usage of SynchronizedStack/Queue

Posted by Rémy Maucherat <re...@apache.org>.
On Fri, Jun 5, 2020 at 3:08 PM Martin Grigorov <mg...@apache.org> wrote:

> Hi,
>
> On Thu, Jun 4, 2020 at 9:28 PM Rémy Maucherat <re...@apache.org> wrote:
>
>> On Thu, Jun 4, 2020 at 6:16 PM Mark Thomas <ma...@apache.org> wrote:
>>
>>> On 04/06/2020 16:33, Martin Grigorov wrote:
>>> > Hi team,
>>> >
>>> > I am profiling our application and I've found that Tomcat uses its own
>>> > Synchronized Queue/Stack implementations which synchronize on instance
>>> > level, i.e. their methods are 'synchronized'.
>>>
>>
>> Can you work off 10 (master) first ?
>>
>
> The application is based on Spring Boot and my attempts to
> "jakartaee migrate it" were not successful.
>
> But I've created a very simple Servlet and I will try it on Tomcat 10.x
> soon:
>
> @WebServlet(urlPatterns = "/test")
> public class PlainTextServlet extends HttpServlet {
>
>     public static final String CONTENT = "Hello world!";
>     public static final int CONTENT_LENGTH = CONTENT.length();
>
>     @Override
>     protected void doGet(final HttpServletRequest req, final
> HttpServletResponse resp) throws IOException {
>          resp.setContentType("text/plain");
>          resp.setContentLength(CONTENT_LENGTH);
>          resp.getWriter().println(CONTENT);
>     }
> }
>
> For load testing it I use wrk: wrk -c96 -t8 -d60s https://host:port/test
> The GC JVM arguments are: -Xloggc:./gc.log -XX:+PrintGCDetails -verbose:gc
> JDK 1.8.0 b252
>
> The GC logs are:
> - Tomcat 9.0.x (uses SynchronizedQueue/Stack):
> https://gist.github.com/martin-g/d2570e7a6896e4d094ce548ceea3adb6
> - Tomcat 9.0.x with my changes:
> https://gist.github.com/martin-g/52c7d3a883b37e9bcd11ad6430800852
> I've uploaded them to https://gceasy.io/ and the charts are similar.
>
> The summary from wrk with SynchronizedQueue/Stack is:
>
> Running 1m test @ http://192.168.0.206:8080
>   8 threads and 96 connections
>   Thread Stats   Avg      Stdev     Max   +/- Stdev
>     Latency     1.24ms  788.15us  19.46ms   91.44%
>     Req/Sec    10.04k     1.64k   12.45k    79.48%
>   4797193 requests in 1.00m, 600.19MB read
> Requests/sec:  79834.45
> Transfer/sec:      9.99MB
>
> I didn't preserve the summary from the lock-free run but it was similar -
> the throughput is around 80K/sec +- few thousands.
> I.e. the throughput does not improve after my changes!
> Only the locking profiling results become better (Async Profiler:
> ./profile.sh -e lock -d50s -f lock-results.txt PID)
>

I rarely got any gains when making an optimization lately. But this sounds
good [or not bad at least].

Rémy


>
> Martin
>
>
>>
>>>
>>> ~8 years ago when I was profiling Tomcat and wrote those having
>>> something targetted to Tomcat's specific use case made a measurable
>>> improvement.
>>>
>>
>> Yes, the SyncStack was the go to structure for all reuse. Not sure if
>> it's the best.
>>
>> I already removed some of the usage in 10:
>> - No more block poller and its pooling/recycling [I had to pull its
>> removal out of 9 due to an unfortunate regression, this could be brought
>> back now that it seems to work]
>> - No recycling of trivial objects
>>
>> The main use should now be the recycled processors and the recycled
>> channels (it would be bad to not reuse them since they may use direct
>> buffers ...).
>>
>>
>>>
>>> I am more than prepared to believe that things have moved on since then
>>> but I'd be interested in seeing the same numbers with the latest Java 8
>>> from Adopt OpenJDK rather than Java 14.
>>>
>>
>> +1, firnally there could be some differences between JDK, like now they
>> started to actually improve NIO 1 again.
>>
>> Rémy
>>
>>
>>>
>>> I'd also like to see overall throughput numbers and the GC frequency for
>>> each test scenario.
>>>
>>> Mark
>>>
>>>
>>>
>>> >
>>> > The javadoc for those two classes say that they are preferred because
>>> > they are GC-friendlier than
>>> > java.uti.concurrent.ConcurrentLinkedQueue.
>>> >
>>> > I use Async Profiler
>>> > (https://github.com/jvm-profiling-tools/async-profiler) to profile
>>> CPU,
>>> > memory,
>>> > locks and few lower level types (from perf).
>>> > Issuing "./profiler.sh -e lock -d 60 PID" against 9.0.x shows such
>>> reports:
>>> >
>>> > Started [lock] profiling
>>> >
>>> >  --- Execution profile ---
>>> >
>>> >  Total samples       : 713834
>>>
>>> >
>>> >
>>> >
>>> >   Frame buffer usage  : 0.0288%
>>>
>>> >
>>> >
>>> >                                                                     ---
>>> > 1106165618177 ns (58.97%), 395609 samples
>>> >
>>> >                                                              [ 0]
>>> > org.apache.tomcat.util.collections.SynchronizedStack
>>>
>>> >
>>> >                                                          [ 1]
>>> > org.apache.tomcat.util.collections.SynchronizedStack.push
>>> >
>>> >                                                           [ 2]
>>> > org.apache.tomcat.util.net.NioBlockingSelector.write
>>>
>>> >
>>> >                                                          [ 3]
>>> > org.apache.tomcat.util.net.NioSelectorPool.write
>>>
>>> >
>>> >                                                          [ 4]
>>> > org.apache.tomcat.util.net.NioEndpoint$NioSocketWrapper.doWrite
>>>
>>> >
>>> >                                                           [ 5]
>>> > org.apache.tomcat.util.net.SocketWrapperBase.doWrite
>>>
>>> >
>>> >                                                          [ 6]
>>> > org.apache.tomcat.util.net.SocketWrapperBase.flushBlocking
>>>
>>> >
>>> >                                                          [ 7]
>>> > org.apache.tomcat.util.net.SocketWrapperBase.flush
>>>
>>> >
>>> >                                                          [ 8]
>>> > org.apache.coyote.http11.Http11OutputBuffer$SocketOutputBuffer.end
>>>
>>> >   [ 9] org.apache.coyote.http11.filters.IdentityOutputFilter.end
>>>
>>> >
>>> >   [10] org.apache.coyote.http11.Http11OutputBuffer.end
>>> >   [11] org.apache.coyote.http11.Http11Processor.finishResponse
>>> >   [12] org.apache.coyote.AbstractProcessor.action
>>> >   [13] org.apache.coyote.Response.action
>>> >   [14] org.apache.catalina.connector.OutputBuffer.close
>>> >   [15] org.apache.catalina.connector.Response.finishResponse
>>> >   [16] org.apache.catalina.connector.CoyoteAdapter.service
>>> >   [17] org.apache.coyote.http11.Http11Processor.service
>>> >   [18] org.apache.coyote.AbstractProcessorLight.process
>>> >   [19] org.apache.coyote.AbstractProtocol$ConnectionHandler.process
>>> >   [20] org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun
>>> >   [21] org.apache.tomcat.util.net.SocketProcessorBase.run
>>> >   [22] java.util.concurrent.ThreadPoolExecutor.runWorker
>>> >   [23] java.util.concurrent.ThreadPoolExecutor$Worker.run
>>> >   [24] org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run
>>> >   [25] java.lang.Thread.run
>>> >
>>> > --- 769519847169 ns (41.03%), 314384 samples
>>> >   [ 0] org.apache.tomcat.util.collections.SynchronizedStack
>>> >   [ 1] org.apache.tomcat.util.collections.SynchronizedStack.pop
>>> >   [ 2] org.apache.tomcat.util.net.NioBlockingSelector.write
>>> >   [ 3] org.apache.tomcat.util.net.NioSelectorPool.write
>>> >   [ 4] org.apache.tomcat.util.net.NioEndpoint$NioSocketWrapper.doWrite
>>> >   [ 5] org.apache.tomcat.util.net.SocketWrapperBase.doWrite
>>> >   [ 6] org.apache.tomcat.util.net.SocketWrapperBase.flushBlocking
>>> >   [ 7] org.apache.tomcat.util.net.SocketWrapperBase.flush
>>> >   [ 8]
>>> org.apache.coyote.http11.Http11OutputBuffer$SocketOutputBuffer.end
>>> >   [ 9] org.apache.coyote.http11.filters.IdentityOutputFilter.end
>>> >   [10] org.apache.coyote.http11.Http11OutputBuffer.end
>>> >   [11] org.apache.coyote.http11.Http11Processor.finishResponse
>>> >    ....
>>> >
>>> >
>>> > I've made the following changes
>>> > -
>>> https://github.com/apache/tomcat/compare/9.0.x...martin-g:feature/lock-free-queue-and-stack
>>> > and ran the test suite - all green!
>>> >
>>> > Now the output from the locks profiling looks like:
>>> >
>>> >
>>> > --- 72354435197 ns (47.98%), 4008 samples
>>> >   [ 0] java.util.concurrent.locks.ReentrantLock$NonfairSync
>>> >   [ 1] jdk.internal.misc.Unsafe.park
>>> >   [ 2] java.util.concurrent.locks.LockSupport.park
>>> >   [ 3] java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire
>>> >   [ 4]
>>> >
>>> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly
>>> >   [ 5] java.util.concurrent.locks.ReentrantLock$Sync.lockInterruptibly
>>> >   [ 6] java.util.concurrent.locks.ReentrantLock.lockInterruptibly
>>> >   [ 7] java.util.concurrent.LinkedBlockingQueue.poll
>>> >   [ 8] org.apache.tomcat.util.threads.TaskQueue.poll
>>> >   [ 9] org.apache.tomcat.util.threads.TaskQueue.poll
>>> >   [10] java.util.concurrent.ThreadPoolExecutor.getTask
>>> >   [11] java.util.concurrent.ThreadPoolExecutor.runWorker
>>> >   [12] java.util.concurrent.ThreadPoolExecutor$Worker.run
>>> >   [13] org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run
>>> >   [14] java.lang.Thread.run
>>> >
>>> > --- 58413748017 ns (38.74%), 10179 samples
>>> >   [ 0] java.lang.Object
>>> >   [ 1]
>>> org.apache.catalina.core.StandardContext$ContextFilterMaps.asArray
>>> >   [ 2] org.apache.catalina.core.StandardContext.findFilterMaps
>>> >   [ 3]
>>> org.apache.catalina.core.ApplicationFilterFactory.createFilterChain
>>> >   [ 4] org.apache.catalina.core.StandardWrapperValve.invoke
>>> >   [ 5] org.apache.catalina.core.StandardContextValve.invoke
>>> >   [ 6] org.apache.catalina.authenticator.AuthenticatorBase.invoke
>>> >   [ 7] org.apache.catalina.core.StandardHostValve.invoke
>>> >   [ 8] org.apache.catalina.valves.ErrorReportValve.invoke
>>> >   [ 9] org.apache.catalina.core.StandardEngineValve.invoke
>>> >   [10] org.apache.catalina.connector.CoyoteAdapter.service
>>> >   [11] org.apache.coyote.http11.Http11Processor.service
>>> >   [12] org.apache.coyote.AbstractProcessorLight.process
>>> >   [13] org.apache.coyote.AbstractProtocol$ConnectionHandler.process
>>> >   [14] org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun
>>> >   [15] org.apache.tomcat.util.net.SocketProcessorBase.run
>>> >   [16] java.util.concurrent.ThreadPoolExecutor.runWorker
>>> >   [17] java.util.concurrent.ThreadPoolExecutor$Worker.run
>>> >   [18] org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run
>>> >   [19] java.lang.Thread.run
>>> >
>>> > --- 5856040617 ns (3.88%), 11395 samples
>>> >   [ 0] org.apache.tomcat.util.collections.SynchronizedQueue
>>> >   [ 1] org.apache.tomcat.util.collections.SynchronizedQueue.offer
>>> >   [ 2] org.apache.tomcat.util.net.NioEndpoint$Poller.addEvent
>>> >   [ 3] org.apache.tomcat.util.net.NioEndpoint$Poller.add
>>> >   [ 4]
>>> > org.apache.tomcat.util.net
>>> .NioEndpoint$NioSocketWrapper.registerReadInterest
>>> >   [ 5] org.apache.coyote.AbstractProtocol$ConnectionHandler.process
>>> >   ...
>>> >
>>> >
>>> > I monitor the GC both with -Xloggc:gc.log and with JMX and the patterns
>>> > are the same, no new spikes.
>>> > JDK 14 with G1GC.
>>> >
>>> > I load the application with wrk (https://github.com/wg/wrk/) and it
>>> > makes 70/80K per second.
>>> >
>>> > I guess there is a reason to use these custom synchronized data
>>> > structures but so far I don't see any drawbacks after moving to the
>>> > lock-free impls.
>>> >
>>> > Regards,
>>> > Martin
>>> >
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
>>> For additional commands, e-mail: dev-help@tomcat.apache.org
>>>
>>>

Re: Usage of SynchronizedStack/Queue

Posted by Martin Grigorov <mg...@apache.org>.
Hi,

On Thu, Jun 4, 2020 at 9:28 PM Rémy Maucherat <re...@apache.org> wrote:

> On Thu, Jun 4, 2020 at 6:16 PM Mark Thomas <ma...@apache.org> wrote:
>
>> On 04/06/2020 16:33, Martin Grigorov wrote:
>> > Hi team,
>> >
>> > I am profiling our application and I've found that Tomcat uses its own
>> > Synchronized Queue/Stack implementations which synchronize on instance
>> > level, i.e. their methods are 'synchronized'.
>>
>
> Can you work off 10 (master) first ?
>

The application is based on Spring Boot and my attempts to
"jakartaee migrate it" were not successful.

But I've created a very simple Servlet and I will try it on Tomcat 10.x
soon:

@WebServlet(urlPatterns = "/test")
public class PlainTextServlet extends HttpServlet {

    public static final String CONTENT = "Hello world!";
    public static final int CONTENT_LENGTH = CONTENT.length();

    @Override
    protected void doGet(final HttpServletRequest req, final
HttpServletResponse resp) throws IOException {
         resp.setContentType("text/plain");
         resp.setContentLength(CONTENT_LENGTH);
         resp.getWriter().println(CONTENT);
    }
}

For load testing it I use wrk: wrk -c96 -t8 -d60s https://host:port/test
The GC JVM arguments are: -Xloggc:./gc.log -XX:+PrintGCDetails -verbose:gc
JDK 1.8.0 b252

The GC logs are:
- Tomcat 9.0.x (uses SynchronizedQueue/Stack):
https://gist.github.com/martin-g/d2570e7a6896e4d094ce548ceea3adb6
- Tomcat 9.0.x with my changes:
https://gist.github.com/martin-g/52c7d3a883b37e9bcd11ad6430800852
I've uploaded them to https://gceasy.io/ and the charts are similar.

The summary from wrk with SynchronizedQueue/Stack is:

Running 1m test @ http://192.168.0.206:8080
  8 threads and 96 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     1.24ms  788.15us  19.46ms   91.44%
    Req/Sec    10.04k     1.64k   12.45k    79.48%
  4797193 requests in 1.00m, 600.19MB read
Requests/sec:  79834.45
Transfer/sec:      9.99MB

I didn't preserve the summary from the lock-free run but it was similar -
the throughput is around 80K/sec +- few thousands.
I.e. the throughput does not improve after my changes!
Only the locking profiling results become better (Async Profiler:
./profile.sh -e lock -d50s -f lock-results.txt PID)

Martin


>
>>
>> ~8 years ago when I was profiling Tomcat and wrote those having
>> something targetted to Tomcat's specific use case made a measurable
>> improvement.
>>
>
> Yes, the SyncStack was the go to structure for all reuse. Not sure if it's
> the best.
>
> I already removed some of the usage in 10:
> - No more block poller and its pooling/recycling [I had to pull its
> removal out of 9 due to an unfortunate regression, this could be brought
> back now that it seems to work]
> - No recycling of trivial objects
>
> The main use should now be the recycled processors and the recycled
> channels (it would be bad to not reuse them since they may use direct
> buffers ...).
>
>
>>
>> I am more than prepared to believe that things have moved on since then
>> but I'd be interested in seeing the same numbers with the latest Java 8
>> from Adopt OpenJDK rather than Java 14.
>>
>
> +1, firnally there could be some differences between JDK, like now they
> started to actually improve NIO 1 again.
>
> Rémy
>
>
>>
>> I'd also like to see overall throughput numbers and the GC frequency for
>> each test scenario.
>>
>> Mark
>>
>>
>>
>> >
>> > The javadoc for those two classes say that they are preferred because
>> > they are GC-friendlier than
>> > java.uti.concurrent.ConcurrentLinkedQueue.
>> >
>> > I use Async Profiler
>> > (https://github.com/jvm-profiling-tools/async-profiler) to profile CPU,
>> > memory,
>> > locks and few lower level types (from perf).
>> > Issuing "./profiler.sh -e lock -d 60 PID" against 9.0.x shows such
>> reports:
>> >
>> > Started [lock] profiling
>> >
>> >  --- Execution profile ---
>> >
>> >  Total samples       : 713834
>> >
>> >
>> >
>> >   Frame buffer usage  : 0.0288%
>> >
>> >
>> >                                                                     ---
>> > 1106165618177 ns (58.97%), 395609 samples
>> >
>> >                                                              [ 0]
>> > org.apache.tomcat.util.collections.SynchronizedStack
>> >
>> >                                                          [ 1]
>> > org.apache.tomcat.util.collections.SynchronizedStack.push
>> >
>> >                                                           [ 2]
>> > org.apache.tomcat.util.net.NioBlockingSelector.write
>>
>> >
>> >                                                          [ 3]
>> > org.apache.tomcat.util.net.NioSelectorPool.write
>>
>> >
>> >                                                          [ 4]
>> > org.apache.tomcat.util.net.NioEndpoint$NioSocketWrapper.doWrite
>> >
>> >                                                           [ 5]
>> > org.apache.tomcat.util.net.SocketWrapperBase.doWrite
>>
>> >
>> >                                                          [ 6]
>> > org.apache.tomcat.util.net.SocketWrapperBase.flushBlocking
>>
>> >
>> >                                                          [ 7]
>> > org.apache.tomcat.util.net.SocketWrapperBase.flush
>>
>> >
>> >                                                          [ 8]
>> > org.apache.coyote.http11.Http11OutputBuffer$SocketOutputBuffer.end
>> >   [ 9] org.apache.coyote.http11.filters.IdentityOutputFilter.end
>> >
>> >   [10] org.apache.coyote.http11.Http11OutputBuffer.end
>> >   [11] org.apache.coyote.http11.Http11Processor.finishResponse
>> >   [12] org.apache.coyote.AbstractProcessor.action
>> >   [13] org.apache.coyote.Response.action
>> >   [14] org.apache.catalina.connector.OutputBuffer.close
>> >   [15] org.apache.catalina.connector.Response.finishResponse
>> >   [16] org.apache.catalina.connector.CoyoteAdapter.service
>> >   [17] org.apache.coyote.http11.Http11Processor.service
>> >   [18] org.apache.coyote.AbstractProcessorLight.process
>> >   [19] org.apache.coyote.AbstractProtocol$ConnectionHandler.process
>> >   [20] org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun
>> >   [21] org.apache.tomcat.util.net.SocketProcessorBase.run
>> >   [22] java.util.concurrent.ThreadPoolExecutor.runWorker
>> >   [23] java.util.concurrent.ThreadPoolExecutor$Worker.run
>> >   [24] org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run
>> >   [25] java.lang.Thread.run
>> >
>> > --- 769519847169 ns (41.03%), 314384 samples
>> >   [ 0] org.apache.tomcat.util.collections.SynchronizedStack
>> >   [ 1] org.apache.tomcat.util.collections.SynchronizedStack.pop
>> >   [ 2] org.apache.tomcat.util.net.NioBlockingSelector.write
>> >   [ 3] org.apache.tomcat.util.net.NioSelectorPool.write
>> >   [ 4] org.apache.tomcat.util.net.NioEndpoint$NioSocketWrapper.doWrite
>> >   [ 5] org.apache.tomcat.util.net.SocketWrapperBase.doWrite
>> >   [ 6] org.apache.tomcat.util.net.SocketWrapperBase.flushBlocking
>> >   [ 7] org.apache.tomcat.util.net.SocketWrapperBase.flush
>> >   [ 8]
>> org.apache.coyote.http11.Http11OutputBuffer$SocketOutputBuffer.end
>> >   [ 9] org.apache.coyote.http11.filters.IdentityOutputFilter.end
>> >   [10] org.apache.coyote.http11.Http11OutputBuffer.end
>> >   [11] org.apache.coyote.http11.Http11Processor.finishResponse
>> >    ....
>> >
>> >
>> > I've made the following changes
>> > -
>> https://github.com/apache/tomcat/compare/9.0.x...martin-g:feature/lock-free-queue-and-stack
>> > and ran the test suite - all green!
>> >
>> > Now the output from the locks profiling looks like:
>> >
>> >
>> > --- 72354435197 ns (47.98%), 4008 samples
>> >   [ 0] java.util.concurrent.locks.ReentrantLock$NonfairSync
>> >   [ 1] jdk.internal.misc.Unsafe.park
>> >   [ 2] java.util.concurrent.locks.LockSupport.park
>> >   [ 3] java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire
>> >   [ 4]
>> >
>> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly
>> >   [ 5] java.util.concurrent.locks.ReentrantLock$Sync.lockInterruptibly
>> >   [ 6] java.util.concurrent.locks.ReentrantLock.lockInterruptibly
>> >   [ 7] java.util.concurrent.LinkedBlockingQueue.poll
>> >   [ 8] org.apache.tomcat.util.threads.TaskQueue.poll
>> >   [ 9] org.apache.tomcat.util.threads.TaskQueue.poll
>> >   [10] java.util.concurrent.ThreadPoolExecutor.getTask
>> >   [11] java.util.concurrent.ThreadPoolExecutor.runWorker
>> >   [12] java.util.concurrent.ThreadPoolExecutor$Worker.run
>> >   [13] org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run
>> >   [14] java.lang.Thread.run
>> >
>> > --- 58413748017 ns (38.74%), 10179 samples
>> >   [ 0] java.lang.Object
>> >   [ 1]
>> org.apache.catalina.core.StandardContext$ContextFilterMaps.asArray
>> >   [ 2] org.apache.catalina.core.StandardContext.findFilterMaps
>> >   [ 3]
>> org.apache.catalina.core.ApplicationFilterFactory.createFilterChain
>> >   [ 4] org.apache.catalina.core.StandardWrapperValve.invoke
>> >   [ 5] org.apache.catalina.core.StandardContextValve.invoke
>> >   [ 6] org.apache.catalina.authenticator.AuthenticatorBase.invoke
>> >   [ 7] org.apache.catalina.core.StandardHostValve.invoke
>> >   [ 8] org.apache.catalina.valves.ErrorReportValve.invoke
>> >   [ 9] org.apache.catalina.core.StandardEngineValve.invoke
>> >   [10] org.apache.catalina.connector.CoyoteAdapter.service
>> >   [11] org.apache.coyote.http11.Http11Processor.service
>> >   [12] org.apache.coyote.AbstractProcessorLight.process
>> >   [13] org.apache.coyote.AbstractProtocol$ConnectionHandler.process
>> >   [14] org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun
>> >   [15] org.apache.tomcat.util.net.SocketProcessorBase.run
>> >   [16] java.util.concurrent.ThreadPoolExecutor.runWorker
>> >   [17] java.util.concurrent.ThreadPoolExecutor$Worker.run
>> >   [18] org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run
>> >   [19] java.lang.Thread.run
>> >
>> > --- 5856040617 ns (3.88%), 11395 samples
>> >   [ 0] org.apache.tomcat.util.collections.SynchronizedQueue
>> >   [ 1] org.apache.tomcat.util.collections.SynchronizedQueue.offer
>> >   [ 2] org.apache.tomcat.util.net.NioEndpoint$Poller.addEvent
>> >   [ 3] org.apache.tomcat.util.net.NioEndpoint$Poller.add
>> >   [ 4]
>> > org.apache.tomcat.util.net
>> .NioEndpoint$NioSocketWrapper.registerReadInterest
>> >   [ 5] org.apache.coyote.AbstractProtocol$ConnectionHandler.process
>> >   ...
>> >
>> >
>> > I monitor the GC both with -Xloggc:gc.log and with JMX and the patterns
>> > are the same, no new spikes.
>> > JDK 14 with G1GC.
>> >
>> > I load the application with wrk (https://github.com/wg/wrk/) and it
>> > makes 70/80K per second.
>> >
>> > I guess there is a reason to use these custom synchronized data
>> > structures but so far I don't see any drawbacks after moving to the
>> > lock-free impls.
>> >
>> > Regards,
>> > Martin
>> >
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
>> For additional commands, e-mail: dev-help@tomcat.apache.org
>>
>>

Re: Usage of SynchronizedStack/Queue

Posted by Rémy Maucherat <re...@apache.org>.
On Thu, Jun 4, 2020 at 6:16 PM Mark Thomas <ma...@apache.org> wrote:

> On 04/06/2020 16:33, Martin Grigorov wrote:
> > Hi team,
> >
> > I am profiling our application and I've found that Tomcat uses its own
> > Synchronized Queue/Stack implementations which synchronize on instance
> > level, i.e. their methods are 'synchronized'.
>

Can you work off 10 (master) first ?


>
> ~8 years ago when I was profiling Tomcat and wrote those having
> something targetted to Tomcat's specific use case made a measurable
> improvement.
>

Yes, the SyncStack was the go to structure for all reuse. Not sure if it's
the best.

I already removed some of the usage in 10:
- No more block poller and its pooling/recycling [I had to pull its removal
out of 9 due to an unfortunate regression, this could be brought back now
that it seems to work]
- No recycling of trivial objects

The main use should now be the recycled processors and the recycled
channels (it would be bad to not reuse them since they may use direct
buffers ...).


>
> I am more than prepared to believe that things have moved on since then
> but I'd be interested in seeing the same numbers with the latest Java 8
> from Adopt OpenJDK rather than Java 14.
>

+1, firnally there could be some differences between JDK, like now they
started to actually improve NIO 1 again.

Rémy


>
> I'd also like to see overall throughput numbers and the GC frequency for
> each test scenario.
>
> Mark
>
>
>
> >
> > The javadoc for those two classes say that they are preferred because
> > they are GC-friendlier than
> > java.uti.concurrent.ConcurrentLinkedQueue.
> >
> > I use Async Profiler
> > (https://github.com/jvm-profiling-tools/async-profiler) to profile CPU,
> > memory,
> > locks and few lower level types (from perf).
> > Issuing "./profiler.sh -e lock -d 60 PID" against 9.0.x shows such
> reports:
> >
> > Started [lock] profiling
> >
> >  --- Execution profile ---
> >
> >  Total samples       : 713834
> >
> >
> >
> >   Frame buffer usage  : 0.0288%
> >
> >
> >                                                                     ---
> > 1106165618177 ns (58.97%), 395609 samples
> >
> >                                                              [ 0]
> > org.apache.tomcat.util.collections.SynchronizedStack
> >
> >                                                          [ 1]
> > org.apache.tomcat.util.collections.SynchronizedStack.push
> >
> >                                                           [ 2]
> > org.apache.tomcat.util.net.NioBlockingSelector.write
> >
> >                                                          [ 3]
> > org.apache.tomcat.util.net.NioSelectorPool.write
> >
> >                                                          [ 4]
> > org.apache.tomcat.util.net.NioEndpoint$NioSocketWrapper.doWrite
> >
> >                                                           [ 5]
> > org.apache.tomcat.util.net.SocketWrapperBase.doWrite
> >
> >                                                          [ 6]
> > org.apache.tomcat.util.net.SocketWrapperBase.flushBlocking
> >
> >                                                          [ 7]
> > org.apache.tomcat.util.net.SocketWrapperBase.flush
> >
> >                                                          [ 8]
> > org.apache.coyote.http11.Http11OutputBuffer$SocketOutputBuffer.end
> >   [ 9] org.apache.coyote.http11.filters.IdentityOutputFilter.end
> >
> >   [10] org.apache.coyote.http11.Http11OutputBuffer.end
> >   [11] org.apache.coyote.http11.Http11Processor.finishResponse
> >   [12] org.apache.coyote.AbstractProcessor.action
> >   [13] org.apache.coyote.Response.action
> >   [14] org.apache.catalina.connector.OutputBuffer.close
> >   [15] org.apache.catalina.connector.Response.finishResponse
> >   [16] org.apache.catalina.connector.CoyoteAdapter.service
> >   [17] org.apache.coyote.http11.Http11Processor.service
> >   [18] org.apache.coyote.AbstractProcessorLight.process
> >   [19] org.apache.coyote.AbstractProtocol$ConnectionHandler.process
> >   [20] org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun
> >   [21] org.apache.tomcat.util.net.SocketProcessorBase.run
> >   [22] java.util.concurrent.ThreadPoolExecutor.runWorker
> >   [23] java.util.concurrent.ThreadPoolExecutor$Worker.run
> >   [24] org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run
> >   [25] java.lang.Thread.run
> >
> > --- 769519847169 ns (41.03%), 314384 samples
> >   [ 0] org.apache.tomcat.util.collections.SynchronizedStack
> >   [ 1] org.apache.tomcat.util.collections.SynchronizedStack.pop
> >   [ 2] org.apache.tomcat.util.net.NioBlockingSelector.write
> >   [ 3] org.apache.tomcat.util.net.NioSelectorPool.write
> >   [ 4] org.apache.tomcat.util.net.NioEndpoint$NioSocketWrapper.doWrite
> >   [ 5] org.apache.tomcat.util.net.SocketWrapperBase.doWrite
> >   [ 6] org.apache.tomcat.util.net.SocketWrapperBase.flushBlocking
> >   [ 7] org.apache.tomcat.util.net.SocketWrapperBase.flush
> >   [ 8] org.apache.coyote.http11.Http11OutputBuffer$SocketOutputBuffer.end
> >   [ 9] org.apache.coyote.http11.filters.IdentityOutputFilter.end
> >   [10] org.apache.coyote.http11.Http11OutputBuffer.end
> >   [11] org.apache.coyote.http11.Http11Processor.finishResponse
> >    ....
> >
> >
> > I've made the following changes
> > -
> https://github.com/apache/tomcat/compare/9.0.x...martin-g:feature/lock-free-queue-and-stack
> > and ran the test suite - all green!
> >
> > Now the output from the locks profiling looks like:
> >
> >
> > --- 72354435197 ns (47.98%), 4008 samples
> >   [ 0] java.util.concurrent.locks.ReentrantLock$NonfairSync
> >   [ 1] jdk.internal.misc.Unsafe.park
> >   [ 2] java.util.concurrent.locks.LockSupport.park
> >   [ 3] java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire
> >   [ 4]
> >
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly
> >   [ 5] java.util.concurrent.locks.ReentrantLock$Sync.lockInterruptibly
> >   [ 6] java.util.concurrent.locks.ReentrantLock.lockInterruptibly
> >   [ 7] java.util.concurrent.LinkedBlockingQueue.poll
> >   [ 8] org.apache.tomcat.util.threads.TaskQueue.poll
> >   [ 9] org.apache.tomcat.util.threads.TaskQueue.poll
> >   [10] java.util.concurrent.ThreadPoolExecutor.getTask
> >   [11] java.util.concurrent.ThreadPoolExecutor.runWorker
> >   [12] java.util.concurrent.ThreadPoolExecutor$Worker.run
> >   [13] org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run
> >   [14] java.lang.Thread.run
> >
> > --- 58413748017 ns (38.74%), 10179 samples
> >   [ 0] java.lang.Object
> >   [ 1] org.apache.catalina.core.StandardContext$ContextFilterMaps.asArray
> >   [ 2] org.apache.catalina.core.StandardContext.findFilterMaps
> >   [ 3]
> org.apache.catalina.core.ApplicationFilterFactory.createFilterChain
> >   [ 4] org.apache.catalina.core.StandardWrapperValve.invoke
> >   [ 5] org.apache.catalina.core.StandardContextValve.invoke
> >   [ 6] org.apache.catalina.authenticator.AuthenticatorBase.invoke
> >   [ 7] org.apache.catalina.core.StandardHostValve.invoke
> >   [ 8] org.apache.catalina.valves.ErrorReportValve.invoke
> >   [ 9] org.apache.catalina.core.StandardEngineValve.invoke
> >   [10] org.apache.catalina.connector.CoyoteAdapter.service
> >   [11] org.apache.coyote.http11.Http11Processor.service
> >   [12] org.apache.coyote.AbstractProcessorLight.process
> >   [13] org.apache.coyote.AbstractProtocol$ConnectionHandler.process
> >   [14] org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun
> >   [15] org.apache.tomcat.util.net.SocketProcessorBase.run
> >   [16] java.util.concurrent.ThreadPoolExecutor.runWorker
> >   [17] java.util.concurrent.ThreadPoolExecutor$Worker.run
> >   [18] org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run
> >   [19] java.lang.Thread.run
> >
> > --- 5856040617 ns (3.88%), 11395 samples
> >   [ 0] org.apache.tomcat.util.collections.SynchronizedQueue
> >   [ 1] org.apache.tomcat.util.collections.SynchronizedQueue.offer
> >   [ 2] org.apache.tomcat.util.net.NioEndpoint$Poller.addEvent
> >   [ 3] org.apache.tomcat.util.net.NioEndpoint$Poller.add
> >   [ 4]
> > org.apache.tomcat.util.net
> .NioEndpoint$NioSocketWrapper.registerReadInterest
> >   [ 5] org.apache.coyote.AbstractProtocol$ConnectionHandler.process
> >   ...
> >
> >
> > I monitor the GC both with -Xloggc:gc.log and with JMX and the patterns
> > are the same, no new spikes.
> > JDK 14 with G1GC.
> >
> > I load the application with wrk (https://github.com/wg/wrk/) and it
> > makes 70/80K per second.
> >
> > I guess there is a reason to use these custom synchronized data
> > structures but so far I don't see any drawbacks after moving to the
> > lock-free impls.
> >
> > Regards,
> > Martin
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
> For additional commands, e-mail: dev-help@tomcat.apache.org
>
>

Re: Usage of SynchronizedStack/Queue

Posted by Mark Thomas <ma...@apache.org>.
On 04/06/2020 16:33, Martin Grigorov wrote:
> Hi team,
> 
> I am profiling our application and I've found that Tomcat uses its own
> Synchronized Queue/Stack implementations which synchronize on instance
> level, i.e. their methods are 'synchronized'.

~8 years ago when I was profiling Tomcat and wrote those having
something targetted to Tomcat's specific use case made a measurable
improvement.

I am more than prepared to believe that things have moved on since then
but I'd be interested in seeing the same numbers with the latest Java 8
from Adopt OpenJDK rather than Java 14.

I'd also like to see overall throughput numbers and the GC frequency for
each test scenario.

Mark



> 
> The javadoc for those two classes say that they are preferred because
> they are GC-friendlier than
> java.uti.concurrent.ConcurrentLinkedQueue.
> 
> I use Async Profiler
> (https://github.com/jvm-profiling-tools/async-profiler) to profile CPU,
> memory,
> locks and few lower level types (from perf).
> Issuing "./profiler.sh -e lock -d 60 PID" against 9.0.x shows such reports:
> 
> Started [lock] profiling                                               
>                                                                      
>  --- Execution profile ---                                             
>                                                                      
>  Total samples       : 713834                                           
>                                                                        
>                                                                        
>                                                                        
>   Frame buffer usage  : 0.0288%                                         
>                                                                        
>                                                                        
>                                                                     ---
> 1106165618177 ns (58.97%), 395609 samples                              
>                                                                        
>                                                              [ 0]
> org.apache.tomcat.util.collections.SynchronizedStack                    
>                                                                        
>                                                          [ 1]
> org.apache.tomcat.util.collections.SynchronizedStack.push              
>                                                                        
>                                                           [ 2]
> org.apache.tomcat.util.net.NioBlockingSelector.write                    
>                                                                        
>                                                          [ 3]
> org.apache.tomcat.util.net.NioSelectorPool.write                        
>                                                                        
>                                                          [ 4]
> org.apache.tomcat.util.net.NioEndpoint$NioSocketWrapper.doWrite        
>                                                                        
>                                                           [ 5]
> org.apache.tomcat.util.net.SocketWrapperBase.doWrite                    
>                                                                        
>                                                          [ 6]
> org.apache.tomcat.util.net.SocketWrapperBase.flushBlocking              
>                                                                        
>                                                          [ 7]
> org.apache.tomcat.util.net.SocketWrapperBase.flush                      
>                                                                        
>                                                          [ 8]
> org.apache.coyote.http11.Http11OutputBuffer$SocketOutputBuffer.end      
>   [ 9] org.apache.coyote.http11.filters.IdentityOutputFilter.end        
>                                                    
>   [10] org.apache.coyote.http11.Http11OutputBuffer.end              
>   [11] org.apache.coyote.http11.Http11Processor.finishResponse      
>   [12] org.apache.coyote.AbstractProcessor.action
>   [13] org.apache.coyote.Response.action
>   [14] org.apache.catalina.connector.OutputBuffer.close
>   [15] org.apache.catalina.connector.Response.finishResponse
>   [16] org.apache.catalina.connector.CoyoteAdapter.service
>   [17] org.apache.coyote.http11.Http11Processor.service
>   [18] org.apache.coyote.AbstractProcessorLight.process
>   [19] org.apache.coyote.AbstractProtocol$ConnectionHandler.process
>   [20] org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun
>   [21] org.apache.tomcat.util.net.SocketProcessorBase.run
>   [22] java.util.concurrent.ThreadPoolExecutor.runWorker
>   [23] java.util.concurrent.ThreadPoolExecutor$Worker.run
>   [24] org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run
>   [25] java.lang.Thread.run
> 
> --- 769519847169 ns (41.03%), 314384 samples
>   [ 0] org.apache.tomcat.util.collections.SynchronizedStack
>   [ 1] org.apache.tomcat.util.collections.SynchronizedStack.pop
>   [ 2] org.apache.tomcat.util.net.NioBlockingSelector.write
>   [ 3] org.apache.tomcat.util.net.NioSelectorPool.write
>   [ 4] org.apache.tomcat.util.net.NioEndpoint$NioSocketWrapper.doWrite
>   [ 5] org.apache.tomcat.util.net.SocketWrapperBase.doWrite
>   [ 6] org.apache.tomcat.util.net.SocketWrapperBase.flushBlocking
>   [ 7] org.apache.tomcat.util.net.SocketWrapperBase.flush
>   [ 8] org.apache.coyote.http11.Http11OutputBuffer$SocketOutputBuffer.end
>   [ 9] org.apache.coyote.http11.filters.IdentityOutputFilter.end
>   [10] org.apache.coyote.http11.Http11OutputBuffer.end
>   [11] org.apache.coyote.http11.Http11Processor.finishResponse 
>    ....
> 
> 
> I've made the following changes
> - https://github.com/apache/tomcat/compare/9.0.x...martin-g:feature/lock-free-queue-and-stack
> and ran the test suite - all green!
> 
> Now the output from the locks profiling looks like:
> 
> 
> --- 72354435197 ns (47.98%), 4008 samples
>   [ 0] java.util.concurrent.locks.ReentrantLock$NonfairSync
>   [ 1] jdk.internal.misc.Unsafe.park
>   [ 2] java.util.concurrent.locks.LockSupport.park
>   [ 3] java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire
>   [ 4]
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly
>   [ 5] java.util.concurrent.locks.ReentrantLock$Sync.lockInterruptibly
>   [ 6] java.util.concurrent.locks.ReentrantLock.lockInterruptibly
>   [ 7] java.util.concurrent.LinkedBlockingQueue.poll
>   [ 8] org.apache.tomcat.util.threads.TaskQueue.poll
>   [ 9] org.apache.tomcat.util.threads.TaskQueue.poll
>   [10] java.util.concurrent.ThreadPoolExecutor.getTask
>   [11] java.util.concurrent.ThreadPoolExecutor.runWorker
>   [12] java.util.concurrent.ThreadPoolExecutor$Worker.run
>   [13] org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run
>   [14] java.lang.Thread.run
> 
> --- 58413748017 ns (38.74%), 10179 samples
>   [ 0] java.lang.Object
>   [ 1] org.apache.catalina.core.StandardContext$ContextFilterMaps.asArray
>   [ 2] org.apache.catalina.core.StandardContext.findFilterMaps
>   [ 3] org.apache.catalina.core.ApplicationFilterFactory.createFilterChain
>   [ 4] org.apache.catalina.core.StandardWrapperValve.invoke
>   [ 5] org.apache.catalina.core.StandardContextValve.invoke
>   [ 6] org.apache.catalina.authenticator.AuthenticatorBase.invoke
>   [ 7] org.apache.catalina.core.StandardHostValve.invoke
>   [ 8] org.apache.catalina.valves.ErrorReportValve.invoke
>   [ 9] org.apache.catalina.core.StandardEngineValve.invoke
>   [10] org.apache.catalina.connector.CoyoteAdapter.service
>   [11] org.apache.coyote.http11.Http11Processor.service
>   [12] org.apache.coyote.AbstractProcessorLight.process
>   [13] org.apache.coyote.AbstractProtocol$ConnectionHandler.process
>   [14] org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun
>   [15] org.apache.tomcat.util.net.SocketProcessorBase.run
>   [16] java.util.concurrent.ThreadPoolExecutor.runWorker
>   [17] java.util.concurrent.ThreadPoolExecutor$Worker.run
>   [18] org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run
>   [19] java.lang.Thread.run
> 
> --- 5856040617 ns (3.88%), 11395 samples
>   [ 0] org.apache.tomcat.util.collections.SynchronizedQueue
>   [ 1] org.apache.tomcat.util.collections.SynchronizedQueue.offer
>   [ 2] org.apache.tomcat.util.net.NioEndpoint$Poller.addEvent
>   [ 3] org.apache.tomcat.util.net.NioEndpoint$Poller.add
>   [ 4]
> org.apache.tomcat.util.net.NioEndpoint$NioSocketWrapper.registerReadInterest
>   [ 5] org.apache.coyote.AbstractProtocol$ConnectionHandler.process
>   ...
> 
> 
> I monitor the GC both with -Xloggc:gc.log and with JMX and the patterns
> are the same, no new spikes. 
> JDK 14 with G1GC.
> 
> I load the application with wrk (https://github.com/wg/wrk/) and it
> makes 70/80K per second.
> 
> I guess there is a reason to use these custom synchronized data
> structures but so far I don't see any drawbacks after moving to the
> lock-free impls.
> 
> Regards,
> Martin
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org