You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@qpid.apache.org by "Alex Rudyy (JIRA)" <ji...@apache.org> on 2018/10/10 10:10:00 UTC
[jira] [Comment Edited] (QPID-8238) [Broker-J] Improve performance of asynchronous publishing of transient messages into topic exchange having queues bound using non-overlapping selectors

    [ https://issues.apache.org/jira/browse/QPID-8238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16644766#comment-16644766 ] 

Alex Rudyy edited comment on QPID-8238 at 10/10/18 10:09 AM:
-------------------------------------------------------------

Apart from lower throughput of NIO layer, there is another aspect to the performance issues I had failed to notice before. When running my performance tests for longer amount of time I started to see long GC pauses on cleaning of JVM Metaspace:

{noformat}
2018-10-10T05:32:11.250-0400: 1918.070: [GC (Allocation Failure) 2018-10-10T05:32:11.250-0400: 1918.071: [ParNew: 3145728K->3145728K(3145728K), 0.0000665 secs]2018-10-10T05:32:11.250-0400: 1918.071: [CMS2018-10-10T05:32:11.363-0400: 1918.183: [CMS-concurrent-mark: 1.696/1.700 secs] [Times: user=51.70 sys=1.93, real=1.70 secs]
 (concurrent mode failure): 6418793K->5067015K(6990528K), 43.9299868 secs] 9564521K->5067015K(10136256K), [Metaspace: 37679K->37679K(1083392K)], 43.9303498 secs] [Times: user=45.39 sys=0.00, real=43.92 secs]
2018-10-10T05:32:55.181-0400: 1962.001: Total time for which application threads were stopped: 43.9323704 seconds, Stopping threads took: 0.0004930 seconds

2018-10-10T05:33:01.804-0400: 1968.624: [GC (Allocation Failure) 2018-10-10T05:33:01.804-0400: 1968.624: [ParNew: 3145728K->349504K(3145728K), 1.1281568 secs] 8881780K->6564365K(10136256K), 1.1284601 secs] [Times: user=47.24 sys=0.01, real=1.13 secs]
2018-10-10T05:33:02.933-0400: 1969.753: Total time for which application threads were stopped: 1.1302133 seconds, Stopping threads took: 0.0003888 seconds

2018-10-10T05:33:04.515-0400: 1971.335: [GC (Allocation Failure) 2018-10-10T05:33:04.515-0400: 1971.335: [ParNew: 3145728K->349504K(3145728K), 1.1778553 secs] 9360589K->7056409K(10136256K), 1.1781560 secs] [Times: user=50.64 sys=0.00, real=1.18 secs]
2018-10-10T05:33:05.693-0400: 1972.513: Total time for which application threads were stopped: 1.1804099 seconds, Stopping threads took: 0.0007907 seconds

2018-10-10T05:33:07.300-0400: 1974.120: [GC (Allocation Failure) 2018-10-10T05:33:07.300-0400: 1974.120: [ParNew: 3145728K->3145728K(3145728K), 0.0000554 secs]2018-10-10T05:33:07.300-0400: 1974.120: [CMS2018-10-10T05:33:13.021-0400: 1979.841: [CMS-concurrent-preclean: 8.160/9.850 secs] [Times: user=110.86 sys=3.14, real=9.85 secs]
 (concurrent mode failure): 6706905K->5608432K(6990528K), 48.4980083 secs] 9852633K->5608432K(10136256K), [Metaspace: 37689K->37689K(1083392K)], 48.4983623 secs] [Times: user=48.53 sys=0.00, real=48.49 secs]
2018-10-10T05:33:55.798-0400: 2022.619: Total time for which application threads were stopped: 48.5008915 seconds, Stopping threads took: 0.0008321 seconds

2018-10-10T05:33:59.596-0400: 2026.417: [GC (Allocation Failure) 2018-10-10T05:33:59.596-0400: 2026.417: [ParNew: 3145728K->349504K(3145728K), 1.2127988 secs] 8892795K->6665263K(10136256K), 1.2130993 secs] [Times: user=46.20 sys=0.10, real=1.21 secs]
2018-10-10T05:34:00.810-0400: 2027.630: Total time for which application threads were stopped: 1.2151635 seconds, Stopping threads took: 0.0006465 seconds

2018-10-10T05:34:02.374-0400: 2029.194: [GC (Allocation Failure) 2018-10-10T05:34:02.374-0400: 2029.194: [ParNew: 3145728K->3145728K(3145728K), 0.0000684 secs]2018-10-10T05:34:02.374-0400: 2029.195: [CMS2018-10-10T05:34:04.462-0400: 2031.282: [CMS-concurrent-mark: 3.616/3.626 secs] [Times: user=63.40 sys=3.03, real=3.62 secs]
 (concurrent mode failure): 6315759K->5859983K(6990528K), 44.3028251 secs] 9461487K->5859983K(10136256K), [Metaspace: 37695K->37695K(1083392K)], 44.3032096 secs] [Times: user=60.53 sys=0.71, real=44.30 secs]
2018-10-10T05:34:46.678-0400: 2073.498: Total time for which application threads were stopped: 44.3052052 seconds, Stopping threads took: 0.0005032 seconds

{noformat}

The nature of the pauses is not clear to me at the moment. I need to investigate this further.


was (Author: alex.rufous):
Apart from lover throughput of NIO layer, there is another aspect to the performance issues I had failed to notice before. When running my performance tests for longer amount of time I started to see long GC pauses on cleaning of JVM Metaspace:

{noformat}
2018-10-10T05:32:11.250-0400: 1918.070: [GC (Allocation Failure) 2018-10-10T05:32:11.250-0400: 1918.071: [ParNew: 3145728K->3145728K(3145728K), 0.0000665 secs]2018-10-10T05:32:11.250-0400: 1918.071: [CMS2018-10-10T05:32:11.363-0400: 1918.183: [CMS-concurrent-mark: 1.696/1.700 secs] [Times: user=51.70 sys=1.93, real=1.70 secs]
 (concurrent mode failure): 6418793K->5067015K(6990528K), 43.9299868 secs] 9564521K->5067015K(10136256K), [Metaspace: 37679K->37679K(1083392K)], 43.9303498 secs] [Times: user=45.39 sys=0.00, real=43.92 secs]
2018-10-10T05:32:55.181-0400: 1962.001: Total time for which application threads were stopped: 43.9323704 seconds, Stopping threads took: 0.0004930 seconds

2018-10-10T05:33:01.804-0400: 1968.624: [GC (Allocation Failure) 2018-10-10T05:33:01.804-0400: 1968.624: [ParNew: 3145728K->349504K(3145728K), 1.1281568 secs] 8881780K->6564365K(10136256K), 1.1284601 secs] [Times: user=47.24 sys=0.01, real=1.13 secs]
2018-10-10T05:33:02.933-0400: 1969.753: Total time for which application threads were stopped: 1.1302133 seconds, Stopping threads took: 0.0003888 seconds

2018-10-10T05:33:04.515-0400: 1971.335: [GC (Allocation Failure) 2018-10-10T05:33:04.515-0400: 1971.335: [ParNew: 3145728K->349504K(3145728K), 1.1778553 secs] 9360589K->7056409K(10136256K), 1.1781560 secs] [Times: user=50.64 sys=0.00, real=1.18 secs]
2018-10-10T05:33:05.693-0400: 1972.513: Total time for which application threads were stopped: 1.1804099 seconds, Stopping threads took: 0.0007907 seconds

2018-10-10T05:33:07.300-0400: 1974.120: [GC (Allocation Failure) 2018-10-10T05:33:07.300-0400: 1974.120: [ParNew: 3145728K->3145728K(3145728K), 0.0000554 secs]2018-10-10T05:33:07.300-0400: 1974.120: [CMS2018-10-10T05:33:13.021-0400: 1979.841: [CMS-concurrent-preclean: 8.160/9.850 secs] [Times: user=110.86 sys=3.14, real=9.85 secs]
 (concurrent mode failure): 6706905K->5608432K(6990528K), 48.4980083 secs] 9852633K->5608432K(10136256K), [Metaspace: 37689K->37689K(1083392K)], 48.4983623 secs] [Times: user=48.53 sys=0.00, real=48.49 secs]
2018-10-10T05:33:55.798-0400: 2022.619: Total time for which application threads were stopped: 48.5008915 seconds, Stopping threads took: 0.0008321 seconds

2018-10-10T05:33:59.596-0400: 2026.417: [GC (Allocation Failure) 2018-10-10T05:33:59.596-0400: 2026.417: [ParNew: 3145728K->349504K(3145728K), 1.2127988 secs] 8892795K->6665263K(10136256K), 1.2130993 secs] [Times: user=46.20 sys=0.10, real=1.21 secs]
2018-10-10T05:34:00.810-0400: 2027.630: Total time for which application threads were stopped: 1.2151635 seconds, Stopping threads took: 0.0006465 seconds

2018-10-10T05:34:02.374-0400: 2029.194: [GC (Allocation Failure) 2018-10-10T05:34:02.374-0400: 2029.194: [ParNew: 3145728K->3145728K(3145728K), 0.0000684 secs]2018-10-10T05:34:02.374-0400: 2029.195: [CMS2018-10-10T05:34:04.462-0400: 2031.282: [CMS-concurrent-mark: 3.616/3.626 secs] [Times: user=63.40 sys=3.03, real=3.62 secs]
 (concurrent mode failure): 6315759K->5859983K(6990528K), 44.3028251 secs] 9461487K->5859983K(10136256K), [Metaspace: 37695K->37695K(1083392K)], 44.3032096 secs] [Times: user=60.53 sys=0.71, real=44.30 secs]
2018-10-10T05:34:46.678-0400: 2073.498: Total time for which application threads were stopped: 44.3052052 seconds, Stopping threads took: 0.0005032 seconds

{noformat}

The nature of the pauses is not clear to me at the moment. I need to investigate this further.

> [Broker-J] Improve performance of asynchronous publishing of transient messages into topic exchange having queues bound using non-overlapping selectors 
> --------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: QPID-8238
>                 URL: https://issues.apache.org/jira/browse/QPID-8238
>             Project: Qpid
>          Issue Type: Improvement
>          Components: Broker-J
>    Affects Versions: qpid-java-broker-7.0.6
>            Reporter: Alex Rudyy
>            Priority: Major
>             Fix For: qpid-java-broker-7.1.0, qpid-java-broker-7.0.7
>
>         Attachments: 0001-QPID-8238-Use-java.lang.String-for-keys-and-values-i.patch
>
>
> The performance of asynchronous publishing of transient messages into topic exchange which routes messages into queues bound using non-overlapping selectors is 2-3 times slower than performance of 0.32 broker. The performance degradation is observed with AMQP 0.9, though, I suspect that the AMQP 0-10 protocol could be affected as well.
> I was running tests with 10 concurrent producers publishing messages  on separate connections using the same routing key into 10 different queues (subscribers queues) bound to the exchange using non-overlapping selectors.
> My testing showed that performance of 7.0 broker for this particular use case was 2-3 times worse than performance of 0.32 broker.
> The following factors contributed to degradation of performance:
> •	Copying data from direct memory into heap memory whilst decoding message headers. Due to this factor, the decoding of message headers is around twice slower. It seems it contributes around 70% to total performance degradation
> •	The message routing algorithm is slower due to need to support a new feature to route messages into bound exchanges (in addition to queues) using replacement routing key.
> •	AMQ short strings caching contributes 5-10% to total performance degradation. The caching was added to manage heap space more efficiently.
> The numbers provided here could be inaccurate due instrumentation overhead whilst profiling the issue.
> Potentially, caching can be turned off but that will not improve performance much.
> On other hand, adding of additional caching of strings to amqp-short-strings would improve the performance a bit. Whilst evaluating selectors, the fields used in selector expressions are represented as java strings but they get converted every time into amqp-short-strings when looking up for message header values. If 10 queues are bound to the exchange using the same binding key, the selector expression is evaluated 10 times for the incoming message. Thus, all selector field names are get converted into amqp-short-strings 10 times as well. It seems adding caching here can improve the performance. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org