You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@qpid.apache.org by "Ken Giusti (Created) (JIRA)" <ji...@apache.org> on 2012/03/07 19:56:57 UTC

[jira] [Created] (QPID-3890) Improve performance by reducing the use of the Queue's message lock

Improve performance by reducing the use of the Queue's message lock
-------------------------------------------------------------------

                 Key: QPID-3890
                 URL: https://issues.apache.org/jira/browse/QPID-3890
             Project: Qpid
          Issue Type: Improvement
          Components: C++ Broker
    Affects Versions: 0.16
            Reporter: Ken Giusti
            Assignee: Ken Giusti
            Priority: Minor
             Fix For: Future


The mutex that protects the Queue's message container is held longer than it really should be (e.g. across statistics updates, etc).

Reducing the amount of code executed while the lock is held will improve the performance of the broker in a multi-process environment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org

[jira] [Commented] (QPID-3890) Improve performance by reducing the use of the Queue's message lock

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/QPID-3890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13225220#comment-13225220 ] 

jiraposter@reviews.apache.org commented on QPID-3890:
-----------------------------------------------------



bq.  On 2012-03-07 21:30:28, Andrew Stitcher wrote:
bq.  > This is great work.
bq.  > 
bq.  > The most important thing on my mind though is how do ensure that work like this maintains correctness?
bq.  > 
bq.  > Do we have some rules/guidelines for the use of the Queue structs that we can use to inspect the code to satisfy its correctness? A simple rule like this would be "take the blah lock before capturing or changing the foo state". It looks like we might have had rules like that, but these changes are so broad it's difficult for me to inspect the code and reconstruct what rules now apply (if any).
bq.  > 
bq.  > On another point - I see you've added the use of a RWLock - it's always good to benchmark with a plain mutex before using one of these as they are generally higher overhead and unless the use is "read lock mostly write lock seldom" plain mutexes can win.
bq.  
bq.  Kenneth Giusti wrote:
bq.      Re: locking rules - good point Andrew.  Most (all?) of these changes I made simply by manually reviewing the existing code's locking behavior, and trying to identify what can be safely moved outside the locks.  I'll update this patch with comments that explicitly call out those data members & actions that must hold the lock.
bq.      
bq.      The RWLock - hmm.. that may be an unintentional change: at first I had a separate lock for the queue observer container which was rarely written too.  I abandoned that change and put the queue observers under the queue lock since otherwise the order of events (as seen by the observers) could be re-arranged.
bq.  
bq.  Kenneth Giusti wrote:
bq.      Doh - sorry, replied with way too low Caffiene/Blood ratio.  You're referring to the lock in the listener - yes.  Gordon points out that this may be unsafe, so I need to think about it a bit more.

The order of events as seen by observers can be re-arranged with the patch as it stands (at least those for acquires and dequeues, enqueues would still be ordered correctly wrt each other).

Regarding the listeners access, I think you could fix it by e.g. having an atomic counter that was incremented after pushing the message but before populating notification set, checking that count before trying to acquire the message on the consume side and then rechecking it before adding the consumer to listeners if there was no message. If the number changes between the last two checks you would retry the acquisition.


- Gordon


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4232/#review5691
-----------------------------------------------------------


On 2012-03-07 19:23:36, Kenneth Giusti wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/4232/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-03-07 19:23:36)
bq.  
bq.  
bq.  Review request for qpid, Andrew Stitcher, Alan Conway, Gordon Sim, and Ted Ross.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  In order to improve our parallelism across threads, this patch reduces the scope over which the Queue's message lock is held.
bq.  
bq.  I've still got more testing to do, but the simple test cases below show pretty good performance improvements on my multi-core box.  I'd like to get some early feedback, as there are a lot of little changes to the queue locking that will need vetting.
bq.  
bq.  As far as performance is concerned - I've run the following tests on an 8 core Xeon X5550 2.67Ghz box.  I used qpid-send and qpid-receive to generate traffic.  All traffic generated in parallel.
bq.  All numbers are in messages/second - sorted by best overall throughput.
bq.  
bq.  Test 1: Funnel Test (shared queue, 2 producers, 1 consumer)
bq.  
bq.  test setup and execute info:
bq.  qpidd --auth no -p 8888 --no-data-dir --worker-threads=3
bq.  qpid-config -b $SRC_HOST add queue srcQ1 --max-queue-size=1200000000 --max-queue-count=4000000 --flow-stop-size=0 --flow-stop-count=0
bq.  qpid-receive -b $SRC_HOST -a srcQ1 -f -m 4000000 --capacity 2000 --ack-frequency 1000 --print-content no --report-total &
bq.  qpid-send -b $SRC_HOST -a srcQ1 -m 2000000 --content-size 300 --capacity 2000 --report-total --sequence no --timestamp no &
bq.  qpid-send -b $SRC_HOST -a srcQ1 -m 2000000 --content-size 300 --capacity 2000 --report-total --sequence no --timestamp no &
bq.  
bq.  trunk:                        patched:
bq.  tp(m/s)                       tp(m/s)
bq.  
bq.  S1    :: S2    :: R1          S1    :: S2    :: R1
bq.  73144 :: 72524 :: 112914      91120 :: 83278 :: 133730
bq.  70299 :: 68878 :: 110905      91079 :: 87418 :: 133649
bq.  70002 :: 69882 :: 110241      83957 :: 83600 :: 131617
bq.  70523 :: 68681 :: 108372      83911 :: 82845 :: 128513
bq.  68135 :: 68022 :: 107949      85999 :: 81863 :: 125575
bq.  
bq.  
bq.  Test 2: Quad Client Shared Test (1 queue, 2 senders x 2 Receivers)
bq.  
bq.  test setup and execute info:
bq.  qpidd --auth no -p 8888 --no-data-dir --worker-threads=4
bq.  qpid-config -b $SRC_HOST add queue srcQ1 --max-queue-size=1200000000 --max-queue-count=4000000 --flow-stop-size=0 --flow-stop-count=0
bq.  qpid-receive -b $SRC_HOST -a srcQ1 -f -m 2000000 --capacity 2000 --ack-frequency 1000 --print-content no --report-total &
bq.  qpid-receive -b $SRC_HOST -a srcQ1 -f -m 2000000 --capacity 2000 --ack-frequency 1000 --print-content no --report-total &
bq.  qpid-send -b $SRC_HOST -a srcQ1 -m 2000000 --content-size 300 --capacity 2000 --report-total --sequence no --timestamp no &
bq.  qpid-send -b $SRC_HOST -a srcQ1 -m 2000000 --content-size 300 --capacity 2000 --report-total --sequence no --timestamp no &
bq.  
bq.  trunk:                               patched:
bq.  tp(m/s)                              tp(m/s)
bq.  
bq.  S1    :: S2    :: R1    :: R2        S1    :: S2    :: R1    :: R2
bq.  52826 :: 52364 :: 52386 :: 52275     64826 :: 64780 :: 64811 :: 64638
bq.  51457 :: 51269 :: 51399 :: 51213     64630 :: 64157 :: 64562 :: 64157
bq.  50557 :: 50432 :: 50468 :: 50366     64023 :: 63832 :: 64092 :: 63748
bq.  
bq.  **Note: pre-patch, qpidd ran steadily at about 270% cpu during this test.  With the patch, qpidd's cpu utilization was steady at 300%
bq.  
bq.  
bq.  Test 3: Federation Funnel (two federated brokers, 2 producers upstream, 1 consumer downstream)
bq.  
bq.  test setup and execute info:
bq.  qpidd --auth no -p 8888 --no-data-dir --worker-threads=3 -d
bq.  qpidd --auth no -p 9999 --no-data-dir --worker-threads=2 -d
bq.  qpid-config -b $SRC_HOST add queue srcQ1 --max-queue-size=600000000  --max-queue-count=2000000 --flow-stop-size=0 --flow-stop-count=0
bq.  qpid-config -b $SRC_HOST add queue srcQ2 --max-queue-size=600000000  --max-queue-count=2000000 --flow-stop-size=0 --flow-stop-count=0
bq.  qpid-config -b $DST_HOST add queue dstQ1 --max-queue-size=1200000000 --max-queue-count=4000000 --flow-stop-size=0 --flow-stop-count=0
bq.  qpid-config -b $DST_HOST add exchange fanout dstX1
bq.  qpid-config -b $DST_HOST bind dstX1 dstQ1
bq.  qpid-route --ack=1000 queue add $DST_HOST $SRC_HOST dstX1 srcQ1
bq.  qpid-route --ack=1000 queue add $DST_HOST $SRC_HOST dstX1 srcQ2
bq.  qpid-receive -b $DST_HOST -a dstQ1 -f -m 4000000 --capacity 2000 --ack-frequency 1000 --print-content no --report-total &
bq.  qpid-send -b $SRC_HOST -a srcQ1 -m 2000000 --content-size 300 --capacity 2000 --report-total --sequence no --timestamp no &
bq.  qpid-send -b $SRC_HOST -a srcQ2 -m 2000000 --content-size 300 --capacity 2000 --report-total --sequence no --timestamp no &
bq.  
bq.  
bq.  trunk:                        patched:
bq.  tp(m/s)                       tp(m/s)
bq.  
bq.  S1    :: S2    :: R1          S1    :: S2    :: R1
bq.  86150 :: 84146 :: 87665       90877 :: 88534 :: 108418
bq.  89188 :: 88319 :: 82469       87014 :: 86753 :: 107386
bq.  88221 :: 85298 :: 82455       89790 :: 88573 :: 104119
bq.  
bq.  Still TBD:
bq.    - fix message group errors
bq.    - verify/fix cluster
bq.  
bq.  
bq.  This addresses bug qpid-3890.
bq.      https://issues.apache.org/jira/browse/qpid-3890
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    /trunk/qpid/cpp/src/qpid/broker/Queue.h 1297185 
bq.    /trunk/qpid/cpp/src/qpid/broker/Queue.cpp 1297185 
bq.    /trunk/qpid/cpp/src/qpid/broker/QueueListeners.h 1297185 
bq.    /trunk/qpid/cpp/src/qpid/broker/QueueListeners.cpp 1297185 
bq.  
bq.  Diff: https://reviews.apache.org/r/4232/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  unit test (non-cluster)
bq.  performance tests as described above.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Kenneth
bq.  
bq.


                
> Improve performance by reducing the use of the Queue's message lock
> -------------------------------------------------------------------
>
>                 Key: QPID-3890
>                 URL: https://issues.apache.org/jira/browse/QPID-3890
>             Project: Qpid
>          Issue Type: Improvement
>          Components: C++ Broker
>    Affects Versions: 0.16
>            Reporter: Ken Giusti
>            Assignee: Ken Giusti
>            Priority: Minor
>             Fix For: Future
>
>
> The mutex that protects the Queue's message container is held longer than it really should be (e.g. across statistics updates, etc).
> Reducing the amount of code executed while the lock is held will improve the performance of the broker in a multi-process environment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org

[jira] [Commented] (QPID-3890) Improve performance by reducing the use of the Queue's message lock

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/QPID-3890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13225235#comment-13225235 ] 

jiraposter@reviews.apache.org commented on QPID-3890:
-----------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4232/#review5723
-----------------------------------------------------------


Looks good overall, some questions & comments in line.


/trunk/qpid/cpp/src/qpid/broker/Queue.h
<https://reviews.apache.org/r/4232/#comment12458>

    Need to make it clear what state each lock covers. My preference would be to actually make 3 structs, each with a lock and the state that lock covers. At least we should indicate by comments.



/trunk/qpid/cpp/src/qpid/broker/Queue.h
<https://reviews.apache.org/r/4232/#comment12457>

    I prefer the ScopedLock& convention over the LH convention but I guess that's cosmetic.



/trunk/qpid/cpp/src/qpid/broker/Queue.cpp
<https://reviews.apache.org/r/4232/#comment12464>

    why did this change from consumerLock?



/trunk/qpid/cpp/src/qpid/broker/Queue.cpp
<https://reviews.apache.org/r/4232/#comment12465>

    Does the snapshot have to be atomic with setting deleted = true?


- Alan


On 2012-03-07 19:23:36, Kenneth Giusti wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/4232/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-03-07 19:23:36)
bq.  
bq.  
bq.  Review request for qpid, Andrew Stitcher, Alan Conway, Gordon Sim, and Ted Ross.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  In order to improve our parallelism across threads, this patch reduces the scope over which the Queue's message lock is held.
bq.  
bq.  I've still got more testing to do, but the simple test cases below show pretty good performance improvements on my multi-core box.  I'd like to get some early feedback, as there are a lot of little changes to the queue locking that will need vetting.
bq.  
bq.  As far as performance is concerned - I've run the following tests on an 8 core Xeon X5550 2.67Ghz box.  I used qpid-send and qpid-receive to generate traffic.  All traffic generated in parallel.
bq.  All numbers are in messages/second - sorted by best overall throughput.
bq.  
bq.  Test 1: Funnel Test (shared queue, 2 producers, 1 consumer)
bq.  
bq.  test setup and execute info:
bq.  qpidd --auth no -p 8888 --no-data-dir --worker-threads=3
bq.  qpid-config -b $SRC_HOST add queue srcQ1 --max-queue-size=1200000000 --max-queue-count=4000000 --flow-stop-size=0 --flow-stop-count=0
bq.  qpid-receive -b $SRC_HOST -a srcQ1 -f -m 4000000 --capacity 2000 --ack-frequency 1000 --print-content no --report-total &
bq.  qpid-send -b $SRC_HOST -a srcQ1 -m 2000000 --content-size 300 --capacity 2000 --report-total --sequence no --timestamp no &
bq.  qpid-send -b $SRC_HOST -a srcQ1 -m 2000000 --content-size 300 --capacity 2000 --report-total --sequence no --timestamp no &
bq.  
bq.  trunk:                        patched:
bq.  tp(m/s)                       tp(m/s)
bq.  
bq.  S1    :: S2    :: R1          S1    :: S2    :: R1
bq.  73144 :: 72524 :: 112914      91120 :: 83278 :: 133730
bq.  70299 :: 68878 :: 110905      91079 :: 87418 :: 133649
bq.  70002 :: 69882 :: 110241      83957 :: 83600 :: 131617
bq.  70523 :: 68681 :: 108372      83911 :: 82845 :: 128513
bq.  68135 :: 68022 :: 107949      85999 :: 81863 :: 125575
bq.  
bq.  
bq.  Test 2: Quad Client Shared Test (1 queue, 2 senders x 2 Receivers)
bq.  
bq.  test setup and execute info:
bq.  qpidd --auth no -p 8888 --no-data-dir --worker-threads=4
bq.  qpid-config -b $SRC_HOST add queue srcQ1 --max-queue-size=1200000000 --max-queue-count=4000000 --flow-stop-size=0 --flow-stop-count=0
bq.  qpid-receive -b $SRC_HOST -a srcQ1 -f -m 2000000 --capacity 2000 --ack-frequency 1000 --print-content no --report-total &
bq.  qpid-receive -b $SRC_HOST -a srcQ1 -f -m 2000000 --capacity 2000 --ack-frequency 1000 --print-content no --report-total &
bq.  qpid-send -b $SRC_HOST -a srcQ1 -m 2000000 --content-size 300 --capacity 2000 --report-total --sequence no --timestamp no &
bq.  qpid-send -b $SRC_HOST -a srcQ1 -m 2000000 --content-size 300 --capacity 2000 --report-total --sequence no --timestamp no &
bq.  
bq.  trunk:                               patched:
bq.  tp(m/s)                              tp(m/s)
bq.  
bq.  S1    :: S2    :: R1    :: R2        S1    :: S2    :: R1    :: R2
bq.  52826 :: 52364 :: 52386 :: 52275     64826 :: 64780 :: 64811 :: 64638
bq.  51457 :: 51269 :: 51399 :: 51213     64630 :: 64157 :: 64562 :: 64157
bq.  50557 :: 50432 :: 50468 :: 50366     64023 :: 63832 :: 64092 :: 63748
bq.  
bq.  **Note: pre-patch, qpidd ran steadily at about 270% cpu during this test.  With the patch, qpidd's cpu utilization was steady at 300%
bq.  
bq.  
bq.  Test 3: Federation Funnel (two federated brokers, 2 producers upstream, 1 consumer downstream)
bq.  
bq.  test setup and execute info:
bq.  qpidd --auth no -p 8888 --no-data-dir --worker-threads=3 -d
bq.  qpidd --auth no -p 9999 --no-data-dir --worker-threads=2 -d
bq.  qpid-config -b $SRC_HOST add queue srcQ1 --max-queue-size=600000000  --max-queue-count=2000000 --flow-stop-size=0 --flow-stop-count=0
bq.  qpid-config -b $SRC_HOST add queue srcQ2 --max-queue-size=600000000  --max-queue-count=2000000 --flow-stop-size=0 --flow-stop-count=0
bq.  qpid-config -b $DST_HOST add queue dstQ1 --max-queue-size=1200000000 --max-queue-count=4000000 --flow-stop-size=0 --flow-stop-count=0
bq.  qpid-config -b $DST_HOST add exchange fanout dstX1
bq.  qpid-config -b $DST_HOST bind dstX1 dstQ1
bq.  qpid-route --ack=1000 queue add $DST_HOST $SRC_HOST dstX1 srcQ1
bq.  qpid-route --ack=1000 queue add $DST_HOST $SRC_HOST dstX1 srcQ2
bq.  qpid-receive -b $DST_HOST -a dstQ1 -f -m 4000000 --capacity 2000 --ack-frequency 1000 --print-content no --report-total &
bq.  qpid-send -b $SRC_HOST -a srcQ1 -m 2000000 --content-size 300 --capacity 2000 --report-total --sequence no --timestamp no &
bq.  qpid-send -b $SRC_HOST -a srcQ2 -m 2000000 --content-size 300 --capacity 2000 --report-total --sequence no --timestamp no &
bq.  
bq.  
bq.  trunk:                        patched:
bq.  tp(m/s)                       tp(m/s)
bq.  
bq.  S1    :: S2    :: R1          S1    :: S2    :: R1
bq.  86150 :: 84146 :: 87665       90877 :: 88534 :: 108418
bq.  89188 :: 88319 :: 82469       87014 :: 86753 :: 107386
bq.  88221 :: 85298 :: 82455       89790 :: 88573 :: 104119
bq.  
bq.  Still TBD:
bq.    - fix message group errors
bq.    - verify/fix cluster
bq.  
bq.  
bq.  This addresses bug qpid-3890.
bq.      https://issues.apache.org/jira/browse/qpid-3890
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    /trunk/qpid/cpp/src/qpid/broker/Queue.h 1297185 
bq.    /trunk/qpid/cpp/src/qpid/broker/Queue.cpp 1297185 
bq.    /trunk/qpid/cpp/src/qpid/broker/QueueListeners.h 1297185 
bq.    /trunk/qpid/cpp/src/qpid/broker/QueueListeners.cpp 1297185 
bq.  
bq.  Diff: https://reviews.apache.org/r/4232/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  unit test (non-cluster)
bq.  performance tests as described above.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Kenneth
bq.  
bq.


                
> Improve performance by reducing the use of the Queue's message lock
> -------------------------------------------------------------------
>
>                 Key: QPID-3890
>                 URL: https://issues.apache.org/jira/browse/QPID-3890
>             Project: Qpid
>          Issue Type: Improvement
>          Components: C++ Broker
>    Affects Versions: 0.16
>            Reporter: Ken Giusti
>            Assignee: Ken Giusti
>            Priority: Minor
>             Fix For: Future
>
>
> The mutex that protects the Queue's message container is held longer than it really should be (e.g. across statistics updates, etc).
> Reducing the amount of code executed while the lock is held will improve the performance of the broker in a multi-process environment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org

[jira] [Commented] (QPID-3890) Improve performance by reducing the use of the Queue's message lock

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/QPID-3890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13225247#comment-13225247 ] 

jiraposter@reviews.apache.org commented on QPID-3890:
-----------------------------------------------------



bq.  On 2012-03-08 11:00:25, Gordon Sim wrote:
bq.  > /trunk/qpid/cpp/src/qpid/broker/Queue.cpp, line 957
bq.  > <https://reviews.apache.org/r/4232/diff/1/?file=88780#file88780line957>
bq.  >
bq.  >     Switching from the consumerLock to messageLock may simplify the code but obviously won't improve the concurrency. The lock previously used here was dedicated to protected an entirely distinct set of data. However I would agree that there is  in reality likely to be little gained.
bq.  
bq.  Kenneth Giusti wrote:
bq.      This may be overkill on my behalf: The reason for losing the consumerLock and using the messageLock was simply for locking consistency across the observer interface - using the same lock across all observer callouts.  Maybe not necessary, but I believe the observers currently assume the queue is providing locking and that implies using the same lock.

Ah, I get it now. This is part of the addition of two new consumer related observer events? If so, I think the change to lock used is perfectly reasonable, but that may be worth breaking out as a separate patch since it is new functionality. 


- Gordon


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4232/#review5717
-----------------------------------------------------------


On 2012-03-07 19:23:36, Kenneth Giusti wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/4232/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-03-07 19:23:36)
bq.  
bq.  
bq.  Review request for qpid, Andrew Stitcher, Alan Conway, Gordon Sim, and Ted Ross.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  In order to improve our parallelism across threads, this patch reduces the scope over which the Queue's message lock is held.
bq.  
bq.  I've still got more testing to do, but the simple test cases below show pretty good performance improvements on my multi-core box.  I'd like to get some early feedback, as there are a lot of little changes to the queue locking that will need vetting.
bq.  
bq.  As far as performance is concerned - I've run the following tests on an 8 core Xeon X5550 2.67Ghz box.  I used qpid-send and qpid-receive to generate traffic.  All traffic generated in parallel.
bq.  All numbers are in messages/second - sorted by best overall throughput.
bq.  
bq.  Test 1: Funnel Test (shared queue, 2 producers, 1 consumer)
bq.  
bq.  test setup and execute info:
bq.  qpidd --auth no -p 8888 --no-data-dir --worker-threads=3
bq.  qpid-config -b $SRC_HOST add queue srcQ1 --max-queue-size=1200000000 --max-queue-count=4000000 --flow-stop-size=0 --flow-stop-count=0
bq.  qpid-receive -b $SRC_HOST -a srcQ1 -f -m 4000000 --capacity 2000 --ack-frequency 1000 --print-content no --report-total &
bq.  qpid-send -b $SRC_HOST -a srcQ1 -m 2000000 --content-size 300 --capacity 2000 --report-total --sequence no --timestamp no &
bq.  qpid-send -b $SRC_HOST -a srcQ1 -m 2000000 --content-size 300 --capacity 2000 --report-total --sequence no --timestamp no &
bq.  
bq.  trunk:                        patched:
bq.  tp(m/s)                       tp(m/s)
bq.  
bq.  S1    :: S2    :: R1          S1    :: S2    :: R1
bq.  73144 :: 72524 :: 112914      91120 :: 83278 :: 133730
bq.  70299 :: 68878 :: 110905      91079 :: 87418 :: 133649
bq.  70002 :: 69882 :: 110241      83957 :: 83600 :: 131617
bq.  70523 :: 68681 :: 108372      83911 :: 82845 :: 128513
bq.  68135 :: 68022 :: 107949      85999 :: 81863 :: 125575
bq.  
bq.  
bq.  Test 2: Quad Client Shared Test (1 queue, 2 senders x 2 Receivers)
bq.  
bq.  test setup and execute info:
bq.  qpidd --auth no -p 8888 --no-data-dir --worker-threads=4
bq.  qpid-config -b $SRC_HOST add queue srcQ1 --max-queue-size=1200000000 --max-queue-count=4000000 --flow-stop-size=0 --flow-stop-count=0
bq.  qpid-receive -b $SRC_HOST -a srcQ1 -f -m 2000000 --capacity 2000 --ack-frequency 1000 --print-content no --report-total &
bq.  qpid-receive -b $SRC_HOST -a srcQ1 -f -m 2000000 --capacity 2000 --ack-frequency 1000 --print-content no --report-total &
bq.  qpid-send -b $SRC_HOST -a srcQ1 -m 2000000 --content-size 300 --capacity 2000 --report-total --sequence no --timestamp no &
bq.  qpid-send -b $SRC_HOST -a srcQ1 -m 2000000 --content-size 300 --capacity 2000 --report-total --sequence no --timestamp no &
bq.  
bq.  trunk:                               patched:
bq.  tp(m/s)                              tp(m/s)
bq.  
bq.  S1    :: S2    :: R1    :: R2        S1    :: S2    :: R1    :: R2
bq.  52826 :: 52364 :: 52386 :: 52275     64826 :: 64780 :: 64811 :: 64638
bq.  51457 :: 51269 :: 51399 :: 51213     64630 :: 64157 :: 64562 :: 64157
bq.  50557 :: 50432 :: 50468 :: 50366     64023 :: 63832 :: 64092 :: 63748
bq.  
bq.  **Note: pre-patch, qpidd ran steadily at about 270% cpu during this test.  With the patch, qpidd's cpu utilization was steady at 300%
bq.  
bq.  
bq.  Test 3: Federation Funnel (two federated brokers, 2 producers upstream, 1 consumer downstream)
bq.  
bq.  test setup and execute info:
bq.  qpidd --auth no -p 8888 --no-data-dir --worker-threads=3 -d
bq.  qpidd --auth no -p 9999 --no-data-dir --worker-threads=2 -d
bq.  qpid-config -b $SRC_HOST add queue srcQ1 --max-queue-size=600000000  --max-queue-count=2000000 --flow-stop-size=0 --flow-stop-count=0
bq.  qpid-config -b $SRC_HOST add queue srcQ2 --max-queue-size=600000000  --max-queue-count=2000000 --flow-stop-size=0 --flow-stop-count=0
bq.  qpid-config -b $DST_HOST add queue dstQ1 --max-queue-size=1200000000 --max-queue-count=4000000 --flow-stop-size=0 --flow-stop-count=0
bq.  qpid-config -b $DST_HOST add exchange fanout dstX1
bq.  qpid-config -b $DST_HOST bind dstX1 dstQ1
bq.  qpid-route --ack=1000 queue add $DST_HOST $SRC_HOST dstX1 srcQ1
bq.  qpid-route --ack=1000 queue add $DST_HOST $SRC_HOST dstX1 srcQ2
bq.  qpid-receive -b $DST_HOST -a dstQ1 -f -m 4000000 --capacity 2000 --ack-frequency 1000 --print-content no --report-total &
bq.  qpid-send -b $SRC_HOST -a srcQ1 -m 2000000 --content-size 300 --capacity 2000 --report-total --sequence no --timestamp no &
bq.  qpid-send -b $SRC_HOST -a srcQ2 -m 2000000 --content-size 300 --capacity 2000 --report-total --sequence no --timestamp no &
bq.  
bq.  
bq.  trunk:                        patched:
bq.  tp(m/s)                       tp(m/s)
bq.  
bq.  S1    :: S2    :: R1          S1    :: S2    :: R1
bq.  86150 :: 84146 :: 87665       90877 :: 88534 :: 108418
bq.  89188 :: 88319 :: 82469       87014 :: 86753 :: 107386
bq.  88221 :: 85298 :: 82455       89790 :: 88573 :: 104119
bq.  
bq.  Still TBD:
bq.    - fix message group errors
bq.    - verify/fix cluster
bq.  
bq.  
bq.  This addresses bug qpid-3890.
bq.      https://issues.apache.org/jira/browse/qpid-3890
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    /trunk/qpid/cpp/src/qpid/broker/Queue.h 1297185 
bq.    /trunk/qpid/cpp/src/qpid/broker/Queue.cpp 1297185 
bq.    /trunk/qpid/cpp/src/qpid/broker/QueueListeners.h 1297185 
bq.    /trunk/qpid/cpp/src/qpid/broker/QueueListeners.cpp 1297185 
bq.  
bq.  Diff: https://reviews.apache.org/r/4232/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  unit test (non-cluster)
bq.  performance tests as described above.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Kenneth
bq.  
bq.


                
> Improve performance by reducing the use of the Queue's message lock
> -------------------------------------------------------------------
>
>                 Key: QPID-3890
>                 URL: https://issues.apache.org/jira/browse/QPID-3890
>             Project: Qpid
>          Issue Type: Improvement
>          Components: C++ Broker
>    Affects Versions: 0.16
>            Reporter: Ken Giusti
>            Assignee: Ken Giusti
>            Priority: Minor
>             Fix For: Future
>
>
> The mutex that protects the Queue's message container is held longer than it really should be (e.g. across statistics updates, etc).
> Reducing the amount of code executed while the lock is held will improve the performance of the broker in a multi-process environment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org

[jira] [Commented] (QPID-3890) Improve performance by reducing the use of the Queue's message lock

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/QPID-3890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13232723#comment-13232723 ] 

jiraposter@reviews.apache.org commented on QPID-3890:
-----------------------------------------------------



bq.  On 2012-03-08 14:52:43, Alan Conway wrote:
bq.  > /trunk/qpid/cpp/src/qpid/broker/Queue.h, line 148
bq.  > <https://reviews.apache.org/r/4232/diff/1/?file=88779#file88779line148>
bq.  >
bq.  >     I prefer the ScopedLock& convention over the LH convention but I guess that's cosmetic.
bq.  
bq.  Kenneth Giusti wrote:
bq.      What is the "ScopedLock convention"?  Is that when we pass the held lock as an argument?  I like that approach better, since the compiler can enforce it.

Yes. According to the diff above, your patch changed a bunch of observer functions from having a ScopedLock& argument to having LH suffixes. Did you pick up some unintended changes?


- Alan


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4232/#review5723
-----------------------------------------------------------


On 2012-03-07 19:23:36, Kenneth Giusti wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/4232/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-03-07 19:23:36)
bq.  
bq.  
bq.  Review request for qpid, Andrew Stitcher, Alan Conway, Gordon Sim, and Ted Ross.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  In order to improve our parallelism across threads, this patch reduces the scope over which the Queue's message lock is held.
bq.  
bq.  I've still got more testing to do, but the simple test cases below show pretty good performance improvements on my multi-core box.  I'd like to get some early feedback, as there are a lot of little changes to the queue locking that will need vetting.
bq.  
bq.  As far as performance is concerned - I've run the following tests on an 8 core Xeon X5550 2.67Ghz box.  I used qpid-send and qpid-receive to generate traffic.  All traffic generated in parallel.
bq.  All numbers are in messages/second - sorted by best overall throughput.
bq.  
bq.  Test 1: Funnel Test (shared queue, 2 producers, 1 consumer)
bq.  
bq.  test setup and execute info:
bq.  qpidd --auth no -p 8888 --no-data-dir --worker-threads=3
bq.  qpid-config -b $SRC_HOST add queue srcQ1 --max-queue-size=1200000000 --max-queue-count=4000000 --flow-stop-size=0 --flow-stop-count=0
bq.  qpid-receive -b $SRC_HOST -a srcQ1 -f -m 4000000 --capacity 2000 --ack-frequency 1000 --print-content no --report-total &
bq.  qpid-send -b $SRC_HOST -a srcQ1 -m 2000000 --content-size 300 --capacity 2000 --report-total --sequence no --timestamp no &
bq.  qpid-send -b $SRC_HOST -a srcQ1 -m 2000000 --content-size 300 --capacity 2000 --report-total --sequence no --timestamp no &
bq.  
bq.  trunk:                        patched:
bq.  tp(m/s)                       tp(m/s)
bq.  
bq.  S1    :: S2    :: R1          S1    :: S2    :: R1
bq.  73144 :: 72524 :: 112914      91120 :: 83278 :: 133730
bq.  70299 :: 68878 :: 110905      91079 :: 87418 :: 133649
bq.  70002 :: 69882 :: 110241      83957 :: 83600 :: 131617
bq.  70523 :: 68681 :: 108372      83911 :: 82845 :: 128513
bq.  68135 :: 68022 :: 107949      85999 :: 81863 :: 125575
bq.  
bq.  
bq.  Test 2: Quad Client Shared Test (1 queue, 2 senders x 2 Receivers)
bq.  
bq.  test setup and execute info:
bq.  qpidd --auth no -p 8888 --no-data-dir --worker-threads=4
bq.  qpid-config -b $SRC_HOST add queue srcQ1 --max-queue-size=1200000000 --max-queue-count=4000000 --flow-stop-size=0 --flow-stop-count=0
bq.  qpid-receive -b $SRC_HOST -a srcQ1 -f -m 2000000 --capacity 2000 --ack-frequency 1000 --print-content no --report-total &
bq.  qpid-receive -b $SRC_HOST -a srcQ1 -f -m 2000000 --capacity 2000 --ack-frequency 1000 --print-content no --report-total &
bq.  qpid-send -b $SRC_HOST -a srcQ1 -m 2000000 --content-size 300 --capacity 2000 --report-total --sequence no --timestamp no &
bq.  qpid-send -b $SRC_HOST -a srcQ1 -m 2000000 --content-size 300 --capacity 2000 --report-total --sequence no --timestamp no &
bq.  
bq.  trunk:                               patched:
bq.  tp(m/s)                              tp(m/s)
bq.  
bq.  S1    :: S2    :: R1    :: R2        S1    :: S2    :: R1    :: R2
bq.  52826 :: 52364 :: 52386 :: 52275     64826 :: 64780 :: 64811 :: 64638
bq.  51457 :: 51269 :: 51399 :: 51213     64630 :: 64157 :: 64562 :: 64157
bq.  50557 :: 50432 :: 50468 :: 50366     64023 :: 63832 :: 64092 :: 63748
bq.  
bq.  **Note: pre-patch, qpidd ran steadily at about 270% cpu during this test.  With the patch, qpidd's cpu utilization was steady at 300%
bq.  
bq.  
bq.  Test 3: Federation Funnel (two federated brokers, 2 producers upstream, 1 consumer downstream)
bq.  
bq.  test setup and execute info:
bq.  qpidd --auth no -p 8888 --no-data-dir --worker-threads=3 -d
bq.  qpidd --auth no -p 9999 --no-data-dir --worker-threads=2 -d
bq.  qpid-config -b $SRC_HOST add queue srcQ1 --max-queue-size=600000000  --max-queue-count=2000000 --flow-stop-size=0 --flow-stop-count=0
bq.  qpid-config -b $SRC_HOST add queue srcQ2 --max-queue-size=600000000  --max-queue-count=2000000 --flow-stop-size=0 --flow-stop-count=0
bq.  qpid-config -b $DST_HOST add queue dstQ1 --max-queue-size=1200000000 --max-queue-count=4000000 --flow-stop-size=0 --flow-stop-count=0
bq.  qpid-config -b $DST_HOST add exchange fanout dstX1
bq.  qpid-config -b $DST_HOST bind dstX1 dstQ1
bq.  qpid-route --ack=1000 queue add $DST_HOST $SRC_HOST dstX1 srcQ1
bq.  qpid-route --ack=1000 queue add $DST_HOST $SRC_HOST dstX1 srcQ2
bq.  qpid-receive -b $DST_HOST -a dstQ1 -f -m 4000000 --capacity 2000 --ack-frequency 1000 --print-content no --report-total &
bq.  qpid-send -b $SRC_HOST -a srcQ1 -m 2000000 --content-size 300 --capacity 2000 --report-total --sequence no --timestamp no &
bq.  qpid-send -b $SRC_HOST -a srcQ2 -m 2000000 --content-size 300 --capacity 2000 --report-total --sequence no --timestamp no &
bq.  
bq.  
bq.  trunk:                        patched:
bq.  tp(m/s)                       tp(m/s)
bq.  
bq.  S1    :: S2    :: R1          S1    :: S2    :: R1
bq.  86150 :: 84146 :: 87665       90877 :: 88534 :: 108418
bq.  89188 :: 88319 :: 82469       87014 :: 86753 :: 107386
bq.  88221 :: 85298 :: 82455       89790 :: 88573 :: 104119
bq.  
bq.  Still TBD:
bq.    - fix message group errors
bq.    - verify/fix cluster
bq.  
bq.  
bq.  This addresses bug qpid-3890.
bq.      https://issues.apache.org/jira/browse/qpid-3890
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    /trunk/qpid/cpp/src/qpid/broker/Queue.h 1297185 
bq.    /trunk/qpid/cpp/src/qpid/broker/Queue.cpp 1297185 
bq.    /trunk/qpid/cpp/src/qpid/broker/QueueListeners.h 1297185 
bq.    /trunk/qpid/cpp/src/qpid/broker/QueueListeners.cpp 1297185 
bq.  
bq.  Diff: https://reviews.apache.org/r/4232/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  unit test (non-cluster)
bq.  performance tests as described above.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Kenneth
bq.  
bq.


                
> Improve performance by reducing the use of the Queue's message lock
> -------------------------------------------------------------------
>
>                 Key: QPID-3890
>                 URL: https://issues.apache.org/jira/browse/QPID-3890
>             Project: Qpid
>          Issue Type: Improvement
>          Components: C++ Broker
>    Affects Versions: 0.16
>            Reporter: Ken Giusti
>            Assignee: Ken Giusti
>            Priority: Minor
>             Fix For: Future
>
>
> The mutex that protects the Queue's message container is held longer than it really should be (e.g. across statistics updates, etc).
> Reducing the amount of code executed while the lock is held will improve the performance of the broker in a multi-process environment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org

[jira] [Commented] (QPID-3890) Improve performance by reducing the use of the Queue's message lock

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/QPID-3890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13225243#comment-13225243 ] 

jiraposter@reviews.apache.org commented on QPID-3890:
-----------------------------------------------------



bq.  On 2012-03-08 14:52:43, Alan Conway wrote:
bq.  > /trunk/qpid/cpp/src/qpid/broker/Queue.h, line 148
bq.  > <https://reviews.apache.org/r/4232/diff/1/?file=88779#file88779line148>
bq.  >
bq.  >     I prefer the ScopedLock& convention over the LH convention but I guess that's cosmetic.

What is the "ScopedLock convention"?  Is that when we pass the held lock as an argument?  I like that approach better, since the compiler can enforce it.


bq.  On 2012-03-08 14:52:43, Alan Conway wrote:
bq.  > /trunk/qpid/cpp/src/qpid/broker/Queue.cpp, line 537
bq.  > <https://reviews.apache.org/r/4232/diff/1/?file=88780#file88780line537>
bq.  >
bq.  >     why did this change from consumerLock?

See above re: Gordon's comment.


bq.  On 2012-03-08 14:52:43, Alan Conway wrote:
bq.  > /trunk/qpid/cpp/src/qpid/broker/Queue.cpp, line 1414
bq.  > <https://reviews.apache.org/r/4232/diff/1/?file=88780#file88780line1414>
bq.  >
bq.  >     Does the snapshot have to be atomic with setting deleted = true?

I'm going to try putting the listener back under the lock - I doubt moving it out gave us much at all, and seems racey.


- Kenneth


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4232/#review5723
-----------------------------------------------------------


On 2012-03-07 19:23:36, Kenneth Giusti wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/4232/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-03-07 19:23:36)
bq.  
bq.  
bq.  Review request for qpid, Andrew Stitcher, Alan Conway, Gordon Sim, and Ted Ross.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  In order to improve our parallelism across threads, this patch reduces the scope over which the Queue's message lock is held.
bq.  
bq.  I've still got more testing to do, but the simple test cases below show pretty good performance improvements on my multi-core box.  I'd like to get some early feedback, as there are a lot of little changes to the queue locking that will need vetting.
bq.  
bq.  As far as performance is concerned - I've run the following tests on an 8 core Xeon X5550 2.67Ghz box.  I used qpid-send and qpid-receive to generate traffic.  All traffic generated in parallel.
bq.  All numbers are in messages/second - sorted by best overall throughput.
bq.  
bq.  Test 1: Funnel Test (shared queue, 2 producers, 1 consumer)
bq.  
bq.  test setup and execute info:
bq.  qpidd --auth no -p 8888 --no-data-dir --worker-threads=3
bq.  qpid-config -b $SRC_HOST add queue srcQ1 --max-queue-size=1200000000 --max-queue-count=4000000 --flow-stop-size=0 --flow-stop-count=0
bq.  qpid-receive -b $SRC_HOST -a srcQ1 -f -m 4000000 --capacity 2000 --ack-frequency 1000 --print-content no --report-total &
bq.  qpid-send -b $SRC_HOST -a srcQ1 -m 2000000 --content-size 300 --capacity 2000 --report-total --sequence no --timestamp no &
bq.  qpid-send -b $SRC_HOST -a srcQ1 -m 2000000 --content-size 300 --capacity 2000 --report-total --sequence no --timestamp no &
bq.  
bq.  trunk:                        patched:
bq.  tp(m/s)                       tp(m/s)
bq.  
bq.  S1    :: S2    :: R1          S1    :: S2    :: R1
bq.  73144 :: 72524 :: 112914      91120 :: 83278 :: 133730
bq.  70299 :: 68878 :: 110905      91079 :: 87418 :: 133649
bq.  70002 :: 69882 :: 110241      83957 :: 83600 :: 131617
bq.  70523 :: 68681 :: 108372      83911 :: 82845 :: 128513
bq.  68135 :: 68022 :: 107949      85999 :: 81863 :: 125575
bq.  
bq.  
bq.  Test 2: Quad Client Shared Test (1 queue, 2 senders x 2 Receivers)
bq.  
bq.  test setup and execute info:
bq.  qpidd --auth no -p 8888 --no-data-dir --worker-threads=4
bq.  qpid-config -b $SRC_HOST add queue srcQ1 --max-queue-size=1200000000 --max-queue-count=4000000 --flow-stop-size=0 --flow-stop-count=0
bq.  qpid-receive -b $SRC_HOST -a srcQ1 -f -m 2000000 --capacity 2000 --ack-frequency 1000 --print-content no --report-total &
bq.  qpid-receive -b $SRC_HOST -a srcQ1 -f -m 2000000 --capacity 2000 --ack-frequency 1000 --print-content no --report-total &
bq.  qpid-send -b $SRC_HOST -a srcQ1 -m 2000000 --content-size 300 --capacity 2000 --report-total --sequence no --timestamp no &
bq.  qpid-send -b $SRC_HOST -a srcQ1 -m 2000000 --content-size 300 --capacity 2000 --report-total --sequence no --timestamp no &
bq.  
bq.  trunk:                               patched:
bq.  tp(m/s)                              tp(m/s)
bq.  
bq.  S1    :: S2    :: R1    :: R2        S1    :: S2    :: R1    :: R2
bq.  52826 :: 52364 :: 52386 :: 52275     64826 :: 64780 :: 64811 :: 64638
bq.  51457 :: 51269 :: 51399 :: 51213     64630 :: 64157 :: 64562 :: 64157
bq.  50557 :: 50432 :: 50468 :: 50366     64023 :: 63832 :: 64092 :: 63748
bq.  
bq.  **Note: pre-patch, qpidd ran steadily at about 270% cpu during this test.  With the patch, qpidd's cpu utilization was steady at 300%
bq.  
bq.  
bq.  Test 3: Federation Funnel (two federated brokers, 2 producers upstream, 1 consumer downstream)
bq.  
bq.  test setup and execute info:
bq.  qpidd --auth no -p 8888 --no-data-dir --worker-threads=3 -d
bq.  qpidd --auth no -p 9999 --no-data-dir --worker-threads=2 -d
bq.  qpid-config -b $SRC_HOST add queue srcQ1 --max-queue-size=600000000  --max-queue-count=2000000 --flow-stop-size=0 --flow-stop-count=0
bq.  qpid-config -b $SRC_HOST add queue srcQ2 --max-queue-size=600000000  --max-queue-count=2000000 --flow-stop-size=0 --flow-stop-count=0
bq.  qpid-config -b $DST_HOST add queue dstQ1 --max-queue-size=1200000000 --max-queue-count=4000000 --flow-stop-size=0 --flow-stop-count=0
bq.  qpid-config -b $DST_HOST add exchange fanout dstX1
bq.  qpid-config -b $DST_HOST bind dstX1 dstQ1
bq.  qpid-route --ack=1000 queue add $DST_HOST $SRC_HOST dstX1 srcQ1
bq.  qpid-route --ack=1000 queue add $DST_HOST $SRC_HOST dstX1 srcQ2
bq.  qpid-receive -b $DST_HOST -a dstQ1 -f -m 4000000 --capacity 2000 --ack-frequency 1000 --print-content no --report-total &
bq.  qpid-send -b $SRC_HOST -a srcQ1 -m 2000000 --content-size 300 --capacity 2000 --report-total --sequence no --timestamp no &
bq.  qpid-send -b $SRC_HOST -a srcQ2 -m 2000000 --content-size 300 --capacity 2000 --report-total --sequence no --timestamp no &
bq.  
bq.  
bq.  trunk:                        patched:
bq.  tp(m/s)                       tp(m/s)
bq.  
bq.  S1    :: S2    :: R1          S1    :: S2    :: R1
bq.  86150 :: 84146 :: 87665       90877 :: 88534 :: 108418
bq.  89188 :: 88319 :: 82469       87014 :: 86753 :: 107386
bq.  88221 :: 85298 :: 82455       89790 :: 88573 :: 104119
bq.  
bq.  Still TBD:
bq.    - fix message group errors
bq.    - verify/fix cluster
bq.  
bq.  
bq.  This addresses bug qpid-3890.
bq.      https://issues.apache.org/jira/browse/qpid-3890
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    /trunk/qpid/cpp/src/qpid/broker/Queue.h 1297185 
bq.    /trunk/qpid/cpp/src/qpid/broker/Queue.cpp 1297185 
bq.    /trunk/qpid/cpp/src/qpid/broker/QueueListeners.h 1297185 
bq.    /trunk/qpid/cpp/src/qpid/broker/QueueListeners.cpp 1297185 
bq.  
bq.  Diff: https://reviews.apache.org/r/4232/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  unit test (non-cluster)
bq.  performance tests as described above.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Kenneth
bq.  
bq.


                
> Improve performance by reducing the use of the Queue's message lock
> -------------------------------------------------------------------
>
>                 Key: QPID-3890
>                 URL: https://issues.apache.org/jira/browse/QPID-3890
>             Project: Qpid
>          Issue Type: Improvement
>          Components: C++ Broker
>    Affects Versions: 0.16
>            Reporter: Ken Giusti
>            Assignee: Ken Giusti
>            Priority: Minor
>             Fix For: Future
>
>
> The mutex that protects the Queue's message container is held longer than it really should be (e.g. across statistics updates, etc).
> Reducing the amount of code executed while the lock is held will improve the performance of the broker in a multi-process environment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org

[jira] [Commented] (QPID-3890) Improve performance by reducing the use of the Queue's message lock

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/QPID-3890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13232841#comment-13232841 ] 

jiraposter@reviews.apache.org commented on QPID-3890:
-----------------------------------------------------



bq.  On 2012-03-08 14:52:43, Alan Conway wrote:
bq.  > /trunk/qpid/cpp/src/qpid/broker/Queue.h, line 148
bq.  > <https://reviews.apache.org/r/4232/diff/1/?file=88779#file88779line148>
bq.  >
bq.  >     I prefer the ScopedLock& convention over the LH convention but I guess that's cosmetic.
bq.  
bq.  Kenneth Giusti wrote:
bq.      What is the "ScopedLock convention"?  Is that when we pass the held lock as an argument?  I like that approach better, since the compiler can enforce it.
bq.  
bq.  Alan Conway wrote:
bq.      Yes. According to the diff above, your patch changed a bunch of observer functions from having a ScopedLock& argument to having LH suffixes. Did you pick up some unintended changes?

Oops!  At one point I was trying to limit the lock within the observe* call, which introduced ordering issues.  I'll revert back to the original - thanks.


- Kenneth


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4232/#review5723
-----------------------------------------------------------


On 2012-03-07 19:23:36, Kenneth Giusti wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/4232/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-03-07 19:23:36)
bq.  
bq.  
bq.  Review request for qpid, Andrew Stitcher, Alan Conway, Gordon Sim, and Ted Ross.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  In order to improve our parallelism across threads, this patch reduces the scope over which the Queue's message lock is held.
bq.  
bq.  I've still got more testing to do, but the simple test cases below show pretty good performance improvements on my multi-core box.  I'd like to get some early feedback, as there are a lot of little changes to the queue locking that will need vetting.
bq.  
bq.  As far as performance is concerned - I've run the following tests on an 8 core Xeon X5550 2.67Ghz box.  I used qpid-send and qpid-receive to generate traffic.  All traffic generated in parallel.
bq.  All numbers are in messages/second - sorted by best overall throughput.
bq.  
bq.  Test 1: Funnel Test (shared queue, 2 producers, 1 consumer)
bq.  
bq.  test setup and execute info:
bq.  qpidd --auth no -p 8888 --no-data-dir --worker-threads=3
bq.  qpid-config -b $SRC_HOST add queue srcQ1 --max-queue-size=1200000000 --max-queue-count=4000000 --flow-stop-size=0 --flow-stop-count=0
bq.  qpid-receive -b $SRC_HOST -a srcQ1 -f -m 4000000 --capacity 2000 --ack-frequency 1000 --print-content no --report-total &
bq.  qpid-send -b $SRC_HOST -a srcQ1 -m 2000000 --content-size 300 --capacity 2000 --report-total --sequence no --timestamp no &
bq.  qpid-send -b $SRC_HOST -a srcQ1 -m 2000000 --content-size 300 --capacity 2000 --report-total --sequence no --timestamp no &
bq.  
bq.  trunk:                        patched:
bq.  tp(m/s)                       tp(m/s)
bq.  
bq.  S1    :: S2    :: R1          S1    :: S2    :: R1
bq.  73144 :: 72524 :: 112914      91120 :: 83278 :: 133730
bq.  70299 :: 68878 :: 110905      91079 :: 87418 :: 133649
bq.  70002 :: 69882 :: 110241      83957 :: 83600 :: 131617
bq.  70523 :: 68681 :: 108372      83911 :: 82845 :: 128513
bq.  68135 :: 68022 :: 107949      85999 :: 81863 :: 125575
bq.  
bq.  
bq.  Test 2: Quad Client Shared Test (1 queue, 2 senders x 2 Receivers)
bq.  
bq.  test setup and execute info:
bq.  qpidd --auth no -p 8888 --no-data-dir --worker-threads=4
bq.  qpid-config -b $SRC_HOST add queue srcQ1 --max-queue-size=1200000000 --max-queue-count=4000000 --flow-stop-size=0 --flow-stop-count=0
bq.  qpid-receive -b $SRC_HOST -a srcQ1 -f -m 2000000 --capacity 2000 --ack-frequency 1000 --print-content no --report-total &
bq.  qpid-receive -b $SRC_HOST -a srcQ1 -f -m 2000000 --capacity 2000 --ack-frequency 1000 --print-content no --report-total &
bq.  qpid-send -b $SRC_HOST -a srcQ1 -m 2000000 --content-size 300 --capacity 2000 --report-total --sequence no --timestamp no &
bq.  qpid-send -b $SRC_HOST -a srcQ1 -m 2000000 --content-size 300 --capacity 2000 --report-total --sequence no --timestamp no &
bq.  
bq.  trunk:                               patched:
bq.  tp(m/s)                              tp(m/s)
bq.  
bq.  S1    :: S2    :: R1    :: R2        S1    :: S2    :: R1    :: R2
bq.  52826 :: 52364 :: 52386 :: 52275     64826 :: 64780 :: 64811 :: 64638
bq.  51457 :: 51269 :: 51399 :: 51213     64630 :: 64157 :: 64562 :: 64157
bq.  50557 :: 50432 :: 50468 :: 50366     64023 :: 63832 :: 64092 :: 63748
bq.  
bq.  **Note: pre-patch, qpidd ran steadily at about 270% cpu during this test.  With the patch, qpidd's cpu utilization was steady at 300%
bq.  
bq.  
bq.  Test 3: Federation Funnel (two federated brokers, 2 producers upstream, 1 consumer downstream)
bq.  
bq.  test setup and execute info:
bq.  qpidd --auth no -p 8888 --no-data-dir --worker-threads=3 -d
bq.  qpidd --auth no -p 9999 --no-data-dir --worker-threads=2 -d
bq.  qpid-config -b $SRC_HOST add queue srcQ1 --max-queue-size=600000000  --max-queue-count=2000000 --flow-stop-size=0 --flow-stop-count=0
bq.  qpid-config -b $SRC_HOST add queue srcQ2 --max-queue-size=600000000  --max-queue-count=2000000 --flow-stop-size=0 --flow-stop-count=0
bq.  qpid-config -b $DST_HOST add queue dstQ1 --max-queue-size=1200000000 --max-queue-count=4000000 --flow-stop-size=0 --flow-stop-count=0
bq.  qpid-config -b $DST_HOST add exchange fanout dstX1
bq.  qpid-config -b $DST_HOST bind dstX1 dstQ1
bq.  qpid-route --ack=1000 queue add $DST_HOST $SRC_HOST dstX1 srcQ1
bq.  qpid-route --ack=1000 queue add $DST_HOST $SRC_HOST dstX1 srcQ2
bq.  qpid-receive -b $DST_HOST -a dstQ1 -f -m 4000000 --capacity 2000 --ack-frequency 1000 --print-content no --report-total &
bq.  qpid-send -b $SRC_HOST -a srcQ1 -m 2000000 --content-size 300 --capacity 2000 --report-total --sequence no --timestamp no &
bq.  qpid-send -b $SRC_HOST -a srcQ2 -m 2000000 --content-size 300 --capacity 2000 --report-total --sequence no --timestamp no &
bq.  
bq.  
bq.  trunk:                        patched:
bq.  tp(m/s)                       tp(m/s)
bq.  
bq.  S1    :: S2    :: R1          S1    :: S2    :: R1
bq.  86150 :: 84146 :: 87665       90877 :: 88534 :: 108418
bq.  89188 :: 88319 :: 82469       87014 :: 86753 :: 107386
bq.  88221 :: 85298 :: 82455       89790 :: 88573 :: 104119
bq.  
bq.  Still TBD:
bq.    - fix message group errors
bq.    - verify/fix cluster
bq.  
bq.  
bq.  This addresses bug qpid-3890.
bq.      https://issues.apache.org/jira/browse/qpid-3890
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    /trunk/qpid/cpp/src/qpid/broker/Queue.h 1297185 
bq.    /trunk/qpid/cpp/src/qpid/broker/Queue.cpp 1297185 
bq.    /trunk/qpid/cpp/src/qpid/broker/QueueListeners.h 1297185 
bq.    /trunk/qpid/cpp/src/qpid/broker/QueueListeners.cpp 1297185 
bq.  
bq.  Diff: https://reviews.apache.org/r/4232/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  unit test (non-cluster)
bq.  performance tests as described above.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Kenneth
bq.  
bq.


                
> Improve performance by reducing the use of the Queue's message lock
> -------------------------------------------------------------------
>
>                 Key: QPID-3890
>                 URL: https://issues.apache.org/jira/browse/QPID-3890
>             Project: Qpid
>          Issue Type: Improvement
>          Components: C++ Broker
>    Affects Versions: 0.16
>            Reporter: Ken Giusti
>            Assignee: Ken Giusti
>            Priority: Minor
>             Fix For: Future
>
>
> The mutex that protects the Queue's message container is held longer than it really should be (e.g. across statistics updates, etc).
> Reducing the amount of code executed while the lock is held will improve the performance of the broker in a multi-process environment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org

[jira] [Commented] (QPID-3890) Improve performance by reducing the use of the Queue's message lock

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/QPID-3890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13225249#comment-13225249 ] 

jiraposter@reviews.apache.org commented on QPID-3890:
-----------------------------------------------------



bq.  On 2012-03-07 21:30:28, Andrew Stitcher wrote:
bq.  > This is great work.
bq.  > 
bq.  > The most important thing on my mind though is how do ensure that work like this maintains correctness?
bq.  > 
bq.  > Do we have some rules/guidelines for the use of the Queue structs that we can use to inspect the code to satisfy its correctness? A simple rule like this would be "take the blah lock before capturing or changing the foo state". It looks like we might have had rules like that, but these changes are so broad it's difficult for me to inspect the code and reconstruct what rules now apply (if any).
bq.  > 
bq.  > On another point - I see you've added the use of a RWLock - it's always good to benchmark with a plain mutex before using one of these as they are generally higher overhead and unless the use is "read lock mostly write lock seldom" plain mutexes can win.
bq.  
bq.  Kenneth Giusti wrote:
bq.      Re: locking rules - good point Andrew.  Most (all?) of these changes I made simply by manually reviewing the existing code's locking behavior, and trying to identify what can be safely moved outside the locks.  I'll update this patch with comments that explicitly call out those data members & actions that must hold the lock.
bq.      
bq.      The RWLock - hmm.. that may be an unintentional change: at first I had a separate lock for the queue observer container which was rarely written too.  I abandoned that change and put the queue observers under the queue lock since otherwise the order of events (as seen by the observers) could be re-arranged.
bq.  
bq.  Kenneth Giusti wrote:
bq.      Doh - sorry, replied with way too low Caffiene/Blood ratio.  You're referring to the lock in the listener - yes.  Gordon points out that this may be unsafe, so I need to think about it a bit more.
bq.  
bq.  Gordon Sim wrote:
bq.      The order of events as seen by observers can be re-arranged with the patch as it stands (at least those for acquires and dequeues, enqueues would still be ordered correctly wrt each other).
bq.      
bq.      Regarding the listeners access, I think you could fix it by e.g. having an atomic counter that was incremented after pushing the message but before populating notification set, checking that count before trying to acquire the message on the consume side and then rechecking it before adding the consumer to listeners if there was no message. If the number changes between the last two checks you would retry the acquisition.
bq.  
bq.  Kenneth Giusti wrote:
bq.      Re: listeners - makes sense.  I thought the consistency of the listener could be decoupled from the queue, but clearly that doesn't make sense (and probably buys us very little to begin with).
bq.      
bq.      Re: ordering of events as seen by observers.  I want to guarantee that the order of events are preserved WRT the observers - otherwise stuff will break as it stands (like flow control, which relies on correct ordering).  Gordon - can you explain the acquire/dequeues reordering case?  Does this patch introduce that reordering, or has that been possible all along?

My mistake, I misread the patch, sorry! I think you are right that the ordering is preserved.


- Gordon


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4232/#review5691
-----------------------------------------------------------


On 2012-03-07 19:23:36, Kenneth Giusti wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/4232/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-03-07 19:23:36)
bq.  
bq.  
bq.  Review request for qpid, Andrew Stitcher, Alan Conway, Gordon Sim, and Ted Ross.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  In order to improve our parallelism across threads, this patch reduces the scope over which the Queue's message lock is held.
bq.  
bq.  I've still got more testing to do, but the simple test cases below show pretty good performance improvements on my multi-core box.  I'd like to get some early feedback, as there are a lot of little changes to the queue locking that will need vetting.
bq.  
bq.  As far as performance is concerned - I've run the following tests on an 8 core Xeon X5550 2.67Ghz box.  I used qpid-send and qpid-receive to generate traffic.  All traffic generated in parallel.
bq.  All numbers are in messages/second - sorted by best overall throughput.
bq.  
bq.  Test 1: Funnel Test (shared queue, 2 producers, 1 consumer)
bq.  
bq.  test setup and execute info:
bq.  qpidd --auth no -p 8888 --no-data-dir --worker-threads=3
bq.  qpid-config -b $SRC_HOST add queue srcQ1 --max-queue-size=1200000000 --max-queue-count=4000000 --flow-stop-size=0 --flow-stop-count=0
bq.  qpid-receive -b $SRC_HOST -a srcQ1 -f -m 4000000 --capacity 2000 --ack-frequency 1000 --print-content no --report-total &
bq.  qpid-send -b $SRC_HOST -a srcQ1 -m 2000000 --content-size 300 --capacity 2000 --report-total --sequence no --timestamp no &
bq.  qpid-send -b $SRC_HOST -a srcQ1 -m 2000000 --content-size 300 --capacity 2000 --report-total --sequence no --timestamp no &
bq.  
bq.  trunk:                        patched:
bq.  tp(m/s)                       tp(m/s)
bq.  
bq.  S1    :: S2    :: R1          S1    :: S2    :: R1
bq.  73144 :: 72524 :: 112914      91120 :: 83278 :: 133730
bq.  70299 :: 68878 :: 110905      91079 :: 87418 :: 133649
bq.  70002 :: 69882 :: 110241      83957 :: 83600 :: 131617
bq.  70523 :: 68681 :: 108372      83911 :: 82845 :: 128513
bq.  68135 :: 68022 :: 107949      85999 :: 81863 :: 125575
bq.  
bq.  
bq.  Test 2: Quad Client Shared Test (1 queue, 2 senders x 2 Receivers)
bq.  
bq.  test setup and execute info:
bq.  qpidd --auth no -p 8888 --no-data-dir --worker-threads=4
bq.  qpid-config -b $SRC_HOST add queue srcQ1 --max-queue-size=1200000000 --max-queue-count=4000000 --flow-stop-size=0 --flow-stop-count=0
bq.  qpid-receive -b $SRC_HOST -a srcQ1 -f -m 2000000 --capacity 2000 --ack-frequency 1000 --print-content no --report-total &
bq.  qpid-receive -b $SRC_HOST -a srcQ1 -f -m 2000000 --capacity 2000 --ack-frequency 1000 --print-content no --report-total &
bq.  qpid-send -b $SRC_HOST -a srcQ1 -m 2000000 --content-size 300 --capacity 2000 --report-total --sequence no --timestamp no &
bq.  qpid-send -b $SRC_HOST -a srcQ1 -m 2000000 --content-size 300 --capacity 2000 --report-total --sequence no --timestamp no &
bq.  
bq.  trunk:                               patched:
bq.  tp(m/s)                              tp(m/s)
bq.  
bq.  S1    :: S2    :: R1    :: R2        S1    :: S2    :: R1    :: R2
bq.  52826 :: 52364 :: 52386 :: 52275     64826 :: 64780 :: 64811 :: 64638
bq.  51457 :: 51269 :: 51399 :: 51213     64630 :: 64157 :: 64562 :: 64157
bq.  50557 :: 50432 :: 50468 :: 50366     64023 :: 63832 :: 64092 :: 63748
bq.  
bq.  **Note: pre-patch, qpidd ran steadily at about 270% cpu during this test.  With the patch, qpidd's cpu utilization was steady at 300%
bq.  
bq.  
bq.  Test 3: Federation Funnel (two federated brokers, 2 producers upstream, 1 consumer downstream)
bq.  
bq.  test setup and execute info:
bq.  qpidd --auth no -p 8888 --no-data-dir --worker-threads=3 -d
bq.  qpidd --auth no -p 9999 --no-data-dir --worker-threads=2 -d
bq.  qpid-config -b $SRC_HOST add queue srcQ1 --max-queue-size=600000000  --max-queue-count=2000000 --flow-stop-size=0 --flow-stop-count=0
bq.  qpid-config -b $SRC_HOST add queue srcQ2 --max-queue-size=600000000  --max-queue-count=2000000 --flow-stop-size=0 --flow-stop-count=0
bq.  qpid-config -b $DST_HOST add queue dstQ1 --max-queue-size=1200000000 --max-queue-count=4000000 --flow-stop-size=0 --flow-stop-count=0
bq.  qpid-config -b $DST_HOST add exchange fanout dstX1
bq.  qpid-config -b $DST_HOST bind dstX1 dstQ1
bq.  qpid-route --ack=1000 queue add $DST_HOST $SRC_HOST dstX1 srcQ1
bq.  qpid-route --ack=1000 queue add $DST_HOST $SRC_HOST dstX1 srcQ2
bq.  qpid-receive -b $DST_HOST -a dstQ1 -f -m 4000000 --capacity 2000 --ack-frequency 1000 --print-content no --report-total &
bq.  qpid-send -b $SRC_HOST -a srcQ1 -m 2000000 --content-size 300 --capacity 2000 --report-total --sequence no --timestamp no &
bq.  qpid-send -b $SRC_HOST -a srcQ2 -m 2000000 --content-size 300 --capacity 2000 --report-total --sequence no --timestamp no &
bq.  
bq.  
bq.  trunk:                        patched:
bq.  tp(m/s)                       tp(m/s)
bq.  
bq.  S1    :: S2    :: R1          S1    :: S2    :: R1
bq.  86150 :: 84146 :: 87665       90877 :: 88534 :: 108418
bq.  89188 :: 88319 :: 82469       87014 :: 86753 :: 107386
bq.  88221 :: 85298 :: 82455       89790 :: 88573 :: 104119
bq.  
bq.  Still TBD:
bq.    - fix message group errors
bq.    - verify/fix cluster
bq.  
bq.  
bq.  This addresses bug qpid-3890.
bq.      https://issues.apache.org/jira/browse/qpid-3890
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    /trunk/qpid/cpp/src/qpid/broker/Queue.h 1297185 
bq.    /trunk/qpid/cpp/src/qpid/broker/Queue.cpp 1297185 
bq.    /trunk/qpid/cpp/src/qpid/broker/QueueListeners.h 1297185 
bq.    /trunk/qpid/cpp/src/qpid/broker/QueueListeners.cpp 1297185 
bq.  
bq.  Diff: https://reviews.apache.org/r/4232/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  unit test (non-cluster)
bq.  performance tests as described above.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Kenneth
bq.  
bq.


                
> Improve performance by reducing the use of the Queue's message lock
> -------------------------------------------------------------------
>
>                 Key: QPID-3890
>                 URL: https://issues.apache.org/jira/browse/QPID-3890
>             Project: Qpid
>          Issue Type: Improvement
>          Components: C++ Broker
>    Affects Versions: 0.16
>            Reporter: Ken Giusti
>            Assignee: Ken Giusti
>            Priority: Minor
>             Fix For: Future
>
>
> The mutex that protects the Queue's message container is held longer than it really should be (e.g. across statistics updates, etc).
> Reducing the amount of code executed while the lock is held will improve the performance of the broker in a multi-process environment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org

[jira] [Commented] (QPID-3890) Improve performance by reducing the use of the Queue's message lock

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/QPID-3890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13225130#comment-13225130 ] 

jiraposter@reviews.apache.org commented on QPID-3890:
-----------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4232/#review5717
-----------------------------------------------------------


The numbers are encouraging! As mentioned below I think this still needs some work to be safe however.


/trunk/qpid/cpp/src/qpid/broker/Queue.cpp
<https://reviews.apache.org/r/4232/#comment12449>

    As it stands, I believe this is unsafe. If you have an empty queue, with a concurrent consumeNextMessage() from a consumer with a push() from a publisher then it is possible the consumer does not get a message *and* does not get notified that there is one.
    
    Moving the listeners.addListener() and the listeners.populate() (from push()) outside the lock scope means that the consuming thread can fail to get a message from the allocator, the publishing thread then adds a message and before the consuming thread adds the consumer to the set of listeners, the publishing thread populates its set (which will be empty) for notification.
    
    Thus the message remains on the queue without being delivered to an interested consumer until something else happens to retrigger a notification.



/trunk/qpid/cpp/src/qpid/broker/Queue.cpp
<https://reviews.apache.org/r/4232/#comment12451>

    Switching from the consumerLock to messageLock may simplify the code but obviously won't improve the concurrency. The lock previously used here was dedicated to protected an entirely distinct set of data. However I would agree that there is  in reality likely to be little gained.



/trunk/qpid/cpp/src/qpid/broker/Queue.cpp
<https://reviews.apache.org/r/4232/#comment12450>

    This looks like a subtle change to the logic that is perhaps unintentional? Previously if ctxt was non null the isEnqueued() test would still be executed and if false, the method would return. That test is now skipped for the transactional case.
    
    (If I'm right then a transactional dequeue attempt on a persistent message that has been removed say due to a ring queue policy limit being reached will result in a second dequeue attempt on the store that will most likely fail.)


- Gordon


On 2012-03-07 19:23:36, Kenneth Giusti wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/4232/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-03-07 19:23:36)
bq.  
bq.  
bq.  Review request for qpid, Andrew Stitcher, Alan Conway, Gordon Sim, and Ted Ross.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  In order to improve our parallelism across threads, this patch reduces the scope over which the Queue's message lock is held.
bq.  
bq.  I've still got more testing to do, but the simple test cases below show pretty good performance improvements on my multi-core box.  I'd like to get some early feedback, as there are a lot of little changes to the queue locking that will need vetting.
bq.  
bq.  As far as performance is concerned - I've run the following tests on an 8 core Xeon X5550 2.67Ghz box.  I used qpid-send and qpid-receive to generate traffic.  All traffic generated in parallel.
bq.  All numbers are in messages/second - sorted by best overall throughput.
bq.  
bq.  Test 1: Funnel Test (shared queue, 2 producers, 1 consumer)
bq.  
bq.  test setup and execute info:
bq.  qpidd --auth no -p 8888 --no-data-dir --worker-threads=3
bq.  qpid-config -b $SRC_HOST add queue srcQ1 --max-queue-size=1200000000 --max-queue-count=4000000 --flow-stop-size=0 --flow-stop-count=0
bq.  qpid-receive -b $SRC_HOST -a srcQ1 -f -m 4000000 --capacity 2000 --ack-frequency 1000 --print-content no --report-total &
bq.  qpid-send -b $SRC_HOST -a srcQ1 -m 2000000 --content-size 300 --capacity 2000 --report-total --sequence no --timestamp no &
bq.  qpid-send -b $SRC_HOST -a srcQ1 -m 2000000 --content-size 300 --capacity 2000 --report-total --sequence no --timestamp no &
bq.  
bq.  trunk:                        patched:
bq.  tp(m/s)                       tp(m/s)
bq.  
bq.  S1    :: S2    :: R1          S1    :: S2    :: R1
bq.  73144 :: 72524 :: 112914      91120 :: 83278 :: 133730
bq.  70299 :: 68878 :: 110905      91079 :: 87418 :: 133649
bq.  70002 :: 69882 :: 110241      83957 :: 83600 :: 131617
bq.  70523 :: 68681 :: 108372      83911 :: 82845 :: 128513
bq.  68135 :: 68022 :: 107949      85999 :: 81863 :: 125575
bq.  
bq.  
bq.  Test 2: Quad Client Shared Test (1 queue, 2 senders x 2 Receivers)
bq.  
bq.  test setup and execute info:
bq.  qpidd --auth no -p 8888 --no-data-dir --worker-threads=4
bq.  qpid-config -b $SRC_HOST add queue srcQ1 --max-queue-size=1200000000 --max-queue-count=4000000 --flow-stop-size=0 --flow-stop-count=0
bq.  qpid-receive -b $SRC_HOST -a srcQ1 -f -m 2000000 --capacity 2000 --ack-frequency 1000 --print-content no --report-total &
bq.  qpid-receive -b $SRC_HOST -a srcQ1 -f -m 2000000 --capacity 2000 --ack-frequency 1000 --print-content no --report-total &
bq.  qpid-send -b $SRC_HOST -a srcQ1 -m 2000000 --content-size 300 --capacity 2000 --report-total --sequence no --timestamp no &
bq.  qpid-send -b $SRC_HOST -a srcQ1 -m 2000000 --content-size 300 --capacity 2000 --report-total --sequence no --timestamp no &
bq.  
bq.  trunk:                               patched:
bq.  tp(m/s)                              tp(m/s)
bq.  
bq.  S1    :: S2    :: R1    :: R2        S1    :: S2    :: R1    :: R2
bq.  52826 :: 52364 :: 52386 :: 52275     64826 :: 64780 :: 64811 :: 64638
bq.  51457 :: 51269 :: 51399 :: 51213     64630 :: 64157 :: 64562 :: 64157
bq.  50557 :: 50432 :: 50468 :: 50366     64023 :: 63832 :: 64092 :: 63748
bq.  
bq.  **Note: pre-patch, qpidd ran steadily at about 270% cpu during this test.  With the patch, qpidd's cpu utilization was steady at 300%
bq.  
bq.  
bq.  Test 3: Federation Funnel (two federated brokers, 2 producers upstream, 1 consumer downstream)
bq.  
bq.  test setup and execute info:
bq.  qpidd --auth no -p 8888 --no-data-dir --worker-threads=3 -d
bq.  qpidd --auth no -p 9999 --no-data-dir --worker-threads=2 -d
bq.  qpid-config -b $SRC_HOST add queue srcQ1 --max-queue-size=600000000  --max-queue-count=2000000 --flow-stop-size=0 --flow-stop-count=0
bq.  qpid-config -b $SRC_HOST add queue srcQ2 --max-queue-size=600000000  --max-queue-count=2000000 --flow-stop-size=0 --flow-stop-count=0
bq.  qpid-config -b $DST_HOST add queue dstQ1 --max-queue-size=1200000000 --max-queue-count=4000000 --flow-stop-size=0 --flow-stop-count=0
bq.  qpid-config -b $DST_HOST add exchange fanout dstX1
bq.  qpid-config -b $DST_HOST bind dstX1 dstQ1
bq.  qpid-route --ack=1000 queue add $DST_HOST $SRC_HOST dstX1 srcQ1
bq.  qpid-route --ack=1000 queue add $DST_HOST $SRC_HOST dstX1 srcQ2
bq.  qpid-receive -b $DST_HOST -a dstQ1 -f -m 4000000 --capacity 2000 --ack-frequency 1000 --print-content no --report-total &
bq.  qpid-send -b $SRC_HOST -a srcQ1 -m 2000000 --content-size 300 --capacity 2000 --report-total --sequence no --timestamp no &
bq.  qpid-send -b $SRC_HOST -a srcQ2 -m 2000000 --content-size 300 --capacity 2000 --report-total --sequence no --timestamp no &
bq.  
bq.  
bq.  trunk:                        patched:
bq.  tp(m/s)                       tp(m/s)
bq.  
bq.  S1    :: S2    :: R1          S1    :: S2    :: R1
bq.  86150 :: 84146 :: 87665       90877 :: 88534 :: 108418
bq.  89188 :: 88319 :: 82469       87014 :: 86753 :: 107386
bq.  88221 :: 85298 :: 82455       89790 :: 88573 :: 104119
bq.  
bq.  Still TBD:
bq.    - fix message group errors
bq.    - verify/fix cluster
bq.  
bq.  
bq.  This addresses bug qpid-3890.
bq.      https://issues.apache.org/jira/browse/qpid-3890
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    /trunk/qpid/cpp/src/qpid/broker/Queue.h 1297185 
bq.    /trunk/qpid/cpp/src/qpid/broker/Queue.cpp 1297185 
bq.    /trunk/qpid/cpp/src/qpid/broker/QueueListeners.h 1297185 
bq.    /trunk/qpid/cpp/src/qpid/broker/QueueListeners.cpp 1297185 
bq.  
bq.  Diff: https://reviews.apache.org/r/4232/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  unit test (non-cluster)
bq.  performance tests as described above.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Kenneth
bq.  
bq.


                
> Improve performance by reducing the use of the Queue's message lock
> -------------------------------------------------------------------
>
>                 Key: QPID-3890
>                 URL: https://issues.apache.org/jira/browse/QPID-3890
>             Project: Qpid
>          Issue Type: Improvement
>          Components: C++ Broker
>    Affects Versions: 0.16
>            Reporter: Ken Giusti
>            Assignee: Ken Giusti
>            Priority: Minor
>             Fix For: Future
>
>
> The mutex that protects the Queue's message container is held longer than it really should be (e.g. across statistics updates, etc).
> Reducing the amount of code executed while the lock is held will improve the performance of the broker in a multi-process environment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org

[jira] [Commented] (QPID-3890) Improve performance by reducing the use of the Queue's message lock

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/QPID-3890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13225218#comment-13225218 ] 

jiraposter@reviews.apache.org commented on QPID-3890:
-----------------------------------------------------



bq.  On 2012-03-07 21:30:28, Andrew Stitcher wrote:
bq.  > This is great work.
bq.  > 
bq.  > The most important thing on my mind though is how do ensure that work like this maintains correctness?
bq.  > 
bq.  > Do we have some rules/guidelines for the use of the Queue structs that we can use to inspect the code to satisfy its correctness? A simple rule like this would be "take the blah lock before capturing or changing the foo state". It looks like we might have had rules like that, but these changes are so broad it's difficult for me to inspect the code and reconstruct what rules now apply (if any).
bq.  > 
bq.  > On another point - I see you've added the use of a RWLock - it's always good to benchmark with a plain mutex before using one of these as they are generally higher overhead and unless the use is "read lock mostly write lock seldom" plain mutexes can win.
bq.  
bq.  Kenneth Giusti wrote:
bq.      Re: locking rules - good point Andrew.  Most (all?) of these changes I made simply by manually reviewing the existing code's locking behavior, and trying to identify what can be safely moved outside the locks.  I'll update this patch with comments that explicitly call out those data members & actions that must hold the lock.
bq.      
bq.      The RWLock - hmm.. that may be an unintentional change: at first I had a separate lock for the queue observer container which was rarely written too.  I abandoned that change and put the queue observers under the queue lock since otherwise the order of events (as seen by the observers) could be re-arranged.

Doh - sorry, replied with way too low Caffiene/Blood ratio.  You're referring to the lock in the listener - yes.  Gordon points out that this may be unsafe, so I need to think about it a bit more.


- Kenneth


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4232/#review5691
-----------------------------------------------------------


On 2012-03-07 19:23:36, Kenneth Giusti wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/4232/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-03-07 19:23:36)
bq.  
bq.  
bq.  Review request for qpid, Andrew Stitcher, Alan Conway, Gordon Sim, and Ted Ross.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  In order to improve our parallelism across threads, this patch reduces the scope over which the Queue's message lock is held.
bq.  
bq.  I've still got more testing to do, but the simple test cases below show pretty good performance improvements on my multi-core box.  I'd like to get some early feedback, as there are a lot of little changes to the queue locking that will need vetting.
bq.  
bq.  As far as performance is concerned - I've run the following tests on an 8 core Xeon X5550 2.67Ghz box.  I used qpid-send and qpid-receive to generate traffic.  All traffic generated in parallel.
bq.  All numbers are in messages/second - sorted by best overall throughput.
bq.  
bq.  Test 1: Funnel Test (shared queue, 2 producers, 1 consumer)
bq.  
bq.  test setup and execute info:
bq.  qpidd --auth no -p 8888 --no-data-dir --worker-threads=3
bq.  qpid-config -b $SRC_HOST add queue srcQ1 --max-queue-size=1200000000 --max-queue-count=4000000 --flow-stop-size=0 --flow-stop-count=0
bq.  qpid-receive -b $SRC_HOST -a srcQ1 -f -m 4000000 --capacity 2000 --ack-frequency 1000 --print-content no --report-total &
bq.  qpid-send -b $SRC_HOST -a srcQ1 -m 2000000 --content-size 300 --capacity 2000 --report-total --sequence no --timestamp no &
bq.  qpid-send -b $SRC_HOST -a srcQ1 -m 2000000 --content-size 300 --capacity 2000 --report-total --sequence no --timestamp no &
bq.  
bq.  trunk:                        patched:
bq.  tp(m/s)                       tp(m/s)
bq.  
bq.  S1    :: S2    :: R1          S1    :: S2    :: R1
bq.  73144 :: 72524 :: 112914      91120 :: 83278 :: 133730
bq.  70299 :: 68878 :: 110905      91079 :: 87418 :: 133649
bq.  70002 :: 69882 :: 110241      83957 :: 83600 :: 131617
bq.  70523 :: 68681 :: 108372      83911 :: 82845 :: 128513
bq.  68135 :: 68022 :: 107949      85999 :: 81863 :: 125575
bq.  
bq.  
bq.  Test 2: Quad Client Shared Test (1 queue, 2 senders x 2 Receivers)
bq.  
bq.  test setup and execute info:
bq.  qpidd --auth no -p 8888 --no-data-dir --worker-threads=4
bq.  qpid-config -b $SRC_HOST add queue srcQ1 --max-queue-size=1200000000 --max-queue-count=4000000 --flow-stop-size=0 --flow-stop-count=0
bq.  qpid-receive -b $SRC_HOST -a srcQ1 -f -m 2000000 --capacity 2000 --ack-frequency 1000 --print-content no --report-total &
bq.  qpid-receive -b $SRC_HOST -a srcQ1 -f -m 2000000 --capacity 2000 --ack-frequency 1000 --print-content no --report-total &
bq.  qpid-send -b $SRC_HOST -a srcQ1 -m 2000000 --content-size 300 --capacity 2000 --report-total --sequence no --timestamp no &
bq.  qpid-send -b $SRC_HOST -a srcQ1 -m 2000000 --content-size 300 --capacity 2000 --report-total --sequence no --timestamp no &
bq.  
bq.  trunk:                               patched:
bq.  tp(m/s)                              tp(m/s)
bq.  
bq.  S1    :: S2    :: R1    :: R2        S1    :: S2    :: R1    :: R2
bq.  52826 :: 52364 :: 52386 :: 52275     64826 :: 64780 :: 64811 :: 64638
bq.  51457 :: 51269 :: 51399 :: 51213     64630 :: 64157 :: 64562 :: 64157
bq.  50557 :: 50432 :: 50468 :: 50366     64023 :: 63832 :: 64092 :: 63748
bq.  
bq.  **Note: pre-patch, qpidd ran steadily at about 270% cpu during this test.  With the patch, qpidd's cpu utilization was steady at 300%
bq.  
bq.  
bq.  Test 3: Federation Funnel (two federated brokers, 2 producers upstream, 1 consumer downstream)
bq.  
bq.  test setup and execute info:
bq.  qpidd --auth no -p 8888 --no-data-dir --worker-threads=3 -d
bq.  qpidd --auth no -p 9999 --no-data-dir --worker-threads=2 -d
bq.  qpid-config -b $SRC_HOST add queue srcQ1 --max-queue-size=600000000  --max-queue-count=2000000 --flow-stop-size=0 --flow-stop-count=0
bq.  qpid-config -b $SRC_HOST add queue srcQ2 --max-queue-size=600000000  --max-queue-count=2000000 --flow-stop-size=0 --flow-stop-count=0
bq.  qpid-config -b $DST_HOST add queue dstQ1 --max-queue-size=1200000000 --max-queue-count=4000000 --flow-stop-size=0 --flow-stop-count=0
bq.  qpid-config -b $DST_HOST add exchange fanout dstX1
bq.  qpid-config -b $DST_HOST bind dstX1 dstQ1
bq.  qpid-route --ack=1000 queue add $DST_HOST $SRC_HOST dstX1 srcQ1
bq.  qpid-route --ack=1000 queue add $DST_HOST $SRC_HOST dstX1 srcQ2
bq.  qpid-receive -b $DST_HOST -a dstQ1 -f -m 4000000 --capacity 2000 --ack-frequency 1000 --print-content no --report-total &
bq.  qpid-send -b $SRC_HOST -a srcQ1 -m 2000000 --content-size 300 --capacity 2000 --report-total --sequence no --timestamp no &
bq.  qpid-send -b $SRC_HOST -a srcQ2 -m 2000000 --content-size 300 --capacity 2000 --report-total --sequence no --timestamp no &
bq.  
bq.  
bq.  trunk:                        patched:
bq.  tp(m/s)                       tp(m/s)
bq.  
bq.  S1    :: S2    :: R1          S1    :: S2    :: R1
bq.  86150 :: 84146 :: 87665       90877 :: 88534 :: 108418
bq.  89188 :: 88319 :: 82469       87014 :: 86753 :: 107386
bq.  88221 :: 85298 :: 82455       89790 :: 88573 :: 104119
bq.  
bq.  Still TBD:
bq.    - fix message group errors
bq.    - verify/fix cluster
bq.  
bq.  
bq.  This addresses bug qpid-3890.
bq.      https://issues.apache.org/jira/browse/qpid-3890
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    /trunk/qpid/cpp/src/qpid/broker/Queue.h 1297185 
bq.    /trunk/qpid/cpp/src/qpid/broker/Queue.cpp 1297185 
bq.    /trunk/qpid/cpp/src/qpid/broker/QueueListeners.h 1297185 
bq.    /trunk/qpid/cpp/src/qpid/broker/QueueListeners.cpp 1297185 
bq.  
bq.  Diff: https://reviews.apache.org/r/4232/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  unit test (non-cluster)
bq.  performance tests as described above.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Kenneth
bq.  
bq.


                
> Improve performance by reducing the use of the Queue's message lock
> -------------------------------------------------------------------
>
>                 Key: QPID-3890
>                 URL: https://issues.apache.org/jira/browse/QPID-3890
>             Project: Qpid
>          Issue Type: Improvement
>          Components: C++ Broker
>    Affects Versions: 0.16
>            Reporter: Ken Giusti
>            Assignee: Ken Giusti
>            Priority: Minor
>             Fix For: Future
>
>
> The mutex that protects the Queue's message container is held longer than it really should be (e.g. across statistics updates, etc).
> Reducing the amount of code executed while the lock is held will improve the performance of the broker in a multi-process environment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org

[jira] [Commented] (QPID-3890) Improve performance by reducing the use of the Queue's message lock

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/QPID-3890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13225233#comment-13225233 ] 

jiraposter@reviews.apache.org commented on QPID-3890:
-----------------------------------------------------



bq.  On 2012-03-08 11:00:25, Gordon Sim wrote:
bq.  > /trunk/qpid/cpp/src/qpid/broker/Queue.cpp, line 957
bq.  > <https://reviews.apache.org/r/4232/diff/1/?file=88780#file88780line957>
bq.  >
bq.  >     Switching from the consumerLock to messageLock may simplify the code but obviously won't improve the concurrency. The lock previously used here was dedicated to protected an entirely distinct set of data. However I would agree that there is  in reality likely to be little gained.

This may be overkill on my behalf: The reason for losing the consumerLock and using the messageLock was simply for locking consistency across the observer interface - using the same lock across all observer callouts.  Maybe not necessary, but I believe the observers currently assume the queue is providing locking and that implies using the same lock.


bq.  On 2012-03-08 11:00:25, Gordon Sim wrote:
bq.  > /trunk/qpid/cpp/src/qpid/broker/Queue.cpp, line 411
bq.  > <https://reviews.apache.org/r/4232/diff/1/?file=88780#file88780line411>
bq.  >
bq.  >     As it stands, I believe this is unsafe. If you have an empty queue, with a concurrent consumeNextMessage() from a consumer with a push() from a publisher then it is possible the consumer does not get a message *and* does not get notified that there is one.
bq.  >     
bq.  >     Moving the listeners.addListener() and the listeners.populate() (from push()) outside the lock scope means that the consuming thread can fail to get a message from the allocator, the publishing thread then adds a message and before the consuming thread adds the consumer to the set of listeners, the publishing thread populates its set (which will be empty) for notification.
bq.  >     
bq.  >     Thus the message remains on the queue without being delivered to an interested consumer until something else happens to retrigger a notification.

+1000 thanks - I see the problem now.


bq.  On 2012-03-08 11:00:25, Gordon Sim wrote:
bq.  > /trunk/qpid/cpp/src/qpid/broker/Queue.cpp, line 1071
bq.  > <https://reviews.apache.org/r/4232/diff/1/?file=88780#file88780line1071>
bq.  >
bq.  >     This looks like a subtle change to the logic that is perhaps unintentional? Previously if ctxt was non null the isEnqueued() test would still be executed and if false, the method would return. That test is now skipped for the transactional case.
bq.  >     
bq.  >     (If I'm right then a transactional dequeue attempt on a persistent message that has been removed say due to a ring queue policy limit being reached will result in a second dequeue attempt on the store that will most likely fail.)

What the hell did I do?  Thanks for catching that - totally unintended.


- Kenneth


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4232/#review5717
-----------------------------------------------------------


On 2012-03-07 19:23:36, Kenneth Giusti wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/4232/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-03-07 19:23:36)
bq.  
bq.  
bq.  Review request for qpid, Andrew Stitcher, Alan Conway, Gordon Sim, and Ted Ross.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  In order to improve our parallelism across threads, this patch reduces the scope over which the Queue's message lock is held.
bq.  
bq.  I've still got more testing to do, but the simple test cases below show pretty good performance improvements on my multi-core box.  I'd like to get some early feedback, as there are a lot of little changes to the queue locking that will need vetting.
bq.  
bq.  As far as performance is concerned - I've run the following tests on an 8 core Xeon X5550 2.67Ghz box.  I used qpid-send and qpid-receive to generate traffic.  All traffic generated in parallel.
bq.  All numbers are in messages/second - sorted by best overall throughput.
bq.  
bq.  Test 1: Funnel Test (shared queue, 2 producers, 1 consumer)
bq.  
bq.  test setup and execute info:
bq.  qpidd --auth no -p 8888 --no-data-dir --worker-threads=3
bq.  qpid-config -b $SRC_HOST add queue srcQ1 --max-queue-size=1200000000 --max-queue-count=4000000 --flow-stop-size=0 --flow-stop-count=0
bq.  qpid-receive -b $SRC_HOST -a srcQ1 -f -m 4000000 --capacity 2000 --ack-frequency 1000 --print-content no --report-total &
bq.  qpid-send -b $SRC_HOST -a srcQ1 -m 2000000 --content-size 300 --capacity 2000 --report-total --sequence no --timestamp no &
bq.  qpid-send -b $SRC_HOST -a srcQ1 -m 2000000 --content-size 300 --capacity 2000 --report-total --sequence no --timestamp no &
bq.  
bq.  trunk:                        patched:
bq.  tp(m/s)                       tp(m/s)
bq.  
bq.  S1    :: S2    :: R1          S1    :: S2    :: R1
bq.  73144 :: 72524 :: 112914      91120 :: 83278 :: 133730
bq.  70299 :: 68878 :: 110905      91079 :: 87418 :: 133649
bq.  70002 :: 69882 :: 110241      83957 :: 83600 :: 131617
bq.  70523 :: 68681 :: 108372      83911 :: 82845 :: 128513
bq.  68135 :: 68022 :: 107949      85999 :: 81863 :: 125575
bq.  
bq.  
bq.  Test 2: Quad Client Shared Test (1 queue, 2 senders x 2 Receivers)
bq.  
bq.  test setup and execute info:
bq.  qpidd --auth no -p 8888 --no-data-dir --worker-threads=4
bq.  qpid-config -b $SRC_HOST add queue srcQ1 --max-queue-size=1200000000 --max-queue-count=4000000 --flow-stop-size=0 --flow-stop-count=0
bq.  qpid-receive -b $SRC_HOST -a srcQ1 -f -m 2000000 --capacity 2000 --ack-frequency 1000 --print-content no --report-total &
bq.  qpid-receive -b $SRC_HOST -a srcQ1 -f -m 2000000 --capacity 2000 --ack-frequency 1000 --print-content no --report-total &
bq.  qpid-send -b $SRC_HOST -a srcQ1 -m 2000000 --content-size 300 --capacity 2000 --report-total --sequence no --timestamp no &
bq.  qpid-send -b $SRC_HOST -a srcQ1 -m 2000000 --content-size 300 --capacity 2000 --report-total --sequence no --timestamp no &
bq.  
bq.  trunk:                               patched:
bq.  tp(m/s)                              tp(m/s)
bq.  
bq.  S1    :: S2    :: R1    :: R2        S1    :: S2    :: R1    :: R2
bq.  52826 :: 52364 :: 52386 :: 52275     64826 :: 64780 :: 64811 :: 64638
bq.  51457 :: 51269 :: 51399 :: 51213     64630 :: 64157 :: 64562 :: 64157
bq.  50557 :: 50432 :: 50468 :: 50366     64023 :: 63832 :: 64092 :: 63748
bq.  
bq.  **Note: pre-patch, qpidd ran steadily at about 270% cpu during this test.  With the patch, qpidd's cpu utilization was steady at 300%
bq.  
bq.  
bq.  Test 3: Federation Funnel (two federated brokers, 2 producers upstream, 1 consumer downstream)
bq.  
bq.  test setup and execute info:
bq.  qpidd --auth no -p 8888 --no-data-dir --worker-threads=3 -d
bq.  qpidd --auth no -p 9999 --no-data-dir --worker-threads=2 -d
bq.  qpid-config -b $SRC_HOST add queue srcQ1 --max-queue-size=600000000  --max-queue-count=2000000 --flow-stop-size=0 --flow-stop-count=0
bq.  qpid-config -b $SRC_HOST add queue srcQ2 --max-queue-size=600000000  --max-queue-count=2000000 --flow-stop-size=0 --flow-stop-count=0
bq.  qpid-config -b $DST_HOST add queue dstQ1 --max-queue-size=1200000000 --max-queue-count=4000000 --flow-stop-size=0 --flow-stop-count=0
bq.  qpid-config -b $DST_HOST add exchange fanout dstX1
bq.  qpid-config -b $DST_HOST bind dstX1 dstQ1
bq.  qpid-route --ack=1000 queue add $DST_HOST $SRC_HOST dstX1 srcQ1
bq.  qpid-route --ack=1000 queue add $DST_HOST $SRC_HOST dstX1 srcQ2
bq.  qpid-receive -b $DST_HOST -a dstQ1 -f -m 4000000 --capacity 2000 --ack-frequency 1000 --print-content no --report-total &
bq.  qpid-send -b $SRC_HOST -a srcQ1 -m 2000000 --content-size 300 --capacity 2000 --report-total --sequence no --timestamp no &
bq.  qpid-send -b $SRC_HOST -a srcQ2 -m 2000000 --content-size 300 --capacity 2000 --report-total --sequence no --timestamp no &
bq.  
bq.  
bq.  trunk:                        patched:
bq.  tp(m/s)                       tp(m/s)
bq.  
bq.  S1    :: S2    :: R1          S1    :: S2    :: R1
bq.  86150 :: 84146 :: 87665       90877 :: 88534 :: 108418
bq.  89188 :: 88319 :: 82469       87014 :: 86753 :: 107386
bq.  88221 :: 85298 :: 82455       89790 :: 88573 :: 104119
bq.  
bq.  Still TBD:
bq.    - fix message group errors
bq.    - verify/fix cluster
bq.  
bq.  
bq.  This addresses bug qpid-3890.
bq.      https://issues.apache.org/jira/browse/qpid-3890
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    /trunk/qpid/cpp/src/qpid/broker/Queue.h 1297185 
bq.    /trunk/qpid/cpp/src/qpid/broker/Queue.cpp 1297185 
bq.    /trunk/qpid/cpp/src/qpid/broker/QueueListeners.h 1297185 
bq.    /trunk/qpid/cpp/src/qpid/broker/QueueListeners.cpp 1297185 
bq.  
bq.  Diff: https://reviews.apache.org/r/4232/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  unit test (non-cluster)
bq.  performance tests as described above.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Kenneth
bq.  
bq.


                
> Improve performance by reducing the use of the Queue's message lock
> -------------------------------------------------------------------
>
>                 Key: QPID-3890
>                 URL: https://issues.apache.org/jira/browse/QPID-3890
>             Project: Qpid
>          Issue Type: Improvement
>          Components: C++ Broker
>    Affects Versions: 0.16
>            Reporter: Ken Giusti
>            Assignee: Ken Giusti
>            Priority: Minor
>             Fix For: Future
>
>
> The mutex that protects the Queue's message container is held longer than it really should be (e.g. across statistics updates, etc).
> Reducing the amount of code executed while the lock is held will improve the performance of the broker in a multi-process environment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org

[jira] [Commented] (QPID-3890) Improve performance by reducing the use of the Queue's message lock

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/QPID-3890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13233419#comment-13233419 ] 

jiraposter@reviews.apache.org commented on QPID-3890:
-----------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4232/#review6116
-----------------------------------------------------------

Ship it!


I would still like to see a comment in Queue.h that states explicitly which member variables are covered by which locks - while its fresh in your mind :)

- Alan


On 2012-03-19 22:22:42, Kenneth Giusti wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/4232/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-03-19 22:22:42)
bq.  
bq.  
bq.  Review request for qpid, Andrew Stitcher, Alan Conway, Gordon Sim, and Ted Ross.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  In order to improve our parallelism across threads, this patch reduces the scope over which the Queue's message lock is held.
bq.  
bq.  I've still got more testing to do, but the simple test cases below show pretty good performance improvements on my multi-core box.  I'd like to get some early feedback, as there are a lot of little changes to the queue locking that will need vetting.
bq.  
bq.  As far as performance is concerned - I've run the following tests on an 8 core Xeon X5550 2.67Ghz box.  I used qpid-send and qpid-receive to generate traffic.  All traffic generated in parallel.
bq.  All numbers are in messages/second - sorted by best overall throughput.
bq.  
bq.  Test 1: Funnel Test (shared queue, 2 producers, 1 consumer)
bq.  
bq.  test setup and execute info:
bq.  qpidd --auth no -p 8888 --no-data-dir --worker-threads=3
bq.  qpid-config -b $SRC_HOST add queue srcQ1 --max-queue-size=1200000000 --max-queue-count=4000000 --flow-stop-size=0 --flow-stop-count=0
bq.  qpid-receive -b $SRC_HOST -a srcQ1 -f -m 4000000 --capacity 2000 --ack-frequency 1000 --print-content no --report-total &
bq.  qpid-send -b $SRC_HOST -a srcQ1 -m 2000000 --content-size 300 --capacity 2000 --report-total --sequence no --timestamp no &
bq.  qpid-send -b $SRC_HOST -a srcQ1 -m 2000000 --content-size 300 --capacity 2000 --report-total --sequence no --timestamp no &
bq.  
bq.  trunk:                        patched:
bq.  tp(m/s)                       tp(m/s)
bq.  
bq.  S1    :: S2    :: R1          S1    :: S2    :: R1
bq.  73144 :: 72524 :: 112914      91120 :: 83278 :: 133730
bq.  70299 :: 68878 :: 110905      91079 :: 87418 :: 133649
bq.  70002 :: 69882 :: 110241      83957 :: 83600 :: 131617
bq.  70523 :: 68681 :: 108372      83911 :: 82845 :: 128513
bq.  68135 :: 68022 :: 107949      85999 :: 81863 :: 125575
bq.  
bq.  
bq.  Test 2: Quad Client Shared Test (1 queue, 2 senders x 2 Receivers)
bq.  
bq.  test setup and execute info:
bq.  qpidd --auth no -p 8888 --no-data-dir --worker-threads=4
bq.  qpid-config -b $SRC_HOST add queue srcQ1 --max-queue-size=1200000000 --max-queue-count=4000000 --flow-stop-size=0 --flow-stop-count=0
bq.  qpid-receive -b $SRC_HOST -a srcQ1 -f -m 2000000 --capacity 2000 --ack-frequency 1000 --print-content no --report-total &
bq.  qpid-receive -b $SRC_HOST -a srcQ1 -f -m 2000000 --capacity 2000 --ack-frequency 1000 --print-content no --report-total &
bq.  qpid-send -b $SRC_HOST -a srcQ1 -m 2000000 --content-size 300 --capacity 2000 --report-total --sequence no --timestamp no &
bq.  qpid-send -b $SRC_HOST -a srcQ1 -m 2000000 --content-size 300 --capacity 2000 --report-total --sequence no --timestamp no &
bq.  
bq.  trunk:                               patched:
bq.  tp(m/s)                              tp(m/s)
bq.  
bq.  S1    :: S2    :: R1    :: R2        S1    :: S2    :: R1    :: R2
bq.  52826 :: 52364 :: 52386 :: 52275     64826 :: 64780 :: 64811 :: 64638
bq.  51457 :: 51269 :: 51399 :: 51213     64630 :: 64157 :: 64562 :: 64157
bq.  50557 :: 50432 :: 50468 :: 50366     64023 :: 63832 :: 64092 :: 63748
bq.  
bq.  **Note: pre-patch, qpidd ran steadily at about 270% cpu during this test.  With the patch, qpidd's cpu utilization was steady at 300%
bq.  
bq.  
bq.  Test 3: Federation Funnel (two federated brokers, 2 producers upstream, 1 consumer downstream)
bq.  
bq.  test setup and execute info:
bq.  qpidd --auth no -p 8888 --no-data-dir --worker-threads=3 -d
bq.  qpidd --auth no -p 9999 --no-data-dir --worker-threads=2 -d
bq.  qpid-config -b $SRC_HOST add queue srcQ1 --max-queue-size=600000000  --max-queue-count=2000000 --flow-stop-size=0 --flow-stop-count=0
bq.  qpid-config -b $SRC_HOST add queue srcQ2 --max-queue-size=600000000  --max-queue-count=2000000 --flow-stop-size=0 --flow-stop-count=0
bq.  qpid-config -b $DST_HOST add queue dstQ1 --max-queue-size=1200000000 --max-queue-count=4000000 --flow-stop-size=0 --flow-stop-count=0
bq.  qpid-config -b $DST_HOST add exchange fanout dstX1
bq.  qpid-config -b $DST_HOST bind dstX1 dstQ1
bq.  qpid-route --ack=1000 queue add $DST_HOST $SRC_HOST dstX1 srcQ1
bq.  qpid-route --ack=1000 queue add $DST_HOST $SRC_HOST dstX1 srcQ2
bq.  qpid-receive -b $DST_HOST -a dstQ1 -f -m 4000000 --capacity 2000 --ack-frequency 1000 --print-content no --report-total &
bq.  qpid-send -b $SRC_HOST -a srcQ1 -m 2000000 --content-size 300 --capacity 2000 --report-total --sequence no --timestamp no &
bq.  qpid-send -b $SRC_HOST -a srcQ2 -m 2000000 --content-size 300 --capacity 2000 --report-total --sequence no --timestamp no &
bq.  
bq.  
bq.  trunk:                        patched:
bq.  tp(m/s)                       tp(m/s)
bq.  
bq.  S1    :: S2    :: R1          S1    :: S2    :: R1
bq.  86150 :: 84146 :: 87665       90877 :: 88534 :: 108418
bq.  89188 :: 88319 :: 82469       87014 :: 86753 :: 107386
bq.  88221 :: 85298 :: 82455       89790 :: 88573 :: 104119
bq.  
bq.  Still TBD:
bq.    - fix message group errors
bq.    - verify/fix cluster
bq.  
bq.  
bq.  This addresses bug qpid-3890.
bq.      https://issues.apache.org/jira/browse/qpid-3890
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    /trunk/qpid/cpp/src/qpid/broker/Queue.h 1302689 
bq.    /trunk/qpid/cpp/src/qpid/broker/Queue.cpp 1302689 
bq.    /trunk/qpid/cpp/src/qpid/broker/QueueListeners.h 1302689 
bq.    /trunk/qpid/cpp/src/qpid/broker/QueueListeners.cpp 1302689 
bq.  
bq.  Diff: https://reviews.apache.org/r/4232/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  unit test (non-cluster)
bq.  performance tests as described above.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Kenneth
bq.  
bq.


                
> Improve performance by reducing the use of the Queue's message lock
> -------------------------------------------------------------------
>
>                 Key: QPID-3890
>                 URL: https://issues.apache.org/jira/browse/QPID-3890
>             Project: Qpid
>          Issue Type: Improvement
>          Components: C++ Broker
>    Affects Versions: 0.16
>            Reporter: Ken Giusti
>            Assignee: Ken Giusti
>            Priority: Minor
>             Fix For: Future
>
>
> The mutex that protects the Queue's message container is held longer than it really should be (e.g. across statistics updates, etc).
> Reducing the amount of code executed while the lock is held will improve the performance of the broker in a multi-process environment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org

[jira] [Commented] (QPID-3890) Improve performance by reducing the use of the Queue's message lock

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/QPID-3890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13225251#comment-13225251 ] 

jiraposter@reviews.apache.org commented on QPID-3890:
-----------------------------------------------------

bq.  On 2012-03-07 21:30:28, Andrew Stitcher wrote:
bq.  > This is great work.
bq.  > 
bq.  > The most important thing on my mind though is how do ensure that work like this maintains correctness?
bq.  > 
bq.  > Do we have some rules/guidelines for the use of the Queue structs that we can use to inspect the code to satisfy its correctness? A simple rule like this would be "take the blah lock before capturing or changing the foo state". It looks like we might have had rules like that, but these changes are so broad it's difficult for me to inspect the code and reconstruct what rules now apply (if any).
bq.  > 
bq.  > On another point - I see you've added the use of a RWLock - it's always good to benchmark with a plain mutex before using one of these as they are generally higher overhead and unless the use is "read lock mostly write lock seldom" plain mutexes can win.
bq.  
bq.  Kenneth Giusti wrote:
bq.      Re: locking rules - good point Andrew.  Most (all?) of these changes I made simply by manually reviewing the existing code's locking behavior, and trying to identify what can be safely moved outside the locks.  I'll update this patch with comments that explicitly call out those data members & actions that must hold the lock.
bq.      
bq.      The RWLock - hmm.. that may be an unintentional change: at first I had a separate lock for the queue observer container which was rarely written too.  I abandoned that change and put the queue observers under the queue lock since otherwise the order of events (as seen by the observers) could be re-arranged.
bq.  
bq.  Kenneth Giusti wrote:
bq.      Doh - sorry, replied with way too low Caffiene/Blood ratio.  You're referring to the lock in the listener - yes.  Gordon points out that this may be unsafe, so I need to think about it a bit more.
bq.  
bq.  Gordon Sim wrote:
bq.      The order of events as seen by observers can be re-arranged with the patch as it stands (at least those for acquires and dequeues, enqueues would still be ordered correctly wrt each other).
bq.      
bq.      Regarding the listeners access, I think you could fix it by e.g. having an atomic counter that was incremented after pushing the message but before populating notification set, checking that count before trying to acquire the message on the consume side and then rechecking it before adding the consumer to listeners if there was no message. If the number changes between the last two checks you would retry the acquisition.
bq.  
bq.  Kenneth Giusti wrote:
bq.      Re: listeners - makes sense.  I thought the consistency of the listener could be decoupled from the queue, but clearly that doesn't make sense (and probably buys us very little to begin with).
bq.      
bq.      Re: ordering of events as seen by observers.  I want to guarantee that the order of events are preserved WRT the observers - otherwise stuff will break as it stands (like flow control, which relies on correct ordering).  Gordon - can you explain the acquire/dequeues reordering case?  Does this patch introduce that reordering, or has that been possible all along?
bq.  
bq.  Gordon Sim wrote:
bq.      My mistake, I misread the patch, sorry! I think you are right that the ordering is preserved.

re "I thought the consistency of the listener could be decoupled from the queue, but clearly that doesn't make sense (and probably buys us very little to begin with)." I would have thought that this was one of the more significant change, no? (Along with moving the mgmt stats outside the lock).

- Gordon

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4232/#review5691
-----------------------------------------------------------

On 2012-03-07 19:23:36, Kenneth Giusti wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/4232/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-03-07 19:23:36)
bq.  
bq.  
bq.  Review request for qpid, Andrew Stitcher, Alan Conway, Gordon Sim, and Ted Ross.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  In order to improve our parallelism across threads, this patch reduces the scope over which the Queue's message lock is held.
bq.  
bq.  I've still got more testing to do, but the simple test cases below show pretty good performance improvements on my multi-core box.  I'd like to get some early feedback, as there are a lot of little changes to the queue locking that will need vetting.
bq.  
bq.  As far as performance is concerned - I've run the following tests on an 8 core Xeon X5550 2.67Ghz box.  I used qpid-send and qpid-receive to generate traffic.  All traffic generated in parallel.
bq.  All numbers are in messages/second - sorted by best overall throughput.
bq.  
bq.  Test 1: Funnel Test (shared queue, 2 producers, 1 consumer)
bq.  
bq.  test setup and execute info:
bq.  qpidd --auth no -p 8888 --no-data-dir --worker-threads=3
bq.  qpid-config -b $SRC_HOST add queue srcQ1 --max-queue-size=1200000000 --max-queue-count=4000000 --flow-stop-size=0 --flow-stop-count=0
bq.  qpid-receive -b $SRC_HOST -a srcQ1 -f -m 4000000 --capacity 2000 --ack-frequency 1000 --print-content no --report-total &
bq.  qpid-send -b $SRC_HOST -a srcQ1 -m 2000000 --content-size 300 --capacity 2000 --report-total --sequence no --timestamp no &
bq.  qpid-send -b $SRC_HOST -a srcQ1 -m 2000000 --content-size 300 --capacity 2000 --report-total --sequence no --timestamp no &
bq.  
bq.  trunk:                        patched:
bq.  tp(m/s)                       tp(m/s)
bq.  
bq.  S1    :: S2    :: R1          S1    :: S2    :: R1
bq.  73144 :: 72524 :: 112914      91120 :: 83278 :: 133730
bq.  70299 :: 68878 :: 110905      91079 :: 87418 :: 133649
bq.  70002 :: 69882 :: 110241      83957 :: 83600 :: 131617
bq.  70523 :: 68681 :: 108372      83911 :: 82845 :: 128513
bq.  68135 :: 68022 :: 107949      85999 :: 81863 :: 125575
bq.  
bq.  
bq.  Test 2: Quad Client Shared Test (1 queue, 2 senders x 2 Receivers)
bq.  
bq.  test setup and execute info:
bq.  qpidd --auth no -p 8888 --no-data-dir --worker-threads=4
bq.  qpid-config -b $SRC_HOST add queue srcQ1 --max-queue-size=1200000000 --max-queue-count=4000000 --flow-stop-size=0 --flow-stop-count=0
bq.  qpid-receive -b $SRC_HOST -a srcQ1 -f -m 2000000 --capacity 2000 --ack-frequency 1000 --print-content no --report-total &
bq.  qpid-receive -b $SRC_HOST -a srcQ1 -f -m 2000000 --capacity 2000 --ack-frequency 1000 --print-content no --report-total &
bq.  qpid-send -b $SRC_HOST -a srcQ1 -m 2000000 --content-size 300 --capacity 2000 --report-total --sequence no --timestamp no &
bq.  qpid-send -b $SRC_HOST -a srcQ1 -m 2000000 --content-size 300 --capacity 2000 --report-total --sequence no --timestamp no &
bq.  
bq.  trunk:                               patched:
bq.  tp(m/s)                              tp(m/s)
bq.  
bq.  S1    :: S2    :: R1    :: R2        S1    :: S2    :: R1    :: R2
bq.  52826 :: 52364 :: 52386 :: 52275     64826 :: 64780 :: 64811 :: 64638
bq.  51457 :: 51269 :: 51399 :: 51213     64630 :: 64157 :: 64562 :: 64157
bq.  50557 :: 50432 :: 50468 :: 50366     64023 :: 63832 :: 64092 :: 63748
bq.  
bq.  **Note: pre-patch, qpidd ran steadily at about 270% cpu during this test.  With the patch, qpidd's cpu utilization was steady at 300%
bq.  
bq.  
bq.  Test 3: Federation Funnel (two federated brokers, 2 producers upstream, 1 consumer downstream)
bq.  
bq.  test setup and execute info:
bq.  qpidd --auth no -p 8888 --no-data-dir --worker-threads=3 -d
bq.  qpidd --auth no -p 9999 --no-data-dir --worker-threads=2 -d
bq.  qpid-config -b $SRC_HOST add queue srcQ1 --max-queue-size=600000000  --max-queue-count=2000000 --flow-stop-size=0 --flow-stop-count=0
bq.  qpid-config -b $SRC_HOST add queue srcQ2 --max-queue-size=600000000  --max-queue-count=2000000 --flow-stop-size=0 --flow-stop-count=0
bq.  qpid-config -b $DST_HOST add queue dstQ1 --max-queue-size=1200000000 --max-queue-count=4000000 --flow-stop-size=0 --flow-stop-count=0
bq.  qpid-config -b $DST_HOST add exchange fanout dstX1
bq.  qpid-config -b $DST_HOST bind dstX1 dstQ1
bq.  qpid-route --ack=1000 queue add $DST_HOST $SRC_HOST dstX1 srcQ1
bq.  qpid-route --ack=1000 queue add $DST_HOST $SRC_HOST dstX1 srcQ2
bq.  qpid-receive -b $DST_HOST -a dstQ1 -f -m 4000000 --capacity 2000 --ack-frequency 1000 --print-content no --report-total &
bq.  qpid-send -b $SRC_HOST -a srcQ1 -m 2000000 --content-size 300 --capacity 2000 --report-total --sequence no --timestamp no &
bq.  qpid-send -b $SRC_HOST -a srcQ2 -m 2000000 --content-size 300 --capacity 2000 --report-total --sequence no --timestamp no &
bq.  
bq.  
bq.  trunk:                        patched:
bq.  tp(m/s)                       tp(m/s)
bq.  
bq.  S1    :: S2    :: R1          S1    :: S2    :: R1
bq.  86150 :: 84146 :: 87665       90877 :: 88534 :: 108418
bq.  89188 :: 88319 :: 82469       87014 :: 86753 :: 107386
bq.  88221 :: 85298 :: 82455       89790 :: 88573 :: 104119
bq.  
bq.  Still TBD:
bq.    - fix message group errors
bq.    - verify/fix cluster
bq.  
bq.  
bq.  This addresses bug qpid-3890.
bq.      https://issues.apache.org/jira/browse/qpid-3890
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    /trunk/qpid/cpp/src/qpid/broker/Queue.h 1297185 
bq.    /trunk/qpid/cpp/src/qpid/broker/Queue.cpp 1297185 
bq.    /trunk/qpid/cpp/src/qpid/broker/QueueListeners.h 1297185 
bq.    /trunk/qpid/cpp/src/qpid/broker/QueueListeners.cpp 1297185 
bq.  
bq.  Diff: https://reviews.apache.org/r/4232/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  unit test (non-cluster)
bq.  performance tests as described above.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Kenneth
bq.  
bq.

> Improve performance by reducing the use of the Queue's message lock
> -------------------------------------------------------------------
>
>                 Key: QPID-3890
>                 URL: https://issues.apache.org/jira/browse/QPID-3890
>             Project: Qpid
>          Issue Type: Improvement
>          Components: C++ Broker
>    Affects Versions: 0.16
>            Reporter: Ken Giusti
>            Assignee: Ken Giusti
>            Priority: Minor
>             Fix For: Future
>
>
> The mutex that protects the Queue's message container is held longer than it really should be (e.g. across statistics updates, etc).
> Reducing the amount of code executed while the lock is held will improve the performance of the broker in a multi-process environment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org

[jira] [Commented] (QPID-3890) Improve performance by reducing the use of the Queue's message lock

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/QPID-3890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13225398#comment-13225398 ] 

jiraposter@reviews.apache.org commented on QPID-3890:
-----------------------------------------------------



bq.  On 2012-03-07 21:30:28, Andrew Stitcher wrote:
bq.  > This is great work.
bq.  > 
bq.  > The most important thing on my mind though is how do ensure that work like this maintains correctness?
bq.  > 
bq.  > Do we have some rules/guidelines for the use of the Queue structs that we can use to inspect the code to satisfy its correctness? A simple rule like this would be "take the blah lock before capturing or changing the foo state". It looks like we might have had rules like that, but these changes are so broad it's difficult for me to inspect the code and reconstruct what rules now apply (if any).
bq.  > 
bq.  > On another point - I see you've added the use of a RWLock - it's always good to benchmark with a plain mutex before using one of these as they are generally higher overhead and unless the use is "read lock mostly write lock seldom" plain mutexes can win.
bq.  
bq.  Kenneth Giusti wrote:
bq.      Re: locking rules - good point Andrew.  Most (all?) of these changes I made simply by manually reviewing the existing code's locking behavior, and trying to identify what can be safely moved outside the locks.  I'll update this patch with comments that explicitly call out those data members & actions that must hold the lock.
bq.      
bq.      The RWLock - hmm.. that may be an unintentional change: at first I had a separate lock for the queue observer container which was rarely written too.  I abandoned that change and put the queue observers under the queue lock since otherwise the order of events (as seen by the observers) could be re-arranged.
bq.  
bq.  Kenneth Giusti wrote:
bq.      Doh - sorry, replied with way too low Caffiene/Blood ratio.  You're referring to the lock in the listener - yes.  Gordon points out that this may be unsafe, so I need to think about it a bit more.
bq.  
bq.  Gordon Sim wrote:
bq.      The order of events as seen by observers can be re-arranged with the patch as it stands (at least those for acquires and dequeues, enqueues would still be ordered correctly wrt each other).
bq.      
bq.      Regarding the listeners access, I think you could fix it by e.g. having an atomic counter that was incremented after pushing the message but before populating notification set, checking that count before trying to acquire the message on the consume side and then rechecking it before adding the consumer to listeners if there was no message. If the number changes between the last two checks you would retry the acquisition.
bq.  
bq.  Kenneth Giusti wrote:
bq.      Re: listeners - makes sense.  I thought the consistency of the listener could be decoupled from the queue, but clearly that doesn't make sense (and probably buys us very little to begin with).
bq.      
bq.      Re: ordering of events as seen by observers.  I want to guarantee that the order of events are preserved WRT the observers - otherwise stuff will break as it stands (like flow control, which relies on correct ordering).  Gordon - can you explain the acquire/dequeues reordering case?  Does this patch introduce that reordering, or has that been possible all along?
bq.  
bq.  Gordon Sim wrote:
bq.      My mistake, I misread the patch, sorry! I think you are right that the ordering is preserved.
bq.  
bq.  Gordon Sim wrote:
bq.      re "I thought the consistency of the listener could be decoupled from the queue, but clearly that doesn't make sense (and probably buys us very little to begin with)." I would have thought that this was one of the more significant change, no? (Along with moving the mgmt stats outside the lock).

I've moved the listeners.populate() calls back under the lock.  There is a performance degradation, but there's still an overall improvement:

Funnel Test:  (3 worker threads)

trunk:                        patched:                       patch-2 (locked listeners):
tp(m/s)                       tp(m/s)                        tp(m/s)

S1    :: S2    :: R1          S1    :: S2    :: R1          S1    :: S2    :: R1
73144 :: 72524 :: 112914      91120 :: 83278 :: 133730      91857 :: 87240 :: 133878
70299 :: 68878 :: 110905      91079 :: 87418 :: 133649      84184 :: 82794 :: 126980
70002 :: 69882 :: 110241      83957 :: 83600 :: 131617      81846 :: 79640 :: 124192
70523 :: 68681 :: 108372      83911 :: 82845 :: 128513      82942 :: 80433 :: 123311
68135 :: 68022 :: 107949      85999 :: 81863 :: 125575      80533 :: 79706 :: 120317


Quad Shared Test (1 queue, 2 senders x 2 Receivers,  4 worker threads)

trunk:                               patched:                              patch-2 (locked listeners):
tp(m/s)                              tp(m/s)                               tp(m/s)

S1    :: S2    :: R1    :: R2        S1    :: S2    :: R1    :: R2         S1    :: S2    :: R1    :: R2
52826 :: 52364 :: 52386 :: 52275     64826 :: 64780 :: 64811 :: 64638      64184 :: 64170 :: 64247 :: 63992
51457 :: 51269 :: 51399 :: 51213     64630 :: 64157 :: 64562 :: 64157      63589 :: 63524 :: 63628 :: 63428
50557 :: 50432 :: 50468 :: 50366     64023 :: 63832 :: 64092 :: 63748      63718 :: 63685 :: 63772 :: 63555


Federation Funnel (worker threads = 3 upstream, 2 down)

trunk:                        patched:                        patch-2 (locked listeners):
tp(m/s)                       tp(m/s)                         tp(m/s)

S1    :: S2    :: R1          S1    :: S2    :: R1          S1    :: S2    :: R1
86150 :: 84146 :: 87665       90877 :: 88534 :: 108418      90953 :: 90269 :: 105194
89188 :: 88319 :: 82469       87014 :: 86753 :: 107386      86468 :: 85530 :: 100796
88221 :: 85298 :: 82455       89790 :: 88573 :: 104119      89169 :: 88839 ::  92910


- Kenneth


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4232/#review5691
-----------------------------------------------------------


On 2012-03-07 19:23:36, Kenneth Giusti wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/4232/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-03-07 19:23:36)
bq.  
bq.  
bq.  Review request for qpid, Andrew Stitcher, Alan Conway, Gordon Sim, and Ted Ross.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  In order to improve our parallelism across threads, this patch reduces the scope over which the Queue's message lock is held.
bq.  
bq.  I've still got more testing to do, but the simple test cases below show pretty good performance improvements on my multi-core box.  I'd like to get some early feedback, as there are a lot of little changes to the queue locking that will need vetting.
bq.  
bq.  As far as performance is concerned - I've run the following tests on an 8 core Xeon X5550 2.67Ghz box.  I used qpid-send and qpid-receive to generate traffic.  All traffic generated in parallel.
bq.  All numbers are in messages/second - sorted by best overall throughput.
bq.  
bq.  Test 1: Funnel Test (shared queue, 2 producers, 1 consumer)
bq.  
bq.  test setup and execute info:
bq.  qpidd --auth no -p 8888 --no-data-dir --worker-threads=3
bq.  qpid-config -b $SRC_HOST add queue srcQ1 --max-queue-size=1200000000 --max-queue-count=4000000 --flow-stop-size=0 --flow-stop-count=0
bq.  qpid-receive -b $SRC_HOST -a srcQ1 -f -m 4000000 --capacity 2000 --ack-frequency 1000 --print-content no --report-total &
bq.  qpid-send -b $SRC_HOST -a srcQ1 -m 2000000 --content-size 300 --capacity 2000 --report-total --sequence no --timestamp no &
bq.  qpid-send -b $SRC_HOST -a srcQ1 -m 2000000 --content-size 300 --capacity 2000 --report-total --sequence no --timestamp no &
bq.  
bq.  trunk:                        patched:
bq.  tp(m/s)                       tp(m/s)
bq.  
bq.  S1    :: S2    :: R1          S1    :: S2    :: R1
bq.  73144 :: 72524 :: 112914      91120 :: 83278 :: 133730
bq.  70299 :: 68878 :: 110905      91079 :: 87418 :: 133649
bq.  70002 :: 69882 :: 110241      83957 :: 83600 :: 131617
bq.  70523 :: 68681 :: 108372      83911 :: 82845 :: 128513
bq.  68135 :: 68022 :: 107949      85999 :: 81863 :: 125575
bq.  
bq.  
bq.  Test 2: Quad Client Shared Test (1 queue, 2 senders x 2 Receivers)
bq.  
bq.  test setup and execute info:
bq.  qpidd --auth no -p 8888 --no-data-dir --worker-threads=4
bq.  qpid-config -b $SRC_HOST add queue srcQ1 --max-queue-size=1200000000 --max-queue-count=4000000 --flow-stop-size=0 --flow-stop-count=0
bq.  qpid-receive -b $SRC_HOST -a srcQ1 -f -m 2000000 --capacity 2000 --ack-frequency 1000 --print-content no --report-total &
bq.  qpid-receive -b $SRC_HOST -a srcQ1 -f -m 2000000 --capacity 2000 --ack-frequency 1000 --print-content no --report-total &
bq.  qpid-send -b $SRC_HOST -a srcQ1 -m 2000000 --content-size 300 --capacity 2000 --report-total --sequence no --timestamp no &
bq.  qpid-send -b $SRC_HOST -a srcQ1 -m 2000000 --content-size 300 --capacity 2000 --report-total --sequence no --timestamp no &
bq.  
bq.  trunk:                               patched:
bq.  tp(m/s)                              tp(m/s)
bq.  
bq.  S1    :: S2    :: R1    :: R2        S1    :: S2    :: R1    :: R2
bq.  52826 :: 52364 :: 52386 :: 52275     64826 :: 64780 :: 64811 :: 64638
bq.  51457 :: 51269 :: 51399 :: 51213     64630 :: 64157 :: 64562 :: 64157
bq.  50557 :: 50432 :: 50468 :: 50366     64023 :: 63832 :: 64092 :: 63748
bq.  
bq.  **Note: pre-patch, qpidd ran steadily at about 270% cpu during this test.  With the patch, qpidd's cpu utilization was steady at 300%
bq.  
bq.  
bq.  Test 3: Federation Funnel (two federated brokers, 2 producers upstream, 1 consumer downstream)
bq.  
bq.  test setup and execute info:
bq.  qpidd --auth no -p 8888 --no-data-dir --worker-threads=3 -d
bq.  qpidd --auth no -p 9999 --no-data-dir --worker-threads=2 -d
bq.  qpid-config -b $SRC_HOST add queue srcQ1 --max-queue-size=600000000  --max-queue-count=2000000 --flow-stop-size=0 --flow-stop-count=0
bq.  qpid-config -b $SRC_HOST add queue srcQ2 --max-queue-size=600000000  --max-queue-count=2000000 --flow-stop-size=0 --flow-stop-count=0
bq.  qpid-config -b $DST_HOST add queue dstQ1 --max-queue-size=1200000000 --max-queue-count=4000000 --flow-stop-size=0 --flow-stop-count=0
bq.  qpid-config -b $DST_HOST add exchange fanout dstX1
bq.  qpid-config -b $DST_HOST bind dstX1 dstQ1
bq.  qpid-route --ack=1000 queue add $DST_HOST $SRC_HOST dstX1 srcQ1
bq.  qpid-route --ack=1000 queue add $DST_HOST $SRC_HOST dstX1 srcQ2
bq.  qpid-receive -b $DST_HOST -a dstQ1 -f -m 4000000 --capacity 2000 --ack-frequency 1000 --print-content no --report-total &
bq.  qpid-send -b $SRC_HOST -a srcQ1 -m 2000000 --content-size 300 --capacity 2000 --report-total --sequence no --timestamp no &
bq.  qpid-send -b $SRC_HOST -a srcQ2 -m 2000000 --content-size 300 --capacity 2000 --report-total --sequence no --timestamp no &
bq.  
bq.  
bq.  trunk:                        patched:
bq.  tp(m/s)                       tp(m/s)
bq.  
bq.  S1    :: S2    :: R1          S1    :: S2    :: R1
bq.  86150 :: 84146 :: 87665       90877 :: 88534 :: 108418
bq.  89188 :: 88319 :: 82469       87014 :: 86753 :: 107386
bq.  88221 :: 85298 :: 82455       89790 :: 88573 :: 104119
bq.  
bq.  Still TBD:
bq.    - fix message group errors
bq.    - verify/fix cluster
bq.  
bq.  
bq.  This addresses bug qpid-3890.
bq.      https://issues.apache.org/jira/browse/qpid-3890
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    /trunk/qpid/cpp/src/qpid/broker/Queue.h 1297185 
bq.    /trunk/qpid/cpp/src/qpid/broker/Queue.cpp 1297185 
bq.    /trunk/qpid/cpp/src/qpid/broker/QueueListeners.h 1297185 
bq.    /trunk/qpid/cpp/src/qpid/broker/QueueListeners.cpp 1297185 
bq.  
bq.  Diff: https://reviews.apache.org/r/4232/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  unit test (non-cluster)
bq.  performance tests as described above.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Kenneth
bq.  
bq.


                
> Improve performance by reducing the use of the Queue's message lock
> -------------------------------------------------------------------
>
>                 Key: QPID-3890
>                 URL: https://issues.apache.org/jira/browse/QPID-3890
>             Project: Qpid
>          Issue Type: Improvement
>          Components: C++ Broker
>    Affects Versions: 0.16
>            Reporter: Ken Giusti
>            Assignee: Ken Giusti
>            Priority: Minor
>             Fix For: Future
>
>
> The mutex that protects the Queue's message container is held longer than it really should be (e.g. across statistics updates, etc).
> Reducing the amount of code executed while the lock is held will improve the performance of the broker in a multi-process environment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org

[jira] [Commented] (QPID-3890) Improve performance by reducing the use of the Queue's message lock

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/QPID-3890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13224620#comment-13224620 ] 

jiraposter@reviews.apache.org commented on QPID-3890:
-----------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4232/
-----------------------------------------------------------

Review request for qpid, Andrew Stitcher, Alan Conway, Gordon Sim, and Ted Ross.


Summary
-------

In order to improve our parallelism across threads, this patch reduces the scope over which the Queue's message lock is held.

I've still got more testing to do, but the simple test cases below show pretty good performance improvements on my multi-core box.  I'd like to get some early feedback, as there are a lot of little changes to the queue locking that will need vetting.

As far as performance is concerned - I've run the following tests on an 8 core Xeon X5550 2.67Ghz box.  I used qpid-send and qpid-receive to generate traffic.  All traffic generated in parallel.
All numbers are in messages/second - sorted by best overall throughput.

Test 1: Funnel Test (shared queue, 2 producers, 1 consumer)

test setup and execute info:
qpidd --auth no -p 8888 --no-data-dir --worker-threads=3
qpid-config -b $SRC_HOST add queue srcQ1 --max-queue-size=1200000000 --max-queue-count=4000000 --flow-stop-size=0 --flow-stop-count=0
qpid-receive -b $SRC_HOST -a srcQ1 -f -m 4000000 --capacity 2000 --ack-frequency 1000 --print-content no --report-total &
qpid-send -b $SRC_HOST -a srcQ1 -m 2000000 --content-size 300 --capacity 2000 --report-total --sequence no --timestamp no &
qpid-send -b $SRC_HOST -a srcQ1 -m 2000000 --content-size 300 --capacity 2000 --report-total --sequence no --timestamp no &

trunk:                        patched:
tp(m/s)                       tp(m/s)

S1    :: S2    :: R1          S1    :: S2    :: R1
73144 :: 72524 :: 112914      91120 :: 83278 :: 133730
70299 :: 68878 :: 110905      91079 :: 87418 :: 133649
70002 :: 69882 :: 110241      83957 :: 83600 :: 131617
70523 :: 68681 :: 108372      83911 :: 82845 :: 128513
68135 :: 68022 :: 107949      85999 :: 81863 :: 125575


Test 2: Quad Client Shared Test (1 queue, 2 senders x 2 Receivers)

test setup and execute info:
qpidd --auth no -p 8888 --no-data-dir --worker-threads=4
qpid-config -b $SRC_HOST add queue srcQ1 --max-queue-size=1200000000 --max-queue-count=4000000 --flow-stop-size=0 --flow-stop-count=0
qpid-receive -b $SRC_HOST -a srcQ1 -f -m 2000000 --capacity 2000 --ack-frequency 1000 --print-content no --report-total &
qpid-receive -b $SRC_HOST -a srcQ1 -f -m 2000000 --capacity 2000 --ack-frequency 1000 --print-content no --report-total &
qpid-send -b $SRC_HOST -a srcQ1 -m 2000000 --content-size 300 --capacity 2000 --report-total --sequence no --timestamp no &
qpid-send -b $SRC_HOST -a srcQ1 -m 2000000 --content-size 300 --capacity 2000 --report-total --sequence no --timestamp no &

trunk:                               patched:
tp(m/s)                              tp(m/s)

S1    :: S2    :: R1    :: R2        S1    :: S2    :: R1    :: R2
52826 :: 52364 :: 52386 :: 52275     64826 :: 64780 :: 64811 :: 64638
51457 :: 51269 :: 51399 :: 51213     64630 :: 64157 :: 64562 :: 64157
50557 :: 50432 :: 50468 :: 50366     64023 :: 63832 :: 64092 :: 63748

**Note: pre-patch, qpidd ran steadily at about 270% cpu during this test.  With the patch, qpidd's cpu utilization was steady at 300%


Test 3: Federation Funnel (two federated brokers, 2 producers upstream, 1 consumer downstream)

test setup and execute info:
qpidd --auth no -p 8888 --no-data-dir --worker-threads=3 -d
qpidd --auth no -p 9999 --no-data-dir --worker-threads=2 -d
qpid-config -b $SRC_HOST add queue srcQ1 --max-queue-size=600000000  --max-queue-count=2000000 --flow-stop-size=0 --flow-stop-count=0
qpid-config -b $SRC_HOST add queue srcQ2 --max-queue-size=600000000  --max-queue-count=2000000 --flow-stop-size=0 --flow-stop-count=0
qpid-config -b $DST_HOST add queue dstQ1 --max-queue-size=1200000000 --max-queue-count=4000000 --flow-stop-size=0 --flow-stop-count=0
qpid-config -b $DST_HOST add exchange fanout dstX1
qpid-config -b $DST_HOST bind dstX1 dstQ1
qpid-route --ack=1000 queue add $DST_HOST $SRC_HOST dstX1 srcQ1
qpid-route --ack=1000 queue add $DST_HOST $SRC_HOST dstX1 srcQ2
qpid-receive -b $DST_HOST -a dstQ1 -f -m 4000000 --capacity 2000 --ack-frequency 1000 --print-content no --report-total &
qpid-send -b $SRC_HOST -a srcQ1 -m 2000000 --content-size 300 --capacity 2000 --report-total --sequence no --timestamp no &
qpid-send -b $SRC_HOST -a srcQ2 -m 2000000 --content-size 300 --capacity 2000 --report-total --sequence no --timestamp no &


trunk:                        patched:
tp(m/s)                       tp(m/s)

S1    :: S2    :: R1          S1    :: S2    :: R1
86150 :: 84146 :: 87665       90877 :: 88534 :: 108418
89188 :: 88319 :: 82469       87014 :: 86753 :: 107386
88221 :: 85298 :: 82455       89790 :: 88573 :: 104119

Still TBD:
  - fix message group errors
  - verify/fix cluster


This addresses bug qpid-3890.
    https://issues.apache.org/jira/browse/qpid-3890


Diffs
-----

  /trunk/qpid/cpp/src/qpid/broker/Queue.h 1297185 
  /trunk/qpid/cpp/src/qpid/broker/Queue.cpp 1297185 
  /trunk/qpid/cpp/src/qpid/broker/QueueListeners.h 1297185 
  /trunk/qpid/cpp/src/qpid/broker/QueueListeners.cpp 1297185 

Diff: https://reviews.apache.org/r/4232/diff


Testing
-------

unit test (non-cluster)
performance tests as described above.


Thanks,

Kenneth


                
> Improve performance by reducing the use of the Queue's message lock
> -------------------------------------------------------------------
>
>                 Key: QPID-3890
>                 URL: https://issues.apache.org/jira/browse/QPID-3890
>             Project: Qpid
>          Issue Type: Improvement
>          Components: C++ Broker
>    Affects Versions: 0.16
>            Reporter: Ken Giusti
>            Assignee: Ken Giusti
>            Priority: Minor
>             Fix For: Future
>
>
> The mutex that protects the Queue's message container is held longer than it really should be (e.g. across statistics updates, etc).
> Reducing the amount of code executed while the lock is held will improve the performance of the broker in a multi-process environment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org

[jira] [Commented] (QPID-3890) Improve performance by reducing the use of the Queue's message lock

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/QPID-3890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13224730#comment-13224730 ] 

jiraposter@reviews.apache.org commented on QPID-3890:
-----------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4232/#review5691
-----------------------------------------------------------


This is great work.

The most important thing on my mind though is how do ensure that work like this maintains correctness?

Do we have some rules/guidelines for the use of the Queue structs that we can use to inspect the code to satisfy its correctness? A simple rule like this would be "take the blah lock before capturing or changing the foo state". It looks like we might have had rules like that, but these changes are so broad it's difficult for me to inspect the code and reconstruct what rules now apply (if any).

On another point - I see you've added the use of a RWLock - it's always good to benchmark with a plain mutex before using one of these as they are generally higher overhead and unless the use is "read lock mostly write lock seldom" plain mutexes can win.

- Andrew


On 2012-03-07 19:23:36, Kenneth Giusti wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/4232/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-03-07 19:23:36)
bq.  
bq.  
bq.  Review request for qpid, Andrew Stitcher, Alan Conway, Gordon Sim, and Ted Ross.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  In order to improve our parallelism across threads, this patch reduces the scope over which the Queue's message lock is held.
bq.  
bq.  I've still got more testing to do, but the simple test cases below show pretty good performance improvements on my multi-core box.  I'd like to get some early feedback, as there are a lot of little changes to the queue locking that will need vetting.
bq.  
bq.  As far as performance is concerned - I've run the following tests on an 8 core Xeon X5550 2.67Ghz box.  I used qpid-send and qpid-receive to generate traffic.  All traffic generated in parallel.
bq.  All numbers are in messages/second - sorted by best overall throughput.
bq.  
bq.  Test 1: Funnel Test (shared queue, 2 producers, 1 consumer)
bq.  
bq.  test setup and execute info:
bq.  qpidd --auth no -p 8888 --no-data-dir --worker-threads=3
bq.  qpid-config -b $SRC_HOST add queue srcQ1 --max-queue-size=1200000000 --max-queue-count=4000000 --flow-stop-size=0 --flow-stop-count=0
bq.  qpid-receive -b $SRC_HOST -a srcQ1 -f -m 4000000 --capacity 2000 --ack-frequency 1000 --print-content no --report-total &
bq.  qpid-send -b $SRC_HOST -a srcQ1 -m 2000000 --content-size 300 --capacity 2000 --report-total --sequence no --timestamp no &
bq.  qpid-send -b $SRC_HOST -a srcQ1 -m 2000000 --content-size 300 --capacity 2000 --report-total --sequence no --timestamp no &
bq.  
bq.  trunk:                        patched:
bq.  tp(m/s)                       tp(m/s)
bq.  
bq.  S1    :: S2    :: R1          S1    :: S2    :: R1
bq.  73144 :: 72524 :: 112914      91120 :: 83278 :: 133730
bq.  70299 :: 68878 :: 110905      91079 :: 87418 :: 133649
bq.  70002 :: 69882 :: 110241      83957 :: 83600 :: 131617
bq.  70523 :: 68681 :: 108372      83911 :: 82845 :: 128513
bq.  68135 :: 68022 :: 107949      85999 :: 81863 :: 125575
bq.  
bq.  
bq.  Test 2: Quad Client Shared Test (1 queue, 2 senders x 2 Receivers)
bq.  
bq.  test setup and execute info:
bq.  qpidd --auth no -p 8888 --no-data-dir --worker-threads=4
bq.  qpid-config -b $SRC_HOST add queue srcQ1 --max-queue-size=1200000000 --max-queue-count=4000000 --flow-stop-size=0 --flow-stop-count=0
bq.  qpid-receive -b $SRC_HOST -a srcQ1 -f -m 2000000 --capacity 2000 --ack-frequency 1000 --print-content no --report-total &
bq.  qpid-receive -b $SRC_HOST -a srcQ1 -f -m 2000000 --capacity 2000 --ack-frequency 1000 --print-content no --report-total &
bq.  qpid-send -b $SRC_HOST -a srcQ1 -m 2000000 --content-size 300 --capacity 2000 --report-total --sequence no --timestamp no &
bq.  qpid-send -b $SRC_HOST -a srcQ1 -m 2000000 --content-size 300 --capacity 2000 --report-total --sequence no --timestamp no &
bq.  
bq.  trunk:                               patched:
bq.  tp(m/s)                              tp(m/s)
bq.  
bq.  S1    :: S2    :: R1    :: R2        S1    :: S2    :: R1    :: R2
bq.  52826 :: 52364 :: 52386 :: 52275     64826 :: 64780 :: 64811 :: 64638
bq.  51457 :: 51269 :: 51399 :: 51213     64630 :: 64157 :: 64562 :: 64157
bq.  50557 :: 50432 :: 50468 :: 50366     64023 :: 63832 :: 64092 :: 63748
bq.  
bq.  **Note: pre-patch, qpidd ran steadily at about 270% cpu during this test.  With the patch, qpidd's cpu utilization was steady at 300%
bq.  
bq.  
bq.  Test 3: Federation Funnel (two federated brokers, 2 producers upstream, 1 consumer downstream)
bq.  
bq.  test setup and execute info:
bq.  qpidd --auth no -p 8888 --no-data-dir --worker-threads=3 -d
bq.  qpidd --auth no -p 9999 --no-data-dir --worker-threads=2 -d
bq.  qpid-config -b $SRC_HOST add queue srcQ1 --max-queue-size=600000000  --max-queue-count=2000000 --flow-stop-size=0 --flow-stop-count=0
bq.  qpid-config -b $SRC_HOST add queue srcQ2 --max-queue-size=600000000  --max-queue-count=2000000 --flow-stop-size=0 --flow-stop-count=0
bq.  qpid-config -b $DST_HOST add queue dstQ1 --max-queue-size=1200000000 --max-queue-count=4000000 --flow-stop-size=0 --flow-stop-count=0
bq.  qpid-config -b $DST_HOST add exchange fanout dstX1
bq.  qpid-config -b $DST_HOST bind dstX1 dstQ1
bq.  qpid-route --ack=1000 queue add $DST_HOST $SRC_HOST dstX1 srcQ1
bq.  qpid-route --ack=1000 queue add $DST_HOST $SRC_HOST dstX1 srcQ2
bq.  qpid-receive -b $DST_HOST -a dstQ1 -f -m 4000000 --capacity 2000 --ack-frequency 1000 --print-content no --report-total &
bq.  qpid-send -b $SRC_HOST -a srcQ1 -m 2000000 --content-size 300 --capacity 2000 --report-total --sequence no --timestamp no &
bq.  qpid-send -b $SRC_HOST -a srcQ2 -m 2000000 --content-size 300 --capacity 2000 --report-total --sequence no --timestamp no &
bq.  
bq.  
bq.  trunk:                        patched:
bq.  tp(m/s)                       tp(m/s)
bq.  
bq.  S1    :: S2    :: R1          S1    :: S2    :: R1
bq.  86150 :: 84146 :: 87665       90877 :: 88534 :: 108418
bq.  89188 :: 88319 :: 82469       87014 :: 86753 :: 107386
bq.  88221 :: 85298 :: 82455       89790 :: 88573 :: 104119
bq.  
bq.  Still TBD:
bq.    - fix message group errors
bq.    - verify/fix cluster
bq.  
bq.  
bq.  This addresses bug qpid-3890.
bq.      https://issues.apache.org/jira/browse/qpid-3890
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    /trunk/qpid/cpp/src/qpid/broker/Queue.h 1297185 
bq.    /trunk/qpid/cpp/src/qpid/broker/Queue.cpp 1297185 
bq.    /trunk/qpid/cpp/src/qpid/broker/QueueListeners.h 1297185 
bq.    /trunk/qpid/cpp/src/qpid/broker/QueueListeners.cpp 1297185 
bq.  
bq.  Diff: https://reviews.apache.org/r/4232/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  unit test (non-cluster)
bq.  performance tests as described above.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Kenneth
bq.  
bq.


                
> Improve performance by reducing the use of the Queue's message lock
> -------------------------------------------------------------------
>
>                 Key: QPID-3890
>                 URL: https://issues.apache.org/jira/browse/QPID-3890
>             Project: Qpid
>          Issue Type: Improvement
>          Components: C++ Broker
>    Affects Versions: 0.16
>            Reporter: Ken Giusti
>            Assignee: Ken Giusti
>            Priority: Minor
>             Fix For: Future
>
>
> The mutex that protects the Queue's message container is held longer than it really should be (e.g. across statistics updates, etc).
> Reducing the amount of code executed while the lock is held will improve the performance of the broker in a multi-process environment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org

[jira] [Commented] (QPID-3890) Improve performance by reducing the use of the Queue's message lock

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/QPID-3890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13225234#comment-13225234 ] 

jiraposter@reviews.apache.org commented on QPID-3890:
-----------------------------------------------------



bq.  On 2012-03-07 21:30:28, Andrew Stitcher wrote:
bq.  > This is great work.
bq.  > 
bq.  > The most important thing on my mind though is how do ensure that work like this maintains correctness?
bq.  > 
bq.  > Do we have some rules/guidelines for the use of the Queue structs that we can use to inspect the code to satisfy its correctness? A simple rule like this would be "take the blah lock before capturing or changing the foo state". It looks like we might have had rules like that, but these changes are so broad it's difficult for me to inspect the code and reconstruct what rules now apply (if any).
bq.  > 
bq.  > On another point - I see you've added the use of a RWLock - it's always good to benchmark with a plain mutex before using one of these as they are generally higher overhead and unless the use is "read lock mostly write lock seldom" plain mutexes can win.
bq.  
bq.  Kenneth Giusti wrote:
bq.      Re: locking rules - good point Andrew.  Most (all?) of these changes I made simply by manually reviewing the existing code's locking behavior, and trying to identify what can be safely moved outside the locks.  I'll update this patch with comments that explicitly call out those data members & actions that must hold the lock.
bq.      
bq.      The RWLock - hmm.. that may be an unintentional change: at first I had a separate lock for the queue observer container which was rarely written too.  I abandoned that change and put the queue observers under the queue lock since otherwise the order of events (as seen by the observers) could be re-arranged.
bq.  
bq.  Kenneth Giusti wrote:
bq.      Doh - sorry, replied with way too low Caffiene/Blood ratio.  You're referring to the lock in the listener - yes.  Gordon points out that this may be unsafe, so I need to think about it a bit more.
bq.  
bq.  Gordon Sim wrote:
bq.      The order of events as seen by observers can be re-arranged with the patch as it stands (at least those for acquires and dequeues, enqueues would still be ordered correctly wrt each other).
bq.      
bq.      Regarding the listeners access, I think you could fix it by e.g. having an atomic counter that was incremented after pushing the message but before populating notification set, checking that count before trying to acquire the message on the consume side and then rechecking it before adding the consumer to listeners if there was no message. If the number changes between the last two checks you would retry the acquisition.

Re: listeners - makes sense.  I thought the consistency of the listener could be decoupled from the queue, but clearly that doesn't make sense (and probably buys us very little to begin with).

Re: ordering of events as seen by observers.  I want to guarantee that the order of events are preserved WRT the observers - otherwise stuff will break as it stands (like flow control, which relies on correct ordering).  Gordon - can you explain the acquire/dequeues reordering case?  Does this patch introduce that reordering, or has that been possible all along?


- Kenneth


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4232/#review5691
-----------------------------------------------------------


On 2012-03-07 19:23:36, Kenneth Giusti wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/4232/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-03-07 19:23:36)
bq.  
bq.  
bq.  Review request for qpid, Andrew Stitcher, Alan Conway, Gordon Sim, and Ted Ross.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  In order to improve our parallelism across threads, this patch reduces the scope over which the Queue's message lock is held.
bq.  
bq.  I've still got more testing to do, but the simple test cases below show pretty good performance improvements on my multi-core box.  I'd like to get some early feedback, as there are a lot of little changes to the queue locking that will need vetting.
bq.  
bq.  As far as performance is concerned - I've run the following tests on an 8 core Xeon X5550 2.67Ghz box.  I used qpid-send and qpid-receive to generate traffic.  All traffic generated in parallel.
bq.  All numbers are in messages/second - sorted by best overall throughput.
bq.  
bq.  Test 1: Funnel Test (shared queue, 2 producers, 1 consumer)
bq.  
bq.  test setup and execute info:
bq.  qpidd --auth no -p 8888 --no-data-dir --worker-threads=3
bq.  qpid-config -b $SRC_HOST add queue srcQ1 --max-queue-size=1200000000 --max-queue-count=4000000 --flow-stop-size=0 --flow-stop-count=0
bq.  qpid-receive -b $SRC_HOST -a srcQ1 -f -m 4000000 --capacity 2000 --ack-frequency 1000 --print-content no --report-total &
bq.  qpid-send -b $SRC_HOST -a srcQ1 -m 2000000 --content-size 300 --capacity 2000 --report-total --sequence no --timestamp no &
bq.  qpid-send -b $SRC_HOST -a srcQ1 -m 2000000 --content-size 300 --capacity 2000 --report-total --sequence no --timestamp no &
bq.  
bq.  trunk:                        patched:
bq.  tp(m/s)                       tp(m/s)
bq.  
bq.  S1    :: S2    :: R1          S1    :: S2    :: R1
bq.  73144 :: 72524 :: 112914      91120 :: 83278 :: 133730
bq.  70299 :: 68878 :: 110905      91079 :: 87418 :: 133649
bq.  70002 :: 69882 :: 110241      83957 :: 83600 :: 131617
bq.  70523 :: 68681 :: 108372      83911 :: 82845 :: 128513
bq.  68135 :: 68022 :: 107949      85999 :: 81863 :: 125575
bq.  
bq.  
bq.  Test 2: Quad Client Shared Test (1 queue, 2 senders x 2 Receivers)
bq.  
bq.  test setup and execute info:
bq.  qpidd --auth no -p 8888 --no-data-dir --worker-threads=4
bq.  qpid-config -b $SRC_HOST add queue srcQ1 --max-queue-size=1200000000 --max-queue-count=4000000 --flow-stop-size=0 --flow-stop-count=0
bq.  qpid-receive -b $SRC_HOST -a srcQ1 -f -m 2000000 --capacity 2000 --ack-frequency 1000 --print-content no --report-total &
bq.  qpid-receive -b $SRC_HOST -a srcQ1 -f -m 2000000 --capacity 2000 --ack-frequency 1000 --print-content no --report-total &
bq.  qpid-send -b $SRC_HOST -a srcQ1 -m 2000000 --content-size 300 --capacity 2000 --report-total --sequence no --timestamp no &
bq.  qpid-send -b $SRC_HOST -a srcQ1 -m 2000000 --content-size 300 --capacity 2000 --report-total --sequence no --timestamp no &
bq.  
bq.  trunk:                               patched:
bq.  tp(m/s)                              tp(m/s)
bq.  
bq.  S1    :: S2    :: R1    :: R2        S1    :: S2    :: R1    :: R2
bq.  52826 :: 52364 :: 52386 :: 52275     64826 :: 64780 :: 64811 :: 64638
bq.  51457 :: 51269 :: 51399 :: 51213     64630 :: 64157 :: 64562 :: 64157
bq.  50557 :: 50432 :: 50468 :: 50366     64023 :: 63832 :: 64092 :: 63748
bq.  
bq.  **Note: pre-patch, qpidd ran steadily at about 270% cpu during this test.  With the patch, qpidd's cpu utilization was steady at 300%
bq.  
bq.  
bq.  Test 3: Federation Funnel (two federated brokers, 2 producers upstream, 1 consumer downstream)
bq.  
bq.  test setup and execute info:
bq.  qpidd --auth no -p 8888 --no-data-dir --worker-threads=3 -d
bq.  qpidd --auth no -p 9999 --no-data-dir --worker-threads=2 -d
bq.  qpid-config -b $SRC_HOST add queue srcQ1 --max-queue-size=600000000  --max-queue-count=2000000 --flow-stop-size=0 --flow-stop-count=0
bq.  qpid-config -b $SRC_HOST add queue srcQ2 --max-queue-size=600000000  --max-queue-count=2000000 --flow-stop-size=0 --flow-stop-count=0
bq.  qpid-config -b $DST_HOST add queue dstQ1 --max-queue-size=1200000000 --max-queue-count=4000000 --flow-stop-size=0 --flow-stop-count=0
bq.  qpid-config -b $DST_HOST add exchange fanout dstX1
bq.  qpid-config -b $DST_HOST bind dstX1 dstQ1
bq.  qpid-route --ack=1000 queue add $DST_HOST $SRC_HOST dstX1 srcQ1
bq.  qpid-route --ack=1000 queue add $DST_HOST $SRC_HOST dstX1 srcQ2
bq.  qpid-receive -b $DST_HOST -a dstQ1 -f -m 4000000 --capacity 2000 --ack-frequency 1000 --print-content no --report-total &
bq.  qpid-send -b $SRC_HOST -a srcQ1 -m 2000000 --content-size 300 --capacity 2000 --report-total --sequence no --timestamp no &
bq.  qpid-send -b $SRC_HOST -a srcQ2 -m 2000000 --content-size 300 --capacity 2000 --report-total --sequence no --timestamp no &
bq.  
bq.  
bq.  trunk:                        patched:
bq.  tp(m/s)                       tp(m/s)
bq.  
bq.  S1    :: S2    :: R1          S1    :: S2    :: R1
bq.  86150 :: 84146 :: 87665       90877 :: 88534 :: 108418
bq.  89188 :: 88319 :: 82469       87014 :: 86753 :: 107386
bq.  88221 :: 85298 :: 82455       89790 :: 88573 :: 104119
bq.  
bq.  Still TBD:
bq.    - fix message group errors
bq.    - verify/fix cluster
bq.  
bq.  
bq.  This addresses bug qpid-3890.
bq.      https://issues.apache.org/jira/browse/qpid-3890
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    /trunk/qpid/cpp/src/qpid/broker/Queue.h 1297185 
bq.    /trunk/qpid/cpp/src/qpid/broker/Queue.cpp 1297185 
bq.    /trunk/qpid/cpp/src/qpid/broker/QueueListeners.h 1297185 
bq.    /trunk/qpid/cpp/src/qpid/broker/QueueListeners.cpp 1297185 
bq.  
bq.  Diff: https://reviews.apache.org/r/4232/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  unit test (non-cluster)
bq.  performance tests as described above.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Kenneth
bq.  
bq.


                
> Improve performance by reducing the use of the Queue's message lock
> -------------------------------------------------------------------
>
>                 Key: QPID-3890
>                 URL: https://issues.apache.org/jira/browse/QPID-3890
>             Project: Qpid
>          Issue Type: Improvement
>          Components: C++ Broker
>    Affects Versions: 0.16
>            Reporter: Ken Giusti
>            Assignee: Ken Giusti
>            Priority: Minor
>             Fix For: Future
>
>
> The mutex that protects the Queue's message container is held longer than it really should be (e.g. across statistics updates, etc).
> Reducing the amount of code executed while the lock is held will improve the performance of the broker in a multi-process environment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org

[jira] [Commented] (QPID-3890) Improve performance by reducing the use of the Queue's message lock

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/QPID-3890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13232964#comment-13232964 ] 

jiraposter@reviews.apache.org commented on QPID-3890:
-----------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4232/
-----------------------------------------------------------

(Updated 2012-03-19 22:22:42.524210)


Review request for qpid, Andrew Stitcher, Alan Conway, Gordon Sim, and Ted Ross.


Changes
-------

I've applied most of the review recommendations, reverting some of the changes that lead to race conditions.  The result is a bit slower than the original, but shows promise esp. in the multi-consumer/multi-producer case.

Rerun of the tests given the latest changes (against lastest trunk):

Test1
trunk                              qpid-3890
tp(m/s)                            tp(m/s)
70907 :: 70776  :: 112450          81417 :: 75767  :: 115527
73190 :: 71805  :: 107750          81132 :: 75067  :: 114962
67628 :: 66340  :: 106461          75638 :: 73906  :: 110000
64801 :: 64055  :: 104864          77083 :: 75184  :: 106368
69654 :: 66441  :: 103480          72933 :: 72518  :: 105773


Test2
trunk                                   qpid-3890
49921 :: 49761  :: 49724  :: 49674      60263 :: 60234 :: 60211 :: 60111
49672 :: 49384  :: 49372  :: 49279      59781 :: 59545 :: 59770 :: 59529
48138 :: 48107  :: 48097  :: 48056      59792 :: 59506 :: 59571 :: 59426
48228 :: 48216  :: 48061  :: 48009      59002 :: 58847 :: 58972 :: 58779


Test3
trunk                              qpid-3890
84372 :: 84320  :: 92689           81960 :: 80475 :: 96154
88671 :: 84633  :: 85234           83841 :: 82777 :: 86037
81302 :: 81073  :: 79898           83740 :: 81476 :: 85572


I'd like to put this on trunk now that we've branched - the earlier the better.

Opinions?


Summary
-------

In order to improve our parallelism across threads, this patch reduces the scope over which the Queue's message lock is held.

I've still got more testing to do, but the simple test cases below show pretty good performance improvements on my multi-core box.  I'd like to get some early feedback, as there are a lot of little changes to the queue locking that will need vetting.

As far as performance is concerned - I've run the following tests on an 8 core Xeon X5550 2.67Ghz box.  I used qpid-send and qpid-receive to generate traffic.  All traffic generated in parallel.
All numbers are in messages/second - sorted by best overall throughput.

Test 1: Funnel Test (shared queue, 2 producers, 1 consumer)

test setup and execute info:
qpidd --auth no -p 8888 --no-data-dir --worker-threads=3
qpid-config -b $SRC_HOST add queue srcQ1 --max-queue-size=1200000000 --max-queue-count=4000000 --flow-stop-size=0 --flow-stop-count=0
qpid-receive -b $SRC_HOST -a srcQ1 -f -m 4000000 --capacity 2000 --ack-frequency 1000 --print-content no --report-total &
qpid-send -b $SRC_HOST -a srcQ1 -m 2000000 --content-size 300 --capacity 2000 --report-total --sequence no --timestamp no &
qpid-send -b $SRC_HOST -a srcQ1 -m 2000000 --content-size 300 --capacity 2000 --report-total --sequence no --timestamp no &

trunk:                        patched:
tp(m/s)                       tp(m/s)

S1    :: S2    :: R1          S1    :: S2    :: R1
73144 :: 72524 :: 112914      91120 :: 83278 :: 133730
70299 :: 68878 :: 110905      91079 :: 87418 :: 133649
70002 :: 69882 :: 110241      83957 :: 83600 :: 131617
70523 :: 68681 :: 108372      83911 :: 82845 :: 128513
68135 :: 68022 :: 107949      85999 :: 81863 :: 125575


Test 2: Quad Client Shared Test (1 queue, 2 senders x 2 Receivers)

test setup and execute info:
qpidd --auth no -p 8888 --no-data-dir --worker-threads=4
qpid-config -b $SRC_HOST add queue srcQ1 --max-queue-size=1200000000 --max-queue-count=4000000 --flow-stop-size=0 --flow-stop-count=0
qpid-receive -b $SRC_HOST -a srcQ1 -f -m 2000000 --capacity 2000 --ack-frequency 1000 --print-content no --report-total &
qpid-receive -b $SRC_HOST -a srcQ1 -f -m 2000000 --capacity 2000 --ack-frequency 1000 --print-content no --report-total &
qpid-send -b $SRC_HOST -a srcQ1 -m 2000000 --content-size 300 --capacity 2000 --report-total --sequence no --timestamp no &
qpid-send -b $SRC_HOST -a srcQ1 -m 2000000 --content-size 300 --capacity 2000 --report-total --sequence no --timestamp no &

trunk:                               patched:
tp(m/s)                              tp(m/s)

S1    :: S2    :: R1    :: R2        S1    :: S2    :: R1    :: R2
52826 :: 52364 :: 52386 :: 52275     64826 :: 64780 :: 64811 :: 64638
51457 :: 51269 :: 51399 :: 51213     64630 :: 64157 :: 64562 :: 64157
50557 :: 50432 :: 50468 :: 50366     64023 :: 63832 :: 64092 :: 63748

**Note: pre-patch, qpidd ran steadily at about 270% cpu during this test.  With the patch, qpidd's cpu utilization was steady at 300%


Test 3: Federation Funnel (two federated brokers, 2 producers upstream, 1 consumer downstream)

test setup and execute info:
qpidd --auth no -p 8888 --no-data-dir --worker-threads=3 -d
qpidd --auth no -p 9999 --no-data-dir --worker-threads=2 -d
qpid-config -b $SRC_HOST add queue srcQ1 --max-queue-size=600000000  --max-queue-count=2000000 --flow-stop-size=0 --flow-stop-count=0
qpid-config -b $SRC_HOST add queue srcQ2 --max-queue-size=600000000  --max-queue-count=2000000 --flow-stop-size=0 --flow-stop-count=0
qpid-config -b $DST_HOST add queue dstQ1 --max-queue-size=1200000000 --max-queue-count=4000000 --flow-stop-size=0 --flow-stop-count=0
qpid-config -b $DST_HOST add exchange fanout dstX1
qpid-config -b $DST_HOST bind dstX1 dstQ1
qpid-route --ack=1000 queue add $DST_HOST $SRC_HOST dstX1 srcQ1
qpid-route --ack=1000 queue add $DST_HOST $SRC_HOST dstX1 srcQ2
qpid-receive -b $DST_HOST -a dstQ1 -f -m 4000000 --capacity 2000 --ack-frequency 1000 --print-content no --report-total &
qpid-send -b $SRC_HOST -a srcQ1 -m 2000000 --content-size 300 --capacity 2000 --report-total --sequence no --timestamp no &
qpid-send -b $SRC_HOST -a srcQ2 -m 2000000 --content-size 300 --capacity 2000 --report-total --sequence no --timestamp no &


trunk:                        patched:
tp(m/s)                       tp(m/s)

S1    :: S2    :: R1          S1    :: S2    :: R1
86150 :: 84146 :: 87665       90877 :: 88534 :: 108418
89188 :: 88319 :: 82469       87014 :: 86753 :: 107386
88221 :: 85298 :: 82455       89790 :: 88573 :: 104119

Still TBD:
  - fix message group errors
  - verify/fix cluster


This addresses bug qpid-3890.
    https://issues.apache.org/jira/browse/qpid-3890


Diffs (updated)
-----

  /trunk/qpid/cpp/src/qpid/broker/Queue.h 1302689 
  /trunk/qpid/cpp/src/qpid/broker/Queue.cpp 1302689 
  /trunk/qpid/cpp/src/qpid/broker/QueueListeners.h 1302689 
  /trunk/qpid/cpp/src/qpid/broker/QueueListeners.cpp 1302689 

Diff: https://reviews.apache.org/r/4232/diff


Testing
-------

unit test (non-cluster)
performance tests as described above.


Thanks,

Kenneth


                
> Improve performance by reducing the use of the Queue's message lock
> -------------------------------------------------------------------
>
>                 Key: QPID-3890
>                 URL: https://issues.apache.org/jira/browse/QPID-3890
>             Project: Qpid
>          Issue Type: Improvement
>          Components: C++ Broker
>    Affects Versions: 0.16
>            Reporter: Ken Giusti
>            Assignee: Ken Giusti
>            Priority: Minor
>             Fix For: Future
>
>
> The mutex that protects the Queue's message container is held longer than it really should be (e.g. across statistics updates, etc).
> Reducing the amount of code executed while the lock is held will improve the performance of the broker in a multi-process environment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org

[jira] [Commented] (QPID-3890) Improve performance by reducing the use of the Queue's message lock

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/QPID-3890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13226110#comment-13226110 ] 

jiraposter@reviews.apache.org commented on QPID-3890:
-----------------------------------------------------

bq.  On 2012-03-07 21:30:28, Andrew Stitcher wrote:
bq.  > This is great work.
bq.  > 
bq.  > The most important thing on my mind though is how do ensure that work like this maintains correctness?
bq.  > 
bq.  > Do we have some rules/guidelines for the use of the Queue structs that we can use to inspect the code to satisfy its correctness? A simple rule like this would be "take the blah lock before capturing or changing the foo state". It looks like we might have had rules like that, but these changes are so broad it's difficult for me to inspect the code and reconstruct what rules now apply (if any).
bq.  > 
bq.  > On another point - I see you've added the use of a RWLock - it's always good to benchmark with a plain mutex before using one of these as they are generally higher overhead and unless the use is "read lock mostly write lock seldom" plain mutexes can win.
bq.  
bq.  Kenneth Giusti wrote:
bq.      Re: locking rules - good point Andrew.  Most (all?) of these changes I made simply by manually reviewing the existing code's locking behavior, and trying to identify what can be safely moved outside the locks.  I'll update this patch with comments that explicitly call out those data members & actions that must hold the lock.
bq.      
bq.      The RWLock - hmm.. that may be an unintentional change: at first I had a separate lock for the queue observer container which was rarely written too.  I abandoned that change and put the queue observers under the queue lock since otherwise the order of events (as seen by the observers) could be re-arranged.
bq.  
bq.  Kenneth Giusti wrote:
bq.      Doh - sorry, replied with way too low Caffiene/Blood ratio.  You're referring to the lock in the listener - yes.  Gordon points out that this may be unsafe, so I need to think about it a bit more.
bq.  
bq.  Gordon Sim wrote:
bq.      The order of events as seen by observers can be re-arranged with the patch as it stands (at least those for acquires and dequeues, enqueues would still be ordered correctly wrt each other).
bq.      
bq.      Regarding the listeners access, I think you could fix it by e.g. having an atomic counter that was incremented after pushing the message but before populating notification set, checking that count before trying to acquire the message on the consume side and then rechecking it before adding the consumer to listeners if there was no message. If the number changes between the last two checks you would retry the acquisition.
bq.  
bq.  Kenneth Giusti wrote:
bq.      Re: listeners - makes sense.  I thought the consistency of the listener could be decoupled from the queue, but clearly that doesn't make sense (and probably buys us very little to begin with).
bq.      
bq.      Re: ordering of events as seen by observers.  I want to guarantee that the order of events are preserved WRT the observers - otherwise stuff will break as it stands (like flow control, which relies on correct ordering).  Gordon - can you explain the acquire/dequeues reordering case?  Does this patch introduce that reordering, or has that been possible all along?
bq.  
bq.  Gordon Sim wrote:
bq.      My mistake, I misread the patch, sorry! I think you are right that the ordering is preserved.
bq.  
bq.  Gordon Sim wrote:
bq.      re "I thought the consistency of the listener could be decoupled from the queue, but clearly that doesn't make sense (and probably buys us very little to begin with)." I would have thought that this was one of the more significant change, no? (Along with moving the mgmt stats outside the lock).
bq.  
bq.  Kenneth Giusti wrote:
bq.      I've moved the listeners.populate() calls back under the lock.  There is a performance degradation, but there's still an overall improvement:
bq.      
bq.      Funnel Test:  (3 worker threads)
bq.      
bq.      trunk:                        patched:                       patch-2 (locked listeners):
bq.      tp(m/s)                       tp(m/s)                        tp(m/s)
bq.      
bq.      S1    :: S2    :: R1          S1    :: S2    :: R1          S1    :: S2    :: R1
bq.      73144 :: 72524 :: 112914      91120 :: 83278 :: 133730      91857 :: 87240 :: 133878
bq.      70299 :: 68878 :: 110905      91079 :: 87418 :: 133649      84184 :: 82794 :: 126980
bq.      70002 :: 69882 :: 110241      83957 :: 83600 :: 131617      81846 :: 79640 :: 124192
bq.      70523 :: 68681 :: 108372      83911 :: 82845 :: 128513      82942 :: 80433 :: 123311
bq.      68135 :: 68022 :: 107949      85999 :: 81863 :: 125575      80533 :: 79706 :: 120317
bq.      
bq.      
bq.      
bq.      Quad Shared Test (1 queue, 2 senders x 2 Receivers,  4 worker threads)
bq.      
bq.      trunk:                               patched:                              patch-2 (locked listeners):
bq.      tp(m/s)                              tp(m/s)                               tp(m/s)
bq.      
bq.      S1    :: S2    :: R1    :: R2        S1    :: S2    :: R1    :: R2         S1    :: S2    :: R1    :: R2
bq.      52826 :: 52364 :: 52386 :: 52275     64826 :: 64780 :: 64811 :: 64638      64184 :: 64170 :: 64247 :: 63992
bq.      51457 :: 51269 :: 51399 :: 51213     64630 :: 64157 :: 64562 :: 64157      63589 :: 63524 :: 63628 :: 63428
bq.      50557 :: 50432 :: 50468 :: 50366     64023 :: 63832 :: 64092 :: 63748      63718 :: 63685 :: 63772 :: 63555
bq.      
bq.      
bq.      
bq.      Federation Funnel (worker threads = 3 upstream, 2 down)
bq.      
bq.      trunk:                        patched:                        patch-2 (locked listeners):
bq.      tp(m/s)                       tp(m/s)                         tp(m/s)
bq.      
bq.      S1    :: S2    :: R1          S1    :: S2    :: R1          S1    :: S2    :: R1
bq.      86150 :: 84146 :: 87665       90877 :: 88534 :: 108418      90953 :: 90269 :: 105194
bq.      89188 :: 88319 :: 82469       87014 :: 86753 :: 107386      86468 :: 85530 :: 100796
bq.      88221 :: 85298 :: 82455       89790 :: 88573 :: 104119      89169 :: 88839 ::  92910
bq.

That's very interesting. There are only two changes I can see on the critical path for these tests:

  (i) moving the update of mgmt stats outside the lock (very sensible as we know already that adds some overhead and the locking is not required)

  (ii) the Consumer::filter() and Consumer::accept() are also moved out on the consumer side; as filter always currently returns true I think this boils down to the credit check in accept()

Would be interesting to know the relative impact of these.

- Gordon

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4232/#review5691
-----------------------------------------------------------

On 2012-03-07 19:23:36, Kenneth Giusti wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/4232/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-03-07 19:23:36)
bq.  
bq.  
bq.  Review request for qpid, Andrew Stitcher, Alan Conway, Gordon Sim, and Ted Ross.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  In order to improve our parallelism across threads, this patch reduces the scope over which the Queue's message lock is held.
bq.  
bq.  I've still got more testing to do, but the simple test cases below show pretty good performance improvements on my multi-core box.  I'd like to get some early feedback, as there are a lot of little changes to the queue locking that will need vetting.
bq.  
bq.  As far as performance is concerned - I've run the following tests on an 8 core Xeon X5550 2.67Ghz box.  I used qpid-send and qpid-receive to generate traffic.  All traffic generated in parallel.
bq.  All numbers are in messages/second - sorted by best overall throughput.
bq.  
bq.  Test 1: Funnel Test (shared queue, 2 producers, 1 consumer)
bq.  
bq.  test setup and execute info:
bq.  qpidd --auth no -p 8888 --no-data-dir --worker-threads=3
bq.  qpid-config -b $SRC_HOST add queue srcQ1 --max-queue-size=1200000000 --max-queue-count=4000000 --flow-stop-size=0 --flow-stop-count=0
bq.  qpid-receive -b $SRC_HOST -a srcQ1 -f -m 4000000 --capacity 2000 --ack-frequency 1000 --print-content no --report-total &
bq.  qpid-send -b $SRC_HOST -a srcQ1 -m 2000000 --content-size 300 --capacity 2000 --report-total --sequence no --timestamp no &
bq.  qpid-send -b $SRC_HOST -a srcQ1 -m 2000000 --content-size 300 --capacity 2000 --report-total --sequence no --timestamp no &
bq.  
bq.  trunk:                        patched:
bq.  tp(m/s)                       tp(m/s)
bq.  
bq.  S1    :: S2    :: R1          S1    :: S2    :: R1
bq.  73144 :: 72524 :: 112914      91120 :: 83278 :: 133730
bq.  70299 :: 68878 :: 110905      91079 :: 87418 :: 133649
bq.  70002 :: 69882 :: 110241      83957 :: 83600 :: 131617
bq.  70523 :: 68681 :: 108372      83911 :: 82845 :: 128513
bq.  68135 :: 68022 :: 107949      85999 :: 81863 :: 125575
bq.  
bq.  
bq.  Test 2: Quad Client Shared Test (1 queue, 2 senders x 2 Receivers)
bq.  
bq.  test setup and execute info:
bq.  qpidd --auth no -p 8888 --no-data-dir --worker-threads=4
bq.  qpid-config -b $SRC_HOST add queue srcQ1 --max-queue-size=1200000000 --max-queue-count=4000000 --flow-stop-size=0 --flow-stop-count=0
bq.  qpid-receive -b $SRC_HOST -a srcQ1 -f -m 2000000 --capacity 2000 --ack-frequency 1000 --print-content no --report-total &
bq.  qpid-receive -b $SRC_HOST -a srcQ1 -f -m 2000000 --capacity 2000 --ack-frequency 1000 --print-content no --report-total &
bq.  qpid-send -b $SRC_HOST -a srcQ1 -m 2000000 --content-size 300 --capacity 2000 --report-total --sequence no --timestamp no &
bq.  qpid-send -b $SRC_HOST -a srcQ1 -m 2000000 --content-size 300 --capacity 2000 --report-total --sequence no --timestamp no &
bq.  
bq.  trunk:                               patched:
bq.  tp(m/s)                              tp(m/s)
bq.  
bq.  S1    :: S2    :: R1    :: R2        S1    :: S2    :: R1    :: R2
bq.  52826 :: 52364 :: 52386 :: 52275     64826 :: 64780 :: 64811 :: 64638
bq.  51457 :: 51269 :: 51399 :: 51213     64630 :: 64157 :: 64562 :: 64157
bq.  50557 :: 50432 :: 50468 :: 50366     64023 :: 63832 :: 64092 :: 63748
bq.  
bq.  **Note: pre-patch, qpidd ran steadily at about 270% cpu during this test.  With the patch, qpidd's cpu utilization was steady at 300%
bq.  
bq.  
bq.  Test 3: Federation Funnel (two federated brokers, 2 producers upstream, 1 consumer downstream)
bq.  
bq.  test setup and execute info:
bq.  qpidd --auth no -p 8888 --no-data-dir --worker-threads=3 -d
bq.  qpidd --auth no -p 9999 --no-data-dir --worker-threads=2 -d
bq.  qpid-config -b $SRC_HOST add queue srcQ1 --max-queue-size=600000000  --max-queue-count=2000000 --flow-stop-size=0 --flow-stop-count=0
bq.  qpid-config -b $SRC_HOST add queue srcQ2 --max-queue-size=600000000  --max-queue-count=2000000 --flow-stop-size=0 --flow-stop-count=0
bq.  qpid-config -b $DST_HOST add queue dstQ1 --max-queue-size=1200000000 --max-queue-count=4000000 --flow-stop-size=0 --flow-stop-count=0
bq.  qpid-config -b $DST_HOST add exchange fanout dstX1
bq.  qpid-config -b $DST_HOST bind dstX1 dstQ1
bq.  qpid-route --ack=1000 queue add $DST_HOST $SRC_HOST dstX1 srcQ1
bq.  qpid-route --ack=1000 queue add $DST_HOST $SRC_HOST dstX1 srcQ2
bq.  qpid-receive -b $DST_HOST -a dstQ1 -f -m 4000000 --capacity 2000 --ack-frequency 1000 --print-content no --report-total &
bq.  qpid-send -b $SRC_HOST -a srcQ1 -m 2000000 --content-size 300 --capacity 2000 --report-total --sequence no --timestamp no &
bq.  qpid-send -b $SRC_HOST -a srcQ2 -m 2000000 --content-size 300 --capacity 2000 --report-total --sequence no --timestamp no &
bq.  
bq.  
bq.  trunk:                        patched:
bq.  tp(m/s)                       tp(m/s)
bq.  
bq.  S1    :: S2    :: R1          S1    :: S2    :: R1
bq.  86150 :: 84146 :: 87665       90877 :: 88534 :: 108418
bq.  89188 :: 88319 :: 82469       87014 :: 86753 :: 107386
bq.  88221 :: 85298 :: 82455       89790 :: 88573 :: 104119
bq.  
bq.  Still TBD:
bq.    - fix message group errors
bq.    - verify/fix cluster
bq.  
bq.  
bq.  This addresses bug qpid-3890.
bq.      https://issues.apache.org/jira/browse/qpid-3890
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    /trunk/qpid/cpp/src/qpid/broker/Queue.h 1297185 
bq.    /trunk/qpid/cpp/src/qpid/broker/Queue.cpp 1297185 
bq.    /trunk/qpid/cpp/src/qpid/broker/QueueListeners.h 1297185 
bq.    /trunk/qpid/cpp/src/qpid/broker/QueueListeners.cpp 1297185 
bq.  
bq.  Diff: https://reviews.apache.org/r/4232/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  unit test (non-cluster)
bq.  performance tests as described above.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Kenneth
bq.  
bq.

> Improve performance by reducing the use of the Queue's message lock
> -------------------------------------------------------------------
>
>                 Key: QPID-3890
>                 URL: https://issues.apache.org/jira/browse/QPID-3890
>             Project: Qpid
>          Issue Type: Improvement
>          Components: C++ Broker
>    Affects Versions: 0.16
>            Reporter: Ken Giusti
>            Assignee: Ken Giusti
>            Priority: Minor
>             Fix For: Future
>
>
> The mutex that protects the Queue's message container is held longer than it really should be (e.g. across statistics updates, etc).
> Reducing the amount of code executed while the lock is held will improve the performance of the broker in a multi-process environment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org

[jira] [Commented] (QPID-3890) Improve performance by reducing the use of the Queue's message lock

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/QPID-3890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13225215#comment-13225215 ] 

jiraposter@reviews.apache.org commented on QPID-3890:
-----------------------------------------------------



bq.  On 2012-03-07 21:30:28, Andrew Stitcher wrote:
bq.  > This is great work.
bq.  > 
bq.  > The most important thing on my mind though is how do ensure that work like this maintains correctness?
bq.  > 
bq.  > Do we have some rules/guidelines for the use of the Queue structs that we can use to inspect the code to satisfy its correctness? A simple rule like this would be "take the blah lock before capturing or changing the foo state". It looks like we might have had rules like that, but these changes are so broad it's difficult for me to inspect the code and reconstruct what rules now apply (if any).
bq.  > 
bq.  > On another point - I see you've added the use of a RWLock - it's always good to benchmark with a plain mutex before using one of these as they are generally higher overhead and unless the use is "read lock mostly write lock seldom" plain mutexes can win.

Re: locking rules - good point Andrew.  Most (all?) of these changes I made simply by manually reviewing the existing code's locking behavior, and trying to identify what can be safely moved outside the locks.  I'll update this patch with comments that explicitly call out those data members & actions that must hold the lock.

The RWLock - hmm.. that may be an unintentional change: at first I had a separate lock for the queue observer container which was rarely written too.  I abandoned that change and put the queue observers under the queue lock since otherwise the order of events (as seen by the observers) could be re-arranged.


- Kenneth


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4232/#review5691
-----------------------------------------------------------


On 2012-03-07 19:23:36, Kenneth Giusti wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/4232/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-03-07 19:23:36)
bq.  
bq.  
bq.  Review request for qpid, Andrew Stitcher, Alan Conway, Gordon Sim, and Ted Ross.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  In order to improve our parallelism across threads, this patch reduces the scope over which the Queue's message lock is held.
bq.  
bq.  I've still got more testing to do, but the simple test cases below show pretty good performance improvements on my multi-core box.  I'd like to get some early feedback, as there are a lot of little changes to the queue locking that will need vetting.
bq.  
bq.  As far as performance is concerned - I've run the following tests on an 8 core Xeon X5550 2.67Ghz box.  I used qpid-send and qpid-receive to generate traffic.  All traffic generated in parallel.
bq.  All numbers are in messages/second - sorted by best overall throughput.
bq.  
bq.  Test 1: Funnel Test (shared queue, 2 producers, 1 consumer)
bq.  
bq.  test setup and execute info:
bq.  qpidd --auth no -p 8888 --no-data-dir --worker-threads=3
bq.  qpid-config -b $SRC_HOST add queue srcQ1 --max-queue-size=1200000000 --max-queue-count=4000000 --flow-stop-size=0 --flow-stop-count=0
bq.  qpid-receive -b $SRC_HOST -a srcQ1 -f -m 4000000 --capacity 2000 --ack-frequency 1000 --print-content no --report-total &
bq.  qpid-send -b $SRC_HOST -a srcQ1 -m 2000000 --content-size 300 --capacity 2000 --report-total --sequence no --timestamp no &
bq.  qpid-send -b $SRC_HOST -a srcQ1 -m 2000000 --content-size 300 --capacity 2000 --report-total --sequence no --timestamp no &
bq.  
bq.  trunk:                        patched:
bq.  tp(m/s)                       tp(m/s)
bq.  
bq.  S1    :: S2    :: R1          S1    :: S2    :: R1
bq.  73144 :: 72524 :: 112914      91120 :: 83278 :: 133730
bq.  70299 :: 68878 :: 110905      91079 :: 87418 :: 133649
bq.  70002 :: 69882 :: 110241      83957 :: 83600 :: 131617
bq.  70523 :: 68681 :: 108372      83911 :: 82845 :: 128513
bq.  68135 :: 68022 :: 107949      85999 :: 81863 :: 125575
bq.  
bq.  
bq.  Test 2: Quad Client Shared Test (1 queue, 2 senders x 2 Receivers)
bq.  
bq.  test setup and execute info:
bq.  qpidd --auth no -p 8888 --no-data-dir --worker-threads=4
bq.  qpid-config -b $SRC_HOST add queue srcQ1 --max-queue-size=1200000000 --max-queue-count=4000000 --flow-stop-size=0 --flow-stop-count=0
bq.  qpid-receive -b $SRC_HOST -a srcQ1 -f -m 2000000 --capacity 2000 --ack-frequency 1000 --print-content no --report-total &
bq.  qpid-receive -b $SRC_HOST -a srcQ1 -f -m 2000000 --capacity 2000 --ack-frequency 1000 --print-content no --report-total &
bq.  qpid-send -b $SRC_HOST -a srcQ1 -m 2000000 --content-size 300 --capacity 2000 --report-total --sequence no --timestamp no &
bq.  qpid-send -b $SRC_HOST -a srcQ1 -m 2000000 --content-size 300 --capacity 2000 --report-total --sequence no --timestamp no &
bq.  
bq.  trunk:                               patched:
bq.  tp(m/s)                              tp(m/s)
bq.  
bq.  S1    :: S2    :: R1    :: R2        S1    :: S2    :: R1    :: R2
bq.  52826 :: 52364 :: 52386 :: 52275     64826 :: 64780 :: 64811 :: 64638
bq.  51457 :: 51269 :: 51399 :: 51213     64630 :: 64157 :: 64562 :: 64157
bq.  50557 :: 50432 :: 50468 :: 50366     64023 :: 63832 :: 64092 :: 63748
bq.  
bq.  **Note: pre-patch, qpidd ran steadily at about 270% cpu during this test.  With the patch, qpidd's cpu utilization was steady at 300%
bq.  
bq.  
bq.  Test 3: Federation Funnel (two federated brokers, 2 producers upstream, 1 consumer downstream)
bq.  
bq.  test setup and execute info:
bq.  qpidd --auth no -p 8888 --no-data-dir --worker-threads=3 -d
bq.  qpidd --auth no -p 9999 --no-data-dir --worker-threads=2 -d
bq.  qpid-config -b $SRC_HOST add queue srcQ1 --max-queue-size=600000000  --max-queue-count=2000000 --flow-stop-size=0 --flow-stop-count=0
bq.  qpid-config -b $SRC_HOST add queue srcQ2 --max-queue-size=600000000  --max-queue-count=2000000 --flow-stop-size=0 --flow-stop-count=0
bq.  qpid-config -b $DST_HOST add queue dstQ1 --max-queue-size=1200000000 --max-queue-count=4000000 --flow-stop-size=0 --flow-stop-count=0
bq.  qpid-config -b $DST_HOST add exchange fanout dstX1
bq.  qpid-config -b $DST_HOST bind dstX1 dstQ1
bq.  qpid-route --ack=1000 queue add $DST_HOST $SRC_HOST dstX1 srcQ1
bq.  qpid-route --ack=1000 queue add $DST_HOST $SRC_HOST dstX1 srcQ2
bq.  qpid-receive -b $DST_HOST -a dstQ1 -f -m 4000000 --capacity 2000 --ack-frequency 1000 --print-content no --report-total &
bq.  qpid-send -b $SRC_HOST -a srcQ1 -m 2000000 --content-size 300 --capacity 2000 --report-total --sequence no --timestamp no &
bq.  qpid-send -b $SRC_HOST -a srcQ2 -m 2000000 --content-size 300 --capacity 2000 --report-total --sequence no --timestamp no &
bq.  
bq.  
bq.  trunk:                        patched:
bq.  tp(m/s)                       tp(m/s)
bq.  
bq.  S1    :: S2    :: R1          S1    :: S2    :: R1
bq.  86150 :: 84146 :: 87665       90877 :: 88534 :: 108418
bq.  89188 :: 88319 :: 82469       87014 :: 86753 :: 107386
bq.  88221 :: 85298 :: 82455       89790 :: 88573 :: 104119
bq.  
bq.  Still TBD:
bq.    - fix message group errors
bq.    - verify/fix cluster
bq.  
bq.  
bq.  This addresses bug qpid-3890.
bq.      https://issues.apache.org/jira/browse/qpid-3890
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    /trunk/qpid/cpp/src/qpid/broker/Queue.h 1297185 
bq.    /trunk/qpid/cpp/src/qpid/broker/Queue.cpp 1297185 
bq.    /trunk/qpid/cpp/src/qpid/broker/QueueListeners.h 1297185 
bq.    /trunk/qpid/cpp/src/qpid/broker/QueueListeners.cpp 1297185 
bq.  
bq.  Diff: https://reviews.apache.org/r/4232/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  unit test (non-cluster)
bq.  performance tests as described above.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Kenneth
bq.  
bq.


                
> Improve performance by reducing the use of the Queue's message lock
> -------------------------------------------------------------------
>
>                 Key: QPID-3890
>                 URL: https://issues.apache.org/jira/browse/QPID-3890
>             Project: Qpid
>          Issue Type: Improvement
>          Components: C++ Broker
>    Affects Versions: 0.16
>            Reporter: Ken Giusti
>            Assignee: Ken Giusti
>            Priority: Minor
>             Fix For: Future
>
>
> The mutex that protects the Queue's message container is held longer than it really should be (e.g. across statistics updates, etc).
> Reducing the amount of code executed while the lock is held will improve the performance of the broker in a multi-process environment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org