You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Alexey Serbin (Code Review)" <ge...@cloudera.org> on 2020/07/09 18:57:45 UTC

[kudu-CR] [tests] add same tablet concurrent writes test

Hello Kudu Jenkins, Grant Henke, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/16060

to look at the new patch set (#4).

Change subject: [tests] add same_tablet_concurrent_writes test
......................................................................

[tests] add same_tablet_concurrent_writes test

Added SameTabletConcurrentWritesTest.InsertsOnly test scenario.
The scenario exercises concurrent inserts from multiple clients
into the same tablet.

The purpose of the newly introduced test is to check for lock contention
if running multiple write operations on the same tablet concurrently.
There is an interaction between threads pushing Raft consensus updates
and RPC worker threads serving write requests, and the test pinpoints
the contention over the lock primitives used in RaftConsensus.

To validate the results reported by the test, I verified that RPC queue
overflows happen a bit less often if using the lock-free implementation
of RaftConsensus::CheckLeadershipAndBindTerm() with patch posted here:
  https://gerrit.cloudera.org/#/c/16034/

The rates of successful write operations was the same for both cases,
and that's expected since the bottleneck is the WAL (where additional
static delays are introduced per each fsync).  However, the number of
messages from spinlock_profiling.cc like
  Waited 190 ms on lock 0x237acd4 ...
dropped significantly after applying patch 16034 on top.  That's a good
news to have less contention because the freed CPU resources might be
spend on something useful, like handing another RPC request from the
queue (which isn't overflown and able to accommodate extra requests).

Below are snippets of various measurements done for this new test
before and after applying patch from 16034 review item on top.

========================================================================

Without 16034 patch:
 Performance counter stats for './bin/same_tablet_concurrent_writes-itest':

       7449.425640 task-clock                #    0.504 CPUs utilized
            47,882 context-switches          #    0.006 M/sec
             3,454 cpu-migrations            #    0.464 K/sec
            28,592 page-faults               #    0.004 M/sec
    10,211,586,270 cycles                    #    1.371 GHz
    10,647,306,766 instructions              #    1.04  insns per cycle
     1,861,229,149 branches                  #  249.849 M/sec
        25,370,590 branch-misses             #    1.36% of all branches

      14.767762000 seconds time elapsed

With 16034 patch:
 Performance counter stats for './bin/same_tablet_concurrent_writes-itest':

       5646.543970 task-clock                #    0.394 CPUs utilized
            39,194 context-switches          #    0.007 M/sec
             3,715 cpu-migrations            #    0.658 K/sec
            30,090 page-faults               #    0.005 M/sec
     8,543,832,082 cycles                    #    1.513 GHz
     9,301,870,856 instructions              #    1.09  insns per cycle
     1,590,579,357 branches                  #  281.691 M/sec
        18,563,203 branch-misses             #    1.17% of all branches

      14.339274728 seconds time elapsed

========================================================================

------------------------------------------------------------------------
                           | Without 16034 patch   |  With 16034 patch
------------------------------------------------------------------------
  write RPC request rate   | 15.8 req/sec          | 16 req/sec
  RPC queue overflows      | 1898                  | 50
  spinlock_contention_time | 22966310              | 9161557
------------------------------------------------------------------------
  rpc_incoming_queue_time  |                       |
                           | Count: 82             | Count: 82
                           | Mean: 199704          | Mean: 1037.87
                           | Percentiles:          | Percentiles:
                           |   0%  (min) = 35      |   0%  (min) = 21
                           |  25%        = 5844    |  25%        = 51
                           |  50%  (med) = 196096  |  50%  (med) = 67
                           |  75%        = 388352  |  75%        = 1608
                           |  95%        = 400640  |  95%        = 3334
                           |  99%        = 598016  |  99%        = 9960
                           |  99.9%      = 599552  |  99.9%      = 10064
                           |  99.99%     = 599552  |  99.99%     = 10064
                           |  100% (max) = 600048  |  100% (max) = 10066
------------------------------------------------------------------------
  op_apply_run_time        |                       |
                           | Count: 79             | Count: 80
                           | Mean: 99377.1         | Mean: 80796.3
                           | Percentiles:          | Percentiles:
                           |   0%  (min) = 429     |   0%  (min) = 575
                           |  25%        = 940     |  25%        = 828
                           |  50%  (med) = 1336    |  50%  (med) = 1064
                           |  75%        = 200704  |  75%        = 200704
                           |  95%        = 391168  |  95%        = 200704
                           |  99%        = 401408  |  99%        = 200704
                           |  99.9%      = 401408  |  99.9%      = 399360
                           |  99.99%     = 401408  |  99.99%     = 399360
                           |  100% (max) = 401432  |  100% (max) = 399703
------------------------------------------------------------------------
  handler_latency_kudu_tserver_TabletServerService_Write:
                           | Count: 49             | Count: 45
                           | Mean: 3.11688e+06     | Mean: 3.08435e+06
                           | Percentiles:          | Percentiles:
                           |   0%  (min) = 611494  |   0%  (min) = 636928
                           |  25%        = 1802240 |  25%        = 2392064
                           |  50%  (med) = 3391488 |  50%  (med) = 3178496
                           |  75%        = 4390912 |  75%        = 3997696
                           |  95%        = 4816896 |  95%        = 4587520
                           |  99%        = 5013504 |  99%        = 4587520
                           |  99.9%      = 5013504 |  99.9%      = 4587520
                           |  99.99%     = 5013504 |  99.99%     = 4587520
                           |  100% (max) = 5023673 |  100% (max) = 4616174
------------------------------------------------------------------------

Change-Id: I7eef6e46e7685450354473cee9d804c5054723eb
---
M src/kudu/integration-tests/CMakeLists.txt
A src/kudu/integration-tests/same_tablet_concurrent_writes-itest.cc
2 files changed, 341 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/60/16060/4
-- 
To view, visit http://gerrit.cloudera.org:8080/16060
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7eef6e46e7685450354473cee9d804c5054723eb
Gerrit-Change-Number: 16060
Gerrit-PatchSet: 4
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>