You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Todd Lipcon (Code Review)" <ge...@cloudera.org> on 2020/07/13 23:43:59 UTC

[kudu-CR] WIP: tablet: acquire/release locks in batches

Hello Tidy Bot, Alexey Serbin, Andrew Wong, Kudu Jenkins, Andrew Wong, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/16169

to look at the new patch set (#2).

Change subject: WIP: tablet: acquire/release locks in batches
......................................................................

WIP: tablet: acquire/release locks in batches

WIP because there are a few TODOs to fix

Prior to this patch, each row operation required a separate call into
the LockManager to acquire or release its corresponding row lock. This
required atomic operations on the LockManager, each of which would
bounce the cacheline containing the spinlock(s) between the prepare
thread (acquiring locks) and the apply threads (releasing locks). Atomic
operations on cachelines in remote cores are quite expensive, so this
caused quite a lot of CPU usage (locking related methods took more CPU
than the actual work on prepare/apply threads)

This patch changes the LockManager API somewhat to allow
acquiring/releasing the locks in bulk. WriteOp now contains a single
ScopedRowLock object to hold all of the required locks, instead of a
separate ScopedRowLock per row. The acquire/release paths are also
optimized now to get better instruction level parallelism in the
hashtable lookups -- prefetches are used to bring the appropriate cache
lines into CPU cache prior to reading them.

This patch also simplifies the locking to use a single lockmanager-wide lock
instead of per-bucket locks. Per-bucket locks would still be possible but
don't seem to be beneficial, and would complicate the codepath quite a bit.
Removing them also saves some significant memory from the LockManager hash
tables.

Benchmarked by running:

  $ perf stat ./build/release/bin/kudu tserver run -fs-wal-dir /tmp/ts \
    -enable_maintenance_manager=0 -unlock-unsafe-flags

and
  $ kudu perf loadgen localhost -num_rows_per_thread=10000000 -num_threads=8

This workload ends up somewhat client-bound on my machine, so wallclock time
isn't improved much, but the tserver CPU consumption is reduced by about 23%:

Before:

 Performance counter stats for './build/release/bin/kudu tserver run -fs-wal-dir /tmp/ts -enable_maintenance_manager=0 -unlock-unsafe-flags':

         177786.37 msec task-clock                #    7.648 CPUs utilized
            215350      context-switches          #    0.001 M/sec
             55653      cpu-migrations            #    0.313 K/sec
           3219417      page-faults               #    0.018 M/sec
      724369004287      cycles                    #    4.074 GHz                      (83.32%)
      142799406201      stalled-cycles-frontend   #   19.71% frontend cycles idle     (83.36%)
      109903159567      stalled-cycles-backend    #   15.17% backend cycles idle      (83.40%)
      719706215568      instructions              #    0.99  insn per cycle
                                                  #    0.20  stalled cycles per insn  (83.30%)
      131445739053      branches                  #  739.347 M/sec                    (83.31%)
         479779584      branch-misses             #    0.37% of all branches          (83.32%)

     165.187526000 seconds user
      12.866637000 seconds sys

After:

 Performance counter stats for './build/release/bin/kudu tserver run -fs-wal-dir /tmp/ts -enable_maintenance_manager=0 -unlock-unsafe-flags':

         145986.70 msec task-clock                #    6.063 CPUs utilized
            202600      context-switches          #    0.001 M/sec
             51442      cpu-migrations            #    0.352 K/sec
           3368490      page-faults               #    0.023 M/sec
      597915142173      cycles                    #    4.096 GHz                      (83.51%)
       58333772996      stalled-cycles-frontend   #    9.76% frontend cycles idle     (83.35%)
      104785221789      stalled-cycles-backend    #   17.53% backend cycles idle      (83.18%)
      690655964982      instructions              #    1.16  insn per cycle
                                                  #    0.15  stalled cycles per insn  (83.38%)
      126988529873      branches                  #  869.864 M/sec                    (83.40%)
         469031328      branch-misses             #    0.37% of all branches          (83.17%)

     134.072172000 seconds user
      12.192747000 seconds sys

Change-Id: I3cb724e953ecdf188a35181c2f91b721b3416524
---
M src/kudu/tablet/lock_manager-test.cc
M src/kudu/tablet/lock_manager.cc
M src/kudu/tablet/lock_manager.h
M src/kudu/tablet/ops/write_op.cc
M src/kudu/tablet/ops/write_op.h
M src/kudu/tablet/row_op.h
M src/kudu/tablet/tablet.cc
M src/kudu/tablet/tablet.h
8 files changed, 241 insertions(+), 180 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/69/16169/2
-- 
To view, visit http://gerrit.cloudera.org:8080/16169
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3cb724e953ecdf188a35181c2f91b721b3416524
Gerrit-Change-Number: 16169
Gerrit-PatchSet: 2
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <an...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)