You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Yifan Zhang (Code Review)" <ge...@cloudera.org> on 2020/07/29 12:14:56 UTC

[kudu-CR] [maintenance] use workload statistics to scale perf score of flushes/compactions

Hello Yingchun Lai, Kudu Jenkins, Andrew Wong, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/15995

to look at the new patch set (#12).

Change subject: [maintenance] use workload statistics to scale perf score of flushes/compactions
......................................................................

[maintenance] use workload statistics to scale perf score of flushes/compactions

When we consider the performance improvement brought by maintenance
operations, we could use workload statistics to find how 'hot' the
tablet has been in the last few minutes and perform maintenance ops
for 'hot' tablets in priority. This patch use recent read/write rate
of a tablet as a workload score, calculate a final perf score based on
a op's raw perf_improvement, the tablet's workload score and the table's
priority, so maintenance ops for a 'hot' tablet are more likely to launch.

In our use case, there is insert/update/delete traffic all the time,
but some tables may have more read traffic at some time, so we want to
dynamically adjust priorities of compaction/flush ops for different tables.

We tested this on a 6-node cluster and set tservers with configs:
-maintenance_manager_num_threads=1,
-workload_score_upper_bound=10,
and we run workloads with setting enable_workload_score_for_perf_improvement_ops
false and true to see whether it can improve performance.

We first insert 5,000,000,000 rows into table-C(256 tablets), and then
insert 200,000,000 rows into table-A(8 tablets) and table-B(8 tablets)
at the same time. Next we run different YCSB workloads on table-A and table-B,
all the tablets have some uncompacted rowsets at this time, but there is
no on-going workloads on table-C.

workload for table_A: Update heavy workload, scan/update ratio is 50/50
  operationcount=10,000,000
  requestdistribution=zipfian
  maxscanlength=10

workload for table_B: Scan mostly workload, scan/insert ratio is 80/20.
  operationcount=10,000,000
  requestdistribution=zipfian
  maxscanlength=10000

result:
measurement                                 Before change     After change
[table-A:UPDATE]AverageLatency(us)          9.46              3.84
[table-A:UPDATE]95thPercentileLatency(us)   12                7
[table-A:UPDATE]99thPercentileLatency(us)   19                13
[table-A:SCAN]AverageLatency(us)            2317              1419
[table-A:SCAN]95thPercentileLatency(us)     4847              2939
[table-A:SCAN]99thPercentileLatency(us)     10815             5703
[table-B:INSERT]AverageLatency(us)          16.11             16.54
[table-B:INSERT]95thPercentileLatency(us)   35                35
[table-B:INSERT]99thPercentileLatency(us)   58                56
[table-B:SCAN]AverageLatency(us)            6417              5545
[table-B:SCAN]95thPercentileLatency(us)     12463             10063
[table-B:SCAN]99thPercentileLatency(us)     18095             13511

We run these workloads 5 times, we can see 10%-30% reduction in scan
latency of table-B, and 38%-60% reduction in scan latency of table-A.

This patch also add 'Workload score' to tserver /maintenance-manager page[1]
so that we could adjust runtime flags based on current state.

[1] http://ww1.sinaimg.cn/large/9b7ebaddly1gh83qh2bkfj21eb0d3goy.jpg

Change-Id: Ie3afcc359002d1392164ba2fda885f8930ef8696
---
M src/kudu/tablet/tablet.cc
M src/kudu/tablet/tablet.h
M src/kudu/tablet/tablet_mm_ops.cc
M src/kudu/tablet/tablet_replica_mm_ops.cc
M src/kudu/tserver/tablet_server-test-base.cc
M src/kudu/tserver/tablet_server-test-base.h
M src/kudu/tserver/tablet_server-test.cc
M src/kudu/tserver/tserver_path_handlers.cc
M src/kudu/util/maintenance_manager-test.cc
M src/kudu/util/maintenance_manager.cc
M src/kudu/util/maintenance_manager.h
M src/kudu/util/maintenance_manager.proto
M www/maintenance-manager.mustache
13 files changed, 283 insertions(+), 29 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/95/15995/12
-- 
To view, visit http://gerrit.cloudera.org:8080/15995
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie3afcc359002d1392164ba2fda885f8930ef8696
Gerrit-Change-Number: 15995
Gerrit-PatchSet: 12
Gerrit-Owner: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Yingchun Lai <40...@qq.com>