You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Yifan Zhang (Code Review)" <ge...@cloudera.org> on 2020/07/29 12:14:56 UTC
[kudu-CR] [maintenance] use workload statistics to scale perf score of flushes/compactions
Hello Yingchun Lai, Kudu Jenkins, Andrew Wong,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/15995
to look at the new patch set (#12).
Change subject: [maintenance] use workload statistics to scale perf score of flushes/compactions
......................................................................
[maintenance] use workload statistics to scale perf score of flushes/compactions
When we consider the performance improvement brought by maintenance
operations, we could use workload statistics to find how 'hot' the
tablet has been in the last few minutes and perform maintenance ops
for 'hot' tablets in priority. This patch use recent read/write rate
of a tablet as a workload score, calculate a final perf score based on
a op's raw perf_improvement, the tablet's workload score and the table's
priority, so maintenance ops for a 'hot' tablet are more likely to launch.
In our use case, there is insert/update/delete traffic all the time,
but some tables may have more read traffic at some time, so we want to
dynamically adjust priorities of compaction/flush ops for different tables.
We tested this on a 6-node cluster and set tservers with configs:
-maintenance_manager_num_threads=1,
-workload_score_upper_bound=10,
and we run workloads with setting enable_workload_score_for_perf_improvement_ops
false and true to see whether it can improve performance.
We first insert 5,000,000,000 rows into table-C(256 tablets), and then
insert 200,000,000 rows into table-A(8 tablets) and table-B(8 tablets)
at the same time. Next we run different YCSB workloads on table-A and table-B,
all the tablets have some uncompacted rowsets at this time, but there is
no on-going workloads on table-C.
workload for table_A: Update heavy workload, scan/update ratio is 50/50
operationcount=10,000,000
requestdistribution=zipfian
maxscanlength=10
workload for table_B: Scan mostly workload, scan/insert ratio is 80/20.
operationcount=10,000,000
requestdistribution=zipfian
maxscanlength=10000
result:
measurement Before change After change
[table-A:UPDATE]AverageLatency(us) 9.46 3.84
[table-A:UPDATE]95thPercentileLatency(us) 12 7
[table-A:UPDATE]99thPercentileLatency(us) 19 13
[table-A:SCAN]AverageLatency(us) 2317 1419
[table-A:SCAN]95thPercentileLatency(us) 4847 2939
[table-A:SCAN]99thPercentileLatency(us) 10815 5703
[table-B:INSERT]AverageLatency(us) 16.11 16.54
[table-B:INSERT]95thPercentileLatency(us) 35 35
[table-B:INSERT]99thPercentileLatency(us) 58 56
[table-B:SCAN]AverageLatency(us) 6417 5545
[table-B:SCAN]95thPercentileLatency(us) 12463 10063
[table-B:SCAN]99thPercentileLatency(us) 18095 13511
We run these workloads 5 times, we can see 10%-30% reduction in scan
latency of table-B, and 38%-60% reduction in scan latency of table-A.
This patch also add 'Workload score' to tserver /maintenance-manager page[1]
so that we could adjust runtime flags based on current state.
[1] http://ww1.sinaimg.cn/large/9b7ebaddly1gh83qh2bkfj21eb0d3goy.jpg
Change-Id: Ie3afcc359002d1392164ba2fda885f8930ef8696
---
M src/kudu/tablet/tablet.cc
M src/kudu/tablet/tablet.h
M src/kudu/tablet/tablet_mm_ops.cc
M src/kudu/tablet/tablet_replica_mm_ops.cc
M src/kudu/tserver/tablet_server-test-base.cc
M src/kudu/tserver/tablet_server-test-base.h
M src/kudu/tserver/tablet_server-test.cc
M src/kudu/tserver/tserver_path_handlers.cc
M src/kudu/util/maintenance_manager-test.cc
M src/kudu/util/maintenance_manager.cc
M src/kudu/util/maintenance_manager.h
M src/kudu/util/maintenance_manager.proto
M www/maintenance-manager.mustache
13 files changed, 283 insertions(+), 29 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/95/15995/12
--
To view, visit http://gerrit.cloudera.org:8080/15995
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie3afcc359002d1392164ba2fda885f8930ef8696
Gerrit-Change-Number: 15995
Gerrit-PatchSet: 12
Gerrit-Owner: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Yingchun Lai <40...@qq.com>