You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kudu.apache.org by "Todd Lipcon (Code Review)" <ge...@cloudera.org> on 2016/03/01 22:54:23 UTC
[kudu-CR] Submit ProbeStat metrics only once per batch
Hello David Ribeiro Alves, Jean-Daniel Cryans,
I'd like you to do a code review. Please visit
http://gerrit.cloudera.org:8080/2377
to review the following change.
Change subject: Submit ProbeStat metrics only once per batch
......................................................................
Submit ProbeStat metrics only once per batch
This is a small optimization to the gathering of the probe statistics for write
operations. We previously collected these metrics on every operation, which
cost us several atomic instructions for incrementing the metrics, as well as a
TRACE entry for each.
Now, we collect the stats for each operation into an arena-allocated array, and
then batch them together before submitting to the tablet metrics. We also now
only log a single trace entry for the probe stats rather than one per row,
making the traces much more manageable.
The new code uses the transaction state's arena for allocation to avoid adding
any extra allocator contention, which is already fairly measurable in write
workloads.
To measure the impact, I temporarily added an 'exit(0)' after the insert threads
finish in full_stack-insert-scan-test and ran:
$ KUDU_ALLOW_SLOW_TESTS=1 perf record \
./build/latest/bin/full_stack-insert-scan-test \
--gtest_filter=\*WithDiskStress\* --inserts_per_client=2000000 -rows_per_batch=1000
before and after the patch. The patch had the following effect:
- functions related to tracing (SubstituteToBuffer, SubstitutedSize,
SubstituteAndTrace) reduced from 3.83% of cycles to 0.05% of cycles
- HdrHistogram::IncrementBy(...) reduced from 1.4% of cycles to 0.03% of cycles
- LongAdder::IncrementBy(...) reduced from 1.52% of cycles to 0.65% of cycles
'perf stat' shows an even better reduction in cycles compared to summing up the above,
probably due to fewer cross-CPU cache invalidations, allocator pressure from the traces,
etc:
without patch:
347644.057631 task-clock (msec) # 6.549 CPUs utilized
341,767 context-switches # 0.983 K/sec
57,828 cpu-migrations # 0.166 K/sec
506,239 page-faults # 0.001 M/sec
779,116,782,135 cycles # 2.241 GHz
<not supported> stalled-cycles-frontend
<not supported> stalled-cycles-backend
745,063,382,648 instructions # 0.96 insns per cycle
134,030,182,890 branches # 385.539 M/sec
689,148,045 branch-misses # 0.51% of all branches
53.079492439 seconds time elapsed
with patch:
244940.029112 task-clock (msec) # 5.706 CPUs utilized
290,294 context-switches # 0.001 M/sec
54,321 cpu-migrations # 0.222 K/sec
818,060 page-faults # 0.003 M/sec
637,474,442,303 cycles # 2.603 GHz
<not supported> stalled-cycles-frontend
<not supported> stalled-cycles-backend
609,976,573,012 instructions # 0.96 insns per cycle
106,824,660,658 branches # 436.126 M/sec
383,546,319 branch-misses # 0.36% of all branches
42.928344981 seconds time elapsed
Change-Id: I9609ea01375be745a82105f845a21cd7829b3a45
---
M src/kudu/tablet/tablet.cc
M src/kudu/tablet/tablet.h
M src/kudu/tablet/tablet_bootstrap.cc
M src/kudu/tablet/tablet_metrics.cc
M src/kudu/tablet/tablet_metrics.h
5 files changed, 95 insertions(+), 53 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/77/2377/1
--
To view, visit http://gerrit.cloudera.org:8080/2377
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: newchange
Gerrit-Change-Id: I9609ea01375be745a82105f845a21cd7829b3a45
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: David Ribeiro Alves <da...@cloudera.com>
Gerrit-Reviewer: Jean-Daniel Cryans