You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Todd Lipcon (Code Review)" <ge...@cloudera.org> on 2017/11/01 20:02:58 UTC
[kudu-CR] compaction policy: out-of-line inessential fields in RowSetInfo
Hello zhen.zhang, Adar Dembo,
I'd like you to do a code review. Please visit
http://gerrit.cloudera.org:8080/8446
to review the following change.
Change subject: compaction_policy: out-of-line inessential fields in RowSetInfo
......................................................................
compaction_policy: out-of-line inessential fields in RowSetInfo
Based on analysis using toplev.py from pmu-tools[1] I found that the
compaction algorithm was memory-bandwidth bound for the benchmarks in
compaction_policy-test.
The core algorithms of the compaction policy only access a subset of the
fields in the object, so an easy fix to reduce memory bandwidth usage is
to out-of-line the infrequently accessed fields to a separate object. In
particular, there were two strings in RowSetInfo that are used rarely,
and sizeof(std::string) is 32 bytes. So moving them out of line reduces
sizeof(RowSetInfo) quite a bit.
Summary of improvements:
- 12% improvement in wall time
- L1 cache: 36% reduction in misses
- L2 cache: 66% reduction in misses
- L3 cache: 59% reduction in misses
Before:
Performance counter stats for 'build/latest/bin/compaction_policy-test --gtest_filter=*TinyOver*' (10 runs):
55,364,354 mem_load_retired.l1_miss ( +- 0.83% ) (66.24%)
1,201,317,760 mem_load_retired.l1_hit ( +- 0.32% ) (66.71%)
44,942,323 mem_load_retired.l2_miss ( +- 1.02% ) (67.08%)
10,873,016 mem_load_retired.l2_hit ( +- 3.48% ) (67.31%)
42,525 mem_load_retired.l3_miss ( +- 10.29% ) (67.00%)
44,167,947 mem_load_retired.l3_hit ( +- 1.49% ) (66.29%)
0.406484613 seconds time elapsed ( +- 1.22% )
After:
Performance counter stats for 'build/latest/bin/compaction_policy-test --gtest_filter=*TinyOver*' (10 runs):
35,557,941 mem_load_retired.l1_miss ( +- 0.69% ) (66.58%)
1,262,199,262 mem_load_retired.l1_hit ( +- 0.13% ) (66.58%)
15,424,777 mem_load_retired.l2_miss ( +- 4.25% ) (66.58%)
18,029,572 mem_load_retired.l2_hit ( +- 3.31% ) (67.04%)
17,235 mem_load_retired.l3_miss ( +- 7.93% ) (67.18%)
16,544,671 mem_load_retired.l3_hit ( +- 4.02% ) (66.73%)
0.359402346 seconds time elapsed ( +- 0.28% )
As this is the third in a set of patches for optimizations that were
concentrating on the "TinyOverlap" test case, I also verified before/after of
the "YCSB" compaction test case. This also showed about a 14% improvement
overall from this patch series.
ycsb at start:
Performance counter stats for 'build/latest/bin/compaction_policy-test --gtest_filter=*Ycsb*' (10 runs):
795.924340 task-clock (msec) # 0.998 CPUs utilized ( +- 0.53% )
27 context-switches # 0.034 K/sec ( +- 60.21% )
0 cpu-migrations # 0.000 K/sec ( +-100.00% )
2,672 page-faults # 0.003 M/sec ( +- 1.36% )
2,998,040,105 cycles # 3.767 GHz ( +- 0.35% )
7,457,801,099 instructions # 2.49 insn per cycle ( +- 0.00% )
1,212,318,353 branches # 1523.158 M/sec ( +- 0.00% )
9,741,331 branch-misses # 0.80% of all branches ( +- 0.03% )
0.797407974 seconds time elapsed ( +- 0.58% )
ycsb at end:
Performance counter stats for 'build/latest/bin/compaction_policy-test --gtest_filter=*Ycsb*' (10 runs):
686.208936 task-clock (msec) # 0.999 CPUs utilized ( +- 0.48% )
9 context-switches # 0.013 K/sec ( +- 54.78% )
0 cpu-migrations # 0.000 K/sec ( +-100.00% )
2,470 page-faults # 0.004 M/sec ( +- 1.21% )
2,567,417,451 cycles # 3.741 GHz ( +- 0.22% )
7,271,965,987 instructions # 2.83 insn per cycle ( +- 0.00% )
1,211,559,802 branches # 1765.584 M/sec ( +- 0.00% )
9,759,994 branch-misses # 0.81% of all branches ( +- 0.03% )
0.686853500 seconds time elapsed ( +- 0.49% )
[1] https://github.com/andikleen/pmu-tools/wiki/toplev-manual
Change-Id: I148b8c976577cbe0020ed887f3bc4c7a2cb2c3ae
---
M src/kudu/tablet/rowset_info.cc
M src/kudu/tablet/rowset_info.h
2 files changed, 32 insertions(+), 23 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/46/8446/1
--
To view, visit http://gerrit.cloudera.org:8080/8446
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I148b8c976577cbe0020ed887f3bc4c7a2cb2c3ae
Gerrit-Change-Number: 8446
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: zhen.zhang <zh...@xiaomi.com>
[kudu-CR] compaction policy: out-of-line inessential fields in RowSetInfo
Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/8446 )
Change subject: compaction_policy: out-of-line inessential fields in RowSetInfo
......................................................................
Patch Set 1:
(1 comment)
http://gerrit.cloudera.org:8080/#/c/8446/1/src/kudu/tablet/rowset_info.h
File src/kudu/tablet/rowset_info.h:
http://gerrit.cloudera.org:8080/#/c/8446/1/src/kudu/tablet/rowset_info.h@103
PS1, Line 103: // True if the RowSet has known bounds.
: // MemRowSets in particular do not.
This comment belongs above has_bounds.
--
To view, visit http://gerrit.cloudera.org:8080/8446
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I148b8c976577cbe0020ed887f3bc4c7a2cb2c3ae
Gerrit-Change-Number: 8446
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: zhen.zhang <zh...@xiaomi.com>
Gerrit-Comment-Date: Wed, 01 Nov 2017 20:53:09 +0000
Gerrit-HasComments: Yes
[kudu-CR] compaction policy: out-of-line inessential fields in RowSetInfo
Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/8446 )
Change subject: compaction_policy: out-of-line inessential fields in RowSetInfo
......................................................................
Patch Set 1:
I wanted to try out the "efficiency sanitizer" so I ran the tests prior to this fix with -fsanitize=efficiency-cache-frag and got the following results:
==8130== # 0: offset = 0, size = 8, count = 20000, type = %"class.kudu::tablet::RowSet"*
==8130== # 1: offset = 8, size = 4, count = 39998, type = i32
==8130== # 2: offset = 12, size = 4, count = 107698980, type = i32
==8130== # 3: offset = 16, size = 1, count = 39998, type = i8
==8130== # 4: offset = 24, size = 32, count = 210001, type = %"class.std::__cxx11::basic_string" = type { %"struct.std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_Alloc_hider", i64, %union.anon }
==8130== # 5: offset = 56, size = 32, count = 210001, type = %"class.std::__cxx11::basic_string" = type { %"struct.std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_Alloc_hider", i64, %union.anon }
==8130== # 6: offset = 88, size = 8, count = 300940510, type = double
==8130== # 7: offset = 96, size = 8, count = 200930507, type = double
==8130== # 8: offset = 104, size = 8, count = 89700226, type = double
From this it seems like the byte size int32 is also infrequently accessed and might be a candidate for out-of-lining.
--
To view, visit http://gerrit.cloudera.org:8080/8446
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I148b8c976577cbe0020ed887f3bc4c7a2cb2c3ae
Gerrit-Change-Number: 8446
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: zhen.zhang <zh...@xiaomi.com>
Gerrit-Comment-Date: Wed, 01 Nov 2017 21:37:46 +0000
Gerrit-HasComments: No
[kudu-CR] compaction policy: out-of-line inessential fields in RowSetInfo
Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/8446 )
Change subject: compaction_policy: out-of-line inessential fields in RowSetInfo
......................................................................
Patch Set 3: Code-Review+2
--
To view, visit http://gerrit.cloudera.org:8080/8446
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I148b8c976577cbe0020ed887f3bc4c7a2cb2c3ae
Gerrit-Change-Number: 8446
Gerrit-PatchSet: 3
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: zhen.zhang <zh...@xiaomi.com>
Gerrit-Comment-Date: Wed, 01 Nov 2017 23:31:30 +0000
Gerrit-HasComments: No
[kudu-CR] compaction policy: out-of-line inessential fields in RowSetInfo
Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/8446 )
Change subject: compaction_policy: out-of-line inessential fields in RowSetInfo
......................................................................
compaction_policy: out-of-line inessential fields in RowSetInfo
Based on analysis using toplev.py from pmu-tools[1] I found that the
compaction algorithm was memory-bandwidth bound for the benchmarks in
compaction_policy-test.
The core algorithms of the compaction policy only access a subset of
the fields in the object, meaning that some fields are accessed far
more frequently than others. This can also be illustrated by building
with -fsanitize=cache-efficiency-frag which generates the following
report for the RowSetInfo struct:
==8130== # 0: offset = 0, size = 8, count = 20000, type = %"class.kudu::tablet::RowSet"*
==8130== # 1: offset = 8, size = 4, count = 39998, type = i32
==8130== # 2: offset = 12, size = 4, count = 107698980, type = i32
==8130== # 3: offset = 16, size = 1, count = 39998, type = i8
==8130== # 4: offset = 24, size = 32, count = 210001, type = %"class.std::__cxx11::basic_string" = type { %"struct.std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_Alloc_hider", i64, %union.anon }
==8130== # 5: offset = 56, size = 32, count = 210001, type = %"class.std::__cxx11::basic_string" = type { %"struct.std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_Alloc_hider", i64, %union.anon }
==8130== # 6: offset = 88, size = 8, count = 300940510, type = double
==8130== # 7: offset = 96, size = 8, count = 200930507, type = double
==8130== # 8: offset = 104, size = 8, count = 89700226, type = double
Here we can see that the second i32 (size_mb) as well as the three
doubles (cdf_min, cdf_max, and density) are accessed orders of
magnitude more than the other fields.
An easy fix for this is to out-of-line the other fields into an ancillary
struct elsewhere on the heap, and store just a pointer to the struct.
This ensures that the frequently-accessed fields fit into a smaller
memory footprint and thus involve far fewer memory accesses.
I benchmarked using both the 'TinyOverlap' case and the 'Ycsb' case,
measuring CPU cache misses at each level along with total runtime.
TinyOverlap case
------------------
Before:
Performance counter stats for 'build/latest/bin/compaction_policy-test --gtest_filter=*TinyOver*' (10 runs):
55,773,441 mem_load_retired.l1_miss ( +- 0.91% ) (66.20%)
1,200,071,033 mem_load_retired.l1_hit ( +- 0.19% ) (66.51%)
44,324,227 mem_load_retired.l2_miss ( +- 1.21% ) (67.02%)
11,525,953 mem_load_retired.l2_hit ( +- 3.75% ) (67.34%)
42,298 mem_load_retired.l3_miss ( +- 8.56% ) (67.07%)
44,461,423 mem_load_retired.l3_hit ( +- 1.16% ) (66.44%)
0.399781701 seconds time elapsed ( +- 0.84% )
After:
Performance counter stats for 'build/latest/bin/compaction_policy-test --gtest_filter=*TinyOver*' (10 runs):
24,864,656 mem_load_retired.l1_miss ( +- 1.37% ) (66.51%)
1,261,166,288 mem_load_retired.l1_hit ( +- 0.13% ) (66.73%)
5,694,166 mem_load_retired.l2_miss ( +- 4.37% ) (66.75%)
18,221,204 mem_load_retired.l2_hit ( +- 1.13% ) (66.97%)
28,574 mem_load_retired.l3_miss ( +- 16.39% ) (67.10%)
5,829,783 mem_load_retired.l3_hit ( +- 5.19% ) (66.63%)
0.360860837 seconds time elapsed ( +- 0.35% )
Summary of improvements:
- ~10% improvement in wall time
- L1 cache: 45% reduction in misses
- L2 cache: 93% reduction in misses
- L3 cache: 68% reduction in misses
YCSB test case
---------------
Before:
Performance counter stats for 'build/latest/bin/compaction_policy-test --gtest_filter=*Ycsb*' (10 runs):
97,654,120 mem_load_retired.l1_miss ( +- 1.13% ) (66.41%)
2,205,803,278 mem_load_retired.l1_hit ( +- 0.13% ) (66.57%)
80,632,313 mem_load_retired.l2_miss ( +- 1.68% ) (66.73%)
16,742,677 mem_load_retired.l2_hit ( +- 3.77% ) (66.99%)
60,874 mem_load_retired.l3_miss ( +- 5.64% ) (67.02%)
82,209,502 mem_load_retired.l3_hit ( +- 1.08% ) (66.61%)
0.747246686 seconds time elapsed ( +- 0.49% )
After:
Performance counter stats for 'build/latest/bin/compaction_policy-test --gtest_filter=*Ycsb*' (10 runs):
39,568,838 mem_load_retired.l1_miss ( +- 0.94% ) (66.49%)
2,328,627,974 mem_load_retired.l1_hit ( +- 0.12% ) (66.67%)
6,673,404 mem_load_retired.l2_miss ( +- 3.58% ) (66.76%)
31,882,992 mem_load_retired.l2_hit ( +- 1.11% ) (66.95%)
59,963 mem_load_retired.l3_miss ( +- 10.28% ) (66.95%)
6,378,458 mem_load_retired.l3_hit ( +- 3.19% ) (66.58%)
0.676610184 seconds time elapsed ( +- 0.44% )
Summary:
- ~10% improvement in wall time
- L1 cache: 59% reduction in misses
- L2 cache: 92% reduction in misses
- L3 cache: 1% reduction (apparently this workload always fit well
in L3)
As this is the third in a set of patches for optimizations that were
concentrating on the "TinyOverlap" test case, I also verified
before/after of the "YCSB" compaction test case since the beginning of
series. This also showed about a 14% improvement:
ycsb at start:
Performance counter stats for 'build/latest/bin/compaction_policy-test --gtest_filter=*Ycsb*' (10 runs):
795.924340 task-clock (msec) # 0.998 CPUs utilized ( +- 0.53% )
27 context-switches # 0.034 K/sec ( +- 60.21% )
0 cpu-migrations # 0.000 K/sec ( +-100.00% )
2,672 page-faults # 0.003 M/sec ( +- 1.36% )
2,998,040,105 cycles # 3.767 GHz ( +- 0.35% )
7,457,801,099 instructions # 2.49 insn per cycle ( +- 0.00% )
1,212,318,353 branches # 1523.158 M/sec ( +- 0.00% )
9,741,331 branch-misses # 0.80% of all branches ( +- 0.03% )
0.797407974 seconds time elapsed ( +- 0.58% )
ycsb at end:
Performance counter stats for 'build/latest/bin/compaction_policy-test --gtest_filter=*Ycsb*' (10 runs):
681.047998 task-clock (msec) # 0.999 CPUs utilized ( +- 0.52% )
2 context-switches # 0.003 K/sec ( +- 46.55% )
0 cpu-migrations # 0.000 K/sec
2,474 page-faults # 0.004 M/sec ( +- 1.22% )
2,581,500,560 cycles # 3.790 GHz ( +- 0.42% )
7,267,792,155 instructions # 2.82 insn per cycle ( +- 0.00% )
1,210,494,552 branches # 1777.400 M/sec ( +- 0.00% )
9,790,631 branch-misses # 0.81% of all branches ( +- 0.02% )
0.681425636 seconds time elapsed ( +- 0.52% )
[1] https://github.com/andikleen/pmu-tools/wiki/toplev-manual
Change-Id: I148b8c976577cbe0020ed887f3bc4c7a2cb2c3ae
Reviewed-on: http://gerrit.cloudera.org:8080/8446
Tested-by: Kudu Jenkins
Reviewed-by: Adar Dembo <ad...@cloudera.com>
---
M src/kudu/tablet/rowset_info.cc
M src/kudu/tablet/rowset_info.h
M src/kudu/tablet/rowset_tree.cc
3 files changed, 46 insertions(+), 30 deletions(-)
Approvals:
Kudu Jenkins: Verified
Adar Dembo: Looks good to me, approved
--
To view, visit http://gerrit.cloudera.org:8080/8446
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I148b8c976577cbe0020ed887f3bc4c7a2cb2c3ae
Gerrit-Change-Number: 8446
Gerrit-PatchSet: 4
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: zhen.zhang <zh...@xiaomi.com>
[kudu-CR] compaction policy: out-of-line inessential fields in RowSetInfo
Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Hello Dan Burkert, Kudu Jenkins, zhen.zhang, Adar Dembo,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/8446
to look at the new patch set (#3).
Change subject: compaction_policy: out-of-line inessential fields in RowSetInfo
......................................................................
compaction_policy: out-of-line inessential fields in RowSetInfo
Based on analysis using toplev.py from pmu-tools[1] I found that the
compaction algorithm was memory-bandwidth bound for the benchmarks in
compaction_policy-test.
The core algorithms of the compaction policy only access a subset of
the fields in the object, meaning that some fields are accessed far
more frequently than others. This can also be illustrated by building
with -fsanitize=cache-efficiency-frag which generates the following
report for the RowSetInfo struct:
==8130== # 0: offset = 0, size = 8, count = 20000, type = %"class.kudu::tablet::RowSet"*
==8130== # 1: offset = 8, size = 4, count = 39998, type = i32
==8130== # 2: offset = 12, size = 4, count = 107698980, type = i32
==8130== # 3: offset = 16, size = 1, count = 39998, type = i8
==8130== # 4: offset = 24, size = 32, count = 210001, type = %"class.std::__cxx11::basic_string" = type { %"struct.std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_Alloc_hider", i64, %union.anon }
==8130== # 5: offset = 56, size = 32, count = 210001, type = %"class.std::__cxx11::basic_string" = type { %"struct.std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_Alloc_hider", i64, %union.anon }
==8130== # 6: offset = 88, size = 8, count = 300940510, type = double
==8130== # 7: offset = 96, size = 8, count = 200930507, type = double
==8130== # 8: offset = 104, size = 8, count = 89700226, type = double
Here we can see that the second i32 (size_mb) as well as the three
doubles (cdf_min, cdf_max, and density) are accessed orders of
magnitude more than the other fields.
An easy fix for this is to out-of-line the other fields into an ancillary
struct elsewhere on the heap, and store just a pointer to the struct.
This ensures that the frequently-accessed fields fit into a smaller
memory footprint and thus involve far fewer memory accesses.
I benchmarked using both the 'TinyOverlap' case and the 'Ycsb' case,
measuring CPU cache misses at each level along with total runtime.
TinyOverlap case
------------------
Before:
Performance counter stats for 'build/latest/bin/compaction_policy-test --gtest_filter=*TinyOver*' (10 runs):
55,773,441 mem_load_retired.l1_miss ( +- 0.91% ) (66.20%)
1,200,071,033 mem_load_retired.l1_hit ( +- 0.19% ) (66.51%)
44,324,227 mem_load_retired.l2_miss ( +- 1.21% ) (67.02%)
11,525,953 mem_load_retired.l2_hit ( +- 3.75% ) (67.34%)
42,298 mem_load_retired.l3_miss ( +- 8.56% ) (67.07%)
44,461,423 mem_load_retired.l3_hit ( +- 1.16% ) (66.44%)
0.399781701 seconds time elapsed ( +- 0.84% )
After:
Performance counter stats for 'build/latest/bin/compaction_policy-test --gtest_filter=*TinyOver*' (10 runs):
24,864,656 mem_load_retired.l1_miss ( +- 1.37% ) (66.51%)
1,261,166,288 mem_load_retired.l1_hit ( +- 0.13% ) (66.73%)
5,694,166 mem_load_retired.l2_miss ( +- 4.37% ) (66.75%)
18,221,204 mem_load_retired.l2_hit ( +- 1.13% ) (66.97%)
28,574 mem_load_retired.l3_miss ( +- 16.39% ) (67.10%)
5,829,783 mem_load_retired.l3_hit ( +- 5.19% ) (66.63%)
0.360860837 seconds time elapsed ( +- 0.35% )
Summary of improvements:
- ~10% improvement in wall time
- L1 cache: 45% reduction in misses
- L2 cache: 93% reduction in misses
- L3 cache: 68% reduction in misses
YCSB test case
---------------
Before:
Performance counter stats for 'build/latest/bin/compaction_policy-test --gtest_filter=*Ycsb*' (10 runs):
97,654,120 mem_load_retired.l1_miss ( +- 1.13% ) (66.41%)
2,205,803,278 mem_load_retired.l1_hit ( +- 0.13% ) (66.57%)
80,632,313 mem_load_retired.l2_miss ( +- 1.68% ) (66.73%)
16,742,677 mem_load_retired.l2_hit ( +- 3.77% ) (66.99%)
60,874 mem_load_retired.l3_miss ( +- 5.64% ) (67.02%)
82,209,502 mem_load_retired.l3_hit ( +- 1.08% ) (66.61%)
0.747246686 seconds time elapsed ( +- 0.49% )
After:
Performance counter stats for 'build/latest/bin/compaction_policy-test --gtest_filter=*Ycsb*' (10 runs):
39,568,838 mem_load_retired.l1_miss ( +- 0.94% ) (66.49%)
2,328,627,974 mem_load_retired.l1_hit ( +- 0.12% ) (66.67%)
6,673,404 mem_load_retired.l2_miss ( +- 3.58% ) (66.76%)
31,882,992 mem_load_retired.l2_hit ( +- 1.11% ) (66.95%)
59,963 mem_load_retired.l3_miss ( +- 10.28% ) (66.95%)
6,378,458 mem_load_retired.l3_hit ( +- 3.19% ) (66.58%)
0.676610184 seconds time elapsed ( +- 0.44% )
Summary:
- ~10% improvement in wall time
- L1 cache: 59% reduction in misses
- L2 cache: 92% reduction in misses
- L3 cache: 1% reduction (apparently this workload always fit well
in L3)
As this is the third in a set of patches for optimizations that were
concentrating on the "TinyOverlap" test case, I also verified
before/after of the "YCSB" compaction test case since the beginning of
series. This also showed about a 14% improvement:
ycsb at start:
Performance counter stats for 'build/latest/bin/compaction_policy-test --gtest_filter=*Ycsb*' (10 runs):
795.924340 task-clock (msec) # 0.998 CPUs utilized ( +- 0.53% )
27 context-switches # 0.034 K/sec ( +- 60.21% )
0 cpu-migrations # 0.000 K/sec ( +-100.00% )
2,672 page-faults # 0.003 M/sec ( +- 1.36% )
2,998,040,105 cycles # 3.767 GHz ( +- 0.35% )
7,457,801,099 instructions # 2.49 insn per cycle ( +- 0.00% )
1,212,318,353 branches # 1523.158 M/sec ( +- 0.00% )
9,741,331 branch-misses # 0.80% of all branches ( +- 0.03% )
0.797407974 seconds time elapsed ( +- 0.58% )
ycsb at end:
Performance counter stats for 'build/latest/bin/compaction_policy-test --gtest_filter=*Ycsb*' (10 runs):
681.047998 task-clock (msec) # 0.999 CPUs utilized ( +- 0.52% )
2 context-switches # 0.003 K/sec ( +- 46.55% )
0 cpu-migrations # 0.000 K/sec
2,474 page-faults # 0.004 M/sec ( +- 1.22% )
2,581,500,560 cycles # 3.790 GHz ( +- 0.42% )
7,267,792,155 instructions # 2.82 insn per cycle ( +- 0.00% )
1,210,494,552 branches # 1777.400 M/sec ( +- 0.00% )
9,790,631 branch-misses # 0.81% of all branches ( +- 0.02% )
0.681425636 seconds time elapsed ( +- 0.52% )
[1] https://github.com/andikleen/pmu-tools/wiki/toplev-manual
Change-Id: I148b8c976577cbe0020ed887f3bc4c7a2cb2c3ae
---
M src/kudu/tablet/rowset_info.cc
M src/kudu/tablet/rowset_info.h
M src/kudu/tablet/rowset_tree.cc
3 files changed, 46 insertions(+), 30 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/46/8446/3
--
To view, visit http://gerrit.cloudera.org:8080/8446
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I148b8c976577cbe0020ed887f3bc4c7a2cb2c3ae
Gerrit-Change-Number: 8446
Gerrit-PatchSet: 3
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: zhen.zhang <zh...@xiaomi.com>
[kudu-CR] compaction policy: out-of-line inessential fields in RowSetInfo
Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Hello Dan Burkert, Kudu Jenkins, zhen.zhang, Adar Dembo,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/8446
to look at the new patch set (#2).
Change subject: compaction_policy: out-of-line inessential fields in RowSetInfo
......................................................................
compaction_policy: out-of-line inessential fields in RowSetInfo
Based on analysis using toplev.py from pmu-tools[1] I found that the
compaction algorithm was memory-bandwidth bound for the benchmarks in
compaction_policy-test.
The core algorithms of the compaction policy only access a subset of
the fields in the object, meaning that some fields are accessed far
more frequently than others. This can also be illustrated by building
with -fsanitize=cache-efficiency-frag which generates the following
report for the RowSetInfo struct:
==8130== # 0: offset = 0, size = 8, count = 20000, type = %"class.kudu::tablet::RowSet"*
==8130== # 1: offset = 8, size = 4, count = 39998, type = i32
==8130== # 2: offset = 12, size = 4, count = 107698980, type = i32
==8130== # 3: offset = 16, size = 1, count = 39998, type = i8
==8130== # 4: offset = 24, size = 32, count = 210001, type = %"class.std::__cxx11::basic_string" = type { %"struct.std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_Alloc_hider", i64, %union.anon }
==8130== # 5: offset = 56, size = 32, count = 210001, type = %"class.std::__cxx11::basic_string" = type { %"struct.std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_Alloc_hider", i64, %union.anon }
==8130== # 6: offset = 88, size = 8, count = 300940510, type = double
==8130== # 7: offset = 96, size = 8, count = 200930507, type = double
==8130== # 8: offset = 104, size = 8, count = 89700226, type = double
Here we can see that the second i32 (size_mb) as well as the three
doubles (cdf_min, cdf_max, and density) are accessed orders of
magnitude more than the other fields.
An easy fix for this is to out-of-line the other fields into an ancillary
struct elsewhere on the heap, and store just a pointer to the struct.
This ensures that the frequently-accessed fields fit into a smaller
memory footprint and thus involve far fewer memory accesses.
I benchmarked using both the 'TinyOverlap' case and the 'Ycsb' case,
measuring CPU cache misses at each level along with total runtime.
TinyOverlap case
------------------
Before:
Performance counter stats for 'build/latest/bin/compaction_policy-test --gtest_filter=*TinyOver*' (10 runs):
55,773,441 mem_load_retired.l1_miss ( +- 0.91% ) (66.20%)
1,200,071,033 mem_load_retired.l1_hit ( +- 0.19% ) (66.51%)
44,324,227 mem_load_retired.l2_miss ( +- 1.21% ) (67.02%)
11,525,953 mem_load_retired.l2_hit ( +- 3.75% ) (67.34%)
42,298 mem_load_retired.l3_miss ( +- 8.56% ) (67.07%)
44,461,423 mem_load_retired.l3_hit ( +- 1.16% ) (66.44%)
0.399781701 seconds time elapsed ( +- 0.84% )
After:
Performance counter stats for 'build/latest/bin/compaction_policy-test --gtest_filter=*TinyOver*' (10 runs):
24,864,656 mem_load_retired.l1_miss ( +- 1.37% ) (66.51%)
1,261,166,288 mem_load_retired.l1_hit ( +- 0.13% ) (66.73%)
5,694,166 mem_load_retired.l2_miss ( +- 4.37% ) (66.75%)
18,221,204 mem_load_retired.l2_hit ( +- 1.13% ) (66.97%)
28,574 mem_load_retired.l3_miss ( +- 16.39% ) (67.10%)
5,829,783 mem_load_retired.l3_hit ( +- 5.19% ) (66.63%)
0.360860837 seconds time elapsed ( +- 0.35% )
Summary of improvements:
- ~10% improvement in wall time
- L1 cache: 45% reduction in misses
- L2 cache: 93% reduction in misses
- L3 cache: 68% reduction in misses
YCSB test case
---------------
Before:
Performance counter stats for 'build/latest/bin/compaction_policy-test --gtest_filter=*Ycsb*' (10 runs):
97,654,120 mem_load_retired.l1_miss ( +- 1.13% ) (66.41%)
2,205,803,278 mem_load_retired.l1_hit ( +- 0.13% ) (66.57%)
80,632,313 mem_load_retired.l2_miss ( +- 1.68% ) (66.73%)
16,742,677 mem_load_retired.l2_hit ( +- 3.77% ) (66.99%)
60,874 mem_load_retired.l3_miss ( +- 5.64% ) (67.02%)
82,209,502 mem_load_retired.l3_hit ( +- 1.08% ) (66.61%)
0.747246686 seconds time elapsed ( +- 0.49% )
After:
Performance counter stats for 'build/latest/bin/compaction_policy-test --gtest_filter=*Ycsb*' (10 runs):
39,568,838 mem_load_retired.l1_miss ( +- 0.94% ) (66.49%)
2,328,627,974 mem_load_retired.l1_hit ( +- 0.12% ) (66.67%)
6,673,404 mem_load_retired.l2_miss ( +- 3.58% ) (66.76%)
31,882,992 mem_load_retired.l2_hit ( +- 1.11% ) (66.95%)
59,963 mem_load_retired.l3_miss ( +- 10.28% ) (66.95%)
6,378,458 mem_load_retired.l3_hit ( +- 3.19% ) (66.58%)
0.676610184 seconds time elapsed ( +- 0.44% )
Summary:
- ~10% improvement in wall time
- L1 cache: 59% reduction in misses
- L2 cache: 92% reduction in misses
- L3 cache: 1% reduction (apparently this workload always fit well
in L3)
As this is the third in a set of patches for optimizations that were
concentrating on the "TinyOverlap" test case, I also verified
before/after of the "YCSB" compaction test case since the beginning of
series. This also showed about a 14% improvement:
ycsb at start:
Performance counter stats for 'build/latest/bin/compaction_policy-test --gtest_filter=*Ycsb*' (10 runs):
795.924340 task-clock (msec) # 0.998 CPUs utilized ( +- 0.53% )
27 context-switches # 0.034 K/sec ( +- 60.21% )
0 cpu-migrations # 0.000 K/sec ( +-100.00% )
2,672 page-faults # 0.003 M/sec ( +- 1.36% )
2,998,040,105 cycles # 3.767 GHz ( +- 0.35% )
7,457,801,099 instructions # 2.49 insn per cycle ( +- 0.00% )
1,212,318,353 branches # 1523.158 M/sec ( +- 0.00% )
9,741,331 branch-misses # 0.80% of all branches ( +- 0.03% )
0.797407974 seconds time elapsed ( +- 0.58% )
ycsb at end:
Performance counter stats for 'build/latest/bin/compaction_policy-test --gtest_filter=*Ycsb*' (10 runs):
681.047998 task-clock (msec) # 0.999 CPUs utilized ( +- 0.52% )
2 context-switches # 0.003 K/sec ( +- 46.55% )
0 cpu-migrations # 0.000 K/sec
2,474 page-faults # 0.004 M/sec ( +- 1.22% )
2,581,500,560 cycles # 3.790 GHz ( +- 0.42% )
7,267,792,155 instructions # 2.82 insn per cycle ( +- 0.00% )
1,210,494,552 branches # 1777.400 M/sec ( +- 0.00% )
9,790,631 branch-misses # 0.81% of all branches ( +- 0.02% )
0.681425636 seconds time elapsed ( +- 0.52% )
[1] https://github.com/andikleen/pmu-tools/wiki/toplev-manual
Change-Id: I148b8c976577cbe0020ed887f3bc4c7a2cb2c3ae
---
M src/kudu/tablet/rowset_info.cc
M src/kudu/tablet/rowset_info.h
M src/kudu/tablet/rowset_tree.cc
3 files changed, 46 insertions(+), 30 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/46/8446/2
--
To view, visit http://gerrit.cloudera.org:8080/8446
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I148b8c976577cbe0020ed887f3bc4c7a2cb2c3ae
Gerrit-Change-Number: 8446
Gerrit-PatchSet: 2
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: zhen.zhang <zh...@xiaomi.com>
[kudu-CR] compaction policy: out-of-line inessential fields in RowSetInfo
Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/8446 )
Change subject: compaction_policy: out-of-line inessential fields in RowSetInfo
......................................................................
Patch Set 1:
(1 comment)
http://gerrit.cloudera.org:8080/#/c/8446/1/src/kudu/tablet/rowset_info.h
File src/kudu/tablet/rowset_info.h:
http://gerrit.cloudera.org:8080/#/c/8446/1/src/kudu/tablet/rowset_info.h@109
PS1, Line 109: const std::shared_ptr<ExtraData> extra_;
> Why not unique_ptr, which is only 8 bytes and no atomic ops?
I tried that first, but we actually copy RowSetInfo objects at some point in the algorithm so that we end up with two separate arrays (one sorted by min_key and one by max_key). Despite doubling memory footprint it results in a more sequential access pattern so I think it's a win to do those copies (though I didn't try changing to pointers there)
--
To view, visit http://gerrit.cloudera.org:8080/8446
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I148b8c976577cbe0020ed887f3bc4c7a2cb2c3ae
Gerrit-Change-Number: 8446
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: zhen.zhang <zh...@xiaomi.com>
Gerrit-Comment-Date: Wed, 01 Nov 2017 21:21:50 +0000
Gerrit-HasComments: Yes
[kudu-CR] compaction policy: out-of-line inessential fields in RowSetInfo
Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/8446 )
Change subject: compaction_policy: out-of-line inessential fields in RowSetInfo
......................................................................
Patch Set 2:
(1 comment)
http://gerrit.cloudera.org:8080/#/c/8446/2/src/kudu/tablet/rowset_info.cc
File src/kudu/tablet/rowset_info.cc:
http://gerrit.cloudera.org:8080/#/c/8446/2/src/kudu/tablet/rowset_info.cc@266
PS2, Line 266: size_bytes()
Nit: use extra_->size_bytes here, to avoid having to jump back and forth between the header and this file.
I figured it's OK since the args to GetBounds() are specified directly instead of by calling accessors.
--
To view, visit http://gerrit.cloudera.org:8080/8446
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I148b8c976577cbe0020ed887f3bc4c7a2cb2c3ae
Gerrit-Change-Number: 8446
Gerrit-PatchSet: 2
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: zhen.zhang <zh...@xiaomi.com>
Gerrit-Comment-Date: Wed, 01 Nov 2017 22:27:30 +0000
Gerrit-HasComments: Yes
[kudu-CR] compaction policy: out-of-line inessential fields in RowSetInfo
Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/8446 )
Change subject: compaction_policy: out-of-line inessential fields in RowSetInfo
......................................................................
Patch Set 1:
(1 comment)
This might be a fun benchmark to play with the new EfficiencySanitizer tool in llvm as well
http://gerrit.cloudera.org:8080/#/c/8446/1/src/kudu/tablet/rowset_info.h
File src/kudu/tablet/rowset_info.h:
http://gerrit.cloudera.org:8080/#/c/8446/1/src/kudu/tablet/rowset_info.h@109
PS1, Line 109: const std::shared_ptr<ExtraData> extra_;
perhaps we could use RefCounted here instead of shared_ptr, since iirc shared_ptr is 16 bytes whereas RefCounted is only 8 (and we'd avoid any atomic ops)
--
To view, visit http://gerrit.cloudera.org:8080/8446
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I148b8c976577cbe0020ed887f3bc4c7a2cb2c3ae
Gerrit-Change-Number: 8446
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: zhen.zhang <zh...@xiaomi.com>
Gerrit-Comment-Date: Wed, 01 Nov 2017 20:07:40 +0000
Gerrit-HasComments: Yes
[kudu-CR] compaction policy: out-of-line inessential fields in RowSetInfo
Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has posted comments on this change. ( http://gerrit.cloudera.org:8080/8446 )
Change subject: compaction_policy: out-of-line inessential fields in RowSetInfo
......................................................................
Patch Set 1:
(1 comment)
http://gerrit.cloudera.org:8080/#/c/8446/1/src/kudu/tablet/rowset_info.h
File src/kudu/tablet/rowset_info.h:
http://gerrit.cloudera.org:8080/#/c/8446/1/src/kudu/tablet/rowset_info.h@109
PS1, Line 109: const std::shared_ptr<ExtraData> extra_;
> perhaps we could use RefCounted here instead of shared_ptr, since iirc sha
Why not unique_ptr, which is only 8 bytes and no atomic ops?
--
To view, visit http://gerrit.cloudera.org:8080/8446
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I148b8c976577cbe0020ed887f3bc4c7a2cb2c3ae
Gerrit-Change-Number: 8446
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: zhen.zhang <zh...@xiaomi.com>
Gerrit-Comment-Date: Wed, 01 Nov 2017 21:08:18 +0000
Gerrit-HasComments: Yes