You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@kudu.apache.org by to...@apache.org on 2016/01/22 02:19:03 UTC
[1/3] incubator-kudu git commit: More mods to the YCM config file
Repository: incubator-kudu
Updated Branches:
refs/heads/master ed01385a7 -> 173e1fde4
More mods to the YCM config file
We were missing the non-installed headers, like gflags
Change-Id: Id22ad5309ff36d5c65433e37f0389c1dbf1cf918
Reviewed-on: http://gerrit.cloudera.org:8080/1850
Reviewed-by: Todd Lipcon <to...@apache.org>
Tested-by: Mike Percy <mp...@cloudera.com>
Project: http://git-wip-us.apache.org/repos/asf/incubator-kudu/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-kudu/commit/21a61646
Tree: http://git-wip-us.apache.org/repos/asf/incubator-kudu/tree/21a61646
Diff: http://git-wip-us.apache.org/repos/asf/incubator-kudu/diff/21a61646
Branch: refs/heads/master
Commit: 21a61646af60832c33f472986c2699cc3eeb2bc6
Parents: ed01385
Author: Mike Percy <mp...@cloudera.com>
Authored: Wed Jan 20 17:10:14 2016 -0800
Committer: Mike Percy <mp...@cloudera.com>
Committed: Fri Jan 22 00:15:49 2016 +0000
----------------------------------------------------------------------
.ycm_extra_conf.py | 2 ++
1 file changed, 2 insertions(+)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/incubator-kudu/blob/21a61646/.ycm_extra_conf.py
----------------------------------------------------------------------
diff --git a/.ycm_extra_conf.py b/.ycm_extra_conf.py
index 7925b1b..c4a3c89 100644
--- a/.ycm_extra_conf.py
+++ b/.ycm_extra_conf.py
@@ -65,6 +65,8 @@ flags = [
'src',
'-isystem',
'thirdparty/installed/include',
+'-isystem',
+'thirdparty/installed-deps/include',
]
# Set this to the absolute path to the folder (NOT the file!) containing the
[3/3] incubator-kudu git commit: cbtree: some optimizations to make
back ground since gcc 4.4
Posted by to...@apache.org.
cbtree: some optimizations to make back ground since gcc 4.4
Benchmarks indicated that cbtree-test got slower when we upgraded to gcc
4.9:
4.4 (pre-C++11):
I0120 14:04:21.391079 1982 cbtree-test.cc:730] Time spent Insert 4000000 keys: real 2.934s user 2.868s sys 0.064s
I0120 14:04:25.186092 1982 cbtree-test.cc:730] Time spent Insert 4000000 keys: real 2.856s user 2.852s sys 0.004s
I0120 14:04:28.971114 1982 cbtree-test.cc:730] Time spent Insert 4000000 keys: real 2.852s user 2.856s sys 0.000s
I0120 14:04:32.786329 1982 cbtree-test.cc:730] Time spent Insert 4000000 keys: real 2.851s user 2.856s sys 0.000s
I0120 14:04:36.601415 1982 cbtree-test.cc:730] Time spent Insert 4000000 keys: real 2.850s user 2.852s sys 0.000s
4.9.3 (C++11):
I0120 14:31:10.460209 9809 cbtree-test.cc:732] Time spent Insert 4000000 keys: real 3.266s user 3.212s sys 0.028s
I0120 14:31:14.625602 9809 cbtree-test.cc:732] Time spent Insert 4000000 keys: real 3.191s user 3.168s sys 0.000s
I0120 14:31:18.799840 9809 cbtree-test.cc:732] Time spent Insert 4000000 keys: real 3.199s user 3.176s sys 0.004s
I0120 14:31:23.010079 9809 cbtree-test.cc:732] Time spent Insert 4000000 keys: real 3.198s user 3.180s sys 0.000s
I0120 14:31:27.220559 9809 cbtree-test.cc:732] Time spent Insert 4000000 keys: real 3.198s user 3.172s sys 0.004s
I0120 14:31:31.434557 9809 cbtree-test.cc:732] Time spent Insert 4000000 keys: real 3.203s user 3.184s sys 0.000s
I0120 14:31:35.648149 9809 cbtree-test.cc:732] Time spent Insert 4000000 keys: real 3.201s user 3.176s sys 0.004s
Looking at the new assembly code, I found a few easy improvements to win back
some of the performance. With this patch:
I0120 14:31:59.956502 9851 cbtree-test.cc:732] Time spent Insert 4000000 keys: real 3.188s user 3.120s sys 0.044s
I0120 14:32:04.061563 9851 cbtree-test.cc:732] Time spent Insert 4000000 keys: real 3.125s user 3.104s sys 0.000s
I0120 14:32:08.173352 9851 cbtree-test.cc:732] Time spent Insert 4000000 keys: real 3.136s user 3.104s sys 0.012s
I0120 14:32:12.290326 9851 cbtree-test.cc:732] Time spent Insert 4000000 keys: real 3.141s user 3.116s sys 0.004s
I0120 14:32:16.396093 9851 cbtree-test.cc:732] Time spent Insert 4000000 keys: real 3.131s user 3.104s sys 0.004s
I0120 14:32:20.510488 9851 cbtree-test.cc:732] Time spent Insert 4000000 keys: real 3.141s user 3.120s sys 0.000s
I0120 14:32:24.769479 9851 cbtree-test.cc:732] Time spent Insert 4000000 keys: real 3.278s user 3.256s sys 0.000s
I0120 14:32:28.866739 9851 cbtree-test.cc:732] Time spent Insert 4000000 keys: real 3.125s user 3.104s sys 0.000s
I0120 14:32:33.185539 9851 cbtree-test.cc:732] Time spent Insert 4000000 keys: real 3.346s user 3.320s sys 0.000s
I0120 14:32:37.284078 9851 cbtree-test.cc:732] Time spent Insert 4000000 keys: real 3.124s user 3.120s sys 0.000s
which is a few percent gain, but still missing a good chunk.
Change-Id: I63afea81ce9d4ab605ccb27d1d9345637613c06d
Reviewed-on: http://gerrit.cloudera.org:8080/1848
Reviewed-by: Adar Dembo <ad...@cloudera.com>
Tested-by: Internal Jenkins
Project: http://git-wip-us.apache.org/repos/asf/incubator-kudu/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-kudu/commit/173e1fde
Tree: http://git-wip-us.apache.org/repos/asf/incubator-kudu/tree/173e1fde
Diff: http://git-wip-us.apache.org/repos/asf/incubator-kudu/diff/173e1fde
Branch: refs/heads/master
Commit: 173e1fde4ea9f25a3d3aa9dc39706f0ad31834d0
Parents: 1c26565
Author: Todd Lipcon <to...@apache.org>
Authored: Wed Jan 20 14:50:33 2016 -0800
Committer: Todd Lipcon <to...@apache.org>
Committed: Fri Jan 22 01:17:55 2016 +0000
----------------------------------------------------------------------
src/kudu/tablet/concurrent_btree.h | 6 +++---
src/kudu/util/inline_slice.h | 2 +-
2 files changed, 4 insertions(+), 4 deletions(-)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/incubator-kudu/blob/173e1fde/src/kudu/tablet/concurrent_btree.h
----------------------------------------------------------------------
diff --git a/src/kudu/tablet/concurrent_btree.h b/src/kudu/tablet/concurrent_btree.h
index 916b833..1d3be26 100644
--- a/src/kudu/tablet/concurrent_btree.h
+++ b/src/kudu/tablet/concurrent_btree.h
@@ -118,7 +118,7 @@ struct VersionField {
static AtomicVersion StableVersion(volatile AtomicVersion *version) {
for (int loop_count = 0; true; loop_count++) {
AtomicVersion v_acq = base::subtle::Acquire_Load(version);
- if (!IsLocked(v_acq)) {
+ if (PREDICT_TRUE(!IsLocked(v_acq))) {
return v_acq;
}
boost::detail::yield(loop_count++);
@@ -130,9 +130,9 @@ struct VersionField {
while (true) {
AtomicVersion v_acq = base::subtle::Acquire_Load(version);
- if (!IsLocked(v_acq)) {
+ if (PREDICT_TRUE(!IsLocked(v_acq))) {
AtomicVersion v_locked = SetLockBit(v_acq, 1);
- if (base::subtle::Acquire_CompareAndSwap(version, v_acq, v_locked) == v_acq) {
+ if (PREDICT_TRUE(base::subtle::Acquire_CompareAndSwap(version, v_acq, v_locked) == v_acq)) {
return;
}
}
http://git-wip-us.apache.org/repos/asf/incubator-kudu/blob/173e1fde/src/kudu/util/inline_slice.h
----------------------------------------------------------------------
diff --git a/src/kudu/util/inline_slice.h b/src/kudu/util/inline_slice.h
index 648862f..aeca370 100644
--- a/src/kudu/util/inline_slice.h
+++ b/src/kudu/util/inline_slice.h
@@ -75,7 +75,7 @@ class InlineSlice {
InlineSlice() {
}
- const Slice as_slice() const {
+ inline const Slice as_slice() const ATTRIBUTE_ALWAYS_INLINE {
DiscriminatedPointer dptr = LoadValue();
if (dptr.is_indirect()) {
[2/3] incubator-kudu git commit: codegen: force inlining of utility
functions in JIT code
Posted by to...@apache.org.
codegen: force inlining of utility functions in JIT code
In investigating why some benchmarks got slower with the switch to C++11, I
looked at some of the LLVM-generated code. We switched from LLVM 3.4 to 3.7
with the switch to C++11, so I thought we might have accidentally broken
optimizations.
It turns out we didn't accidentally disable optimizations, but llvm's inlining
heuristics must have changed. Looking at the generated code, I noticed that
we were not inlining the small utility functions from precompiled.cc, but instead
generating indirect calls.
I figured out that adding the 'always_inline' attribute to these functions
fixes the inlining and gave a pretty big performance improvement.
Before:
I0120 12:57:22.352180 25902 full_stack-insert-scan-test.cc:403] Time spent empty projection, 0 col: real 0.418s user 0.016s sys 0.048s
I0120 12:57:22.889552 25902 full_stack-insert-scan-test.cc:403] Time spent key scan, 1 col: real 0.479s user 0.020s sys 0.008s
I0120 12:57:24.727413 25902 full_stack-insert-scan-test.cc:403] Time spent full schema scan, 10 col: real 1.770s user 0.092s sys 0.016s
I0120 12:57:25.588093 25902 full_stack-insert-scan-test.cc:403] Time spent String projection, 1 col: real 0.788s user 0.064s sys 0.000s
I0120 12:57:26.288879 25902 full_stack-insert-scan-test.cc:403] Time spent Int32 projection, 4 col: real 0.639s user 0.028s sys 0.000s
I0120 12:57:27.042276 25902 full_stack-insert-scan-test.cc:403] Time spent Int64 projection, 4 col: real 0.693s user 0.028s sys 0.008s
After:
I0120 12:56:01.236634 25319 full_stack-insert-scan-test.cc:310] Doing test scans on table of 10000000 rows.
I0120 12:56:01.965637 25319 full_stack-insert-scan-test.cc:403] Time spent empty projection, 0 col: real 0.366s user 0.032s sys 0.004s
I0120 12:56:02.491664 25319 full_stack-insert-scan-test.cc:403] Time spent key scan, 1 col: real 0.459s user 0.024s sys 0.004s
I0120 12:56:03.834020 25319 full_stack-insert-scan-test.cc:403] Time spent full schema scan, 10 col: real 1.250s user 0.096s sys 0.008s
I0120 12:56:04.681128 25319 full_stack-insert-scan-test.cc:403] Time spent String projection, 1 col: real 0.780s user 0.060s sys 0.000s
I0120 12:56:05.294754 25319 full_stack-insert-scan-test.cc:403] Time spent Int32 projection, 4 col: real 0.547s user 0.028s sys 0.000s
I0120 12:56:05.988930 25319 full_stack-insert-scan-test.cc:403] Time spent Int64 projection, 4 col: real 0.633s user 0.036s sys 0.000s
Comparing the machine code before and after for the projection with 4 int64s
shows the effect of inlining:
LLVM 3.4 (prior to our C++11 upgrade):
movq (%rsi), %rax
movq 8(%rsi), %rcx
movq 192(%rax), %rdx
movq (%rdx), %rdx
movq 40(%rdi), %rsi
movq %rsi, (%rdx,%rcx,8)
movq 192(%rax), %rdx
movq 8(%rdx), %rdx
movq 48(%rdi), %rsi
movq %rsi, (%rdx,%rcx,8)
movq 192(%rax), %rdx
movq 16(%rdx), %rdx
movq 56(%rdi), %rsi
movq %rsi, (%rdx,%rcx,8)
movq 192(%rax), %rax
movq 24(%rax), %rax
movq 64(%rdi), %rdx
movq %rdx, (%rax,%rcx,8)
movb $1, %al
ret
After upgrade to LLVM 3.7:
pushq %r15
pushq %r14
pushq %rbx
movq %rsi, %r14
movq %rdi, %rbx
leaq 40(%rbx), %rdi
movabsq $140426221318144, %r15 <-- address of the 'PrecompiledCopyCell' function
xorl %edx, %edx
callq *%r15 <-- indirect call for each cell (#1)
leaq 48(%rbx), %rdi
movl $1, %edx
movq %r14, %rsi
callq *%r15 <-- #2
leaq 56(%rbx), %rdi
movl $2, %edx
movq %r14, %rsi
callq *%r15 <-- #3
addq $64, %rbx
movl $3, %edx
movq %rbx, %rdi
movq %r14, %rsi
callq *%r15 <-- #4
movb $1, %al
popq %rbx
popq %r14
popq %r15
retq
After this patch:
movq (%rsi), %rax
movq 8(%rsi), %rcx
movq 192(%rax), %rdx
movq (%rdx), %rdx
movq 40(%rdi), %rsi
movq %rsi, (%rdx,%rcx,8)
movq 192(%rax), %rdx
movq 8(%rdx), %rdx
movq 48(%rdi), %rsi
movq %rsi, (%rdx,%rcx,8)
movq 192(%rax), %rdx
movq 16(%rdx), %rdx
movq 56(%rdi), %rsi
movq %rsi, (%rdx,%rcx,8)
movq 192(%rax), %rax
movq 24(%rax), %rax
movq 64(%rdi), %rdx
movq %rdx, (%rax,%rcx,8)
movb $1, %al
retq
It looks like the 'after' still has some redundant loads which might be preventable,
but this should restore our previous performance.
Change-Id: I49c0eec360818c160db25bac7a91b5d00fff57ba
Reviewed-on: http://gerrit.cloudera.org:8080/1845
Reviewed-by: Adar Dembo <ad...@cloudera.com>
Tested-by: Internal Jenkins
Project: http://git-wip-us.apache.org/repos/asf/incubator-kudu/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-kudu/commit/1c265653
Tree: http://git-wip-us.apache.org/repos/asf/incubator-kudu/tree/1c265653
Diff: http://git-wip-us.apache.org/repos/asf/incubator-kudu/diff/1c265653
Branch: refs/heads/master
Commit: 1c26565341d603e97956c95e7865036b7509c475
Parents: 21a6164
Author: Todd Lipcon <to...@apache.org>
Authored: Wed Jan 20 12:56:46 2016 -0800
Committer: Todd Lipcon <to...@apache.org>
Committed: Fri Jan 22 01:17:48 2016 +0000
----------------------------------------------------------------------
src/kudu/codegen/precompiled.cc | 31 ++++++++++++++++++++++---------
1 file changed, 22 insertions(+), 9 deletions(-)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/incubator-kudu/blob/1c265653/src/kudu/codegen/precompiled.cc
----------------------------------------------------------------------
diff --git a/src/kudu/codegen/precompiled.cc b/src/kudu/codegen/precompiled.cc
index 7f2e20b..b073176 100644
--- a/src/kudu/codegen/precompiled.cc
+++ b/src/kudu/codegen/precompiled.cc
@@ -44,10 +44,22 @@
// IR_BUILD because we use a fake static library target to workaround a cmake
// dependencies bug. See 'ir_fake_target' in CMakeLists.txt.
#ifdef IR_BUILD
+
+// This file uses the 'always_inline' attribute on a bunch of functions to force
+// the LLVM optimizer at runtime to inline them where it otherwise might not.
+// Because the functions themselves aren't marked 'inline', gcc is unhappy with this.
+// But, we can't mark them 'inline' or else they'll get optimized away and not even
+// included in the .ll file. So, instead, we just mark them as always_inline in
+// the IR_BUILD context.
+#define IR_ALWAYS_INLINE __attribute__((always_inline))
+
// Workaround for an MCJIT deficiency where we see a link error when trying
// to load the JITted library. See the following LLVM bug and suggested workaround.
// https://llvm.org/bugs/show_bug.cgi?id=18062
extern "C" void *__dso_handle __attribute__((__visibility__("hidden"))) = NULL;
+
+#else
+#define IR_ALWAYS_INLINE
#endif
namespace kudu {
@@ -55,8 +67,8 @@ namespace kudu {
// Returns whether copy was successful (fails iff slice relocation fails,
// which can only occur if is_string is true).
// If arena is NULL, then no relocation occurs.
-static bool BasicCopyCell(uint64_t size, uint8_t* src, uint8_t* dst,
- bool is_string, Arena* arena) {
+IR_ALWAYS_INLINE static bool BasicCopyCell(
+ uint64_t size, uint8_t* src, uint8_t* dst, bool is_string, Arena* arena) {
// Relocate indirect data
if (is_string) {
if (PREDICT_TRUE(arena != nullptr)) {
@@ -99,8 +111,9 @@ extern "C" {
// If arena is NULL then only the direct copy will occur.
// Returns whether successful. If not, out-of-memory during relocation of
// slices has occured, which can only happen if is_string is true.
-bool _PrecompiledCopyCellToRowBlock(uint64_t size, uint8_t* src, RowBlockRow* dst,
- uint64_t col, bool is_string, Arena* arena) {
+IR_ALWAYS_INLINE bool _PrecompiledCopyCellToRowBlock(
+ uint64_t size, uint8_t* src, RowBlockRow* dst,
+ uint64_t col, bool is_string, Arena* arena) {
// We manually compute the destination cell pointer here, rather than
// using dst->cell_ptr(), since we statically know the size of the column
@@ -125,9 +138,9 @@ bool _PrecompiledCopyCellToRowBlock(uint64_t size, uint8_t* src, RowBlockRow* ds
// bitmap indicates the cell itself is non-null).
// Returns whether successful. If not, out-of-memory during relocation of
// slices has occured, which can only happen if is_string is true.
-bool _PrecompiledCopyCellToRowBlockNullable(
- uint64_t size, uint8_t* src, RowBlockRow* dst, uint64_t col, bool is_string,
- Arena* arena, uint8_t* src_bitmap, uint64_t bitmap_idx) {
+IR_ALWAYS_INLINE bool _PrecompiledCopyCellToRowBlockNullable(
+ uint64_t size, uint8_t* src, RowBlockRow* dst, uint64_t col, bool is_string,
+ Arena* arena, uint8_t* src_bitmap, uint64_t bitmap_idx) {
// Using this method implies the nullablity of the column.
// Write whether the column is nullable to the RowBlock's ColumnBlock's bitmap
bool is_null = BitmapTest(src_bitmap, bitmap_idx);
@@ -142,8 +155,8 @@ bool _PrecompiledCopyCellToRowBlockNullable(
//
// Sets the cell at column 'col' for destination RowBlockRow 'dst'
// to be marked as 'is_null' (requires the column is nullable).
-void _PrecompiledCopyCellToRowBlockSetNull(
- RowBlockRow* dst, uint64_t col, bool is_null) {
+IR_ALWAYS_INLINE void _PrecompiledCopyCellToRowBlockSetNull(
+ RowBlockRow* dst, uint64_t col, bool is_null) {
dst->cell(col).set_null(is_null);
}