You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@kudu.apache.org by to...@apache.org on 2017/09/15 05:57:21 UTC
[2/2] kudu git commit: log_block_manager: switch from
google::sparse_hash_map to sparsepp
log_block_manager: switch from google::sparse_hash_map to sparsepp
sparsepp is updated for C++11 so it enables move semantics for the map
elements. Since the block map uses ref-counted values, being able to move them
is a big win. It also claims to be generally faster even aside from the
ability to support moves.
According to [1] this data structure uses ~10% more memory than
google::sparse_hash_map. However, previous measurement indicated that 1M
blocks used about 9MB of memory, so this isn't a major consumer as far
as the overall system is concerned. It seems worth a few extra MB of
memory in order to make substantial startup time improvements.
Despite slightly more memory usage, it's still significantly better than
std::unordered_map, and also shares the benefit of avoiding any large
allocations. (std::unordered_map needs a contiguous allocation for the
buckets array).
This patch alone improved startup time 7-8x on a real host with ~11M blocks:
Before:
I0907 17:23:50.748055 12507 fs_manager.cc:335] Time spent opening block manager: real 108.910s user 0.000s sys 0.001s
After:
I0907 17:20:42.277474 10021 fs_manager.cc:335] Time spent opening block manager: real 14.348s user 0.000s sys 0.001s
The LBM startup benchmark (1M blocks) improved less substantially but still noticeably:
Before:
I0907 17:16:54.899818 20839 log_block_manager-test.cc:799] Time spent reopening block manager: real 2.612s user 0.035s sys 0.002s
I0907 17:16:57.498205 20839 log_block_manager-test.cc:799] Time spent reopening block manager: real 2.598s user 0.039s sys 0.001s
I0907 17:17:00.100244 20839 log_block_manager-test.cc:799] Time spent reopening block manager: real 2.602s user 0.042s sys 0.000s
I0907 17:17:02.686638 20839 log_block_manager-test.cc:799] Time spent reopening block manager: real 2.586s user 0.042s sys 0.000s
I0907 17:17:05.284050 20839 log_block_manager-test.cc:799] Time spent reopening block manager: real 2.597s user 0.041s sys 0.001s
I0907 17:17:07.884395 20839 log_block_manager-test.cc:799] Time spent reopening block manager: real 2.600s user 0.039s sys 0.001s
I0907 17:17:10.490550 20839 log_block_manager-test.cc:799] Time spent reopening block manager: real 2.606s user 0.040s sys 0.001s
I0907 17:17:13.070114 20839 log_block_manager-test.cc:799] Time spent reopening block manager: real 2.580s user 0.039s sys 0.000s
I0907 17:17:15.667062 20839 log_block_manager-test.cc:799] Time spent reopening block manager: real 2.597s user 0.040s sys 0.001s
I0907 17:17:18.258447 20839 log_block_manager-test.cc:799] Time spent reopening block manager: real 2.591s user 0.042s sys 0.000s
After:
I0907 17:15:50.645310 20302 log_block_manager-test.cc:799] Time spent reopening block manager: real 1.570s user 0.034s sys 0.001s
I0907 17:15:52.195543 20302 log_block_manager-test.cc:799] Time spent reopening block manager: real 1.550s user 0.037s sys 0.001s
I0907 17:15:53.755209 20302 log_block_manager-test.cc:799] Time spent reopening block manager: real 1.560s user 0.037s sys 0.001s
I0907 17:15:55.263762 20302 log_block_manager-test.cc:799] Time spent reopening block manager: real 1.509s user 0.038s sys 0.001s
I0907 17:15:56.818748 20302 log_block_manager-test.cc:799] Time spent reopening block manager: real 1.555s user 0.037s sys 0.001s
I0907 17:15:58.379680 20302 log_block_manager-test.cc:799] Time spent reopening block manager: real 1.561s user 0.036s sys 0.001s
I0907 17:15:59.913751 20302 log_block_manager-test.cc:799] Time spent reopening block manager: real 1.534s user 0.038s sys 0.000s
I0907 17:16:01.461668 20302 log_block_manager-test.cc:799] Time spent reopening block manager: real 1.548s user 0.037s sys 0.001s
I0907 17:16:03.020823 20302 log_block_manager-test.cc:799] Time spent reopening block manager: real 1.559s user 0.037s sys 0.001s
I0907 17:16:04.549747 20302 log_block_manager-test.cc:799] Time spent reopening block manager: real 1.529s user 0.035s sys 0.001s
[1] https://github.com/greg7mdp/sparsepp/blob/master/bench.md
Change-Id: I7397f9cd418782caecf8b2dae2c7bfe2c0e6215c
Reviewed-on: http://gerrit.cloudera.org:8080/8007
Tested-by: Kudu Jenkins
Reviewed-by: Adar Dembo <ad...@cloudera.com>
Project: http://git-wip-us.apache.org/repos/asf/kudu/repo
Commit: http://git-wip-us.apache.org/repos/asf/kudu/commit/3c26cc3c
Tree: http://git-wip-us.apache.org/repos/asf/kudu/tree/3c26cc3c
Diff: http://git-wip-us.apache.org/repos/asf/kudu/diff/3c26cc3c
Branch: refs/heads/master
Commit: 3c26cc3c2d745afa905888576ac392f5dbc86a66
Parents: 07c3134
Author: Todd Lipcon <to...@apache.org>
Authored: Thu Sep 7 12:54:39 2017 -0700
Committer: Todd Lipcon <to...@apache.org>
Committed: Fri Sep 15 05:56:43 2017 +0000
----------------------------------------------------------------------
src/kudu/fs/log_block_manager.h | 4 ++--
thirdparty/build-definitions.sh | 7 +++++++
thirdparty/build-thirdparty.sh | 5 +++++
thirdparty/download-thirdparty.sh | 10 ++++++++++
thirdparty/vars.sh | 11 +++++++++++
5 files changed, 35 insertions(+), 2 deletions(-)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/kudu/blob/3c26cc3c/src/kudu/fs/log_block_manager.h
----------------------------------------------------------------------
diff --git a/src/kudu/fs/log_block_manager.h b/src/kudu/fs/log_block_manager.h
index 1f3194b..bc38e5a 100644
--- a/src/kudu/fs/log_block_manager.h
+++ b/src/kudu/fs/log_block_manager.h
@@ -28,8 +28,8 @@
#include <vector>
#include <boost/optional/optional.hpp> // IWYU pragma: keep
-#include <sparsehash/sparse_hash_map>
#include <gtest/gtest_prod.h>
+#include <sparsepp/spp.h>
#include "kudu/fs/block_id.h"
#include "kudu/fs/block_manager.h"
@@ -203,7 +203,7 @@ class LogBlockManager : public BlockManager {
// We use sparse_hash_map<> here to reduce memory overhead.
typedef MemTrackerAllocator<
std::pair<const BlockId, scoped_refptr<internal::LogBlock>>> BlockAllocator;
- typedef google::sparse_hash_map<
+ typedef spp::sparse_hash_map<
BlockId,
scoped_refptr<internal::LogBlock>,
BlockIdHash,
http://git-wip-us.apache.org/repos/asf/kudu/blob/3c26cc3c/thirdparty/build-definitions.sh
----------------------------------------------------------------------
diff --git a/thirdparty/build-definitions.sh b/thirdparty/build-definitions.sh
index c52bc25..9aff1e6 100644
--- a/thirdparty/build-definitions.sh
+++ b/thirdparty/build-definitions.sh
@@ -721,3 +721,10 @@ build_sparsehash() {
rsync -av --delete sparsehash/ $PREFIX/include/sparsehash/
popd
}
+
+build_sparsepp() {
+ # This library is header-only, so we just copy the headers
+ pushd $SPARSEPP_SOURCE
+ rsync -av --delete sparsepp/ $PREFIX/include/sparsepp/
+ popd
+}
http://git-wip-us.apache.org/repos/asf/kudu/blob/3c26cc3c/thirdparty/build-thirdparty.sh
----------------------------------------------------------------------
diff --git a/thirdparty/build-thirdparty.sh b/thirdparty/build-thirdparty.sh
index cf56ec0..962981b 100755
--- a/thirdparty/build-thirdparty.sh
+++ b/thirdparty/build-thirdparty.sh
@@ -94,6 +94,7 @@ else
"boost") F_BOOST=1 ;;
"breakpad") F_BREAKPAD=1 ;;
"sparsehash") F_SPARSEHASH=1 ;;
+ "sparsepp") F_SPARSEPP=1 ;;
*) echo "Unknown module: $arg"; exit 1 ;;
esac
done
@@ -232,6 +233,10 @@ if [ -n "$F_COMMON" -o -n "$F_SPARSEHASH" ]; then
build_sparsehash
fi
+if [ -n "$F_COMMON" -o -n "$F_SPARSEPP" ]; then
+ build_sparsepp
+fi
+
### Build C dependencies without instrumentation
PREFIX=$PREFIX_DEPS
http://git-wip-us.apache.org/repos/asf/kudu/blob/3c26cc3c/thirdparty/download-thirdparty.sh
----------------------------------------------------------------------
diff --git a/thirdparty/download-thirdparty.sh b/thirdparty/download-thirdparty.sh
index 3cba006..cce9063 100755
--- a/thirdparty/download-thirdparty.sh
+++ b/thirdparty/download-thirdparty.sh
@@ -325,5 +325,15 @@ if [ ! -d "$SPARSEHASH_SOURCE" ]; then
popd
fi
+SPARSEPP_PATCHLEVEL=0
+delete_if_wrong_patchlevel $SPARSEPP_SOURCE $SPARSEPP_PATCHLEVEL
+if [ ! -d "$SPARSEPP_SOURCE" ]; then
+ fetch_and_expand sparsepp-${SPARSEPP_VERSION}.tar.gz
+ pushd $SPARSEPP_SOURCE
+ touch patchlevel-$SPARSEPP_PATCHLEVEL
+ popd
+fi
+
+
echo "---------------"
echo "Thirdparty dependencies downloaded successfully"
http://git-wip-us.apache.org/repos/asf/kudu/blob/3c26cc3c/thirdparty/vars.sh
----------------------------------------------------------------------
diff --git a/thirdparty/vars.sh b/thirdparty/vars.sh
index 1c00d13..ce1e0e8 100644
--- a/thirdparty/vars.sh
+++ b/thirdparty/vars.sh
@@ -188,3 +188,14 @@ BREAKPAD_SOURCE=$TP_SOURCE_DIR/$BREAKPAD_NAME
SPARSEHASH_VERSION=47a55825ca3b35eab1ca22b7ab82b9544e32a9af
SPARSEHASH_NAME=sparsehash-c11-$SPARSEHASH_VERSION
SPARSEHASH_SOURCE=$TP_SOURCE_DIR/$SPARSEHASH_NAME
+
+# Hash of the sparsepp git revision to use.
+# (from https://github.com/greg7mdp/sparsepp)
+#
+# To re-build this tarball use the following in the sparsepp repo:
+# export NAME=sparsepp-$(git rev-parse HEAD)
+# git archive HEAD --prefix=$NAME/ -o /tmp/$NAME.tar.gz
+# s3cmd put -P /tmp/$NAME.tar.gz s3://cloudera-thirdparty-libs/$NAME.tar.gz
+SPARSEPP_VERSION=824860bb76893d163efbcff330734b9f62eecb17
+SPARSEPP_NAME=sparsepp-$SPARSEPP_VERSION
+SPARSEPP_SOURCE=$TP_SOURCE_DIR/$SPARSEPP_NAME