You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@kudu.apache.org by to...@apache.org on 2017/09/15 05:57:21 UTC

[2/2] kudu git commit: log_block_manager: switch from google::sparse_hash_map to sparsepp

log_block_manager: switch from google::sparse_hash_map to sparsepp

sparsepp is updated for C++11 so it enables move semantics for the map
elements. Since the block map uses ref-counted values, being able to move them
is a big win. It also claims to be generally faster even aside from the
ability to support moves.

According to [1] this data structure uses ~10% more memory than
google::sparse_hash_map. However, previous measurement indicated that 1M
blocks used about 9MB of memory, so this isn't a major consumer as far
as the overall system is concerned. It seems worth a few extra MB of
memory in order to make substantial startup time improvements.

Despite slightly more memory usage, it's still significantly better than
std::unordered_map, and also shares the benefit of avoiding any large
allocations. (std::unordered_map needs a contiguous allocation for the
buckets array).

This patch alone improved startup time 7-8x on a real host with ~11M blocks:

Before:
I0907 17:23:50.748055 12507 fs_manager.cc:335] Time spent opening block manager: real 108.910s  user 0.000s sys 0.001s

After:
I0907 17:20:42.277474 10021 fs_manager.cc:335] Time spent opening block manager: real 14.348s user 0.000s sys 0.001s

The LBM startup benchmark (1M blocks) improved less substantially but still noticeably:

Before:
I0907 17:16:54.899818 20839 log_block_manager-test.cc:799] Time spent reopening block manager: real 2.612s      user 0.035s     sys 0.002s
I0907 17:16:57.498205 20839 log_block_manager-test.cc:799] Time spent reopening block manager: real 2.598s      user 0.039s     sys 0.001s
I0907 17:17:00.100244 20839 log_block_manager-test.cc:799] Time spent reopening block manager: real 2.602s      user 0.042s     sys 0.000s
I0907 17:17:02.686638 20839 log_block_manager-test.cc:799] Time spent reopening block manager: real 2.586s      user 0.042s     sys 0.000s
I0907 17:17:05.284050 20839 log_block_manager-test.cc:799] Time spent reopening block manager: real 2.597s      user 0.041s     sys 0.001s
I0907 17:17:07.884395 20839 log_block_manager-test.cc:799] Time spent reopening block manager: real 2.600s      user 0.039s     sys 0.001s
I0907 17:17:10.490550 20839 log_block_manager-test.cc:799] Time spent reopening block manager: real 2.606s      user 0.040s     sys 0.001s
I0907 17:17:13.070114 20839 log_block_manager-test.cc:799] Time spent reopening block manager: real 2.580s      user 0.039s     sys 0.000s
I0907 17:17:15.667062 20839 log_block_manager-test.cc:799] Time spent reopening block manager: real 2.597s      user 0.040s     sys 0.001s
I0907 17:17:18.258447 20839 log_block_manager-test.cc:799] Time spent reopening block manager: real 2.591s      user 0.042s     sys 0.000s

After:
I0907 17:15:50.645310 20302 log_block_manager-test.cc:799] Time spent reopening block manager: real 1.570s      user 0.034s     sys 0.001s
I0907 17:15:52.195543 20302 log_block_manager-test.cc:799] Time spent reopening block manager: real 1.550s      user 0.037s     sys 0.001s
I0907 17:15:53.755209 20302 log_block_manager-test.cc:799] Time spent reopening block manager: real 1.560s      user 0.037s     sys 0.001s
I0907 17:15:55.263762 20302 log_block_manager-test.cc:799] Time spent reopening block manager: real 1.509s      user 0.038s     sys 0.001s
I0907 17:15:56.818748 20302 log_block_manager-test.cc:799] Time spent reopening block manager: real 1.555s      user 0.037s     sys 0.001s
I0907 17:15:58.379680 20302 log_block_manager-test.cc:799] Time spent reopening block manager: real 1.561s      user 0.036s     sys 0.001s
I0907 17:15:59.913751 20302 log_block_manager-test.cc:799] Time spent reopening block manager: real 1.534s      user 0.038s     sys 0.000s
I0907 17:16:01.461668 20302 log_block_manager-test.cc:799] Time spent reopening block manager: real 1.548s      user 0.037s     sys 0.001s
I0907 17:16:03.020823 20302 log_block_manager-test.cc:799] Time spent reopening block manager: real 1.559s      user 0.037s     sys 0.001s
I0907 17:16:04.549747 20302 log_block_manager-test.cc:799] Time spent reopening block manager: real 1.529s      user 0.035s     sys 0.001s

[1] https://github.com/greg7mdp/sparsepp/blob/master/bench.md

Change-Id: I7397f9cd418782caecf8b2dae2c7bfe2c0e6215c
Reviewed-on: http://gerrit.cloudera.org:8080/8007
Tested-by: Kudu Jenkins
Reviewed-by: Adar Dembo <ad...@cloudera.com>


Project: http://git-wip-us.apache.org/repos/asf/kudu/repo
Commit: http://git-wip-us.apache.org/repos/asf/kudu/commit/3c26cc3c
Tree: http://git-wip-us.apache.org/repos/asf/kudu/tree/3c26cc3c
Diff: http://git-wip-us.apache.org/repos/asf/kudu/diff/3c26cc3c

Branch: refs/heads/master
Commit: 3c26cc3c2d745afa905888576ac392f5dbc86a66
Parents: 07c3134
Author: Todd Lipcon <to...@apache.org>
Authored: Thu Sep 7 12:54:39 2017 -0700
Committer: Todd Lipcon <to...@apache.org>
Committed: Fri Sep 15 05:56:43 2017 +0000

----------------------------------------------------------------------
 src/kudu/fs/log_block_manager.h   |  4 ++--
 thirdparty/build-definitions.sh   |  7 +++++++
 thirdparty/build-thirdparty.sh    |  5 +++++
 thirdparty/download-thirdparty.sh | 10 ++++++++++
 thirdparty/vars.sh                | 11 +++++++++++
 5 files changed, 35 insertions(+), 2 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/kudu/blob/3c26cc3c/src/kudu/fs/log_block_manager.h
----------------------------------------------------------------------
diff --git a/src/kudu/fs/log_block_manager.h b/src/kudu/fs/log_block_manager.h
index 1f3194b..bc38e5a 100644
--- a/src/kudu/fs/log_block_manager.h
+++ b/src/kudu/fs/log_block_manager.h
@@ -28,8 +28,8 @@
 #include <vector>
 
 #include <boost/optional/optional.hpp>  // IWYU pragma: keep
-#include <sparsehash/sparse_hash_map>
 #include <gtest/gtest_prod.h>
+#include <sparsepp/spp.h>
 
 #include "kudu/fs/block_id.h"
 #include "kudu/fs/block_manager.h"
@@ -203,7 +203,7 @@ class LogBlockManager : public BlockManager {
   // We use sparse_hash_map<> here to reduce memory overhead.
   typedef MemTrackerAllocator<
       std::pair<const BlockId, scoped_refptr<internal::LogBlock>>> BlockAllocator;
-  typedef google::sparse_hash_map<
+  typedef spp::sparse_hash_map<
       BlockId,
       scoped_refptr<internal::LogBlock>,
       BlockIdHash,

http://git-wip-us.apache.org/repos/asf/kudu/blob/3c26cc3c/thirdparty/build-definitions.sh
----------------------------------------------------------------------
diff --git a/thirdparty/build-definitions.sh b/thirdparty/build-definitions.sh
index c52bc25..9aff1e6 100644
--- a/thirdparty/build-definitions.sh
+++ b/thirdparty/build-definitions.sh
@@ -721,3 +721,10 @@ build_sparsehash() {
   rsync -av --delete sparsehash/ $PREFIX/include/sparsehash/
   popd
 }
+
+build_sparsepp() {
+  # This library is header-only, so we just copy the headers
+  pushd $SPARSEPP_SOURCE
+  rsync -av --delete sparsepp/ $PREFIX/include/sparsepp/
+  popd
+}

http://git-wip-us.apache.org/repos/asf/kudu/blob/3c26cc3c/thirdparty/build-thirdparty.sh
----------------------------------------------------------------------
diff --git a/thirdparty/build-thirdparty.sh b/thirdparty/build-thirdparty.sh
index cf56ec0..962981b 100755
--- a/thirdparty/build-thirdparty.sh
+++ b/thirdparty/build-thirdparty.sh
@@ -94,6 +94,7 @@ else
       "boost")        F_BOOST=1 ;;
       "breakpad")     F_BREAKPAD=1 ;;
       "sparsehash")   F_SPARSEHASH=1 ;;
+      "sparsepp")     F_SPARSEPP=1 ;;
       *)              echo "Unknown module: $arg"; exit 1 ;;
     esac
   done
@@ -232,6 +233,10 @@ if [ -n "$F_COMMON" -o -n "$F_SPARSEHASH" ]; then
   build_sparsehash
 fi
 
+if [ -n "$F_COMMON" -o -n "$F_SPARSEPP" ]; then
+  build_sparsepp
+fi
+
 ### Build C dependencies without instrumentation
 
 PREFIX=$PREFIX_DEPS

http://git-wip-us.apache.org/repos/asf/kudu/blob/3c26cc3c/thirdparty/download-thirdparty.sh
----------------------------------------------------------------------
diff --git a/thirdparty/download-thirdparty.sh b/thirdparty/download-thirdparty.sh
index 3cba006..cce9063 100755
--- a/thirdparty/download-thirdparty.sh
+++ b/thirdparty/download-thirdparty.sh
@@ -325,5 +325,15 @@ if [ ! -d "$SPARSEHASH_SOURCE" ]; then
   popd
 fi
 
+SPARSEPP_PATCHLEVEL=0
+delete_if_wrong_patchlevel $SPARSEPP_SOURCE $SPARSEPP_PATCHLEVEL
+if [ ! -d "$SPARSEPP_SOURCE" ]; then
+  fetch_and_expand sparsepp-${SPARSEPP_VERSION}.tar.gz
+  pushd $SPARSEPP_SOURCE
+  touch patchlevel-$SPARSEPP_PATCHLEVEL
+  popd
+fi
+
+
 echo "---------------"
 echo "Thirdparty dependencies downloaded successfully"

http://git-wip-us.apache.org/repos/asf/kudu/blob/3c26cc3c/thirdparty/vars.sh
----------------------------------------------------------------------
diff --git a/thirdparty/vars.sh b/thirdparty/vars.sh
index 1c00d13..ce1e0e8 100644
--- a/thirdparty/vars.sh
+++ b/thirdparty/vars.sh
@@ -188,3 +188,14 @@ BREAKPAD_SOURCE=$TP_SOURCE_DIR/$BREAKPAD_NAME
 SPARSEHASH_VERSION=47a55825ca3b35eab1ca22b7ab82b9544e32a9af
 SPARSEHASH_NAME=sparsehash-c11-$SPARSEHASH_VERSION
 SPARSEHASH_SOURCE=$TP_SOURCE_DIR/$SPARSEHASH_NAME
+
+# Hash of the sparsepp git revision to use.
+# (from https://github.com/greg7mdp/sparsepp)
+#
+# To re-build this tarball use the following in the sparsepp repo:
+#  export NAME=sparsepp-$(git rev-parse HEAD)
+#  git archive HEAD --prefix=$NAME/ -o /tmp/$NAME.tar.gz
+#  s3cmd put -P /tmp/$NAME.tar.gz s3://cloudera-thirdparty-libs/$NAME.tar.gz
+SPARSEPP_VERSION=824860bb76893d163efbcff330734b9f62eecb17
+SPARSEPP_NAME=sparsepp-$SPARSEPP_VERSION
+SPARSEPP_SOURCE=$TP_SOURCE_DIR/$SPARSEPP_NAME