You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@kudu.apache.org by gr...@apache.org on 2018/05/02 21:12:32 UTC

[1/2] kudu git commit: Fast path scanning blocks of deleted rows

Repository: kudu
Updated Branches:
  refs/heads/master 0355d373a -> e45fbd1b0


Fast path scanning blocks of deleted rows

This adds some very simple fast-paths in the case that an entire row
block consists of rows that were deleted.

The first is in the block materialization code: if delta application
results in all rows being deleted, we don't need to move forward and
materialize any columns.

The second is before serializing scan responses to the client. In this
case we don't need to loop over each column and read the selection
vector for each column. Instead we can just return immediately.

I tested this on a table where I'd inserted a few billion rows, deleted
them all, and then re-inserted them. Before the patch, a simple 'SELECT
* FROM t LIMIT 10' took 306sec. With the first optimization only, it
took about 10 seconds. With the second one as well, it took about 2.2
seconds.

There are probably more general optimizations that could be done for
sparsely-populated RowBlocks (eg where just a few rows are selected) but
they are much more complex, and it's relatively common for users to
delete large consecutive runs of rows. For example, users may use
'DELETE' all rows in a table or partition before re-adding them, or they
may delete all data corresponding to some prefix of the PK.

Change-Id: I9fa891c0f4e857ddd1f873ad4855154d078be6b8
Reviewed-on: http://gerrit.cloudera.org:8080/10213
Tested-by: Todd Lipcon <to...@apache.org>
Reviewed-by: Will Berkeley <wd...@gmail.com>


Project: http://git-wip-us.apache.org/repos/asf/kudu/repo
Commit: http://git-wip-us.apache.org/repos/asf/kudu/commit/7201b063
Tree: http://git-wip-us.apache.org/repos/asf/kudu/tree/7201b063
Diff: http://git-wip-us.apache.org/repos/asf/kudu/diff/7201b063

Branch: refs/heads/master
Commit: 7201b0635a448cd244dc598158b13226f14aa0fb
Parents: 0355d37
Author: Todd Lipcon <to...@apache.org>
Authored: Wed Apr 25 15:54:45 2018 -0700
Committer: Will Berkeley <wd...@gmail.com>
Committed: Wed May 2 05:29:41 2018 +0000

----------------------------------------------------------------------
 src/kudu/common/generic_iterators.cc | 7 +++++++
 src/kudu/tserver/tablet_service.cc   | 4 ++++
 2 files changed, 11 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/kudu/blob/7201b063/src/kudu/common/generic_iterators.cc
----------------------------------------------------------------------
diff --git a/src/kudu/common/generic_iterators.cc b/src/kudu/common/generic_iterators.cc
index 7a26e4f..4ac37d4 100644
--- a/src/kudu/common/generic_iterators.cc
+++ b/src/kudu/common/generic_iterators.cc
@@ -524,6 +524,13 @@ Status MaterializingIterator::MaterializeBlock(RowBlock *dst) {
   // been deleted.
   RETURN_NOT_OK(iter_->InitializeSelectionVector(dst->selection_vector()));
 
+  // It's relatively common to delete large sequential chunks of rows.
+  // We can fast-path that case and avoid reading any column data.
+  if (!dst->selection_vector()->AnySelected()) {
+    DVLOG(1) << "Fast path over " << dst->nrows() << " deleted rows";
+    return Status::OK();
+  }
+
   for (const auto& col_pred : col_idx_predicates_) {
     // Materialize the column itself into the row block.
     ColumnBlock dst_col(dst->column_block(get<0>(col_pred)));

http://git-wip-us.apache.org/repos/asf/kudu/blob/7201b063/src/kudu/tserver/tablet_service.cc
----------------------------------------------------------------------
diff --git a/src/kudu/tserver/tablet_service.cc b/src/kudu/tserver/tablet_service.cc
index bf2be2b..7339dd4 100644
--- a/src/kudu/tserver/tablet_service.cc
+++ b/src/kudu/tserver/tablet_service.cc
@@ -481,6 +481,10 @@ class ScanResultCopier : public ScanResultCollector {
 
   void HandleRowBlock(Scanner* scanner, const RowBlock& row_block) override {
     int64_t num_selected = row_block.selection_vector()->CountSelected();
+    // Fast-path empty blocks (eg because the predicate didn't match any rows or
+    // all rows in the block were deleted)
+    if (num_selected == 0) return;
+
     num_rows_returned_ += num_selected;
     scanner->add_num_rows_returned(num_selected);
     SerializeRowBlock(row_block, rowblock_pb_, scanner->client_projection_schema(),


[2/2] kudu git commit: [docs] Add suggestion to mirror WAL and metadata directories

Posted by gr...@apache.org.
[docs] Add suggestion to mirror WAL and metadata directories

This adds a small suggestion to the configuration docs that
mirroring the WAL and metadata drives makes recovering from
failures easier, since the whole node won't have to be wiped.

Change-Id: Ie8ca6147f58da2fb3bedbc2679918c994c1ee5e3
Reviewed-on: http://gerrit.cloudera.org:8080/10268
Tested-by: Kudu Jenkins
Reviewed-by: Andrew Wong <aw...@cloudera.com>


Project: http://git-wip-us.apache.org/repos/asf/kudu/repo
Commit: http://git-wip-us.apache.org/repos/asf/kudu/commit/e45fbd1b
Tree: http://git-wip-us.apache.org/repos/asf/kudu/tree/e45fbd1b
Diff: http://git-wip-us.apache.org/repos/asf/kudu/diff/e45fbd1b

Branch: refs/heads/master
Commit: e45fbd1b038a27df7dde866e8b992c030d2471b6
Parents: 7201b06
Author: Will Berkeley <wd...@apache.org>
Authored: Tue May 1 11:57:23 2018 -0700
Committer: Will Berkeley <wd...@gmail.com>
Committed: Wed May 2 18:50:19 2018 +0000

----------------------------------------------------------------------
 docs/configuration.adoc | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/kudu/blob/e45fbd1b/docs/configuration.adoc
----------------------------------------------------------------------
diff --git a/docs/configuration.adoc b/docs/configuration.adoc
index 7e63588..390fc22 100644
--- a/docs/configuration.adoc
+++ b/docs/configuration.adoc
@@ -65,7 +65,11 @@ logs. The `--fs_metadata_dir` configuration indicates where Kudu will place
 metadata for each tablet. It is recommended, although not necessary, that these
 directories be placed on a high-performance drives with high bandwidth and low
 latency, e.g. solid-state drives. If `--fs_metadata_dir` is not specified,
-metadata will be placed in the directory specified by `--fs_wal_dir`.
+metadata will be placed in the directory specified by `--fs_wal_dir`. Since
+a Kudu node cannot tolerate the loss of its WAL or metadata directories, it
+may be wise to mirror the drives containing these directories in order to
+make recovering from a drive failure easier; however, mirroring may increase
+the latency of Kudu writes.
 
 The `--fs_data_dirs` configuration indicates where Kudu will write its data
 blocks. This is a comma-separated list of directories; if multiple values are