You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Andrew Wong (Code Review)" <ge...@cloudera.org> on 2016/09/01 18:55:58 UTC

[kudu-CR] Predicate evaluation pushdown

Andrew Wong has posted comments on this change.

Change subject: Predicate evaluation pushdown
......................................................................


Patch Set 7:

(97 comments)

http://gerrit.cloudera.org:8080/#/c/3990/3//COMMIT_MSG
Commit Message:

Line 16: https://github.com/anjuwong/kudu/blob/pred-pushdown/docs/decoder-eval-perf.md
> Nit: it would be nice to have the document in the Kudu source tree when it'
Will do.


Line 20: https://github.com/anjuwong/kudu/blob/sorted-dict-block/docs/design-docs/predicate-eval-pushdown.md
> Ditto.
Done


http://gerrit.cloudera.org:8080/#/c/3990/3/src/kudu/benchmarks/tpch/tpch_real_world.cc
File src/kudu/benchmarks/tpch/tpch_real_world.cc:

Line 87: DEFINE_string(tpch_path_to_dbgen_dir, ".",
> This does not look as a good default.  I think ".", "./tpch" or "./tpch-dbg
Done, this was not meant to be in the patch.


http://gerrit.cloudera.org:8080/#/c/3990/2/src/kudu/cfile/binary_dict_block.cc
File src/kudu/cfile/binary_dict_block.cc:

Line 252: // store their min/max values. CopyNextAndEval in these blocks could
> Ditto for the asterisk (star) as for the corresponding header file.
Done


Line 259:   ctx->SetDecoderEvalSupported();
> Is this still relevant?  I think you have implemented CopyNextAndEval for p
This could be extended to other blocks. If, for instance, in integer blocks, we begin storing the the min and max of the block, we could either short-circuit or evaluate slice-by-slice.

Right now, anything that doesn't support decoder-level evaluation will just set eval_complete to false and go down the copy&evaluate path.


Line 264: 
> terminate all comments with a period.
Done


Line 289:     // been cleared, in which case we can skip evaluation.
> nit: if I'm not mistaken, the style guide mentions if/else should be format
Done


http://gerrit.cloudera.org:8080/#/c/3990/3/src/kudu/cfile/binary_dict_block.cc
File src/kudu/cfile/binary_dict_block.cc:

Line 191:       dict_decoder_(iter->GetDictDecoder()),
> Why not to move those into the initializer list?
Done


http://gerrit.cloudera.org:8080/#/c/3990/2/src/kudu/cfile/binary_dict_block.h
File src/kudu/cfile/binary_dict_block.h:

Line 131:                          ColumnEvalContext *ctx,
> Nit: it seems in the majority of the code in this file the asterisk tends t
Done


Line 140:     data_decoder_->SeekForward(n);
> drop 'virtual' and replace OVERRIDE with override, here and above.
This didn't get picked up by the last patch, but it will be fixed.


http://gerrit.cloudera.org:8080/#/c/3990/3/src/kudu/cfile/binary_dict_block.h
File src/kudu/cfile/binary_dict_block.h:

Line 131:                          ColumnEvalContext *ctx,
> Style nit: the rest of this file uses the "asterisk is closer to the type" 
Done


http://gerrit.cloudera.org:8080/#/c/3990/3/src/kudu/cfile/binary_plain_block.cc
File src/kudu/cfile/binary_plain_block.cc:

Line 290:   Slice *out = reinterpret_cast<Slice *>(dst->data());
> If the 'i' variable is not needed outside of the cycle, why not to move it 
Done


http://gerrit.cloudera.org:8080/#/c/3990/7/src/kudu/cfile/binary_plain_block.cc
File src/kudu/cfile/binary_plain_block.cc:

Line 306: Status BinaryPlainBlockDecoder::CopyNextValues(size_t *n, ColumnDataView *dst) {
> I wonder whether this function could share code with the new one using some
Done


http://gerrit.cloudera.org:8080/#/c/3990/2/src/kudu/cfile/binary_plain_block.h
File src/kudu/cfile/binary_plain_block.h:

Line 110: 
> What if SeekForward returned OK only if it moved forward at the specified n
Fair point, may be confusing. An "incomplete" scan isn't an error, but rather it means we've reached the end of the decoder.
The status is not used, so it makes more sense for this to be void. Alternatively, the if HasNext() is false, it should return an error status and OK() otherwise.


http://gerrit.cloudera.org:8080/#/c/3990/3/src/kudu/cfile/binary_plain_block.h
File src/kudu/cfile/binary_plain_block.h:

Line 111:   void SeekForward(size_t *n) override {
> Does it make sense to implement this method in the base class instead?
I agree it'd be nice, but I'm hesitant since they are dependent on private members that may not be defined for all block encodings (e.g. dict blocks don't have a cur_idx_ member, but it is used in SeekForward).
We could just define these members in the base class and just leave them as unused when they aren't used, something to consider, definitely.
Edit: yes this can be done by using Count() and GetCurrentIdx(), but the issue is still that there needs to be functions that can modify cur_idx_, which is not a member of all blockdecoders.


Line 112:     DCHECK(HasNext());
> Why not to return non-OK status in that case instead?  I.e., always assign 
It isn't an error to not seek forward the specified number of positions. This case occurs when the batch is started near the end of a decoder. The batch may request 1000 rows, but the decoder itself may only have 100 left, in which case *n will be set to 100. This is the contract for CopyNextValues() as well.


http://gerrit.cloudera.org:8080/#/c/3990/2/src/kudu/cfile/block_encodings.h
File src/kudu/cfile/block_encodings.h:

PS2, Line 150: Fetch the next values from the block a
> I think this should be up to n, right?  That's at least the contract of Cop
You're right. *n should be set to the largest value <= to itself without going past the end of the block. The contract should be the same as CopyNextValues.


http://gerrit.cloudera.org:8080/#/c/3990/3/src/kudu/cfile/block_encodings.h
File src/kudu/cfile/block_encodings.h:

Line 162:                                  ColumnDataView *dst) {
> What if CopyNextValues() returned non-OK code?  Would it make sense to chec
Done. Should be wrapped with RETURN_NOT_OK()


http://gerrit.cloudera.org:8080/#/c/3990/4/src/kudu/cfile/block_encodings.h
File src/kudu/cfile/block_encodings.h:

Line 138:   virtual void SeekForward(size_t *n) = 0;
> I think it'd be better to use 'int' here, so you can DCHECK_GE(*n, 0) at th
I see, just to clarify, this would be used to detect cases where *n is too large and overflows.


Line 157:   // function is that the context is marked as supporting decoder eval. This
> I think there's a subtle requirement here that, given a particular encoding
Right, the current contract, although not explicitly enforced right now, is that all blocks of a block encoding will always set eval_complete to true or false. Enum state should work.


Line 162:                                  ColumnDataView *dst) {
> RETURN_NOT_OK
Done


http://gerrit.cloudera.org:8080/#/c/3990/3/src/kudu/cfile/bshuf_block.h
File src/kudu/cfile/bshuf_block.h:

Line 278:   void SeekForward(size_t *n) override {
> Consider returning non-OK status code (like Status::Incomplete()) instead o
See comments in other implementations of SeekForward.


http://gerrit.cloudera.org:8080/#/c/3990/2/src/kudu/cfile/cfile_reader.cc
File src/kudu/cfile/cfile_reader.cc:

Line 728:       if (ctx_->pred()->predicate_type() == PredicateType::None) {
> wrap comments at 80 columns if possible (100 max).
Done


http://gerrit.cloudera.org:8080/#/c/3990/4/src/kudu/cfile/cfile_reader.cc
File src/kudu/cfile/cfile_reader.cc:

Line 670:     }
> typo
Done


Line 728:       if (ctx_->pred()->predicate_type() == PredicateType::None) {
> I think a getter like 'ctx_->try_decoder_predicate_evaluation()' would be a
Done


Line 733:         if (ctx_->pred()->EvaluateCell(static_cast<const void *>(&cur_string))) {
> would rather add a wrapper in the test code instead of adding a compatibili
Done


Line 747: }
> should remove this
Done


Line 977: 
> again a getter would centralize these checks and make the code more readabl
Done


http://gerrit.cloudera.org:8080/#/c/3990/7/src/kudu/cfile/cfile_reader.cc
File src/kudu/cfile/cfile_reader.cc:

Line 488:   ctx->SetDecoderEvalNotSupported();
> shouldn't this be done in PrepareEvalContext?
Done. This was put here to prevent Scan from being called before SetDecoderEvalNotSupported was called. This only happened in cfile-test, but can definitely be avoided in test rather than in code.


PS7, Line 489: ctx->block()
> given the number of times you use ctx->block() in this function I think it'
Done


http://gerrit.cloudera.org:8080/#/c/3990/2/src/kudu/cfile/cfile_reader.h
File src/kudu/cfile/cfile_reader.h:

PS2, Line 385: dicate o
> predicate
Done


Line 387:   // is shared among the multiple BinaryDictBlockDecoders in a single cfile,
> This seems like a strange place to put this method; why isn't it done by th
A single vocabulary is shared among the entire cfile, which may contain multiple dictionary blocks. This means the predicate set itself is owned by the cfile, not the decoders, and this provides the decoders a way to access it.


http://gerrit.cloudera.org:8080/#/c/3990/4/src/kudu/cfile/cfile_reader.h
File src/kudu/cfile/cfile_reader.h:

Line 24: #include "kudu/common/columnblock.h"
> forward decl is sufficient
There seem to be a fair amount of dependencies that get pretty hairy when I forward declare this.


Line 222:   // Get the ordinal index that the iterator is currently pointed to.
> would be nice to avoid this "duplicate API" thing where one set of things g
Done


Line 261:   virtual Status FinishBatch() = 0;
> see above, would be good to collapse the two APIs
Done


Line 394: 
> this isn't returning a set of predicates, though -- it's returning a set of
Done


Line 395:   struct PreparedBlock {
> should rename to something more specifically about dictionary... maybe GetC
Right, my thinking was it is a set of everything that satisfies the predicate. I see your point, will change.


Line 468: 
> rename (per above)
Done


http://gerrit.cloudera.org:8080/#/c/3990/7/src/kudu/cfile/cfile_reader.h
File src/kudu/cfile/cfile_reader.h:

PS7, Line 390: ;
> nit: space after ;
Done


PS7, Line 463: or>cod
> nit: space after >
Done


http://gerrit.cloudera.org:8080/#/c/3990/3/src/kudu/cfile/plain_block.h
File src/kudu/cfile/plain_block.h:

Line 211:   virtual void SeekForward(size_t *n) OVERRIDE {
> Does it make sense to implement this method in the base class instead?
See replies in other implementations of SeekForward. The base abstract class does not have the right private members to implement this. Am considering exposing a public API for these members.


http://gerrit.cloudera.org:8080/#/c/3990/2/src/kudu/common/column_eval_context.h
File src/kudu/common/column_eval_context.h:

PS2, Line 17: 
> The style guide forbids non-const reference args and fields.  This is a lit
eval_complete is a flag that gets set at the decoder level to indicate that decoder-level evaluation is supported by the block. It tells the materializing iterator that the decoder is only smart enough to copy values and that it still needs to evaluate each cell.

It could take a pointer though, which would have the same functionality and be clearer.


PS2, Line 24: 
> ampersand should be with the type, same with the pointer * below.  I think 
Done


http://gerrit.cloudera.org:8080/#/c/3990/6/src/kudu/common/column_eval_context.h
File src/kudu/common/column_eval_context.h:

Line 36:     : col_idx_(col_idx),
> indent colon 4 spaces, and wrap list at commas
Done


http://gerrit.cloudera.org:8080/#/c/3990/7/src/kudu/common/column_eval_context.h
File src/kudu/common/column_eval_context.h:

Line 43:   // Column index in the parent CFileSet.
> I think this is incorrect and it's actually the index within the projection
Done


Line 57:   // Must be greater than size 0 and must be large enough to cover the number
> isn't "greater than size 0" redundant with "large enough to cover the numbe
Done


Line 79:   void SetDecoderEvalNotSupported() {
> particularly seems like here it should be a CHECK that it's kNotSet. Otherw
Done


http://gerrit.cloudera.org:8080/#/c/3990/2/src/kudu/common/column_predicate.cc
File src/kudu/common/column_predicate.cc:

Line 270: 
> move this above the anonymous namespace above, the function defined in it i
Done


Line 277:   // Going a step further we could do runtime codegen to inline the
> The IsNotNull and None branches shouldn't be necessary given the prior chec
Done


http://gerrit.cloudera.org:8080/#/c/3990/2/src/kudu/common/column_predicate.h
File src/kudu/common/column_predicate.h:

Line 130:   //   selection vector to 0.
> I think this is going to be really slow, because it forces a bunch of branc
Done


http://gerrit.cloudera.org:8080/#/c/3990/4/src/kudu/common/column_predicate.h
File src/kudu/common/column_predicate.h:

Line 128:   // This is evaluated as an 'AND' with the current contents of *sel:
> this comment can be more general -- i.e "when the predicate is based on a c
Done


Line 129:   // - If the predicate evaluates to false, sets the appropriate bit in the
> nit: "should not" is vague. "Do not"? What happens if they do? It seems fun
Right, I had written this comment and then changed the contract. This comment is deprecated.


Line 130:   //   selection vector to 0.
> should clarify that this is a physical_type (same below)
Done


http://gerrit.cloudera.org:8080/#/c/3990/7/src/kudu/common/generic_iterators.cc
File src/kudu/common/generic_iterators.cc:

PS7, Line 522:  = ColumnEvalContext
> can just be:
Done


Line 526:     if (ctx.pred()->predicate_type() == PredicateType::None) {
> does a None predicate ever make it this far into the code? I would think th
Put this here since my unit tests cover None operations, but yes, in a real scan, this won't be reached.


Line 549:     ColumnEvalContext ctx = ColumnEvalContext(col_idx,
> same
Done


Line 553:     ctx.SetDecoderEvalNotSupported();
> why's this necessary? shouldn't the nullptr predicate be sufficient for the
Done


http://gerrit.cloudera.org:8080/#/c/3990/7/src/kudu/common/generic_iterators.h
File src/kudu/common/generic_iterators.h:

Line 28: #include "kudu/common/column_predicate.h"
> this new include isn't used in this header file (include it in the .cc if i
Done


http://gerrit.cloudera.org:8080/#/c/3990/4/src/kudu/common/iterator.h
File src/kudu/common/iterator.h:

Line 24: #include "kudu/common/rowblock.h"
> I think you can get away with forward decls here
Done


Line 102: 
> is this still needed? It seems like the code is now calling the other overl
Done


Line 103:   // Finish the current batch.
> need doc.
Done


http://gerrit.cloudera.org:8080/#/c/3990/6/src/kudu/common/iterator.h
File src/kudu/common/iterator.h:

Line 35: class ScanSpec;
> keep this list sorted.
Done


http://gerrit.cloudera.org:8080/#/c/3990/3/src/kudu/common/rowblock.h
File src/kudu/common/rowblock.h:

Line 123: // and the view will move forward by the appropriate amount. In this way, the
> It's not clear why it would be necessary to have this wrapper for nullptr S
Done


Line 125: class SelectionVectorView {
> Nit: the check for sel_vec_ is already performed by the Advance() method.
Done


Line 133:     row_offset_ += skip;
> Why to advance the row_offset_ if there is no underlying data?  Please add 
There are some underlying structural issues that are being addressed. These all stem from an issue where the scan path diverges depending on the decoder-evaluation-support. Working on merging these paths, so should be able to avoid these.


Line 146:   }
> Nit: tab/shift is missing
Done


http://gerrit.cloudera.org:8080/#/c/3990/4/src/kudu/common/rowblock.h
File src/kudu/common/rowblock.h:

Line 121: // The SelectionVectorView keeps track of where in the selection vector a given
> doc
Done


PS4, Line 137: + r
> nit: space around ' + ' (here and below)
Done


Line 141:     BitmapClear(sel_vec_->mutable_bitmap(), row_offset_ + row_idx);
> all these if()s seem expensive as these are hot-path functions. Maybe we ca
The null selection vector is a result of there being two paths down which a scan can go. Collapsing them should solve this.


Line 168: // of the latter doesn't mean you _should_.
> SelectionVector * const sel_vec_;
Done


http://gerrit.cloudera.org:8080/#/c/3990/6/src/kudu/common/rowblock.h
File src/kudu/common/rowblock.h:

Line 132:     DCHECK_LE(skip, sel_vec_->nrows() - row_offset_);
> seems to me that these checks should take into account row_offset_
Right, should be comparing row_offset_+row_idx and nrows.


PS6, Line 154: 
> put * with type here and below
Done


Line 155: 
> This seems a little strange, I would have expected the method to return the
I didn't want to return a pointer to an off-word address, but I agree that the whole point of this class is to keep an moving window.
Regardless, I think we can get by by just having the clear/set functions. I don't see a case where we would want the actual view into the bitmap for any other reason than to set or clear. Even now, SetBit is unused, but I've kept it in since I think it will be useful in future features (e.g. OR evaluation).


http://gerrit.cloudera.org:8080/#/c/3990/3/src/kudu/common/types.h
File src/kudu/common/types.h:

PS3, Line 59: bool AreConsecutive(const void* a, const voi
> Consider introducing typedef for this functor.
This should not have been in this patch.


http://gerrit.cloudera.org:8080/#/c/3990/6/src/kudu/tablet/CMakeLists.txt
File src/kudu/tablet/CMakeLists.txt:

Line 99: ADD_KUDU_TEST(tablet-decoder-eval-test)
Not included in the patch.


http://gerrit.cloudera.org:8080/#/c/3990/6/src/kudu/tablet/cfile_set-test.cc
File src/kudu/tablet/cfile_set-test.cc:

Line 105:   }
> Insert a space above.
Done


PS6, Line 106: 
> I think this names a little confusing, since the wrapping and context are a
Done


PS6, Line 107: ator *iter,
             :                            size_t col_idx,
> wrapping
Done


http://gerrit.cloudera.org:8080/#/c/3990/7/src/kudu/tablet/cfile_set-test.cc
File src/kudu/tablet/cfile_set-test.cc:

Line 112:     ctx.SetDecoderEvalNotSupported();
> same comment here as elsewhere -- if there's no predicate we shouldn't have
Done


Line 115:  private:
> nit: add blank line before 'private:'
Done


http://gerrit.cloudera.org:8080/#/c/3990/3/src/kudu/tablet/cfile_set.cc
File src/kudu/tablet/cfile_set.cc:

Line 453:       Status s = col_iters_[i]->FinishBatch();
> nit: why WARNING, not ERROR?
I don't think not preparing the column would be a fatal error (it wouldn't cause a crash), so WARNING feels right. Other areas in this code use LOG(WARNING) as well.


http://gerrit.cloudera.org:8080/#/c/3990/4/src/kudu/tablet/delta_store.h
File src/kudu/tablet/delta_store.h:

Line 176:   // Returns true if there might exist deltas to be applied. It is safe to
> typo: "there there"
Done


http://gerrit.cloudera.org:8080/#/c/3990/4/src/kudu/tablet/deltafile.cc
File src/kudu/tablet/deltafile.cc:

Line 821:   // TODO: change the API to take in the col_to_apply and check for deltas on
> I think this could be better done by:
By the time this is called on the scan path, PrepareBatch() should indeed have already been called, although it would be good to do this check anyway.
DCHECK(prepared_) should suffice for the third bullet, since prepared_=true implies that PrepareBatch() has just been called, so the blocks should be popped to the point where we're within the current prepared batch.
As I understand it, if all of the delta blocks are past the end of the decoder, or if there are no blocks, ApplyUpdates should be a no-op.


http://gerrit.cloudera.org:8080/#/c/3990/4/src/kudu/tablet/deltamemstore.cc
File src/kudu/tablet/deltamemstore.cc:

Line 368:     return true;
> Consider using empty() method instead.
Done


Line 372:       return true;
> Consider using empty() method instead of size().
Done


http://gerrit.cloudera.org:8080/#/c/3990/2/src/kudu/tablet/memrowset.cc
File src/kudu/tablet/memrowset.cc:

Line 474: 
> Why is it necessary to have this alias?
It was there because there is a divergence in the scan-path whether or not you want the decoder-level evaluation or not. I am working on merging these paths so only one is needed (and PredPushedNextBlock will be taken out in favor of NextBlock with some flag)


http://gerrit.cloudera.org:8080/#/c/3990/4/src/kudu/tablet/tablet-decoder-eval-test.cc
File src/kudu/tablet/tablet-decoder-eval-test.cc:

Line 1: // Licensed to the Apache Software Foundation (ASF) under one
> nit: license header
Done


PS4, Line 92: 
> Is this passed by a reference and then modified by the function?  If yes, t
Done


PS4, Line 100: 
             :   void FillTestTablet(size_t cardinality, size_t strlen) {
             :     if (GetParam() == LARGE && !AllowSlowTests()) {
             :       return;
             :     }
             :     size_t nrows = static_cast<size_t>(GetParam());
             :     RowBuilder rb(client_schema_);
             : 
             :     LocalTabletWriter writer(tablet().get(), &client_schema_);
             :     KuduPartialRow row(&client_schema_);
             :     // Fill tablet with repeated pattern of strings.
             :     // e.g. Cardinality: 3, strlen: 2: "00", "01", "02", "00", "01", "02", ...
             :     for (int64_t i = 0; i < nrows; i++) {
             :       CHECK_OK(row.SetInt32(0, i));
             :       CHECK_OK(row.SetInt32(1, i * 10));
             :       CHECK_OK(row.SetStringCopy(2, StringPrintf(
             :               Substitute("%0$0$1", strlen, PRId64).c_str(),
             :               i%cardinality)));
             : 
             :  
> Is it possible to re-factor this separating the common code into a class me
Done


http://gerrit.cloudera.org:8080/#/c/3990/6/src/kudu/tablet/tablet-decoder-eval-test.cc
File src/kudu/tablet/tablet-decoder-eval-test.cc:

Line 35: #else
Can be shortened.


Line 44: 
Not used.


PS6, Line 89: per) {
> here and below, prefer to use stings::Substitute instead of StringPrintf.  
Looks like Substitute can't specify the a string length in formatting, but at least it can still get away from doing concat and to_string.


Line 96:       last_chunk_count = upper - lower;
> remove?
Done


PS6, Line 128: 
> const probably isn't necessary?
Done


http://gerrit.cloudera.org:8080/#/c/3990/4/src/kudu/tablet/tablet-test-util.h
File src/kudu/tablet/tablet-test-util.h:

PS4, Line 144: inline
> Why is it necessary to have this function in-lined?  Just because there isn
This is used to benchmark the times, so to get the behavior of just the code in this function, it makes sense for this to be inlined.


PS4, Line 145: int* fetched
> Should be passed by pointer, if following google's style guide: https://goo
Done


http://gerrit.cloudera.org:8080/#/c/3990/7/src/kudu/tablet/tablet-test-util.h
File src/kudu/tablet/tablet-test-util.h:

Line 146:                                                int limit = INT_MAX) {
> 'limit' is never used, right?
No, removed.


-- 
To view, visit http://gerrit.cloudera.org:8080/3990
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I31e4cce21e99f63b089d7c84410af8ed914cb576
Gerrit-PatchSet: 7
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong <an...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong
Gerrit-Reviewer: Dan Burkert <da...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes