You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Bankim Bhavsar (Code Review)" <ge...@cloudera.org> on 2020/06/05 17:13:01 UTC

[kudu-CR] WIP [perf] KUDU-3140 Heuristics to disable predicate evaluation for Bloom filter

Bankim Bhavsar has uploaded this change for review. ( http://gerrit.cloudera.org:8080/16036


Change subject: WIP [perf] KUDU-3140 Heuristics to disable predicate evaluation for Bloom filter
......................................................................

WIP [perf] KUDU-3140 Heuristics to disable predicate evaluation for Bloom filter

Column predicate evaluation can be expensive and ineffective column predicates
can waste CPU. TPCH Q9 exhibits significant regression of 50-96% on enabling
Bloom filter predicates. See KUDU-3140 for details.

This change adds simple heuristic taken from HDFS scanner in Impala
that basically checks for every 16 blocks and if a predicate has rejected
less than 10% of the rows scanned then disables the predicate.

The stats collection is enabled by default for all predicate types
but enforcement is only enabled for Bloom filter predicate type.

With Bloom filter predicate type, false positives are expected so
client is expected to do further filtering to remove false positives.
Kudu makes the decision to disable the predicate independently and doesn't
inform the client in this change which is okay for Bloom filter given
the rationale above.

TODO:
- Add/update tests
- Determine any missing iterators or code paths
- Update Bloom filter predicate client docs
- Need to determine whether these heuristics don't slow down other predicates

Change-Id: I10197800a01a1b34c7821ac879caf8d272cab8dd
---
M src/kudu/common/generic_iterators.cc
1 file changed, 189 insertions(+), 9 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/36/16036/1
-- 
To view, visit http://gerrit.cloudera.org:8080/16036
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I10197800a01a1b34c7821ac879caf8d272cab8dd
Gerrit-Change-Number: 16036
Gerrit-PatchSet: 1
Gerrit-Owner: Bankim Bhavsar <ba...@cloudera.com>

[kudu-CR] [perf] KUDU-3140 Heuristics to disable predicate evaluation for Bloom filter

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16036 )

Change subject: [perf] KUDU-3140 Heuristics to disable predicate evaluation for Bloom filter
......................................................................


Patch Set 4:

(3 comments)

Looking good! One more test suggestion, otherwise basically looks good to go.

http://gerrit.cloudera.org:8080/#/c/16036/4/src/kudu/client/predicate-test.cc
File src/kudu/client/predicate-test.cc:

http://gerrit.cloudera.org:8080/#/c/16036/4/src/kudu/client/predicate-test.cc@1513
PS4, Line 1513: TEST_F(PredicateTest, TestDisabledBloomFilterWithRepeatedStrings) {
Just to make sure we have all our iterators covered, could also add a case where the MRSs gets flushed, and when all the values are updated (i.e. in DMS or deltafiles)? IIRC those cases also exercise top-level iterators.


http://gerrit.cloudera.org:8080/#/c/16036/3/src/kudu/common/generic_iterators-test.cc
File src/kudu/common/generic_iterators-test.cc:

http://gerrit.cloudera.org:8080/#/c/16036/3/src/kudu/common/generic_iterators-test.cc@807
PS3, Line 807:   vector<int64_t> ints;
             :   auto bf_pred = CreateBloomFilterPredi
> Done
Thanks for the explanation! Very odd.. here's a relevant stack overflow post:
https://stackoverflow.com/questions/8016780/undefined-reference-to-static-constexpr-char


http://gerrit.cloudera.org:8080/#/c/16036/4/src/kudu/common/generic_iterators-test.cc
File src/kudu/common/generic_iterators-test.cc:

http://gerrit.cloudera.org:8080/#/c/16036/4/src/kudu/common/generic_iterators-test.cc@819
PS4, Line 819:   ASSERT_EQ(0, spec.predicates().size())
             :     <
nit: hmm typically we treat ASSERTs as function calls w.r.t spacing, which the GSG mentions should be either aligned with the above line, or at 4 spaces

https://google.github.io/styleguide/cppguide.html#Function_Calls



-- 
To view, visit http://gerrit.cloudera.org:8080/16036
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I10197800a01a1b34c7821ac879caf8d272cab8dd
Gerrit-Change-Number: 16036
Gerrit-PatchSet: 4
Gerrit-Owner: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Sat, 20 Jun 2020 03:29:04 +0000
Gerrit-HasComments: Yes

[kudu-CR] [perf] KUDU-3140 Heuristics to disable predicate evaluation for Bloom filter

Posted by "Bankim Bhavsar (Code Review)" <ge...@cloudera.org>.
Bankim Bhavsar has posted comments on this change. ( http://gerrit.cloudera.org:8080/16036 )

Change subject: [perf] KUDU-3140 Heuristics to disable predicate evaluation for Bloom filter
......................................................................


Patch Set 6:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/16036/6//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16036/6//COMMIT_MSG@43
PS6, Line 43: TODO: Run TPCH with latest patchset to ascertain no changes in results
            : before merging.
> I'm curious: has it been done?
Done.
https://gist.github.com/bbhavsar/0a773359b9225f014d353759a535c5be


http://gerrit.cloudera.org:8080/#/c/16036/6/src/kudu/common/generic_iterators-test.cc
File src/kudu/common/generic_iterators-test.cc:

http://gerrit.cloudera.org:8080/#/c/16036/6/src/kudu/common/generic_iterators-test.cc@671
PS6, Line 671: CheckColumnPredicatesAreEqual(vector<ColumnPredicate>({ c_equality, b_equality, a_range }),
             :                                   GetIteratorPredicatesForTests(iter));
> Wrap into NO_FATALS() ?
Done


http://gerrit.cloudera.org:8080/#/c/16036/6/src/kudu/common/generic_iterators-test.cc@714
PS6, Line 714: bool all_values
> Does it make sense to document the semantics of this parameter?
Done


http://gerrit.cloudera.org:8080/#/c/16036/6/src/kudu/common/predicate_effectiveness.h
File src/kudu/common/predicate_effectiveness.h:

http://gerrit.cloudera.org:8080/#/c/16036/6/src/kudu/common/predicate_effectiveness.h@36
PS6, Line 36: IsColumnPredicateDisableable
> Might the following be a better choice for the name of the function:
The term "disableable" does sound weird but it make the purpose obvious, i.e. column predicate could be disabled.
Adaptive on one hand is a more familiar word but it's not obvious what part of column predicate is adaptive. Adaptive would have been more appropriate if the predicate becomes more restrictive or less restrictive based on some feedback, like a moving gauge.

So I'm sticking with disableable.


http://gerrit.cloudera.org:8080/#/c/16036/6/src/kudu/common/predicate_effectiveness.h@44
PS6, Line 44:   explicit PredicateEffectivenessContext(const ColumnPredicate* pred) :
            :     pred(pred),
            :     rows_read(0),
            :     rows_rejected(0),
            :     enabled(true) {
            :   }
> nit: per code style guide it should be formatted a bit different
Done


http://gerrit.cloudera.org:8080/#/c/16036/6/src/kudu/common/predicate_effectiveness.h@75
PS6, Line 75: return it->second;
> What happens if in non-debug build an invalid index is specified?
Done


http://gerrit.cloudera.org:8080/#/c/16036/6/src/kudu/common/predicate_effectiveness.h@81
PS6, Line 81: return it->second;
> ditto
Done


http://gerrit.cloudera.org:8080/#/c/16036/6/src/kudu/mini-cluster/internal_mini_cluster.cc
File src/kudu/mini-cluster/internal_mini_cluster.cc:

http://gerrit.cloudera.org:8080/#/c/16036/6/src/kudu/mini-cluster/internal_mini_cluster.cc@280
PS6, Line 280: void
> Why not to return Status instead of calling CHECK() if something goes not a
Done


http://gerrit.cloudera.org:8080/#/c/16036/6/src/kudu/tablet/tablet_metrics.cc
File src/kudu/tablet/tablet_metrics.cc:

http://gerrit.cloudera.org:8080/#/c/16036/6/src/kudu/tablet/tablet_metrics.cc@85
PS6, Line 85: disableable
> adaptive ?
Same reasoning as above about adaptive v/s disableable.



-- 
To view, visit http://gerrit.cloudera.org:8080/16036
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I10197800a01a1b34c7821ac879caf8d272cab8dd
Gerrit-Change-Number: 16036
Gerrit-PatchSet: 6
Gerrit-Owner: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Thu, 25 Jun 2020 23:00:36 +0000
Gerrit-HasComments: Yes

[kudu-CR] WIP [perf] KUDU-3140 Heuristics to disable predicate evaluation for Bloom filter

Posted by "Bankim Bhavsar (Code Review)" <ge...@cloudera.org>.
Bankim Bhavsar has posted comments on this change. ( http://gerrit.cloudera.org:8080/16036 )

Change subject: WIP [perf] KUDU-3140 Heuristics to disable predicate evaluation for Bloom filter
......................................................................


Patch Set 3:

(12 comments)

http://gerrit.cloudera.org:8080/#/c/16036/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16036/3//COMMIT_MSG@14
PS3, Line 14: for every 16 blocks 
> Do you know how large these blocks are by default? Just want to make sure w
Thanks for pointing that out.

I checked kudu by default uses block of 128 rows.
From Wenzhe, HDFS scanner in Impala uses default block of 1024 rows.

To keep the heuristic same as HDFS, the skip block count will need to be 128. (Haven't made the change)


http://gerrit.cloudera.org:8080/#/c/16036/3//COMMIT_MSG@26
PS3, Line 26: TODO:
> Would also be nice to write a test that leverages a tablet and some string 
Trying to understand the reason before I make the change.
This'll allow testing with dict encoded repeated strings and hence whether predicate evaluation is disabled at decoder level?


http://gerrit.cloudera.org:8080/#/c/16036/3/src/kudu/common/generic_iterators-test.cc
File src/kudu/common/generic_iterators-test.cc:

http://gerrit.cloudera.org:8080/#/c/16036/3/src/kudu/common/generic_iterators-test.cc@684
PS3, Line 684:  public:
> nit: I don't really see why these all need to be a part of the same test cl
Moved the 2 public methods directly as part of the Test outside the class.
Using a class helps
- Use the parameterized test
- Keep the common static const variables confined to the relevant tests instead of making them available for entire file


http://gerrit.cloudera.org:8080/#/c/16036/3/src/kudu/common/generic_iterators-test.cc@701
PS3, Line 701:     google::FlagSaver saver;
> Tests have a member flagsaver already, so we don't need this. See test_util
Done


http://gerrit.cloudera.org:8080/#/c/16036/3/src/kudu/common/generic_iterators-test.cc@739
PS3, Line 739: << "I
> nit: spacing, here and elsewhere
Done


http://gerrit.cloudera.org:8080/#/c/16036/3/src/kudu/common/generic_iterators-test.cc@807
PS3, Line 807: // Despite being a static constexpr integer, this static variable needs explicit declaration
             : // to avoid linker error. Not sure why.
> Should probably either figure out why, or make it static const or somesuch 
Done


http://gerrit.cloudera.org:8080/#/c/16036/3/src/kudu/common/generic_iterators.cc
File src/kudu/common/generic_iterators.cc:

http://gerrit.cloudera.org:8080/#/c/16036/3/src/kudu/common/generic_iterators.cc@1282
PS3, Line 1282:     bool disableable_ineffective_predicate =
              :         effectiveness_ctx != nullptr && !effectiveness_ctx->IsPredicateEnabled();
              :     // True: a disableable predicate that is determined to be effective.
              :     // False: a non-disableable predicate or a disableable predicate that is determined to be
              :     // ineffective.
              :     bool disableable_effective_predicate =
> nit: I think the names would be a bit easier to follow as "disableable_pred
Changed the names.

Re: True/False
At first, it's tempting to simply use negation of first variable for effective case but that would yield unexpected result. So the comment helps for understanding the use. Unless there is strong opposition I'd rather have that comment.


http://gerrit.cloudera.org:8080/#/c/16036/3/src/kudu/common/generic_iterators.cc@1288
PS3, Line 1288: effectiveness_ctx != nullptr
> nit: can drop the != nullptr, same elsewhere
Done


http://gerrit.cloudera.org:8080/#/c/16036/3/src/kudu/common/predicate_effectiveness.h
File src/kudu/common/predicate_effectiveness.h:

http://gerrit.cloudera.org:8080/#/c/16036/3/src/kudu/common/predicate_effectiveness.h@37
PS3, Line 37:   ColumnPredicate* pred;
> nit: could be const
Done


http://gerrit.cloudera.org:8080/#/c/16036/3/src/kudu/common/predicate_effectiveness.h@58
PS3, Line 58: track where
> nit: add a comma between these, otherwise it's easy to misread this
Done


http://gerrit.cloudera.org:8080/#/c/16036/3/src/kudu/common/predicate_effectiveness.h@87
PS3, Line 87: 16 (default) blocks, ratio of rows rejected by a predicate.
            :   // If the rejection ratio is less than 10% 
> nit: these values might change, so how about referring to these values by t
Done


http://gerrit.cloudera.org:8080/#/c/16036/3/src/kudu/common/predicate_effectiveness.cc
File src/kudu/common/predicate_effectiveness.cc:

http://gerrit.cloudera.org:8080/#/c/16036/3/src/kudu/common/predicate_effectiveness.cc@59
PS3, Line 59: id
> nit: idx
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/16036
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I10197800a01a1b34c7821ac879caf8d272cab8dd
Gerrit-Change-Number: 16036
Gerrit-PatchSet: 3
Gerrit-Owner: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Fri, 12 Jun 2020 22:43:25 +0000
Gerrit-HasComments: Yes

[kudu-CR] [perf] KUDU-3140 Heuristics to disable predicate evaluation for Bloom filter

Posted by "Bankim Bhavsar (Code Review)" <ge...@cloudera.org>.
Bankim Bhavsar has posted comments on this change. ( http://gerrit.cloudera.org:8080/16036 )

Change subject: [perf] KUDU-3140 Heuristics to disable predicate evaluation for Bloom filter
......................................................................


Patch Set 6:

> Patch Set 5:
> 
> > Patch Set 5: Verified-1
> > 
> > Build Failed 
> > 
> > http://jenkins.kudu.apache.org/job/kudu-gerrit/21853/ : FAILURE
> 
> TSAN failure occurred while flushing the session even before running the scan request or flushing the tablet for the (true, true) combination.
> 
> Not sure what needs to be done.
> 
> Would reducing number of rows being inserted (from current 10k) help?
> 
>  W0622 23:54:15.658417  8665 meta_cache.cc:265] Tablet 887360f8665b479c8c97fdb0bec62d78: Replica fa6ae0706a484d558a473c865defee88 (127.5.172.65:37891) has failed: Timed out: Write RPC to 127.5.172.65:37891 timed out after 10.000s (SENT)
>  W0622 23:54:15.659335  8665 batcher.cc:435] Timed out: Failed to write batch of 4566 ops to tablet 887360f8665b479c8c97fdb0bec62d78 after 1 attempt(s): Failed to write to server: fa6ae0706a484d558a473c865defee88 (127.5.172.65:37891): Write RPC to 127.5.172.65:37891 timed out after 10.000s (SENT)
>  W0622 23:54:15.846995  8665 outbound_call.cc:285] RPC callback for RPC call kudu.tserver.TabletServerService.Write -> {remote=127.5.172.65:37891, user_credentials={real_user=slave}} blocked reactor thread for 189029us
>  W0622 23:54:16.848306  8665 meta_cache.cc:265] Tablet 887360f8665b479c8c97fdb0bec62d78: Replica fa6ae0706a484d558a473c865defee88 (127.5.172.65:37891) has failed: Timed out: Write RPC to 127.5.172.65:37891 timed out after 10.000s (SENT)
>  W0622 23:54:16.848824  8665 batcher.cc:435] Timed out: Failed to write batch of 4439 ops to tablet 887360f8665b479c8c97fdb0bec62d78 after 1 attempt(s): Failed to write to server: fa6ae0706a484d558a473c865defee88 (127.5.172.65:37891): Write RPC to 127.5.172.65:37891 timed out after 10.000s (SENT)

Timeout error is been for (true, true) case on invoking session->Flush() after updating rows.

Added intermediate session flushes and ran TSAN test in a loop 100 times. No failures were observed.


-- 
To view, visit http://gerrit.cloudera.org:8080/16036
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I10197800a01a1b34c7821ac879caf8d272cab8dd
Gerrit-Change-Number: 16036
Gerrit-PatchSet: 6
Gerrit-Owner: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Tue, 23 Jun 2020 18:17:26 +0000
Gerrit-HasComments: No

[kudu-CR] WIP [perf] KUDU-3140 Heuristics to disable predicate evaluation for Bloom filter

Posted by "Bankim Bhavsar (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, Andrew Wong, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/16036

to look at the new patch set (#3).

Change subject: WIP [perf] KUDU-3140 Heuristics to disable predicate evaluation for Bloom filter
......................................................................

WIP [perf] KUDU-3140 Heuristics to disable predicate evaluation for Bloom filter

Column predicate evaluation can be expensive and ineffective column predicates
can waste CPU. TPCH Q9 exhibits significant regression of 50-96% on enabling
Bloom filter predicates. See KUDU-3140 for details.

This change adds simple heuristic taken from HDFS scanner in Impala
that basically checks for every 16 blocks and if a predicate has rejected
less than 10% of the rows scanned then disables the predicate.

The stats collection and enforcement is enabled only for disableable
predicate types, Bloom filter for now.

With Bloom filter predicate type, false positives are expected so
client is expected to do further filtering to remove false positives.
Kudu makes the decision to disable the predicate independently and doesn't
inform the client in this change which is okay for Bloom filter given
the rationale above.

TODO:
- Run TPCH with this change to determine whether this change helps Q9
that exhibits regression and doesn't regress other queries.
- Need to determine whether these heuristics don't slow down other predicates
(Only disableable predicates are tracked with latest revision so this
shouldn't be a concern)
- Update Bloom filter predicate client docs if we see improvement above.

Change-Id: I10197800a01a1b34c7821ac879caf8d272cab8dd
---
M src/kudu/common/CMakeLists.txt
M src/kudu/common/column_materialization_context.h
M src/kudu/common/generic_iterators-test.cc
M src/kudu/common/generic_iterators.cc
A src/kudu/common/predicate_effectiveness.cc
A src/kudu/common/predicate_effectiveness.h
6 files changed, 460 insertions(+), 12 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/36/16036/3
-- 
To view, visit http://gerrit.cloudera.org:8080/16036
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I10197800a01a1b34c7821ac879caf8d272cab8dd
Gerrit-Change-Number: 16036
Gerrit-PatchSet: 3
Gerrit-Owner: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR] [perf] KUDU-3140 Heuristics to disable predicate evaluation for Bloom filter

Posted by "Bankim Bhavsar (Code Review)" <ge...@cloudera.org>.
Hello Alexey Serbin, Kudu Jenkins, Andrew Wong, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/16036

to look at the new patch set (#7).

Change subject: [perf] KUDU-3140 Heuristics to disable predicate evaluation for Bloom filter
......................................................................

[perf] KUDU-3140 Heuristics to disable predicate evaluation for Bloom filter

Column predicate evaluation can be expensive and ineffective column predicates
can waste CPU. TPCH Q9 exhibits significant regression of 50-96% on enabling
Bloom filter predicates. See KUDU-3140 for details.

Excerpt from TPCH run exhibiting regression:
https://gist.github.com/bbhavsar/943cf8ebbab63f598353efef8f87db32
TPCH Q9 specific info:
https://gist.github.com/bbhavsar/811ccbe0cd144090f82bdabcd801f827

This change adds simple heuristic taken from HDFS scanner in Impala
that basically checks for every 16 blocks and if a predicate has rejected
less than 10% of the rows scanned then disables the predicate.

To match the equivalent number of rows in Kudu, the check is made
every 128 blocks by default.

The stats collection and enforcement is enabled only for disableable
predicate types, Bloom filter for now.

With Bloom filter predicate type, false positives are expected so
client is expected to do further filtering to remove false positives.
Kudu makes the decision to disable the predicate independently and doesn't
inform the client in this change which is okay for Bloom filter given
the rationale above. Client API docs have been updated accordingly.

Added a tablet level metric to track disabled column predicates.

Tests with PS6:
- TPCH no longer reports regression with Q9. With multiple runs,
  the delta are +1.95%, -24.67%, +2.67%, -17.09%, -14.59% with a std dev
  of 17% - 38% to report it neither as improvement nor as regression.
  https://gist.github.com/bbhavsar/0a773359b9225f014d353759a535c5be
- Improvements with other queries reported before this change remain intact.

Change-Id: I10197800a01a1b34c7821ac879caf8d272cab8dd
---
M src/kudu/client/client-test.cc
M src/kudu/client/client.h
M src/kudu/client/predicate-test.cc
M src/kudu/common/CMakeLists.txt
M src/kudu/common/column_materialization_context.h
M src/kudu/common/generic_iterators-test.cc
M src/kudu/common/generic_iterators.cc
M src/kudu/common/generic_iterators.h
M src/kudu/common/iterator_stats.cc
M src/kudu/common/iterator_stats.h
A src/kudu/common/predicate_effectiveness.cc
A src/kudu/common/predicate_effectiveness.h
M src/kudu/mini-cluster/internal_mini_cluster.cc
M src/kudu/mini-cluster/internal_mini_cluster.h
M src/kudu/tablet/tablet_metrics.cc
M src/kudu/tablet/tablet_metrics.h
M src/kudu/tserver/tablet_service.cc
17 files changed, 768 insertions(+), 75 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/36/16036/7
-- 
To view, visit http://gerrit.cloudera.org:8080/16036
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I10197800a01a1b34c7821ac879caf8d272cab8dd
Gerrit-Change-Number: 16036
Gerrit-PatchSet: 7
Gerrit-Owner: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR] WIP [perf] KUDU-3140 Heuristics to disable predicate evaluation for Bloom filter

Posted by "Bankim Bhavsar (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/16036

to look at the new patch set (#2).

Change subject: WIP [perf] KUDU-3140 Heuristics to disable predicate evaluation for Bloom filter
......................................................................

WIP [perf] KUDU-3140 Heuristics to disable predicate evaluation for Bloom filter

Column predicate evaluation can be expensive and ineffective column predicates
can waste CPU. TPCH Q9 exhibits significant regression of 50-96% on enabling
Bloom filter predicates. See KUDU-3140 for details.

This change adds simple heuristic taken from HDFS scanner in Impala
that basically checks for every 16 blocks and if a predicate has rejected
less than 10% of the rows scanned then disables the predicate.

The stats collection is enabled by default for all predicate types
but enforcement is only enabled for Bloom filter predicate type.

With Bloom filter predicate type, false positives are expected so
client is expected to do further filtering to remove false positives.
Kudu makes the decision to disable the predicate independently and doesn't
inform the client in this change which is okay for Bloom filter given
the rationale above.

TODO:
- Run TPCH with this change to determine whether this change helps Q9
that exhibits regression and doesn't regress other queries.
- Need to determine whether these heuristics don't slow down other predicates.
- Update Bloom filter predicate client docs if we see improvement above.

Change-Id: I10197800a01a1b34c7821ac879caf8d272cab8dd
---
M src/kudu/common/CMakeLists.txt
M src/kudu/common/generic_iterators-test.cc
M src/kudu/common/generic_iterators.cc
A src/kudu/common/predicate_effectiveness.cc
A src/kudu/common/predicate_effectiveness.h
5 files changed, 444 insertions(+), 9 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/36/16036/2
-- 
To view, visit http://gerrit.cloudera.org:8080/16036
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I10197800a01a1b34c7821ac879caf8d272cab8dd
Gerrit-Change-Number: 16036
Gerrit-PatchSet: 2
Gerrit-Owner: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR] [perf] KUDU-3140 Heuristics to disable predicate evaluation for Bloom filter

Posted by "Bankim Bhavsar (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, Andrew Wong, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/16036

to look at the new patch set (#5).

Change subject: [perf] KUDU-3140 Heuristics to disable predicate evaluation for Bloom filter
......................................................................

[perf] KUDU-3140 Heuristics to disable predicate evaluation for Bloom filter

Column predicate evaluation can be expensive and ineffective column predicates
can waste CPU. TPCH Q9 exhibits significant regression of 50-96% on enabling
Bloom filter predicates. See KUDU-3140 for details.

Excerpt from TPCH run exhibiting regression:
https://gist.github.com/bbhavsar/943cf8ebbab63f598353efef8f87db32
TPCH Q9 specific info:
https://gist.github.com/bbhavsar/811ccbe0cd144090f82bdabcd801f827

This change adds simple heuristic taken from HDFS scanner in Impala
that basically checks for every 16 blocks and if a predicate has rejected
less than 10% of the rows scanned then disables the predicate.

To match the equivalent number of rows in Kudu, the check is made
every 128 blocks by default.

The stats collection and enforcement is enabled only for disableable
predicate types, Bloom filter for now.

With Bloom filter predicate type, false positives are expected so
client is expected to do further filtering to remove false positives.
Kudu makes the decision to disable the predicate independently and doesn't
inform the client in this change which is okay for Bloom filter given
the rationale above. Client API docs have been updated accordingly.

Added a tablet level metric to track disabled column predicates.

Tests with PS1:
- TPCH no longer reports regression with Q9. With multiple runs,
  the delta is -25% to +9% with a high std dev of 22% to report it neither as
  improvement nor as regression.
  https://gist.github.com/bbhavsar/45defc689dce31b88eb646b946a65493
- Improvements with other queries reported before this change remain intact.

TODO: Run TPCH with latest patchset to ascertain no changes in results
before merging.

Change-Id: I10197800a01a1b34c7821ac879caf8d272cab8dd
---
M src/kudu/client/client-test.cc
M src/kudu/client/client.h
M src/kudu/client/predicate-test.cc
M src/kudu/common/CMakeLists.txt
M src/kudu/common/column_materialization_context.h
M src/kudu/common/generic_iterators-test.cc
M src/kudu/common/generic_iterators.cc
M src/kudu/common/generic_iterators.h
M src/kudu/common/iterator_stats.cc
M src/kudu/common/iterator_stats.h
A src/kudu/common/predicate_effectiveness.cc
A src/kudu/common/predicate_effectiveness.h
M src/kudu/mini-cluster/internal_mini_cluster.cc
M src/kudu/mini-cluster/internal_mini_cluster.h
M src/kudu/tablet/tablet_metrics.cc
M src/kudu/tablet/tablet_metrics.h
M src/kudu/tserver/tablet_service.cc
17 files changed, 752 insertions(+), 75 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/36/16036/5
-- 
To view, visit http://gerrit.cloudera.org:8080/16036
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I10197800a01a1b34c7821ac879caf8d272cab8dd
Gerrit-Change-Number: 16036
Gerrit-PatchSet: 5
Gerrit-Owner: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR] WIP [perf] KUDU-3140 Heuristics to disable predicate evaluation for Bloom filter

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16036 )

Change subject: WIP [perf] KUDU-3140 Heuristics to disable predicate evaluation for Bloom filter
......................................................................


Patch Set 2:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/16036/2/src/kudu/common/generic_iterators.cc
File src/kudu/common/generic_iterators.cc:

http://gerrit.cloudera.org:8080/#/c/16036/2/src/kudu/common/generic_iterators.cc@1290
PS2, Line 1290: num_rows_before
Shouldn't we be measuring this before the call to MaterializeColumn(), which might evaluate some predicates? Otherwise, wouldn't dictionary/RLE encoded cells not be accounted for, even though they still consume CPU to evaluate the BF predicate?


http://gerrit.cloudera.org:8080/#/c/16036/2/src/kudu/common/predicate_effectiveness.h
File src/kudu/common/predicate_effectiveness.h:

http://gerrit.cloudera.org:8080/#/c/16036/2/src/kudu/common/predicate_effectiveness.h@30
PS2, Line 30: // Per column predicate effectiveness context.
            : // Though all column predicate types are tracked, effectiveness of only Bloom filter predicate
            : // is currently enforced.
I don't agree with making this generic for all predicates. It seems like effectiveness/disabling of a predicate only makes sense in the context of a bloom filter because applications that use bloom filters typically accept false positives. That isn't the case for any other kind of predicate, where exact evaluation of the predicate is required. That makes me think this should be a bit more specialized.

Did you consider making the IteratorPredicateEffectivenessContext only track "disableable" predicates like bloom filters? eg by storing a {predicate_idx => predicate context} map instead of a vector, or storing a vector of is_disableable bools?


http://gerrit.cloudera.org:8080/#/c/16036/2/src/kudu/common/predicate_effectiveness.h@38
PS2, Line 38:  : type(type), rows_read(0),
            :                                                                rows_rejected(0), enabled(true) {
micro-nit: would prefer putting these on a new line (at least I tend to favor narrower code where convenient).


http://gerrit.cloudera.org:8080/#/c/16036/2/src/kudu/common/predicate_effectiveness.h@83
PS2, Line 83: CheckEffectiveness
nit: it isn't really clear from this naming that this also disables the predicates. Maybe rename to DisableIneffectivePredicates() or somesuch?

Also, why bother returning the disabled predicates? Seems like they're just used for logging -- can we push the logging into this class, eg by storing a pointer to the predicate in the effectiveness context instead of just the type?


http://gerrit.cloudera.org:8080/#/c/16036/2/src/kudu/common/predicate_effectiveness.h@88
PS2, Line 88: co-ordinate
nit: coordinate



-- 
To view, visit http://gerrit.cloudera.org:8080/16036
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I10197800a01a1b34c7821ac879caf8d272cab8dd
Gerrit-Change-Number: 16036
Gerrit-PatchSet: 2
Gerrit-Owner: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Wed, 10 Jun 2020 02:12:41 +0000
Gerrit-HasComments: Yes

[kudu-CR] [perf] KUDU-3140 Heuristics to disable predicate evaluation for Bloom filter

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/16036 )

Change subject: [perf] KUDU-3140 Heuristics to disable predicate evaluation for Bloom filter
......................................................................


Patch Set 6:

(9 comments)

few nits

http://gerrit.cloudera.org:8080/#/c/16036/6//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16036/6//COMMIT_MSG@43
PS6, Line 43: TODO: Run TPCH with latest patchset to ascertain no changes in results
            : before merging.
I'm curious: has it been done?


http://gerrit.cloudera.org:8080/#/c/16036/6/src/kudu/common/generic_iterators-test.cc
File src/kudu/common/generic_iterators-test.cc:

http://gerrit.cloudera.org:8080/#/c/16036/6/src/kudu/common/generic_iterators-test.cc@671
PS6, Line 671: CheckColumnPredicatesAreEqual(vector<ColumnPredicate>({ c_equality, b_equality, a_range }),
             :                                   GetIteratorPredicatesForTests(iter));
Wrap into NO_FATALS() ?


http://gerrit.cloudera.org:8080/#/c/16036/6/src/kudu/common/generic_iterators-test.cc@714
PS6, Line 714: bool all_values
Does it make sense to document the semantics of this parameter?


http://gerrit.cloudera.org:8080/#/c/16036/6/src/kudu/common/predicate_effectiveness.h
File src/kudu/common/predicate_effectiveness.h:

http://gerrit.cloudera.org:8080/#/c/16036/6/src/kudu/common/predicate_effectiveness.h@36
PS6, Line 36: IsColumnPredicateDisableable
Might the following be a better choice for the name of the function:
  IsColumnPredicateAdaptive()

?


http://gerrit.cloudera.org:8080/#/c/16036/6/src/kudu/common/predicate_effectiveness.h@44
PS6, Line 44:   explicit PredicateEffectivenessContext(const ColumnPredicate* pred) :
            :     pred(pred),
            :     rows_read(0),
            :     rows_rejected(0),
            :     enabled(true) {
            :   }
nit: per code style guide it should be formatted a bit different

explicit PredicateEffectivenessContext(const ColumnPredicate* pred)
    : pred(pred),
      rows_read(0),
      rows_rejected(0),
      enabled(true) {
}


http://gerrit.cloudera.org:8080/#/c/16036/6/src/kudu/common/predicate_effectiveness.h@75
PS6, Line 75: return it->second;
What happens if in non-debug build an invalid index is specified?


http://gerrit.cloudera.org:8080/#/c/16036/6/src/kudu/common/predicate_effectiveness.h@81
PS6, Line 81: return it->second;
ditto


http://gerrit.cloudera.org:8080/#/c/16036/6/src/kudu/mini-cluster/internal_mini_cluster.cc
File src/kudu/mini-cluster/internal_mini_cluster.cc:

http://gerrit.cloudera.org:8080/#/c/16036/6/src/kudu/mini-cluster/internal_mini_cluster.cc@280
PS6, Line 280: void
Why not to return Status instead of calling CHECK() if something goes not as expected?  Isn't it possible to recover from an error in such case?


http://gerrit.cloudera.org:8080/#/c/16036/6/src/kudu/tablet/tablet_metrics.cc
File src/kudu/tablet/tablet_metrics.cc:

http://gerrit.cloudera.org:8080/#/c/16036/6/src/kudu/tablet/tablet_metrics.cc@85
PS6, Line 85: disableable
adaptive ?



-- 
To view, visit http://gerrit.cloudera.org:8080/16036
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I10197800a01a1b34c7821ac879caf8d272cab8dd
Gerrit-Change-Number: 16036
Gerrit-PatchSet: 6
Gerrit-Owner: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Wed, 24 Jun 2020 19:11:56 +0000
Gerrit-HasComments: Yes

[kudu-CR] WIP [perf] KUDU-3140 Heuristics to disable predicate evaluation for Bloom filter

Posted by "Bankim Bhavsar (Code Review)" <ge...@cloudera.org>.
Bankim Bhavsar has posted comments on this change. ( http://gerrit.cloudera.org:8080/16036 )

Change subject: WIP [perf] KUDU-3140 Heuristics to disable predicate evaluation for Bloom filter
......................................................................


Patch Set 3:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/16036/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16036/3//COMMIT_MSG@14
PS3, Line 14: for every 16 blocks 
> Thanks for pointing that out.
Done


http://gerrit.cloudera.org:8080/#/c/16036/3//COMMIT_MSG@26
PS3, Line 26: TODO:
> Yeah, and it'd also be nice to make sure that the iterators we think are be
Done


http://gerrit.cloudera.org:8080/#/c/16036/3/src/kudu/common/generic_iterators.cc
File src/kudu/common/generic_iterators.cc:

http://gerrit.cloudera.org:8080/#/c/16036/3/src/kudu/common/generic_iterators.cc@1282
PS3, Line 1282:     bool disableable_ineffective_predicate =
              :         effectiveness_ctx != nullptr && !effectiveness_ctx->IsPredicateEnabled();
              :     // True: a disableable predicate that is determined to be effective.
              :     // False: a non-disableable predicate or a disableable predicate that is determined to be
              :     // ineffective.
              :     bool disableable_effective_predicate =
> Sure, but these are quite verbose, and I think add more cognitive overhead 
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/16036
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I10197800a01a1b34c7821ac879caf8d272cab8dd
Gerrit-Change-Number: 16036
Gerrit-PatchSet: 3
Gerrit-Owner: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Wed, 17 Jun 2020 16:41:21 +0000
Gerrit-HasComments: Yes

[kudu-CR] WIP [perf] KUDU-3140 Heuristics to disable predicate evaluation for Bloom filter

Posted by "Bankim Bhavsar (Code Review)" <ge...@cloudera.org>.
Bankim Bhavsar has posted comments on this change. ( http://gerrit.cloudera.org:8080/16036 )

Change subject: WIP [perf] KUDU-3140 Heuristics to disable predicate evaluation for Bloom filter
......................................................................


Patch Set 2:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/16036/2/src/kudu/common/generic_iterators.cc
File src/kudu/common/generic_iterators.cc:

http://gerrit.cloudera.org:8080/#/c/16036/2/src/kudu/common/generic_iterators.cc@1290
PS2, Line 1290: num_rows_before
> Shouldn't we be measuring this before the call to MaterializeColumn(), whic
Thanks for pointing that out. Definitely missed it.

This would entail passing down the info that the predicate evaluation must be skipped. Tagging along DecoderEvalNotSupported() as it's already used to disable decoder level evaluation.


http://gerrit.cloudera.org:8080/#/c/16036/2/src/kudu/common/predicate_effectiveness.h
File src/kudu/common/predicate_effectiveness.h:

http://gerrit.cloudera.org:8080/#/c/16036/2/src/kudu/common/predicate_effectiveness.h@30
PS2, Line 30: // Per column predicate effectiveness context.
            : // Though all column predicate types are tracked, effectiveness of only Bloom filter predicate
            : // is currently enforced.
> I don't agree with making this generic for all predicates. It seems like ef
Done


http://gerrit.cloudera.org:8080/#/c/16036/2/src/kudu/common/predicate_effectiveness.h@38
PS2, Line 38:  : type(type), rows_read(0),
            :                                                                rows_rejected(0), enabled(true) {
> micro-nit: would prefer putting these on a new line (at least I tend to fav
Done


http://gerrit.cloudera.org:8080/#/c/16036/2/src/kudu/common/predicate_effectiveness.h@83
PS2, Line 83: CheckEffectiveness
> nit: it isn't really clear from this naming that this also disables the pre
Done


http://gerrit.cloudera.org:8080/#/c/16036/2/src/kudu/common/predicate_effectiveness.h@88
PS2, Line 88: co-ordinate
> nit: coordinate
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/16036
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I10197800a01a1b34c7821ac879caf8d272cab8dd
Gerrit-Change-Number: 16036
Gerrit-PatchSet: 2
Gerrit-Owner: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Thu, 11 Jun 2020 02:35:18 +0000
Gerrit-HasComments: Yes

[kudu-CR] [perf] KUDU-3140 Heuristics to disable predicate evaluation for Bloom filter

Posted by "Bankim Bhavsar (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, Andrew Wong, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/16036

to look at the new patch set (#4).

Change subject: [perf] KUDU-3140 Heuristics to disable predicate evaluation for Bloom filter
......................................................................

[perf] KUDU-3140 Heuristics to disable predicate evaluation for Bloom filter

Column predicate evaluation can be expensive and ineffective column predicates
can waste CPU. TPCH Q9 exhibits significant regression of 50-96% on enabling
Bloom filter predicates. See KUDU-3140 for details.

Excerpt from TPCH run exhibiting regression:
https://gist.github.com/bbhavsar/943cf8ebbab63f598353efef8f87db32
TPCH Q9 specific info:
https://gist.github.com/bbhavsar/811ccbe0cd144090f82bdabcd801f827

This change adds simple heuristic taken from HDFS scanner in Impala
that basically checks for every 16 blocks and if a predicate has rejected
less than 10% of the rows scanned then disables the predicate.

To match the equivalent number of rows in Kudu, the check is made
every 128 blocks by default.

The stats collection and enforcement is enabled only for disableable
predicate types, Bloom filter for now.

With Bloom filter predicate type, false positives are expected so
client is expected to do further filtering to remove false positives.
Kudu makes the decision to disable the predicate independently and doesn't
inform the client in this change which is okay for Bloom filter given
the rationale above. Client API docs have been updated accordingly.

Added a tablet level metric to track disabled column predicates.

Tests with PS1:
- TPCH no longer reports regression with Q9. With multiple runs,
  the delta is -25% to +9% with a high std dev of 22% to report it neither as
  improvement nor as regression.
  https://gist.github.com/bbhavsar/45defc689dce31b88eb646b946a65493
- Improvements with other queries reported before this change remain intact.

TODO: Run TPCH with latest patchset to ascertain no changes in results
before merging.

Change-Id: I10197800a01a1b34c7821ac879caf8d272cab8dd
---
M src/kudu/client/client.h
M src/kudu/client/predicate-test.cc
M src/kudu/common/CMakeLists.txt
M src/kudu/common/column_materialization_context.h
M src/kudu/common/generic_iterators-test.cc
M src/kudu/common/generic_iterators.cc
M src/kudu/common/generic_iterators.h
M src/kudu/common/iterator_stats.cc
M src/kudu/common/iterator_stats.h
A src/kudu/common/predicate_effectiveness.cc
A src/kudu/common/predicate_effectiveness.h
M src/kudu/tablet/tablet_metrics.cc
M src/kudu/tablet/tablet_metrics.h
M src/kudu/tserver/tablet_service.cc
14 files changed, 692 insertions(+), 63 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/36/16036/4
-- 
To view, visit http://gerrit.cloudera.org:8080/16036
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I10197800a01a1b34c7821ac879caf8d272cab8dd
Gerrit-Change-Number: 16036
Gerrit-PatchSet: 4
Gerrit-Owner: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR] WIP [perf] KUDU-3140 Heuristics to disable predicate evaluation for Bloom filter

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16036 )

Change subject: WIP [perf] KUDU-3140 Heuristics to disable predicate evaluation for Bloom filter
......................................................................


Patch Set 3:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/16036/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16036/3//COMMIT_MSG@26
PS3, Line 26: TODO:
> Trying to understand the reason before I make the change.
Yeah, and it'd also be nice to make sure that the iterators we think are being used are actually being used. Something a bit more end-to-end (even if just a single non-replicated tablet test) would fit that bill.


http://gerrit.cloudera.org:8080/#/c/16036/3/src/kudu/common/generic_iterators-test.cc
File src/kudu/common/generic_iterators-test.cc:

http://gerrit.cloudera.org:8080/#/c/16036/3/src/kudu/common/generic_iterators-test.cc@684
PS3, Line 684:  public:
> Moved the 2 public methods directly as part of the Test outside the class.
Sounds good. I agree parameterization of the test is a must. I could live without the second, but don't feel strongly about it either way.


http://gerrit.cloudera.org:8080/#/c/16036/3/src/kudu/common/generic_iterators.cc
File src/kudu/common/generic_iterators.cc:

http://gerrit.cloudera.org:8080/#/c/16036/3/src/kudu/common/generic_iterators.cc@1282
PS3, Line 1282:     bool disableable_ineffective_predicate =
              :         effectiveness_ctx != nullptr && !effectiveness_ctx->IsPredicateEnabled();
              :     // True: a disableable predicate that is determined to be effective.
              :     // False: a non-disableable predicate or a disableable predicate that is determined to be
              :     // ineffective.
              :     bool disableable_effective_predicate =
> Changed the names.
Sure, but these are quite verbose, and I think add more cognitive overhead than summarizing the rules. E.g.

 // Indicate whether the predicate has been disabled or not. If the predicate is not disableable
 // (only bloom filter predicates are disableable), both of these are false.
 bool disableable_predicate_disabled = ...;
 bool disableable_predicate_enabled = ...;



-- 
To view, visit http://gerrit.cloudera.org:8080/16036
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I10197800a01a1b34c7821ac879caf8d272cab8dd
Gerrit-Change-Number: 16036
Gerrit-PatchSet: 3
Gerrit-Owner: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Fri, 12 Jun 2020 23:02:21 +0000
Gerrit-HasComments: Yes

[kudu-CR] [perf] KUDU-3140 Heuristics to disable predicate evaluation for Bloom filter

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16036 )

Change subject: [perf] KUDU-3140 Heuristics to disable predicate evaluation for Bloom filter
......................................................................


Patch Set 7: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/16036
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I10197800a01a1b34c7821ac879caf8d272cab8dd
Gerrit-Change-Number: 16036
Gerrit-PatchSet: 7
Gerrit-Owner: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Fri, 26 Jun 2020 17:48:24 +0000
Gerrit-HasComments: No

[kudu-CR] [perf] KUDU-3140 Heuristics to disable predicate evaluation for Bloom filter

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/16036 )

Change subject: [perf] KUDU-3140 Heuristics to disable predicate evaluation for Bloom filter
......................................................................


Patch Set 7: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16036/6/src/kudu/common/predicate_effectiveness.h
File src/kudu/common/predicate_effectiveness.h:

http://gerrit.cloudera.org:8080/#/c/16036/6/src/kudu/common/predicate_effectiveness.h@36
PS6, Line 36: IsColumnPredicateDisableable
> The term "disableable" does sound weird but it make the purpose obvious, i.
OK, if you like 'disableable' more, that's fine with me.  I don't feel strong about this :)



-- 
To view, visit http://gerrit.cloudera.org:8080/16036
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I10197800a01a1b34c7821ac879caf8d272cab8dd
Gerrit-Change-Number: 16036
Gerrit-PatchSet: 7
Gerrit-Owner: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Fri, 26 Jun 2020 20:16:00 +0000
Gerrit-HasComments: Yes

[kudu-CR] [perf] KUDU-3140 Heuristics to disable predicate evaluation for Bloom filter

Posted by "Bankim Bhavsar (Code Review)" <ge...@cloudera.org>.
Bankim Bhavsar has posted comments on this change. ( http://gerrit.cloudera.org:8080/16036 )

Change subject: [perf] KUDU-3140 Heuristics to disable predicate evaluation for Bloom filter
......................................................................


Patch Set 5:

> Patch Set 5: Verified-1
> 
> Build Failed 
> 
> http://jenkins.kudu.apache.org/job/kudu-gerrit/21853/ : FAILURE

TSAN failure occurred while flushing the session even before running the scan request or flushing the tablet for the (true, true) combination.

Not sure what needs to be done.

Would reducing number of rows being inserted (from current 10k) help?

 W0622 23:54:15.658417  8665 meta_cache.cc:265] Tablet 887360f8665b479c8c97fdb0bec62d78: Replica fa6ae0706a484d558a473c865defee88 (127.5.172.65:37891) has failed: Timed out: Write RPC to 127.5.172.65:37891 timed out after 10.000s (SENT)
 W0622 23:54:15.659335  8665 batcher.cc:435] Timed out: Failed to write batch of 4566 ops to tablet 887360f8665b479c8c97fdb0bec62d78 after 1 attempt(s): Failed to write to server: fa6ae0706a484d558a473c865defee88 (127.5.172.65:37891): Write RPC to 127.5.172.65:37891 timed out after 10.000s (SENT)
 W0622 23:54:15.846995  8665 outbound_call.cc:285] RPC callback for RPC call kudu.tserver.TabletServerService.Write -> {remote=127.5.172.65:37891, user_credentials={real_user=slave}} blocked reactor thread for 189029us
 W0622 23:54:16.848306  8665 meta_cache.cc:265] Tablet 887360f8665b479c8c97fdb0bec62d78: Replica fa6ae0706a484d558a473c865defee88 (127.5.172.65:37891) has failed: Timed out: Write RPC to 127.5.172.65:37891 timed out after 10.000s (SENT)
 W0622 23:54:16.848824  8665 batcher.cc:435] Timed out: Failed to write batch of 4439 ops to tablet 887360f8665b479c8c97fdb0bec62d78 after 1 attempt(s): Failed to write to server: fa6ae0706a484d558a473c865defee88 (127.5.172.65:37891): Write RPC to 127.5.172.65:37891 timed out after 10.000s (SENT)


-- 
To view, visit http://gerrit.cloudera.org:8080/16036
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I10197800a01a1b34c7821ac879caf8d272cab8dd
Gerrit-Change-Number: 16036
Gerrit-PatchSet: 5
Gerrit-Owner: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Tue, 23 Jun 2020 00:49:58 +0000
Gerrit-HasComments: No

[kudu-CR] [perf] KUDU-3140 Heuristics to disable predicate evaluation for Bloom filter

Posted by "Bankim Bhavsar (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, Andrew Wong, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/16036

to look at the new patch set (#6).

Change subject: [perf] KUDU-3140 Heuristics to disable predicate evaluation for Bloom filter
......................................................................

[perf] KUDU-3140 Heuristics to disable predicate evaluation for Bloom filter

Column predicate evaluation can be expensive and ineffective column predicates
can waste CPU. TPCH Q9 exhibits significant regression of 50-96% on enabling
Bloom filter predicates. See KUDU-3140 for details.

Excerpt from TPCH run exhibiting regression:
https://gist.github.com/bbhavsar/943cf8ebbab63f598353efef8f87db32
TPCH Q9 specific info:
https://gist.github.com/bbhavsar/811ccbe0cd144090f82bdabcd801f827

This change adds simple heuristic taken from HDFS scanner in Impala
that basically checks for every 16 blocks and if a predicate has rejected
less than 10% of the rows scanned then disables the predicate.

To match the equivalent number of rows in Kudu, the check is made
every 128 blocks by default.

The stats collection and enforcement is enabled only for disableable
predicate types, Bloom filter for now.

With Bloom filter predicate type, false positives are expected so
client is expected to do further filtering to remove false positives.
Kudu makes the decision to disable the predicate independently and doesn't
inform the client in this change which is okay for Bloom filter given
the rationale above. Client API docs have been updated accordingly.

Added a tablet level metric to track disabled column predicates.

Tests with PS1:
- TPCH no longer reports regression with Q9. With multiple runs,
  the delta is -25% to +9% with a high std dev of 22% to report it neither as
  improvement nor as regression.
  https://gist.github.com/bbhavsar/45defc689dce31b88eb646b946a65493
- Improvements with other queries reported before this change remain intact.

TODO: Run TPCH with latest patchset to ascertain no changes in results
before merging.

Change-Id: I10197800a01a1b34c7821ac879caf8d272cab8dd
---
M src/kudu/client/client-test.cc
M src/kudu/client/client.h
M src/kudu/client/predicate-test.cc
M src/kudu/common/CMakeLists.txt
M src/kudu/common/column_materialization_context.h
M src/kudu/common/generic_iterators-test.cc
M src/kudu/common/generic_iterators.cc
M src/kudu/common/generic_iterators.h
M src/kudu/common/iterator_stats.cc
M src/kudu/common/iterator_stats.h
A src/kudu/common/predicate_effectiveness.cc
A src/kudu/common/predicate_effectiveness.h
M src/kudu/mini-cluster/internal_mini_cluster.cc
M src/kudu/mini-cluster/internal_mini_cluster.h
M src/kudu/tablet/tablet_metrics.cc
M src/kudu/tablet/tablet_metrics.h
M src/kudu/tserver/tablet_service.cc
17 files changed, 756 insertions(+), 75 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/36/16036/6
-- 
To view, visit http://gerrit.cloudera.org:8080/16036
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I10197800a01a1b34c7821ac879caf8d272cab8dd
Gerrit-Change-Number: 16036
Gerrit-PatchSet: 6
Gerrit-Owner: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR] WIP [perf] KUDU-3140 Heuristics to disable predicate evaluation for Bloom filter

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16036 )

Change subject: WIP [perf] KUDU-3140 Heuristics to disable predicate evaluation for Bloom filter
......................................................................


Patch Set 3:

(12 comments)

http://gerrit.cloudera.org:8080/#/c/16036/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16036/3//COMMIT_MSG@14
PS3, Line 14: for every 16 blocks 
Do you know how large these blocks are by default? Just want to make sure we are comparing apples to apples here.


http://gerrit.cloudera.org:8080/#/c/16036/3//COMMIT_MSG@26
PS3, Line 26: TODO:
Would also be nice to write a test that leverages a tablet and some string data.


http://gerrit.cloudera.org:8080/#/c/16036/3/src/kudu/common/generic_iterators-test.cc
File src/kudu/common/generic_iterators-test.cc:

http://gerrit.cloudera.org:8080/#/c/16036/3/src/kudu/common/generic_iterators-test.cc@684
PS3, Line 684:  public:
nit: I don't really see why these all need to be a part of the same test class. Maybe consider inlining the tests, and sticking the helper methods in an anonymous namespace or leaving them as static (non-class) methods.


http://gerrit.cloudera.org:8080/#/c/16036/3/src/kudu/common/generic_iterators-test.cc@701
PS3, Line 701:     google::FlagSaver saver;
Tests have a member flagsaver already, so we don't need this. See test_util.h


http://gerrit.cloudera.org:8080/#/c/16036/3/src/kudu/common/generic_iterators-test.cc@739
PS3, Line 739: << "I
nit: spacing, here and elsewhere


http://gerrit.cloudera.org:8080/#/c/16036/3/src/kudu/common/generic_iterators-test.cc@807
PS3, Line 807: // Despite being a static constexpr integer, this static variable needs explicit declaration
             : // to avoid linker error. Not sure why.
Should probably either figure out why, or make it static const or somesuch before merging.


http://gerrit.cloudera.org:8080/#/c/16036/3/src/kudu/common/generic_iterators.cc
File src/kudu/common/generic_iterators.cc:

http://gerrit.cloudera.org:8080/#/c/16036/3/src/kudu/common/generic_iterators.cc@1282
PS3, Line 1282:     bool disableable_ineffective_predicate =
              :         effectiveness_ctx != nullptr && !effectiveness_ctx->IsPredicateEnabled();
              :     // True: a disableable predicate that is determined to be effective.
              :     // False: a non-disableable predicate or a disableable predicate that is determined to be
              :     // ineffective.
              :     bool disableable_effective_predicate =
nit: I think the names would be a bit easier to follow as "disableable_predicate_disabled" and "disableable_predicate_enabled" respectively. It seems easy enough to grasp that we'll disable ineffective predicates, so let's focus on the "disable" part.

I also think we can do without the explicit True/False comments too in that case, and just mention that we might disable ineffective predicates for the entirety of the iteration.


http://gerrit.cloudera.org:8080/#/c/16036/3/src/kudu/common/generic_iterators.cc@1288
PS3, Line 1288: effectiveness_ctx != nullptr
nit: can drop the != nullptr, same elsewhere


http://gerrit.cloudera.org:8080/#/c/16036/3/src/kudu/common/predicate_effectiveness.h
File src/kudu/common/predicate_effectiveness.h:

http://gerrit.cloudera.org:8080/#/c/16036/3/src/kudu/common/predicate_effectiveness.h@37
PS3, Line 37:   ColumnPredicate* pred;
nit: could be const


http://gerrit.cloudera.org:8080/#/c/16036/3/src/kudu/common/predicate_effectiveness.h@58
PS3, Line 58: track where
nit: add a comma between these, otherwise it's easy to misread this


http://gerrit.cloudera.org:8080/#/c/16036/3/src/kudu/common/predicate_effectiveness.h@87
PS3, Line 87: 16 (default) blocks, ratio of rows rejected by a predicate.
            :   // If the rejection ratio is less than 10% 
nit: these values might change, so how about referring to these values by their flagnames?


http://gerrit.cloudera.org:8080/#/c/16036/3/src/kudu/common/predicate_effectiveness.cc
File src/kudu/common/predicate_effectiveness.cc:

http://gerrit.cloudera.org:8080/#/c/16036/3/src/kudu/common/predicate_effectiveness.cc@59
PS3, Line 59: id
nit: idx

Also consider `idx_and_pred_ctx` or somesuch (I find it convenient enough to use it in most places)



-- 
To view, visit http://gerrit.cloudera.org:8080/16036
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I10197800a01a1b34c7821ac879caf8d272cab8dd
Gerrit-Change-Number: 16036
Gerrit-PatchSet: 3
Gerrit-Owner: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Fri, 12 Jun 2020 05:22:43 +0000
Gerrit-HasComments: Yes

[kudu-CR] [perf] KUDU-3140 Heuristics to disable predicate evaluation for Bloom filter

Posted by "Bankim Bhavsar (Code Review)" <ge...@cloudera.org>.
Bankim Bhavsar has posted comments on this change. ( http://gerrit.cloudera.org:8080/16036 )

Change subject: [perf] KUDU-3140 Heuristics to disable predicate evaluation for Bloom filter
......................................................................


Patch Set 4:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/16036/4/src/kudu/client/predicate-test.cc
File src/kudu/client/predicate-test.cc:

http://gerrit.cloudera.org:8080/#/c/16036/4/src/kudu/client/predicate-test.cc@1513
PS4, Line 1513: TEST_F(PredicateTest, TestDisabledBloomFilterWithRepeatedStrings) {
> Just to make sure we have all our iterators covered, could also add a case 
Thanks for these suggestions!


http://gerrit.cloudera.org:8080/#/c/16036/4/src/kudu/common/generic_iterators-test.cc
File src/kudu/common/generic_iterators-test.cc:

http://gerrit.cloudera.org:8080/#/c/16036/4/src/kudu/common/generic_iterators-test.cc@819
PS4, Line 819:   ASSERT_EQ(0, spec.predicates().size())
             :     <
> nit: hmm typically we treat ASSERTs as function calls w.r.t spacing, which 
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/16036
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I10197800a01a1b34c7821ac879caf8d272cab8dd
Gerrit-Change-Number: 16036
Gerrit-PatchSet: 4
Gerrit-Owner: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Mon, 22 Jun 2020 23:40:40 +0000
Gerrit-HasComments: Yes

[kudu-CR] [perf] KUDU-3140 Heuristics to disable predicate evaluation for Bloom filter

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16036 )

Change subject: [perf] KUDU-3140 Heuristics to disable predicate evaluation for Bloom filter
......................................................................


Patch Set 6: Code-Review+2

(2 comments)

http://gerrit.cloudera.org:8080/#/c/16036/4/src/kudu/client/predicate-test.cc
File src/kudu/client/predicate-test.cc:

http://gerrit.cloudera.org:8080/#/c/16036/4/src/kudu/client/predicate-test.cc@1513
PS4, Line 1513:   auto* included_predicate_clone = included_predicate->Clone();
> Thanks for these suggestions!
Thanks for adding the test coverage!


http://gerrit.cloudera.org:8080/#/c/16036/6/src/kudu/client/predicate-test.cc
File src/kudu/client/predicate-test.cc:

http://gerrit.cloudera.org:8080/#/c/16036/6/src/kudu/client/predicate-test.cc@1553
PS6, Line 1553:       // TSAN builds timeout and fail on flushing with large number of rows.
              :       if ((i % (kNumRows / 10))  == 0) {
              :         ASSERT_OK(session->Flush());
              :       }
Alternatively you can try just writing 100 rows or something tiny like that. Seems like the behavior should be the same regardless



-- 
To view, visit http://gerrit.cloudera.org:8080/16036
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I10197800a01a1b34c7821ac879caf8d272cab8dd
Gerrit-Change-Number: 16036
Gerrit-PatchSet: 6
Gerrit-Owner: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Tue, 23 Jun 2020 19:43:15 +0000
Gerrit-HasComments: Yes

[kudu-CR] [perf] KUDU-3140 Heuristics to disable predicate evaluation for Bloom filter

Posted by "Bankim Bhavsar (Code Review)" <ge...@cloudera.org>.
Bankim Bhavsar has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/16036 )

Change subject: [perf] KUDU-3140 Heuristics to disable predicate evaluation for Bloom filter
......................................................................

[perf] KUDU-3140 Heuristics to disable predicate evaluation for Bloom filter

Column predicate evaluation can be expensive and ineffective column predicates
can waste CPU. TPCH Q9 exhibits significant regression of 50-96% on enabling
Bloom filter predicates. See KUDU-3140 for details.

Excerpt from TPCH run exhibiting regression:
https://gist.github.com/bbhavsar/943cf8ebbab63f598353efef8f87db32
TPCH Q9 specific info:
https://gist.github.com/bbhavsar/811ccbe0cd144090f82bdabcd801f827

This change adds simple heuristic taken from HDFS scanner in Impala
that basically checks for every 16 blocks and if a predicate has rejected
less than 10% of the rows scanned then disables the predicate.

To match the equivalent number of rows in Kudu, the check is made
every 128 blocks by default.

The stats collection and enforcement is enabled only for disableable
predicate types, Bloom filter for now.

With Bloom filter predicate type, false positives are expected so
client is expected to do further filtering to remove false positives.
Kudu makes the decision to disable the predicate independently and doesn't
inform the client in this change which is okay for Bloom filter given
the rationale above. Client API docs have been updated accordingly.

Added a tablet level metric to track disabled column predicates.

Tests with PS6:
- TPCH no longer reports regression with Q9. With multiple runs,
  the delta are +1.95%, -24.67%, +2.67%, -17.09%, -14.59% with a std dev
  of 17% - 38% to report it neither as improvement nor as regression.
  https://gist.github.com/bbhavsar/0a773359b9225f014d353759a535c5be
- Improvements with other queries reported before this change remain intact.

Change-Id: I10197800a01a1b34c7821ac879caf8d272cab8dd
Reviewed-on: http://gerrit.cloudera.org:8080/16036
Tested-by: Kudu Jenkins
Reviewed-by: Andrew Wong <aw...@cloudera.com>
Reviewed-by: Alexey Serbin <as...@cloudera.com>
---
M src/kudu/client/client-test.cc
M src/kudu/client/client.h
M src/kudu/client/predicate-test.cc
M src/kudu/common/CMakeLists.txt
M src/kudu/common/column_materialization_context.h
M src/kudu/common/generic_iterators-test.cc
M src/kudu/common/generic_iterators.cc
M src/kudu/common/generic_iterators.h
M src/kudu/common/iterator_stats.cc
M src/kudu/common/iterator_stats.h
A src/kudu/common/predicate_effectiveness.cc
A src/kudu/common/predicate_effectiveness.h
M src/kudu/mini-cluster/internal_mini_cluster.cc
M src/kudu/mini-cluster/internal_mini_cluster.h
M src/kudu/tablet/tablet_metrics.cc
M src/kudu/tablet/tablet_metrics.h
M src/kudu/tserver/tablet_service.cc
17 files changed, 768 insertions(+), 75 deletions(-)

Approvals:
  Kudu Jenkins: Verified
  Andrew Wong: Looks good to me, approved
  Alexey Serbin: Looks good to me, approved

-- 
To view, visit http://gerrit.cloudera.org:8080/16036
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I10197800a01a1b34c7821ac879caf8d272cab8dd
Gerrit-Change-Number: 16036
Gerrit-PatchSet: 8
Gerrit-Owner: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)