You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Bankim Bhavsar (Code Review)" <ge...@cloudera.org> on 2020/06/17 16:45:06 UTC

[kudu-CR] [perf] KUDU-3140 Heuristics to disable predicate evaluation for Bloom filter

Hello Kudu Jenkins, Andrew Wong, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/16036

to look at the new patch set (#4).

Change subject: [perf] KUDU-3140 Heuristics to disable predicate evaluation for Bloom filter
......................................................................

[perf] KUDU-3140 Heuristics to disable predicate evaluation for Bloom filter

Column predicate evaluation can be expensive and ineffective column predicates
can waste CPU. TPCH Q9 exhibits significant regression of 50-96% on enabling
Bloom filter predicates. See KUDU-3140 for details.

Excerpt from TPCH run exhibiting regression:
https://gist.github.com/bbhavsar/943cf8ebbab63f598353efef8f87db32
TPCH Q9 specific info:
https://gist.github.com/bbhavsar/811ccbe0cd144090f82bdabcd801f827

This change adds simple heuristic taken from HDFS scanner in Impala
that basically checks for every 16 blocks and if a predicate has rejected
less than 10% of the rows scanned then disables the predicate.

To match the equivalent number of rows in Kudu, the check is made
every 128 blocks by default.

The stats collection and enforcement is enabled only for disableable
predicate types, Bloom filter for now.

With Bloom filter predicate type, false positives are expected so
client is expected to do further filtering to remove false positives.
Kudu makes the decision to disable the predicate independently and doesn't
inform the client in this change which is okay for Bloom filter given
the rationale above. Client API docs have been updated accordingly.

Added a tablet level metric to track disabled column predicates.

Tests with PS1:
- TPCH no longer reports regression with Q9. With multiple runs,
  the delta is -25% to +9% with a high std dev of 22% to report it neither as
  improvement nor as regression.
  https://gist.github.com/bbhavsar/45defc689dce31b88eb646b946a65493
- Improvements with other queries reported before this change remain intact.

TODO: Run TPCH with latest patchset to ascertain no changes in results
before merging.

Change-Id: I10197800a01a1b34c7821ac879caf8d272cab8dd
---
M src/kudu/client/client.h
M src/kudu/client/predicate-test.cc
M src/kudu/common/CMakeLists.txt
M src/kudu/common/column_materialization_context.h
M src/kudu/common/generic_iterators-test.cc
M src/kudu/common/generic_iterators.cc
M src/kudu/common/generic_iterators.h
M src/kudu/common/iterator_stats.cc
M src/kudu/common/iterator_stats.h
A src/kudu/common/predicate_effectiveness.cc
A src/kudu/common/predicate_effectiveness.h
M src/kudu/tablet/tablet_metrics.cc
M src/kudu/tablet/tablet_metrics.h
M src/kudu/tserver/tablet_service.cc
14 files changed, 692 insertions(+), 63 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/36/16036/4
-- 
To view, visit http://gerrit.cloudera.org:8080/16036
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I10197800a01a1b34c7821ac879caf8d272cab8dd
Gerrit-Change-Number: 16036
Gerrit-PatchSet: 4
Gerrit-Owner: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)