You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Yuqi Du (Code Review)" <ge...@cloudera.org> on 2023/04/24 15:26:55 UTC

[kudu-CR] [cpp-client] KUDU-3455 Reduce space complexity and speed up hash partition pruning for in-list predicate

Hello Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/19794

to look at the new patch set (#2).

Change subject: [cpp-client] KUDU-3455 Reduce space complexity and speed up hash partition pruning for in-list predicate
......................................................................

[cpp-client] KUDU-3455 Reduce space complexity and speed up hash partition pruning for in-list predicate

This patch's background is in https://gerrit.cloudera.org/c/19568/. As
that patch said, logic of pruning hash partitions for in-list predicate
in Kudu cpp client has also a high space complexity and slow. Old
algorithm must keep all intermedium objects because they are incomplete
until they are completed and can be compute hash.

This patch fixes the problems and provides a recursive algorithm the
same as java client in patch https://gerrit.cloudera.org/c/19568/.

Unlike java client, this problem in cpp client is not critical.
Old algorithm in cpp client does rare go out of memory because it
swap the intermedium objects. This optimization has good benifit too.
PartitionPrunerTest::TestMultiColumnInListHashPruningManyValues can show
some benifits. The benifits depend on the in-list length, for example,
in the test TestMultiColumnInListHashPruningManyValues, set kMaxInListLength
50, old algorithm memory cost can reach 300MB, while new algorithm's memory
cost can be ignored(it only need one object and a few stacks for context).
At the same time, new algorithm has a good speedup, below shows some effect:

combination_count: 5554006920000, old algorithm cost: 428238us, new algorithm cost: 713us, speedup: 600.61430575035058x
combination_count: 89083783664568, old algorithm cost: 2764924us, new algorithm cost: 1145us, speedup: 2414.780786026201x
combination_count: 27194091724800, old algorithm cost: 1610475us, new algorithm cost: 1151us, speedup: 1399.1963509991313x
combination_count: 7116622216704, old algorithm cost: 34544289us, new algorithm cost: 375us, speedup: 92118.104x
combination_count: 37570734489600, old algorithm cost: 1733205us, new algorithm cost: 901us, speedup: 1923.6459489456161x

Change-Id: Ie4bea5c10b4ac2c62b85625fe9d2a33ceb4fb2e9
---
M src/kudu/common/partition_pruner-test.cc
M src/kudu/common/partition_pruner.cc
M src/kudu/common/partition_pruner.h
3 files changed, 228 insertions(+), 18 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/94/19794/2
-- 
To view, visit http://gerrit.cloudera.org:8080/19794
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie4bea5c10b4ac2c62b85625fe9d2a33ceb4fb2e9
Gerrit-Change-Number: 19794
Gerrit-PatchSet: 2
Gerrit-Owner: Yuqi Du <sh...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins (120)