You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Yuqi Du (Code Review)" <ge...@cloudera.org> on 2023/04/24 15:25:29 UTC

[kudu-CR] [client] Reduce space complexity and speed up hash partition pruning for in-list predicate

Yuqi Du has uploaded this change for review. ( http://gerrit.cloudera.org:8080/19794


Change subject: [client] Reduce space complexity and speed up hash partition pruning for in-list predicate
......................................................................

[client] Reduce space complexity and speed up hash partition pruning for in-list predicate

This patch's background is in https://gerrit.cloudera.org/c/19568/. As
that patch said, logic of pruning hash partitions for in-list predicate
in Kudu cpp client has also a high space complexity and slow. Old
algorithm must keep all intermedium objects because they are incomplete
until they are completed and can be compute hash.

This patch fixes the problems and provides a recursive algorithm the
same as java client in patch https://gerrit.cloudera.org/c/19568/.

Unlike java client, this problem in cpp client is not critical.
Old algorithm in cpp client does rare go out of memory because it
swap the intermedium objects. This optimization has good benifit too.
PartitionPrunerTest::TestMultiColumnInListHashPruningManyValues can show
some benifits. The benifits depend on the in-list length, for example,
in the test TestMultiColumnInListHashPruningManyValues, set kMaxInListLength
50, old algorithm memory cost can reach 300MB, while new algorithm's memory
cost can be ignored(it only need one object and a few stacks for context).
At the same time, new algorithm has a good speedup, below shows some effect:

combination_count: 5554006920000, old algorithm cost: 428238us, new algorithm cost: 713us, speedup: 600.61430575035058x
combination_count: 89083783664568, old algorithm cost: 2764924us, new algorithm cost: 1145us, speedup: 2414.780786026201x
combination_count: 27194091724800, old algorithm cost: 1610475us, new algorithm cost: 1151us, speedup: 1399.1963509991313x
combination_count: 7116622216704, old algorithm cost: 34544289us, new algorithm cost: 375us, speedup: 92118.104x
combination_count: 37570734489600, old algorithm cost: 1733205us, new algorithm cost: 901us, speedup: 1923.6459489456161x

Change-Id: Ie4bea5c10b4ac2c62b85625fe9d2a33ceb4fb2e9
---
M src/kudu/common/partition_pruner-test.cc
M src/kudu/common/partition_pruner.cc
M src/kudu/common/partition_pruner.h
3 files changed, 228 insertions(+), 18 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/94/19794/1
-- 
To view, visit http://gerrit.cloudera.org:8080/19794
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ie4bea5c10b4ac2c62b85625fe9d2a33ceb4fb2e9
Gerrit-Change-Number: 19794
Gerrit-PatchSet: 1
Gerrit-Owner: Yuqi Du <sh...@gmail.com>

[kudu-CR] [cpp-client] KUDU-3455 Reduce space complexity and speed up hash partition pruning for in-list predicate

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Yingchun Lai has posted comments on this change. ( http://gerrit.cloudera.org:8080/19794 )

Change subject: [cpp-client] KUDU-3455 Reduce space complexity and speed up hash partition pruning for in-list predicate
......................................................................


Patch Set 14: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/19794
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie4bea5c10b4ac2c62b85625fe9d2a33ceb4fb2e9
Gerrit-Change-Number: 19794
Gerrit-PatchSet: 14
Gerrit-Owner: Yuqi Du <sh...@gmail.com>
Gerrit-Reviewer: Abhishek Chennaka <ac...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Ashwani Raina <ar...@cloudera.com>
Gerrit-Reviewer: KeDeng <kd...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Wang Xixu <14...@qq.com>
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Yuqi Du <sh...@gmail.com>
Gerrit-Comment-Date: Mon, 26 Jun 2023 08:10:07 +0000
Gerrit-HasComments: No

[kudu-CR] [cpp-client] KUDU-3455 Reduce space complexity and speed up hash partition pruning for in-list predicate

Posted by "Yuqi Du (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Alexey Serbin, Ashwani Raina, Yingchun Lai, Yifan Zhang, Kudu Jenkins, Abhishek Chennaka, KeDeng, Wang Xixu, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/19794

to look at the new patch set (#13).

Change subject: [cpp-client] KUDU-3455 Reduce space complexity and speed up hash partition pruning for in-list predicate
......................................................................

[cpp-client] KUDU-3455 Reduce space complexity and speed up hash partition pruning for in-list predicate

This patch comes from a committed patch, its committed id is 'b69dbeb'. As that
patch said, logic of pruning hash partitions for in-list predicate in Kudu cpp
client has also a high space complexity and slow. Old algorithm must keep
all intermedium objects because they are incomplete until they are
completed and can be computed hash.

This patch fixes the problems and provides a recursive algorithm the
same as the one used by java client.

Compared with java client, the cpp client is less likely to cause the
OOM condition because it does not keep too many intermediate results.
This optimization has good benefits too. The benefits are related to
the in-list length and the number of primary keys, the performance
would be better if in-list length is longer. For example,
PartitionPrunerTest::TestMultiColumnInListHashPruningManyValues,
using 10 key columns and kMaxInListLength=50, old algorithm memory cost
may reach 600MB, while new algorithm's memory cost can be ignored
(it only need one objects and a few stacks for contexts). At the same
time, new algorithm has a good speedup, some effect as below:

combination_count: 5554006920000, old cost: 428238us, new cost: 713us, speedup: 600.6x
combination_count: 89083783664568, old cost: 2764924us, new cost: 1145us, speedup: 2414.7x
combination_count: 27194091724800, old cost: 1610475us, new cost: 1151us, speedup: 1399.2x
combination_count: 7116622216704, old cost: 34544289us, new cost: 375us, speedup: 92118.1x
combination_count: 37570734489600, old cost: 1733205us, new cost: 901us, speedup: 1923.6x

Change-Id: Ie4bea5c10b4ac2c62b85625fe9d2a33ceb4fb2e9
---
M src/kudu/common/partition_pruner-test.cc
M src/kudu/common/partition_pruner.cc
M src/kudu/common/partition_pruner.h
3 files changed, 245 insertions(+), 19 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/94/19794/13
-- 
To view, visit http://gerrit.cloudera.org:8080/19794
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie4bea5c10b4ac2c62b85625fe9d2a33ceb4fb2e9
Gerrit-Change-Number: 19794
Gerrit-PatchSet: 13
Gerrit-Owner: Yuqi Du <sh...@gmail.com>
Gerrit-Reviewer: Abhishek Chennaka <ac...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Ashwani Raina <ar...@cloudera.com>
Gerrit-Reviewer: KeDeng <kd...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Wang Xixu <14...@qq.com>
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Yuqi Du <sh...@gmail.com>

[kudu-CR] [cpp-client] KUDU-3455 Reduce space complexity and speed up hash partition pruning for in-list predicate

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Yingchun Lai has posted comments on this change. ( http://gerrit.cloudera.org:8080/19794 )

Change subject: [cpp-client] KUDU-3455 Reduce space complexity and speed up hash partition pruning for in-list predicate
......................................................................


Patch Set 13: Code-Review+1

(5 comments)

Thanks for the contribution, LGTM.

http://gerrit.cloudera.org:8080/#/c/19794/13/src/kudu/common/partition_pruner-test.cc
File src/kudu/common/partition_pruner-test.cc:

http://gerrit.cloudera.org:8080/#/c/19794/13/src/kudu/common/partition_pruner-test.cc@1026
PS13, Line 1026: vector<KuduPartialRow>()
nit: can be replaced by '{}'.


http://gerrit.cloudera.org:8080/#/c/19794/13/src/kudu/common/partition_pruner-test.cc@1047
PS13, Line 1047: CHECK_EQ(v1.size(), v2.size());
               :       for (int i = 0; i < v1.size(); i++) {
               :         ASSERT_EQ(v1[i], v2[i]);
               :       }
nit: they can be replaced by ASSERT_EQ(v1, v2);


http://gerrit.cloudera.org:8080/#/c/19794/13/src/kudu/common/partition_pruner-test.cc@1069
PS13, Line 1069: kMaxSafeLength
kMaxInListLength?


http://gerrit.cloudera.org:8080/#/c/19794/13/src/kudu/common/partition_pruner-test.cc@1084
PS13, Line 1084: temp_values[temp_values.size() - 1]
nit: temp_values.back() ?


http://gerrit.cloudera.org:8080/#/c/19794/13/src/kudu/common/partition_pruner-test.cc@1085
PS13, Line 1085: j++;
nit: Move it to line 1081 which is more conventional.



-- 
To view, visit http://gerrit.cloudera.org:8080/19794
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie4bea5c10b4ac2c62b85625fe9d2a33ceb4fb2e9
Gerrit-Change-Number: 19794
Gerrit-PatchSet: 13
Gerrit-Owner: Yuqi Du <sh...@gmail.com>
Gerrit-Reviewer: Abhishek Chennaka <ac...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Ashwani Raina <ar...@cloudera.com>
Gerrit-Reviewer: KeDeng <kd...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Wang Xixu <14...@qq.com>
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Yuqi Du <sh...@gmail.com>
Gerrit-Comment-Date: Sun, 25 Jun 2023 16:47:49 +0000
Gerrit-HasComments: Yes

[kudu-CR] [cpp-client] KUDU-3455 Reduce space complexity and speed up hash partition pruning for in-list predicate

Posted by "Yifan Zhang (Code Review)" <ge...@cloudera.org>.
Yifan Zhang has posted comments on this change. ( http://gerrit.cloudera.org:8080/19794 )

Change subject: [cpp-client] KUDU-3455 Reduce space complexity and speed up hash partition pruning for in-list predicate
......................................................................


Patch Set 12: Code-Review+1


-- 
To view, visit http://gerrit.cloudera.org:8080/19794
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie4bea5c10b4ac2c62b85625fe9d2a33ceb4fb2e9
Gerrit-Change-Number: 19794
Gerrit-PatchSet: 12
Gerrit-Owner: Yuqi Du <sh...@gmail.com>
Gerrit-Reviewer: Abhishek Chennaka <ac...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Ashwani Raina <ar...@cloudera.com>
Gerrit-Reviewer: KeDeng <kd...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Wang Xixu <14...@qq.com>
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Yuqi Du <sh...@gmail.com>
Gerrit-Comment-Date: Wed, 07 Jun 2023 07:33:39 +0000
Gerrit-HasComments: No

[kudu-CR] [cpp-client] KUDU-3455 Reduce space complexity and speed up hash partition pruning for in-list predicate

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Yingchun Lai has posted comments on this change. ( http://gerrit.cloudera.org:8080/19794 )

Change subject: [cpp-client] KUDU-3455 Reduce space complexity and speed up hash partition pruning for in-list predicate
......................................................................


Patch Set 14: Verified+1

Remove the non-related failed test HybridClockTest.TestNtpDiagnostics.


-- 
To view, visit http://gerrit.cloudera.org:8080/19794
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie4bea5c10b4ac2c62b85625fe9d2a33ceb4fb2e9
Gerrit-Change-Number: 19794
Gerrit-PatchSet: 14
Gerrit-Owner: Yuqi Du <sh...@gmail.com>
Gerrit-Reviewer: Abhishek Chennaka <ac...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Ashwani Raina <ar...@cloudera.com>
Gerrit-Reviewer: KeDeng <kd...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Wang Xixu <14...@qq.com>
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Yuqi Du <sh...@gmail.com>
Gerrit-Comment-Date: Mon, 26 Jun 2023 08:10:42 +0000
Gerrit-HasComments: No

[kudu-CR] [cpp-client] KUDU-3455 Reduce space complexity and speed up hash partition pruning for in-list predicate

Posted by "Yifan Zhang (Code Review)" <ge...@cloudera.org>.
Yifan Zhang has posted comments on this change. ( http://gerrit.cloudera.org:8080/19794 )

Change subject: [cpp-client] KUDU-3455 Reduce space complexity and speed up hash partition pruning for in-list predicate
......................................................................


Patch Set 11:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/19794/11//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19794/11//COMMIT_MSG@16
PS11, Line 16: one using java client
nit: the one used by java client


http://gerrit.cloudera.org:8080/#/c/19794/11/src/kudu/common/partition_pruner-test.cc
File src/kudu/common/partition_pruner-test.cc:

http://gerrit.cloudera.org:8080/#/c/19794/11/src/kudu/common/partition_pruner-test.cc@1081
PS11, Line 1081: temp_values.push_back(t_value);
               :         test_case.push_back(reinterpret_cast<const void*>(&temp_values[temp_values.size() - 1]));
Can this be simplified to 'test_case.push_back(reinterpret_cast<const void*>(&t_value))' ?


http://gerrit.cloudera.org:8080/#/c/19794/11/src/kudu/common/partition_pruner-test.cc@1091
PS11, Line 1091: for (int i = 0; i < kKeyColumnSize; i++) {
               :       predicates.emplace_back(ColumnPredicate::InList(schema.column(i), &test_cases[i]));
               :     }
Maybe this can be moved to the for loop at L1079? And then the 'test_cases' can be removed too.


http://gerrit.cloudera.org:8080/#/c/19794/11/src/kudu/common/partition_pruner.cc
File src/kudu/common/partition_pruner.cc:

http://gerrit.cloudera.org:8080/#/c/19794/11/src/kudu/common/partition_pruner.cc@224
PS11, Line 224:     return;
nit: Need to document this short circuit condition.



-- 
To view, visit http://gerrit.cloudera.org:8080/19794
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie4bea5c10b4ac2c62b85625fe9d2a33ceb4fb2e9
Gerrit-Change-Number: 19794
Gerrit-PatchSet: 11
Gerrit-Owner: Yuqi Du <sh...@gmail.com>
Gerrit-Reviewer: Abhishek Chennaka <ac...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Ashwani Raina <ar...@cloudera.com>
Gerrit-Reviewer: KeDeng <kd...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Wang Xixu <14...@qq.com>
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Yuqi Du <sh...@gmail.com>
Gerrit-Comment-Date: Tue, 30 May 2023 11:59:48 +0000
Gerrit-HasComments: Yes

[kudu-CR] [cpp-client] KUDU-3455 Reduce space complexity and speed up hash partition pruning for in-list predicate

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/19794 )

Change subject: [cpp-client] KUDU-3455 Reduce space complexity and speed up hash partition pruning for in-list predicate
......................................................................


Patch Set 13:

> Patch Set 13: Code-Review+2

Just FYI, Yuqi: it's not a good practice to add +2 for your own patches (unless you carrying over somebody's else +2 from prior PS just after re-base or minor updates).


-- 
To view, visit http://gerrit.cloudera.org:8080/19794
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie4bea5c10b4ac2c62b85625fe9d2a33ceb4fb2e9
Gerrit-Change-Number: 19794
Gerrit-PatchSet: 13
Gerrit-Owner: Yuqi Du <sh...@gmail.com>
Gerrit-Reviewer: Abhishek Chennaka <ac...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Ashwani Raina <ar...@cloudera.com>
Gerrit-Reviewer: KeDeng <kd...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Wang Xixu <14...@qq.com>
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Yuqi Du <sh...@gmail.com>
Gerrit-Comment-Date: Wed, 21 Jun 2023 16:08:39 +0000
Gerrit-HasComments: No

[kudu-CR] [cpp-client] KUDU-3455 Reduce space complexity and speed up hash partition pruning for in-list predicate

Posted by "Yuqi Du (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Alexey Serbin, Ashwani Raina, Yingchun Lai, Yifan Zhang, Kudu Jenkins, Abhishek Chennaka, KeDeng, Wang Xixu, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/19794

to look at the new patch set (#14).

Change subject: [cpp-client] KUDU-3455 Reduce space complexity and speed up hash partition pruning for in-list predicate
......................................................................

[cpp-client] KUDU-3455 Reduce space complexity and speed up hash partition pruning for in-list predicate

This patch comes from a committed patch, its committed id is 'b69dbeb'. As that
patch said, logic of pruning hash partitions for in-list predicate in Kudu cpp
client has also a high space complexity and slow. Old algorithm must keep
all intermedium objects because they are incomplete until they are
completed and can be computed hash.

This patch fixes the problems and provides a recursive algorithm the
same as the one used by java client.

Compared with java client, the cpp client is less likely to cause the
OOM condition because it does not keep too many intermediate results.
This optimization has good benefits too. The benefits are related to
the in-list length and the number of primary keys, the performance
would be better if in-list length is longer. For example,
PartitionPrunerTest::TestMultiColumnInListHashPruningManyValues,
using 10 key columns and kMaxInListLength=50, old algorithm memory cost
may reach 600MB, while new algorithm's memory cost can be ignored
(it only need one objects and a few stacks for contexts). At the same
time, new algorithm has a good speedup, some effect as below:

combination_count: 5554006920000, old cost: 428238us, new cost: 713us, speedup: 600.6x
combination_count: 89083783664568, old cost: 2764924us, new cost: 1145us, speedup: 2414.7x
combination_count: 27194091724800, old cost: 1610475us, new cost: 1151us, speedup: 1399.2x
combination_count: 7116622216704, old cost: 34544289us, new cost: 375us, speedup: 92118.1x
combination_count: 37570734489600, old cost: 1733205us, new cost: 901us, speedup: 1923.6x

Change-Id: Ie4bea5c10b4ac2c62b85625fe9d2a33ceb4fb2e9
---
M src/kudu/common/partition_pruner-test.cc
M src/kudu/common/partition_pruner.cc
M src/kudu/common/partition_pruner.h
3 files changed, 240 insertions(+), 19 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/94/19794/14
-- 
To view, visit http://gerrit.cloudera.org:8080/19794
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie4bea5c10b4ac2c62b85625fe9d2a33ceb4fb2e9
Gerrit-Change-Number: 19794
Gerrit-PatchSet: 14
Gerrit-Owner: Yuqi Du <sh...@gmail.com>
Gerrit-Reviewer: Abhishek Chennaka <ac...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Ashwani Raina <ar...@cloudera.com>
Gerrit-Reviewer: KeDeng <kd...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Wang Xixu <14...@qq.com>
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Yuqi Du <sh...@gmail.com>

[kudu-CR] [cpp-client] KUDU-3455 Reduce space complexity and speed up hash partition pruning for in-list predicate

Posted by "Yuqi Du (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Alexey Serbin, Ashwani Raina, Yingchun Lai, Yifan Zhang, Kudu Jenkins, Abhishek Chennaka, KeDeng, Wang Xixu, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/19794

to look at the new patch set (#8).

Change subject: [cpp-client] KUDU-3455 Reduce space complexity and speed up hash partition pruning for in-list predicate
......................................................................

[cpp-client] KUDU-3455 Reduce space complexity and speed up hash partition pruning for in-list predicate

This patch comes from a committed patch, its committed id is 'b69dbeb'. As that
patch said, logic of pruning hash partitions for in-list predicate in Kudu cpp
client has also a high space complexity and slow. Old algorithm must keep
all intermedium objects because they are incomplete until they are
completed and can be computed hash.

This patch fixes the problems and provides a recursive algorithm the
same as one using java client.

Compared with java client, the cpp client is less likely to cause the
OOM condition because it does not keep too many intermediate results.
This optimization has good benefits too. The benefits are related to
the in-list length and the number of primary keys, the performance
would be better if in-list length is longer. For example,
PartitionPrunerTest::TestMultiColumnInListHashPruningManyValues,
using 10 key columns and kMaxInListLength=50, old algorithm memory cost
may reach 600MB, while new algorithm's memory cost can be ignored
(it only need one objects and a few stacks for contexts). At the same
time, new algorithm has a good speedup, some effect as below:

combination_count: 5554006920000, old cost: 428238us, new cost: 713us, speedup: 600.6x
combination_count: 89083783664568, old cost: 2764924us, new cost: 1145us, speedup: 2414.7x
combination_count: 27194091724800, old cost: 1610475us, new cost: 1151us, speedup: 1399.2x
combination_count: 7116622216704, old cost: 34544289us, new cost: 375us, speedup: 92118.1x
combination_count: 37570734489600, old cost: 1733205us, new cost: 901us, speedup: 1923.6x

Change-Id: Ie4bea5c10b4ac2c62b85625fe9d2a33ceb4fb2e9
---
M src/kudu/common/partition_pruner-test.cc
M src/kudu/common/partition_pruner.cc
M src/kudu/common/partition_pruner.h
3 files changed, 238 insertions(+), 19 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/94/19794/8
-- 
To view, visit http://gerrit.cloudera.org:8080/19794
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie4bea5c10b4ac2c62b85625fe9d2a33ceb4fb2e9
Gerrit-Change-Number: 19794
Gerrit-PatchSet: 8
Gerrit-Owner: Yuqi Du <sh...@gmail.com>
Gerrit-Reviewer: Abhishek Chennaka <ac...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Ashwani Raina <ar...@cloudera.com>
Gerrit-Reviewer: KeDeng <kd...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Wang Xixu <14...@qq.com>
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Yuqi Du <sh...@gmail.com>

[kudu-CR] [cpp-client] KUDU-3455 Reduce space complexity and speed up hash partition pruning for in-list predicate

Posted by "Yuqi Du (Code Review)" <ge...@cloudera.org>.
Yuqi Du has posted comments on this change. ( http://gerrit.cloudera.org:8080/19794 )

Change subject: [cpp-client] KUDU-3455 Reduce space complexity and speed up hash partition pruning for in-list predicate
......................................................................


Patch Set 8:

(11 comments)

Thanks very much for your advices, all has fixed.

http://gerrit.cloudera.org:8080/#/c/19794/7//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19794/7//COMMIT_MSG@9
PS7, Line 9: a committed patch, its committed id 
> It would be better to use the related commit hash i.e. b69dbeb instead, the
Done


http://gerrit.cloudera.org:8080/#/c/19794/7//COMMIT_MSG@16
PS7, Line 16: 
> Same.
Done


http://gerrit.cloudera.org:8080/#/c/19794/7//COMMIT_MSG@21
PS7, Line 21:  pe
> nit: the
Done


http://gerrit.cloudera.org:8080/#/c/19794/7//COMMIT_MSG@21
PS7, Line 21: keys, t
> nit: keys
Done


http://gerrit.cloudera.org:8080/#/c/19794/7//COMMIT_MSG@22
PS7, Line 22: longer
> nit: longger?
longer. DONE


http://gerrit.cloudera.org:8080/#/c/19794/7//COMMIT_MSG@24
PS7, Line 24: using
> nit: using
Done


http://gerrit.cloudera.org:8080/#/c/19794/7/src/kudu/common/partition_pruner-test.cc
File src/kudu/common/partition_pruner-test.cc:

http://gerrit.cloudera.org:8080/#/c/19794/7/src/kudu/common/partition_pruner-test.cc@97
PS7, Line 97: vector<bool
> nit: vector
Done


http://gerrit.cloudera.org:8080/#/c/19794/7/src/kudu/common/partition_pruner.cc
File src/kudu/common/partition_pruner.cc:

http://gerrit.cloudera.org:8080/#/c/19794/7/src/kudu/common/partition_pruner.cc@186
PS7, Line 186:   vector<vector<const void*>> predicate_values_list;
> nit: Move this line to L207 where is closer to the place it's used ?
Done


http://gerrit.cloudera.org:8080/#/c/19794/7/src/kudu/common/partition_pruner.cc@217
PS7, Line 217:   DCHECK_NOTNULL(predicate_values_picked);
> nit: adds DCHECK to check predicate_values_picked and hash_bucket_bitset wi
Done


http://gerrit.cloudera.org:8080/#/c/19794/7/src/kudu/common/partition_pruner.cc@219
PS7, Line 219: ol immediately_stop = true;
             :   for (const auto b : *hash_bucket_bitset) {
             :     immediately_stop &= b;
             :   }
             :   i
> Why not return immediately once find a true value in the loop?
All bitset has set, that means all partitions for this 'hash_dimension' should reserve. 
Its not necessary to run left.

In fact, this is the main optimization for speeding up when in-list predicate values are very long.


http://gerrit.cloudera.org:8080/#/c/19794/7/src/kudu/common/partition_pruner.cc@225
PS7, Line 225:   }
> How about using DCHECK_EQ which is take effect only in DEBUG version? The r
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/19794
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie4bea5c10b4ac2c62b85625fe9d2a33ceb4fb2e9
Gerrit-Change-Number: 19794
Gerrit-PatchSet: 8
Gerrit-Owner: Yuqi Du <sh...@gmail.com>
Gerrit-Reviewer: Abhishek Chennaka <ac...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Ashwani Raina <ar...@cloudera.com>
Gerrit-Reviewer: KeDeng <kd...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Wang Xixu <14...@qq.com>
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Yuqi Du <sh...@gmail.com>
Gerrit-Comment-Date: Fri, 19 May 2023 02:50:36 +0000
Gerrit-HasComments: Yes

[kudu-CR] [cpp-client] KUDU-3455 Reduce space complexity and speed up hash partition pruning for in-list predicate

Posted by "Yuqi Du (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Alexey Serbin, Ashwani Raina, Yingchun Lai, Yifan Zhang, Kudu Jenkins, Abhishek Chennaka, KeDeng, Wang Xixu, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/19794

to look at the new patch set (#9).

Change subject: [cpp-client] KUDU-3455 Reduce space complexity and speed up hash partition pruning for in-list predicate
......................................................................

[cpp-client] KUDU-3455 Reduce space complexity and speed up hash partition pruning for in-list predicate

This patch comes from a committed patch, its committed id is 'b69dbeb'. As that
patch said, logic of pruning hash partitions for in-list predicate in Kudu cpp
client has also a high space complexity and slow. Old algorithm must keep
all intermedium objects because they are incomplete until they are
completed and can be computed hash.

This patch fixes the problems and provides a recursive algorithm the
same as one using java client.

Compared with java client, the cpp client is less likely to cause the
OOM condition because it does not keep too many intermediate results.
This optimization has good benefits too. The benefits are related to
the in-list length and the number of primary keys, the performance
would be better if in-list length is longer. For example,
PartitionPrunerTest::TestMultiColumnInListHashPruningManyValues,
using 10 key columns and kMaxInListLength=50, old algorithm memory cost
may reach 600MB, while new algorithm's memory cost can be ignored
(it only need one objects and a few stacks for contexts). At the same
time, new algorithm has a good speedup, some effect as below:

combination_count: 5554006920000, old cost: 428238us, new cost: 713us, speedup: 600.6x
combination_count: 89083783664568, old cost: 2764924us, new cost: 1145us, speedup: 2414.7x
combination_count: 27194091724800, old cost: 1610475us, new cost: 1151us, speedup: 1399.2x
combination_count: 7116622216704, old cost: 34544289us, new cost: 375us, speedup: 92118.1x
combination_count: 37570734489600, old cost: 1733205us, new cost: 901us, speedup: 1923.6x

Change-Id: Ie4bea5c10b4ac2c62b85625fe9d2a33ceb4fb2e9
---
M src/kudu/common/partition_pruner-test.cc
M src/kudu/common/partition_pruner.cc
M src/kudu/common/partition_pruner.h
3 files changed, 238 insertions(+), 19 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/94/19794/9
-- 
To view, visit http://gerrit.cloudera.org:8080/19794
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie4bea5c10b4ac2c62b85625fe9d2a33ceb4fb2e9
Gerrit-Change-Number: 19794
Gerrit-PatchSet: 9
Gerrit-Owner: Yuqi Du <sh...@gmail.com>
Gerrit-Reviewer: Abhishek Chennaka <ac...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Ashwani Raina <ar...@cloudera.com>
Gerrit-Reviewer: KeDeng <kd...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Wang Xixu <14...@qq.com>
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Yuqi Du <sh...@gmail.com>

[kudu-CR] [cpp-client] KUDU-3455 Reduce space complexity and speed up hash partition pruning for in-list predicate

Posted by "Yuqi Du (Code Review)" <ge...@cloudera.org>.
Yuqi Du has posted comments on this change. ( http://gerrit.cloudera.org:8080/19794 )

Change subject: [cpp-client] KUDU-3455 Reduce space complexity and speed up hash partition pruning for in-list predicate
......................................................................


Patch Set 9: Code-Review+1

(2 comments)

Thanks for your advices.

http://gerrit.cloudera.org:8080/#/c/19794/9/src/kudu/common/partition_pruner.h
File src/kudu/common/partition_pruner.h:

http://gerrit.cloudera.org:8080/#/c/19794/9/src/kudu/common/partition_pruner.h@95
PS9, Line 95:   // hash_bucket_bitset saves the hash buckets result.
> nit: the result is stored in hash_bucket_bitset
Done


http://gerrit.cloudera.org:8080/#/c/19794/9/src/kudu/common/partition_pruner.cc
File src/kudu/common/partition_pruner.cc:

http://gerrit.cloudera.org:8080/#/c/19794/9/src/kudu/common/partition_pruner.cc@243
PS9, Line 243:   CHECK_LT(level, predicate_values_list.size());
> CHECK_LT is used in release mode. Are there assertion similar used in debug
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/19794
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie4bea5c10b4ac2c62b85625fe9d2a33ceb4fb2e9
Gerrit-Change-Number: 19794
Gerrit-PatchSet: 9
Gerrit-Owner: Yuqi Du <sh...@gmail.com>
Gerrit-Reviewer: Abhishek Chennaka <ac...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Ashwani Raina <ar...@cloudera.com>
Gerrit-Reviewer: KeDeng <kd...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Wang Xixu <14...@qq.com>
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Yuqi Du <sh...@gmail.com>
Gerrit-Comment-Date: Mon, 22 May 2023 07:27:23 +0000
Gerrit-HasComments: Yes

[kudu-CR] [cpp-client] KUDU-3455 Reduce space complexity and speed up hash partition pruning for in-list predicate

Posted by "Yuqi Du (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/19794

to look at the new patch set (#3).

Change subject: [cpp-client] KUDU-3455 Reduce space complexity and speed up hash partition pruning for in-list predicate
......................................................................

[cpp-client] KUDU-3455 Reduce space complexity and speed up hash partition pruning for in-list predicate

This patch's background is in https://gerrit.cloudera.org/c/19568/. As
that patch said, logic of pruning hash partitions for in-list predicate
in Kudu cpp client has also a high space complexity and slow. Old
algorithm must keep all intermedium objects because they are incomplete
until they are completed and can be computed hash.

This patch fixes the problems and provides a recursive algorithm the
same as java client in patch https://gerrit.cloudera.org/c/19568/.

Unlike java client, this problem in cpp client is not critical.
Old algorithm in cpp client does rare go out of memory because it
swap the intermedium objects. This optimization has good benifit too.
The benifits depend on the in-list length and size of primary columns,
benifit are better if in-list length is bigger. For example,
PartitionPrunerTest::TestMultiColumnInListHashPruningManyValues,
Using 10 key columns and kMaxInListLength=50, old algorithm memory cost
may reach 600MB, while new algorithm's memory cost can be ignored
(it only need one objects and a few stacks for contexts). At the same
time, new algorithm has a good speedup, some effect as below:

combination_count: 5554006920000, old cost: 428238us, new cost: 713us, speedup: 600.6x
combination_count: 89083783664568, old cost: 2764924us, new cost: 1145us, speedup: 2414.7x
combination_count: 27194091724800, old cost: 1610475us, new cost: 1151us, speedup: 1399.2x
combination_count: 7116622216704, old cost: 34544289us, new cost: 375us, speedup: 92118.1x
combination_count: 37570734489600, old cost: 1733205us, new cost: 901us, speedup: 1923.6x

Change-Id: Ie4bea5c10b4ac2c62b85625fe9d2a33ceb4fb2e9
---
M src/kudu/common/partition_pruner-test.cc
M src/kudu/common/partition_pruner.cc
M src/kudu/common/partition_pruner.h
3 files changed, 228 insertions(+), 18 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/94/19794/3
-- 
To view, visit http://gerrit.cloudera.org:8080/19794
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie4bea5c10b4ac2c62b85625fe9d2a33ceb4fb2e9
Gerrit-Change-Number: 19794
Gerrit-PatchSet: 3
Gerrit-Owner: Yuqi Du <sh...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR] [cpp-client] KUDU-3455 Reduce space complexity and speed up hash partition pruning for in-list predicate

Posted by "Yuqi Du (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Alexey Serbin, Ashwani Raina, Yingchun Lai, Yifan Zhang, Kudu Jenkins, Abhishek Chennaka, KeDeng, Wang Xixu, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/19794

to look at the new patch set (#6).

Change subject: [cpp-client] KUDU-3455 Reduce space complexity and speed up hash partition pruning for in-list predicate
......................................................................

[cpp-client] KUDU-3455 Reduce space complexity and speed up hash partition pruning for in-list predicate

This patch comes from https://gerrit.cloudera.org/c/19568/. As that patch
said, logic of pruning hash partitions for in-list predicate in Kudu cpp
client has also a high space complexity and slow. Old algorithm must keep
all intermedium objects because they are incomplete until they are
completed and can be computed hash.

This patch fixes the problems and provides a recursive algorithm the
same as java client in patch: https://gerrit.cloudera.org/c/19568/.

Compared with java client, the cpp client is less likely to cause the
OOM condition because it does not keep too many intermediate results.
This optimization has good benefits too. The benefits are related to
the in-list length and the number of primary columns, The performance
would be better if in-list length is bigger. For example,
PartitionPrunerTest::TestMultiColumnInListHashPruningManyValues,
Using 10 key columns and kMaxInListLength=50, old algorithm memory cost
may reach 600MB, while new algorithm's memory cost can be ignored
(it only need one objects and a few stacks for contexts). At the same
time, new algorithm has a good speedup, some effect as below:

combination_count: 5554006920000, old cost: 428238us, new cost: 713us, speedup: 600.6x
combination_count: 89083783664568, old cost: 2764924us, new cost: 1145us, speedup: 2414.7x
combination_count: 27194091724800, old cost: 1610475us, new cost: 1151us, speedup: 1399.2x
combination_count: 7116622216704, old cost: 34544289us, new cost: 375us, speedup: 92118.1x
combination_count: 37570734489600, old cost: 1733205us, new cost: 901us, speedup: 1923.6x

Change-Id: Ie4bea5c10b4ac2c62b85625fe9d2a33ceb4fb2e9
---
M src/kudu/common/partition_pruner-test.cc
M src/kudu/common/partition_pruner.cc
M src/kudu/common/partition_pruner.h
3 files changed, 236 insertions(+), 18 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/94/19794/6
-- 
To view, visit http://gerrit.cloudera.org:8080/19794
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie4bea5c10b4ac2c62b85625fe9d2a33ceb4fb2e9
Gerrit-Change-Number: 19794
Gerrit-PatchSet: 6
Gerrit-Owner: Yuqi Du <sh...@gmail.com>
Gerrit-Reviewer: Abhishek Chennaka <ac...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Ashwani Raina <ar...@cloudera.com>
Gerrit-Reviewer: KeDeng <kd...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Wang Xixu <14...@qq.com>
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Yuqi Du <sh...@gmail.com>

[kudu-CR] [cpp-client] KUDU-3455 Reduce space complexity and speed up hash partition pruning for in-list predicate

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Yingchun Lai has removed a vote on this change.

Change subject: [cpp-client] KUDU-3455 Reduce space complexity and speed up hash partition pruning for in-list predicate
......................................................................


Removed Verified-1 by Kudu Jenkins (120)
-- 
To view, visit http://gerrit.cloudera.org:8080/19794
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: deleteVote
Gerrit-Change-Id: Ie4bea5c10b4ac2c62b85625fe9d2a33ceb4fb2e9
Gerrit-Change-Number: 19794
Gerrit-PatchSet: 14
Gerrit-Owner: Yuqi Du <sh...@gmail.com>
Gerrit-Reviewer: Abhishek Chennaka <ac...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Ashwani Raina <ar...@cloudera.com>
Gerrit-Reviewer: KeDeng <kd...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Wang Xixu <14...@qq.com>
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Yuqi Du <sh...@gmail.com>

[kudu-CR] [cpp-client] KUDU-3455 Reduce space complexity and speed up hash partition pruning for in-list predicate

Posted by "Yuqi Du (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Alexey Serbin, Ashwani Raina, Yingchun Lai, Yifan Zhang, Kudu Jenkins, Abhishek Chennaka, KeDeng, Wang Xixu, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/19794

to look at the new patch set (#10).

Change subject: [cpp-client] KUDU-3455 Reduce space complexity and speed up hash partition pruning for in-list predicate
......................................................................

[cpp-client] KUDU-3455 Reduce space complexity and speed up hash partition pruning for in-list predicate

This patch comes from a committed patch, its committed id is 'b69dbeb'. As that
patch said, logic of pruning hash partitions for in-list predicate in Kudu cpp
client has also a high space complexity and slow. Old algorithm must keep
all intermedium objects because they are incomplete until they are
completed and can be computed hash.

This patch fixes the problems and provides a recursive algorithm the
same as one using java client.

Compared with java client, the cpp client is less likely to cause the
OOM condition because it does not keep too many intermediate results.
This optimization has good benefits too. The benefits are related to
the in-list length and the number of primary keys, the performance
would be better if in-list length is longer. For example,
PartitionPrunerTest::TestMultiColumnInListHashPruningManyValues,
using 10 key columns and kMaxInListLength=50, old algorithm memory cost
may reach 600MB, while new algorithm's memory cost can be ignored
(it only need one objects and a few stacks for contexts). At the same
time, new algorithm has a good speedup, some effect as below:

combination_count: 5554006920000, old cost: 428238us, new cost: 713us, speedup: 600.6x
combination_count: 89083783664568, old cost: 2764924us, new cost: 1145us, speedup: 2414.7x
combination_count: 27194091724800, old cost: 1610475us, new cost: 1151us, speedup: 1399.2x
combination_count: 7116622216704, old cost: 34544289us, new cost: 375us, speedup: 92118.1x
combination_count: 37570734489600, old cost: 1733205us, new cost: 901us, speedup: 1923.6x

Change-Id: Ie4bea5c10b4ac2c62b85625fe9d2a33ceb4fb2e9
---
M src/kudu/common/partition_pruner-test.cc
M src/kudu/common/partition_pruner.cc
M src/kudu/common/partition_pruner.h
3 files changed, 238 insertions(+), 19 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/94/19794/10
-- 
To view, visit http://gerrit.cloudera.org:8080/19794
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie4bea5c10b4ac2c62b85625fe9d2a33ceb4fb2e9
Gerrit-Change-Number: 19794
Gerrit-PatchSet: 10
Gerrit-Owner: Yuqi Du <sh...@gmail.com>
Gerrit-Reviewer: Abhishek Chennaka <ac...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Ashwani Raina <ar...@cloudera.com>
Gerrit-Reviewer: KeDeng <kd...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Wang Xixu <14...@qq.com>
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Yuqi Du <sh...@gmail.com>

[kudu-CR] [cpp-client] KUDU-3455 Reduce space complexity and speed up hash partition pruning for in-list predicate

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Yingchun Lai has posted comments on this change. ( http://gerrit.cloudera.org:8080/19794 )

Change subject: [cpp-client] KUDU-3455 Reduce space complexity and speed up hash partition pruning for in-list predicate
......................................................................


Patch Set 12:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/19794/12/src/kudu/common/partition_pruner-test.cc
File src/kudu/common/partition_pruner-test.cc:

http://gerrit.cloudera.org:8080/#/c/19794/12/src/kudu/common/partition_pruner-test.cc@96
PS12, Line 96: Return hash values bitset of these combinations.
How about:

 Return a bitset indicates which hash buckets are selected of these combinations.


http://gerrit.cloudera.org:8080/#/c/19794/12/src/kudu/common/partition_pruner-test.cc@193
PS12, Line 193: vector<bool> PartitionPrunerTest::PruneHashComponent(
Is it copied from partition_pruner.cc?


http://gerrit.cloudera.org:8080/#/c/19794/12/src/kudu/common/partition_pruner-test.cc@984
PS12, Line 984: Check the correctness of new algorithm by comparing it with the old one.
How about add some tests before this patch, the tests will pass by the 'old algorithm', then replace the old algorithm by the new one, make sure the tests added before will pass as well.


http://gerrit.cloudera.org:8080/#/c/19794/12/src/kudu/common/partition_pruner.cc
File src/kudu/common/partition_pruner.cc:

http://gerrit.cloudera.org:8080/#/c/19794/12/src/kudu/common/partition_pruner.cc@204
PS12, Line 204:   vector<const void*> predicate_values_picked;
Could you please add some comments to describe what do 'predicate_values_picked' and 'hash_bucket_bitset' store?


http://gerrit.cloudera.org:8080/#/c/19794/12/src/kudu/common/partition_pruner.cc@206
PS12, Line 206:   vector<bool> hash_bucket_bitset(hash_dimension.num_buckets, false);
nit: rename to 'hash_bucket_selected' ?


http://gerrit.cloudera.org:8080/#/c/19794/12/src/kudu/common/partition_pruner.cc@219
PS12, Line 219: immediately_stop
How about renaming it closer to what it stands for? Like 'all_hash_bucket_needed' or something like that.


http://gerrit.cloudera.org:8080/#/c/19794/12/src/kudu/common/partition_pruner.cc@224
PS12, Line 224: All bits in the bitset have been set, 
If it has a more meaningful name, I gusee these comments can be omitted?



-- 
To view, visit http://gerrit.cloudera.org:8080/19794
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie4bea5c10b4ac2c62b85625fe9d2a33ceb4fb2e9
Gerrit-Change-Number: 19794
Gerrit-PatchSet: 12
Gerrit-Owner: Yuqi Du <sh...@gmail.com>
Gerrit-Reviewer: Abhishek Chennaka <ac...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Ashwani Raina <ar...@cloudera.com>
Gerrit-Reviewer: KeDeng <kd...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Wang Xixu <14...@qq.com>
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Yuqi Du <sh...@gmail.com>
Gerrit-Comment-Date: Fri, 09 Jun 2023 09:40:34 +0000
Gerrit-HasComments: Yes

[kudu-CR] [cpp-client] KUDU-3455 Reduce space complexity and speed up hash partition pruning for in-list predicate

Posted by "Yuqi Du (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/19794

to look at the new patch set (#2).

Change subject: [cpp-client] KUDU-3455 Reduce space complexity and speed up hash partition pruning for in-list predicate
......................................................................

[cpp-client] KUDU-3455 Reduce space complexity and speed up hash partition pruning for in-list predicate

This patch's background is in https://gerrit.cloudera.org/c/19568/. As
that patch said, logic of pruning hash partitions for in-list predicate
in Kudu cpp client has also a high space complexity and slow. Old
algorithm must keep all intermedium objects because they are incomplete
until they are completed and can be compute hash.

This patch fixes the problems and provides a recursive algorithm the
same as java client in patch https://gerrit.cloudera.org/c/19568/.

Unlike java client, this problem in cpp client is not critical.
Old algorithm in cpp client does rare go out of memory because it
swap the intermedium objects. This optimization has good benifit too.
PartitionPrunerTest::TestMultiColumnInListHashPruningManyValues can show
some benifits. The benifits depend on the in-list length, for example,
in the test TestMultiColumnInListHashPruningManyValues, set kMaxInListLength
50, old algorithm memory cost can reach 300MB, while new algorithm's memory
cost can be ignored(it only need one object and a few stacks for context).
At the same time, new algorithm has a good speedup, below shows some effect:

combination_count: 5554006920000, old algorithm cost: 428238us, new algorithm cost: 713us, speedup: 600.61430575035058x
combination_count: 89083783664568, old algorithm cost: 2764924us, new algorithm cost: 1145us, speedup: 2414.780786026201x
combination_count: 27194091724800, old algorithm cost: 1610475us, new algorithm cost: 1151us, speedup: 1399.1963509991313x
combination_count: 7116622216704, old algorithm cost: 34544289us, new algorithm cost: 375us, speedup: 92118.104x
combination_count: 37570734489600, old algorithm cost: 1733205us, new algorithm cost: 901us, speedup: 1923.6459489456161x

Change-Id: Ie4bea5c10b4ac2c62b85625fe9d2a33ceb4fb2e9
---
M src/kudu/common/partition_pruner-test.cc
M src/kudu/common/partition_pruner.cc
M src/kudu/common/partition_pruner.h
3 files changed, 228 insertions(+), 18 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/94/19794/2
-- 
To view, visit http://gerrit.cloudera.org:8080/19794
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie4bea5c10b4ac2c62b85625fe9d2a33ceb4fb2e9
Gerrit-Change-Number: 19794
Gerrit-PatchSet: 2
Gerrit-Owner: Yuqi Du <sh...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR] [cpp-client] KUDU-3455 Reduce space complexity and speed up hash partition pruning for in-list predicate

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Yingchun Lai has posted comments on this change. ( http://gerrit.cloudera.org:8080/19794 )

Change subject: [cpp-client] KUDU-3455 Reduce space complexity and speed up hash partition pruning for in-list predicate
......................................................................


Patch Set 7:

(11 comments)

http://gerrit.cloudera.org:8080/#/c/19794/7//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19794/7//COMMIT_MSG@9
PS7, Line 9: https://gerrit.cloudera.org/c/19568/
It would be better to use the related commit hash i.e. b69dbeb instead, then it would be more convenient to show the info by using "git log b69dbeb".


http://gerrit.cloudera.org:8080/#/c/19794/7//COMMIT_MSG@16
PS7, Line 16: https://gerrit.cloudera.org/c/19568/
Same.


http://gerrit.cloudera.org:8080/#/c/19794/7//COMMIT_MSG@21
PS7, Line 21: columns
nit: keys


http://gerrit.cloudera.org:8080/#/c/19794/7//COMMIT_MSG@21
PS7, Line 21: The
nit: the


http://gerrit.cloudera.org:8080/#/c/19794/7//COMMIT_MSG@22
PS7, Line 22: bigger
nit: longger?


http://gerrit.cloudera.org:8080/#/c/19794/7//COMMIT_MSG@24
PS7, Line 24: Using
nit: using


http://gerrit.cloudera.org:8080/#/c/19794/7/src/kudu/common/partition_pruner-test.cc
File src/kudu/common/partition_pruner-test.cc:

http://gerrit.cloudera.org:8080/#/c/19794/7/src/kudu/common/partition_pruner-test.cc@97
PS7, Line 97: std::vector
nit: vector


http://gerrit.cloudera.org:8080/#/c/19794/7/src/kudu/common/partition_pruner.cc
File src/kudu/common/partition_pruner.cc:

http://gerrit.cloudera.org:8080/#/c/19794/7/src/kudu/common/partition_pruner.cc@186
PS7, Line 186:   vector<bool> hash_bucket_bitset(hash_dimension.num_buckets, false);
nit: Move this line to L207 where is closer to the place it's used ?


http://gerrit.cloudera.org:8080/#/c/19794/7/src/kudu/common/partition_pruner.cc@217
PS7, Line 217:   bool immediately_stop = true;
nit: adds DCHECK to check predicate_values_picked and hash_bucket_bitset will not be nullptr?


http://gerrit.cloudera.org:8080/#/c/19794/7/src/kudu/common/partition_pruner.cc@219
PS7, Line 219: immediately_stop &= b;
             :   }
             :   if (immediately_stop) {
             :     return;
             :   }
Why not return immediately once find a true value in the loop?


http://gerrit.cloudera.org:8080/#/c/19794/7/src/kudu/common/partition_pruner.cc@225
PS7, Line 225:   CHECK_EQ(hash_dimension.column_ids.size(), predicate_values_list.size());
How about using DCHECK_EQ which is take effect only in DEBUG version? The release version which is usually used in product environment will not be effected.



-- 
To view, visit http://gerrit.cloudera.org:8080/19794
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie4bea5c10b4ac2c62b85625fe9d2a33ceb4fb2e9
Gerrit-Change-Number: 19794
Gerrit-PatchSet: 7
Gerrit-Owner: Yuqi Du <sh...@gmail.com>
Gerrit-Reviewer: Abhishek Chennaka <ac...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Ashwani Raina <ar...@cloudera.com>
Gerrit-Reviewer: KeDeng <kd...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Wang Xixu <14...@qq.com>
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Yuqi Du <sh...@gmail.com>
Gerrit-Comment-Date: Tue, 16 May 2023 15:46:39 +0000
Gerrit-HasComments: Yes

[kudu-CR] [cpp-client] KUDU-3455 Reduce space complexity and speed up hash partition pruning for in-list predicate

Posted by "Yuqi Du (Code Review)" <ge...@cloudera.org>.
Yuqi Du has posted comments on this change. ( http://gerrit.cloudera.org:8080/19794 )

Change subject: [cpp-client] KUDU-3455 Reduce space complexity and speed up hash partition pruning for in-list predicate
......................................................................


Patch Set 13: Verified+1

(7 comments)

Thanks for your advices, I fixed them.

http://gerrit.cloudera.org:8080/#/c/19794/12/src/kudu/common/partition_pruner-test.cc
File src/kudu/common/partition_pruner-test.cc:

http://gerrit.cloudera.org:8080/#/c/19794/12/src/kudu/common/partition_pruner-test.cc@96
PS12, Line 96: Return a bitset indicates which hash buckets are
> How about:
Done


http://gerrit.cloudera.org:8080/#/c/19794/12/src/kudu/common/partition_pruner-test.cc@193
PS12, Line 193: // The old algorithm. It moved from 'partition_pruner.cc'.
> Is it copied from partition_pruner.cc?
Yes.


http://gerrit.cloudera.org:8080/#/c/19794/12/src/kudu/common/partition_pruner-test.cc@984
PS12, Line 984: generate some in-list predicates for these key columns. The old algorith
> How about add some tests before this patch, the tests will pass by the 'old
It seems more confuse. 
I add some comments using other words.


http://gerrit.cloudera.org:8080/#/c/19794/12/src/kudu/common/partition_pruner.cc
File src/kudu/common/partition_pruner.cc:

http://gerrit.cloudera.org:8080/#/c/19794/12/src/kudu/common/partition_pruner.cc@204
PS12, Line 204:   // A combination of the predicate values is selected to compute the hash buckets.
> Could you please add some comments to describe what do 'predicate_values_pi
Done


http://gerrit.cloudera.org:8080/#/c/19794/12/src/kudu/common/partition_pruner.cc@206
PS12, Line 206:   predicate_values_selected.reserve(hash_dimension.column_ids.size());
> nit: rename to 'hash_bucket_selected' ?
Done


http://gerrit.cloudera.org:8080/#/c/19794/12/src/kudu/common/partition_pruner.cc@219
PS12, Line 219:                 
> How about renaming it closer to what it stands for? Like 'all_hash_bucket_n
Done


http://gerrit.cloudera.org:8080/#/c/19794/12/src/kudu/common/partition_pruner.cc@224
PS12, Line 224: all_hash_bucket_needed = true;
> If it has a more meaningful name, I gusee these comments can be omitted?
Removed.



-- 
To view, visit http://gerrit.cloudera.org:8080/19794
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie4bea5c10b4ac2c62b85625fe9d2a33ceb4fb2e9
Gerrit-Change-Number: 19794
Gerrit-PatchSet: 13
Gerrit-Owner: Yuqi Du <sh...@gmail.com>
Gerrit-Reviewer: Abhishek Chennaka <ac...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Ashwani Raina <ar...@cloudera.com>
Gerrit-Reviewer: KeDeng <kd...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Wang Xixu <14...@qq.com>
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Yuqi Du <sh...@gmail.com>
Gerrit-Comment-Date: Tue, 13 Jun 2023 08:55:57 +0000
Gerrit-HasComments: Yes

[kudu-CR] [cpp-client] KUDU-3455 Reduce space complexity and speed up hash partition pruning for in-list predicate

Posted by "Yuqi Du (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Alexey Serbin, Ashwani Raina, Yingchun Lai, Yifan Zhang, Kudu Jenkins, Abhishek Chennaka, KeDeng, Wang Xixu, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/19794

to look at the new patch set (#12).

Change subject: [cpp-client] KUDU-3455 Reduce space complexity and speed up hash partition pruning for in-list predicate
......................................................................

[cpp-client] KUDU-3455 Reduce space complexity and speed up hash partition pruning for in-list predicate

This patch comes from a committed patch, its committed id is 'b69dbeb'. As that
patch said, logic of pruning hash partitions for in-list predicate in Kudu cpp
client has also a high space complexity and slow. Old algorithm must keep
all intermedium objects because they are incomplete until they are
completed and can be computed hash.

This patch fixes the problems and provides a recursive algorithm the
same as the one used by java client.

Compared with java client, the cpp client is less likely to cause the
OOM condition because it does not keep too many intermediate results.
This optimization has good benefits too. The benefits are related to
the in-list length and the number of primary keys, the performance
would be better if in-list length is longer. For example,
PartitionPrunerTest::TestMultiColumnInListHashPruningManyValues,
using 10 key columns and kMaxInListLength=50, old algorithm memory cost
may reach 600MB, while new algorithm's memory cost can be ignored
(it only need one objects and a few stacks for contexts). At the same
time, new algorithm has a good speedup, some effect as below:

combination_count: 5554006920000, old cost: 428238us, new cost: 713us, speedup: 600.6x
combination_count: 89083783664568, old cost: 2764924us, new cost: 1145us, speedup: 2414.7x
combination_count: 27194091724800, old cost: 1610475us, new cost: 1151us, speedup: 1399.2x
combination_count: 7116622216704, old cost: 34544289us, new cost: 375us, speedup: 92118.1x
combination_count: 37570734489600, old cost: 1733205us, new cost: 901us, speedup: 1923.6x

Change-Id: Ie4bea5c10b4ac2c62b85625fe9d2a33ceb4fb2e9
---
M src/kudu/common/partition_pruner-test.cc
M src/kudu/common/partition_pruner.cc
M src/kudu/common/partition_pruner.h
3 files changed, 240 insertions(+), 19 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/94/19794/12
-- 
To view, visit http://gerrit.cloudera.org:8080/19794
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie4bea5c10b4ac2c62b85625fe9d2a33ceb4fb2e9
Gerrit-Change-Number: 19794
Gerrit-PatchSet: 12
Gerrit-Owner: Yuqi Du <sh...@gmail.com>
Gerrit-Reviewer: Abhishek Chennaka <ac...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Ashwani Raina <ar...@cloudera.com>
Gerrit-Reviewer: KeDeng <kd...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Wang Xixu <14...@qq.com>
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Yuqi Du <sh...@gmail.com>

[kudu-CR] [cpp-client] KUDU-3455 Reduce space complexity and speed up hash partition pruning for in-list predicate

Posted by "Wang Xixu (Code Review)" <ge...@cloudera.org>.
Wang Xixu has posted comments on this change. ( http://gerrit.cloudera.org:8080/19794 )

Change subject: [cpp-client] KUDU-3455 Reduce space complexity and speed up hash partition pruning for in-list predicate
......................................................................


Patch Set 9: Code-Review+1

(2 comments)

http://gerrit.cloudera.org:8080/#/c/19794/9/src/kudu/common/partition_pruner.h
File src/kudu/common/partition_pruner.h:

http://gerrit.cloudera.org:8080/#/c/19794/9/src/kudu/common/partition_pruner.h@95
PS9, Line 95:   // hash_bucket_bitset saves the hash buckets result.
nit: the result is stored in hash_bucket_bitset


http://gerrit.cloudera.org:8080/#/c/19794/9/src/kudu/common/partition_pruner.cc
File src/kudu/common/partition_pruner.cc:

http://gerrit.cloudera.org:8080/#/c/19794/9/src/kudu/common/partition_pruner.cc@243
PS9, Line 243:   CHECK_LT(level, predicate_values_list.size());
CHECK_LT is used in release mode. Are there assertion similar used in debug mode?



-- 
To view, visit http://gerrit.cloudera.org:8080/19794
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie4bea5c10b4ac2c62b85625fe9d2a33ceb4fb2e9
Gerrit-Change-Number: 19794
Gerrit-PatchSet: 9
Gerrit-Owner: Yuqi Du <sh...@gmail.com>
Gerrit-Reviewer: Abhishek Chennaka <ac...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Ashwani Raina <ar...@cloudera.com>
Gerrit-Reviewer: KeDeng <kd...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Wang Xixu <14...@qq.com>
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Yuqi Du <sh...@gmail.com>
Gerrit-Comment-Date: Mon, 22 May 2023 04:00:22 +0000
Gerrit-HasComments: Yes

[kudu-CR] [cpp-client] KUDU-3455 Reduce space complexity and speed up hash partition pruning for in-list predicate

Posted by "Yuqi Du (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Alexey Serbin, Ashwani Raina, Yingchun Lai, Yifan Zhang, Kudu Jenkins, Abhishek Chennaka, KeDeng, Wang Xixu, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/19794

to look at the new patch set (#5).

Change subject: [cpp-client] KUDU-3455 Reduce space complexity and speed up hash partition pruning for in-list predicate
......................................................................

[cpp-client] KUDU-3455 Reduce space complexity and speed up hash partition pruning for in-list predicate

This patch comes from https://gerrit.cloudera.org/c/19568/. As that patch
said, logic of pruning hash partitions for in-list predicate in Kudu cpp
client has also a high space complexity and slow. Old algorithm must keep
all intermedium objects because they are incomplete until they are
completed and can be computed hash.

This patch fixes the problems and provides a recursive algorithm the
same as java client in patch: https://gerrit.cloudera.org/c/19568/.

Compared with java client, the cpp client is less likely to cause the
OOM condition because it does not keep too many intermediate results.
This optimization has good benefits too. The benefits are related to
the in-list length and the number of primary columns, The performance
would be better if in-list length is bigger. For example,
PartitionPrunerTest::TestMultiColumnInListHashPruningManyValues,
Using 10 key columns and kMaxInListLength=50, old algorithm memory cost
may reach 600MB, while new algorithm's memory cost can be ignored
(it only need one objects and a few stacks for contexts). At the same
time, new algorithm has a good speedup, some effect as below:

combination_count: 5554006920000, old cost: 428238us, new cost: 713us, speedup: 600.6x
combination_count: 89083783664568, old cost: 2764924us, new cost: 1145us, speedup: 2414.7x
combination_count: 27194091724800, old cost: 1610475us, new cost: 1151us, speedup: 1399.2x
combination_count: 7116622216704, old cost: 34544289us, new cost: 375us, speedup: 92118.1x
combination_count: 37570734489600, old cost: 1733205us, new cost: 901us, speedup: 1923.6x

Change-Id: Ie4bea5c10b4ac2c62b85625fe9d2a33ceb4fb2e9
---
M src/kudu/common/partition_pruner-test.cc
M src/kudu/common/partition_pruner.cc
M src/kudu/common/partition_pruner.h
3 files changed, 236 insertions(+), 18 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/94/19794/5
-- 
To view, visit http://gerrit.cloudera.org:8080/19794
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie4bea5c10b4ac2c62b85625fe9d2a33ceb4fb2e9
Gerrit-Change-Number: 19794
Gerrit-PatchSet: 5
Gerrit-Owner: Yuqi Du <sh...@gmail.com>
Gerrit-Reviewer: Abhishek Chennaka <ac...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Ashwani Raina <ar...@cloudera.com>
Gerrit-Reviewer: KeDeng <kd...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Wang Xixu <14...@qq.com>
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Yuqi Du <sh...@gmail.com>

[kudu-CR] [cpp-client] KUDU-3455 Reduce space complexity and speed up hash partition pruning for in-list predicate

Posted by "Yuqi Du (Code Review)" <ge...@cloudera.org>.
Hello Alexey Serbin, Ashwani Raina, Yingchun Lai, Yifan Zhang, Kudu Jenkins, Abhishek Chennaka, KeDeng, Wang Xixu, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/19794

to look at the new patch set (#4).

Change subject: [cpp-client] KUDU-3455 Reduce space complexity and speed up hash partition pruning for in-list predicate
......................................................................

[cpp-client] KUDU-3455 Reduce space complexity and speed up hash partition pruning for in-list predicate

This patch comes from https://gerrit.cloudera.org/c/19568/. As that patch
said, logic of pruning hash partitions for in-list predicate in Kudu cpp
client has also a high space complexity and slow. Old algorithm must keep
all intermedium objects because they are incomplete until they are
completed and can be computed hash.

This patch fixes the problems and provides a recursive algorithm the
same as java client in patch: 19568.

Compared with java client, the cpp client is less likely to cause the
OOM condition because it does not keep too many intermediate results.
This optimization has good benefits too. The benefits are related to
the in-list length and the number of primary columns, The performance
would be better if in-list length is bigger. For example,
PartitionPrunerTest::TestMultiColumnInListHashPruningManyValues,
Using 10 key columns and kMaxInListLength=50, old algorithm memory cost
may reach 600MB, while new algorithm's memory cost can be ignored
(it only need one objects and a few stacks for contexts). At the same
time, new algorithm has a good speedup, some effect as below:

combination_count: 5554006920000, old cost: 428238us, new cost: 713us, speedup: 600.6x
combination_count: 89083783664568, old cost: 2764924us, new cost: 1145us, speedup: 2414.7x
combination_count: 27194091724800, old cost: 1610475us, new cost: 1151us, speedup: 1399.2x
combination_count: 7116622216704, old cost: 34544289us, new cost: 375us, speedup: 92118.1x
combination_count: 37570734489600, old cost: 1733205us, new cost: 901us, speedup: 1923.6x

Change-Id: Ie4bea5c10b4ac2c62b85625fe9d2a33ceb4fb2e9
---
M src/kudu/common/partition_pruner-test.cc
M src/kudu/common/partition_pruner.cc
M src/kudu/common/partition_pruner.h
3 files changed, 231 insertions(+), 18 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/94/19794/4
-- 
To view, visit http://gerrit.cloudera.org:8080/19794
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie4bea5c10b4ac2c62b85625fe9d2a33ceb4fb2e9
Gerrit-Change-Number: 19794
Gerrit-PatchSet: 4
Gerrit-Owner: Yuqi Du <sh...@gmail.com>
Gerrit-Reviewer: Abhishek Chennaka <ac...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Ashwani Raina <ar...@cloudera.com>
Gerrit-Reviewer: KeDeng <kd...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Wang Xixu <14...@qq.com>
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Yuqi Du <sh...@gmail.com>

[kudu-CR] [cpp-client] KUDU-3455 Reduce space complexity and speed up hash partition pruning for in-list predicate

Posted by "Yuqi Du (Code Review)" <ge...@cloudera.org>.
Yuqi Du has posted comments on this change. ( http://gerrit.cloudera.org:8080/19794 )

Change subject: [cpp-client] KUDU-3455 Reduce space complexity and speed up hash partition pruning for in-list predicate
......................................................................


Patch Set 14: Code-Review+1

> Patch Set 13:
> 
> > Patch Set 13: Code-Review+2
> 
> Just FYI, Yuqi: it's not a good practice to add +2 for your own patches (unless you carrying over somebody's else +2 from prior PS just after re-base or minor updates).

OK. Thanks for your advice. I would follow this rule.


-- 
To view, visit http://gerrit.cloudera.org:8080/19794
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie4bea5c10b4ac2c62b85625fe9d2a33ceb4fb2e9
Gerrit-Change-Number: 19794
Gerrit-PatchSet: 14
Gerrit-Owner: Yuqi Du <sh...@gmail.com>
Gerrit-Reviewer: Abhishek Chennaka <ac...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Ashwani Raina <ar...@cloudera.com>
Gerrit-Reviewer: KeDeng <kd...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Wang Xixu <14...@qq.com>
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Yuqi Du <sh...@gmail.com>
Gerrit-Comment-Date: Mon, 26 Jun 2023 09:00:34 +0000
Gerrit-HasComments: No

[kudu-CR] [cpp-client] KUDU-3455 Reduce space complexity and speed up hash partition pruning for in-list predicate

Posted by "Wang Xixu (Code Review)" <ge...@cloudera.org>.
Wang Xixu has posted comments on this change. ( http://gerrit.cloudera.org:8080/19794 )

Change subject: [cpp-client] KUDU-3455 Reduce space complexity and speed up hash partition pruning for in-list predicate
......................................................................


Patch Set 7: Code-Review+1


-- 
To view, visit http://gerrit.cloudera.org:8080/19794
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie4bea5c10b4ac2c62b85625fe9d2a33ceb4fb2e9
Gerrit-Change-Number: 19794
Gerrit-PatchSet: 7
Gerrit-Owner: Yuqi Du <sh...@gmail.com>
Gerrit-Reviewer: Abhishek Chennaka <ac...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Ashwani Raina <ar...@cloudera.com>
Gerrit-Reviewer: KeDeng <kd...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Wang Xixu <14...@qq.com>
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Yuqi Du <sh...@gmail.com>
Gerrit-Comment-Date: Fri, 12 May 2023 03:21:04 +0000
Gerrit-HasComments: No

[kudu-CR] [cpp-client] KUDU-3455 Reduce space complexity and speed up hash partition pruning for in-list predicate

Posted by "Yuqi Du (Code Review)" <ge...@cloudera.org>.
Yuqi Du has posted comments on this change. ( http://gerrit.cloudera.org:8080/19794 )

Change subject: [cpp-client] KUDU-3455 Reduce space complexity and speed up hash partition pruning for in-list predicate
......................................................................


Patch Set 13: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/19794
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie4bea5c10b4ac2c62b85625fe9d2a33ceb4fb2e9
Gerrit-Change-Number: 19794
Gerrit-PatchSet: 13
Gerrit-Owner: Yuqi Du <sh...@gmail.com>
Gerrit-Reviewer: Abhishek Chennaka <ac...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Ashwani Raina <ar...@cloudera.com>
Gerrit-Reviewer: KeDeng <kd...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Wang Xixu <14...@qq.com>
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Yuqi Du <sh...@gmail.com>
Gerrit-Comment-Date: Wed, 21 Jun 2023 11:09:27 +0000
Gerrit-HasComments: No

[kudu-CR] [cpp-client] KUDU-3455 Reduce space complexity and speed up hash partition pruning for in-list predicate

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Yingchun Lai has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/19794 )

Change subject: [cpp-client] KUDU-3455 Reduce space complexity and speed up hash partition pruning for in-list predicate
......................................................................

[cpp-client] KUDU-3455 Reduce space complexity and speed up hash partition pruning for in-list predicate

This patch comes from a committed patch, its committed id is 'b69dbeb'. As that
patch said, logic of pruning hash partitions for in-list predicate in Kudu cpp
client has also a high space complexity and slow. Old algorithm must keep
all intermedium objects because they are incomplete until they are
completed and can be computed hash.

This patch fixes the problems and provides a recursive algorithm the
same as the one used by java client.

Compared with java client, the cpp client is less likely to cause the
OOM condition because it does not keep too many intermediate results.
This optimization has good benefits too. The benefits are related to
the in-list length and the number of primary keys, the performance
would be better if in-list length is longer. For example,
PartitionPrunerTest::TestMultiColumnInListHashPruningManyValues,
using 10 key columns and kMaxInListLength=50, old algorithm memory cost
may reach 600MB, while new algorithm's memory cost can be ignored
(it only need one objects and a few stacks for contexts). At the same
time, new algorithm has a good speedup, some effect as below:

combination_count: 5554006920000, old cost: 428238us, new cost: 713us, speedup: 600.6x
combination_count: 89083783664568, old cost: 2764924us, new cost: 1145us, speedup: 2414.7x
combination_count: 27194091724800, old cost: 1610475us, new cost: 1151us, speedup: 1399.2x
combination_count: 7116622216704, old cost: 34544289us, new cost: 375us, speedup: 92118.1x
combination_count: 37570734489600, old cost: 1733205us, new cost: 901us, speedup: 1923.6x

Change-Id: Ie4bea5c10b4ac2c62b85625fe9d2a33ceb4fb2e9
Reviewed-on: http://gerrit.cloudera.org:8080/19794
Reviewed-by: Yingchun Lai <la...@apache.org>
Tested-by: Yingchun Lai <la...@apache.org>
Reviewed-by: Yuqi Du <sh...@gmail.com>
---
M src/kudu/common/partition_pruner-test.cc
M src/kudu/common/partition_pruner.cc
M src/kudu/common/partition_pruner.h
3 files changed, 240 insertions(+), 19 deletions(-)

Approvals:
  Yingchun Lai: Looks good to me, approved; Verified
  Yuqi Du: Looks good to me, but someone else must approve

-- 
To view, visit http://gerrit.cloudera.org:8080/19794
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Ie4bea5c10b4ac2c62b85625fe9d2a33ceb4fb2e9
Gerrit-Change-Number: 19794
Gerrit-PatchSet: 15
Gerrit-Owner: Yuqi Du <sh...@gmail.com>
Gerrit-Reviewer: Abhishek Chennaka <ac...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Ashwani Raina <ar...@cloudera.com>
Gerrit-Reviewer: KeDeng <kd...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Wang Xixu <14...@qq.com>
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Yuqi Du <sh...@gmail.com>

[kudu-CR] [cpp-client] KUDU-3455 Reduce space complexity and speed up hash partition pruning for in-list predicate

Posted by "Yuqi Du (Code Review)" <ge...@cloudera.org>.
Yuqi Du has posted comments on this change. ( http://gerrit.cloudera.org:8080/19794 )

Change subject: [cpp-client] KUDU-3455 Reduce space complexity and speed up hash partition pruning for in-list predicate
......................................................................


Patch Set 14: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/19794
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie4bea5c10b4ac2c62b85625fe9d2a33ceb4fb2e9
Gerrit-Change-Number: 19794
Gerrit-PatchSet: 14
Gerrit-Owner: Yuqi Du <sh...@gmail.com>
Gerrit-Reviewer: Abhishek Chennaka <ac...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Ashwani Raina <ar...@cloudera.com>
Gerrit-Reviewer: KeDeng <kd...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Wang Xixu <14...@qq.com>
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Yuqi Du <sh...@gmail.com>
Gerrit-Comment-Date: Mon, 26 Jun 2023 07:17:05 +0000
Gerrit-HasComments: No

[kudu-CR] [cpp-client] KUDU-3455 Reduce space complexity and speed up hash partition pruning for in-list predicate

Posted by "Yuqi Du (Code Review)" <ge...@cloudera.org>.
Yuqi Du has posted comments on this change. ( http://gerrit.cloudera.org:8080/19794 )

Change subject: [cpp-client] KUDU-3455 Reduce space complexity and speed up hash partition pruning for in-list predicate
......................................................................


Patch Set 12: Verified+1

(4 comments)

http://gerrit.cloudera.org:8080/#/c/19794/11//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19794/11//COMMIT_MSG@16
PS11, Line 16: the one used by java 
> nit: the one used by java client
Done


http://gerrit.cloudera.org:8080/#/c/19794/11/src/kudu/common/partition_pruner-test.cc
File src/kudu/common/partition_pruner-test.cc:

http://gerrit.cloudera.org:8080/#/c/19794/11/src/kudu/common/partition_pruner-test.cc@1081
PS11, Line 1081: temp_values.push_back(t_value);
               :         test_case.push_back(reinterpret_cast<const void*>(&temp_values[temp_values.size() - 1]));
> Can this be simplified to 'test_case.push_back(reinterpret_cast<const void*
No, the address of 't_value' is not the same as the last element of temp_values, and t_value is an auto variable in stack, its  contents of this address will be undefined after this for loop.

The variable 'temp_values' keep the values and its address in this test.


http://gerrit.cloudera.org:8080/#/c/19794/11/src/kudu/common/partition_pruner-test.cc@1091
PS11, Line 1091: for (int i = 0; i < kKeyColumnSize; i++) {
               :       predicates.emplace_back(ColumnPredicate::InList(schema.column(i), &test_cases[i]));
               :     }
> Maybe this can be moved to the for loop at L1079? And then the 'test_cases'
Can not move it.

The purpose of  lines L1075-L1087 is generating the test_cases,   and L1091-1093 use this cases to generator predicates.


http://gerrit.cloudera.org:8080/#/c/19794/11/src/kudu/common/partition_pruner.cc
File src/kudu/common/partition_pruner.cc:

http://gerrit.cloudera.org:8080/#/c/19794/11/src/kudu/common/partition_pruner.cc@224
PS11, Line 224:     // All bits in the bitset have been set, we should keep all partitions for
> nit: Need to document this short circuit condition.
DONE. Add a comment for this.



-- 
To view, visit http://gerrit.cloudera.org:8080/19794
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie4bea5c10b4ac2c62b85625fe9d2a33ceb4fb2e9
Gerrit-Change-Number: 19794
Gerrit-PatchSet: 12
Gerrit-Owner: Yuqi Du <sh...@gmail.com>
Gerrit-Reviewer: Abhishek Chennaka <ac...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Ashwani Raina <ar...@cloudera.com>
Gerrit-Reviewer: KeDeng <kd...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Wang Xixu <14...@qq.com>
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Yuqi Du <sh...@gmail.com>
Gerrit-Comment-Date: Wed, 07 Jun 2023 02:47:02 +0000
Gerrit-HasComments: Yes

[kudu-CR] [cpp-client] KUDU-3455 Reduce space complexity and speed up hash partition pruning for in-list predicate

Posted by "Yifan Zhang (Code Review)" <ge...@cloudera.org>.
Yifan Zhang has posted comments on this change. ( http://gerrit.cloudera.org:8080/19794 )

Change subject: [cpp-client] KUDU-3455 Reduce space complexity and speed up hash partition pruning for in-list predicate
......................................................................


Patch Set 3:

(17 comments)

http://gerrit.cloudera.org:8080/#/c/19794/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19794/3//COMMIT_MSG@9
PS3, Line 9: This patch's background is in https://gerrit.cloudera.org/c/19568/. As
This patch comes from https://gerrit.cloudera.org/c/19568/.


http://gerrit.cloudera.org:8080/#/c/19794/3//COMMIT_MSG@18
PS3, Line 18: Unlike java client, this problem in cpp client is not critical.
            : Old algorithm in cpp client does rare go out of memory because it
            : swap the intermedium objects.
Compared with the java client, the cpp client is less likely to cause the OOM condition because it does not keep too many intermediate results.


http://gerrit.cloudera.org:8080/#/c/19794/3//COMMIT_MSG@21
PS3, Line 21: depend on
are related to


http://gerrit.cloudera.org:8080/#/c/19794/3//COMMIT_MSG@21
PS3, Line 21: size of
the number of


http://gerrit.cloudera.org:8080/#/c/19794/3//COMMIT_MSG@22
PS3, Line 22: benifit are better
The performance are better


http://gerrit.cloudera.org:8080/#/c/19794/3/src/kudu/common/partition_pruner-test.cc
File src/kudu/common/partition_pruner-test.cc:

http://gerrit.cloudera.org:8080/#/c/19794/3/src/kudu/common/partition_pruner-test.cc@974
PS3, Line 974: // For test cases that should run with both kinds of tokens.
I don't quite understand this sentence, please reorganize it.


http://gerrit.cloudera.org:8080/#/c/19794/3/src/kudu/common/partition_pruner-test.cc@982
PS3, Line 982: with a few key columns (10)
nit: with 10 columns.


http://gerrit.cloudera.org:8080/#/c/19794/3/src/kudu/common/partition_pruner-test.cc@983
PS3, Line 983: // Check new algorithm is correct by comparing with the old one.
Check the correctness of the new algorithm by comparing it with the old one.


http://gerrit.cloudera.org:8080/#/c/19794/3/src/kudu/common/partition_pruner-test.cc@984
PS3, Line 984: // Compare the speed on the two algorithms and show the speedup.
Compare the efficiency of the two algorithms.


http://gerrit.cloudera.org:8080/#/c/19794/3/src/kudu/common/partition_pruner-test.cc@985
PS3, Line 985: TEST_P(PartitionPrunerTestWithMaxInListLength, TestMultiColumnInListHashPruningManyValues) {
nit: Add 'SKIP_IF_SLOW_NOT_ALLOWED()' for the test.


http://gerrit.cloudera.org:8080/#/c/19794/3/src/kudu/common/partition_pruner-test.cc@1023
PS3, Line 1023: comparator with the old one
comparing it with the old one.


http://gerrit.cloudera.org:8080/#/c/19794/3/src/kudu/common/partition_pruner-test.cc@1046
PS3, Line 1046: The following logs can compare v2 and v1.
The following logs are used to compare the efficiency of the two algorithms.


http://gerrit.cloudera.org:8080/#/c/19794/3/src/kudu/common/partition_pruner-test.cc@1063
PS3, Line 1063: / Reserve bigger enough capacity(kMaxSafeLength * kKeyColumnSize) to avoid renew memory
              :     // and copy objects which would cause memory pointer we record changed.
Increase the vector's capacity to kMaxSafeLength * kKeyColumnSize to avoid reallocation that invalidates all references to the elements.


http://gerrit.cloudera.org:8080/#/c/19794/3/src/kudu/common/partition_pruner.h
File src/kudu/common/partition_pruner.h:

http://gerrit.cloudera.org:8080/#/c/19794/3/src/kudu/common/partition_pruner.h@94
PS3, Line 94:   static void ComputeHashBuckets(const Schema& schema,
nit: Please add some description about this newly added method.


http://gerrit.cloudera.org:8080/#/c/19794/3/src/kudu/common/partition_pruner.cc
File src/kudu/common/partition_pruner.cc:

http://gerrit.cloudera.org:8080/#/c/19794/3/src/kudu/common/partition_pruner.cc@182
PS3, Line 182: // newer
nit: Remove this line.


http://gerrit.cloudera.org:8080/#/c/19794/3/src/kudu/common/partition_pruner.cc@216
PS3, Line 216: indexes_picked
nit: What about renaming it to 'predicate_values_picked'?


http://gerrit.cloudera.org:8080/#/c/19794/3/src/kudu/common/partition_pruner.cc@242
PS3, Line 242: push_back
nit: What about using 'emplace_back'?



-- 
To view, visit http://gerrit.cloudera.org:8080/19794
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie4bea5c10b4ac2c62b85625fe9d2a33ceb4fb2e9
Gerrit-Change-Number: 19794
Gerrit-PatchSet: 3
Gerrit-Owner: Yuqi Du <sh...@gmail.com>
Gerrit-Reviewer: Abhishek Chennaka <ac...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Ashwani Raina <ar...@cloudera.com>
Gerrit-Reviewer: KeDeng <kd...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Wang Xixu <14...@qq.com>
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Yuqi Du <sh...@gmail.com>
Gerrit-Comment-Date: Fri, 05 May 2023 01:29:16 +0000
Gerrit-HasComments: Yes

[kudu-CR] [cpp-client] KUDU-3455 Reduce space complexity and speed up hash partition pruning for in-list predicate

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Yingchun Lai has posted comments on this change. ( http://gerrit.cloudera.org:8080/19794 )

Change subject: [cpp-client] KUDU-3455 Reduce space complexity and speed up hash partition pruning for in-list predicate
......................................................................


Patch Set 13:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/19794/12/src/kudu/common/partition_pruner-test.cc
File src/kudu/common/partition_pruner-test.cc:

http://gerrit.cloudera.org:8080/#/c/19794/12/src/kudu/common/partition_pruner-test.cc@984
PS12, Line 984: generate some in-list predicates for these key columns. The old algorith
> Sorry for the late reply. I will take a further more review of this patch a
The exists tests and the new added test can cover these situations.



-- 
To view, visit http://gerrit.cloudera.org:8080/19794
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie4bea5c10b4ac2c62b85625fe9d2a33ceb4fb2e9
Gerrit-Change-Number: 19794
Gerrit-PatchSet: 13
Gerrit-Owner: Yuqi Du <sh...@gmail.com>
Gerrit-Reviewer: Abhishek Chennaka <ac...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Ashwani Raina <ar...@cloudera.com>
Gerrit-Reviewer: KeDeng <kd...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Wang Xixu <14...@qq.com>
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Yuqi Du <sh...@gmail.com>
Gerrit-Comment-Date: Sun, 25 Jun 2023 16:49:08 +0000
Gerrit-HasComments: Yes

[kudu-CR] [cpp-client] KUDU-3455 Reduce space complexity and speed up hash partition pruning for in-list predicate

Posted by "Yuqi Du (Code Review)" <ge...@cloudera.org>.
Yuqi Du has posted comments on this change. ( http://gerrit.cloudera.org:8080/19794 )

Change subject: [cpp-client] KUDU-3455 Reduce space complexity and speed up hash partition pruning for in-list predicate
......................................................................


Patch Set 13:

(6 comments)

Thanks for your advices. You can review it again.

http://gerrit.cloudera.org:8080/#/c/19794/12/src/kudu/common/partition_pruner-test.cc
File src/kudu/common/partition_pruner-test.cc:

http://gerrit.cloudera.org:8080/#/c/19794/12/src/kudu/common/partition_pruner-test.cc@984
PS12, Line 984: generate some in-list predicates for these key columns. The old algorith
> Sorry for the late reply. I will take a further more review of this patch a
The unit tests  which tables with hash bucket partitions has tested the old algorithm. At this, they are also tested the new algorithms, such as PartitionPrunerTest::TestHashPruning, PartitionPrunerTest::TestPruning, PartitionPrunerTest::TestHashSchemasPerRangeWithPartialPrimaryKeyRangePruning.  They can make sure  the new algorithm's results are the same as the old one, in other words, they has make sure the correctness of new algorithm.

So based on the old one algorithm is correct, I randomly construct a lots of in-list cases to cover more combinations to which is impossible to check all the combinations, make sure  the general sensors and at the same time compare with the performance.


http://gerrit.cloudera.org:8080/#/c/19794/13/src/kudu/common/partition_pruner-test.cc
File src/kudu/common/partition_pruner-test.cc:

http://gerrit.cloudera.org:8080/#/c/19794/13/src/kudu/common/partition_pruner-test.cc@1026
PS13, Line 1026: vector<KuduPartialRow>()
> nit: can be replaced by '{}'.
Done


http://gerrit.cloudera.org:8080/#/c/19794/13/src/kudu/common/partition_pruner-test.cc@1047
PS13, Line 1047: CHECK_EQ(v1.size(), v2.size());
               :       for (int i = 0; i < v1.size(); i++) {
               :         ASSERT_EQ(v1[i], v2[i]);
               :       }
> nit: they can be replaced by ASSERT_EQ(v1, v2);
Done


http://gerrit.cloudera.org:8080/#/c/19794/13/src/kudu/common/partition_pruner-test.cc@1069
PS13, Line 1069: kMaxSafeLength
> kMaxInListLength?
Done


http://gerrit.cloudera.org:8080/#/c/19794/13/src/kudu/common/partition_pruner-test.cc@1084
PS13, Line 1084: temp_values[temp_values.size() - 1]
> nit: temp_values.back() ?
Done


http://gerrit.cloudera.org:8080/#/c/19794/13/src/kudu/common/partition_pruner-test.cc@1085
PS13, Line 1085: j++;
> nit: Move it to line 1081 which is more conventional.
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/19794
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie4bea5c10b4ac2c62b85625fe9d2a33ceb4fb2e9
Gerrit-Change-Number: 19794
Gerrit-PatchSet: 13
Gerrit-Owner: Yuqi Du <sh...@gmail.com>
Gerrit-Reviewer: Abhishek Chennaka <ac...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Ashwani Raina <ar...@cloudera.com>
Gerrit-Reviewer: KeDeng <kd...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Wang Xixu <14...@qq.com>
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Yuqi Du <sh...@gmail.com>
Gerrit-Comment-Date: Mon, 26 Jun 2023 07:12:29 +0000
Gerrit-HasComments: Yes

[kudu-CR] [cpp-client] KUDU-3455 Reduce space complexity and speed up hash partition pruning for in-list predicate

Posted by "Yingchun Lai (Code Review)" <ge...@cloudera.org>.
Yingchun Lai has posted comments on this change. ( http://gerrit.cloudera.org:8080/19794 )

Change subject: [cpp-client] KUDU-3455 Reduce space complexity and speed up hash partition pruning for in-list predicate
......................................................................


Patch Set 13:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/19794/12/src/kudu/common/partition_pruner-test.cc
File src/kudu/common/partition_pruner-test.cc:

http://gerrit.cloudera.org:8080/#/c/19794/12/src/kudu/common/partition_pruner-test.cc@984
PS12, Line 984: generate some in-list predicates for these key columns. The old algorith
> It seems more confuse. 
Sorry for the late reply. I will take a further more review of this patch after the vacation.
To prove the correctness, it would be great to add more functionality tests, vs compare the result of old algorithm, how to ensure the old one's correctness? 
To prove the performance improvement, how about add some simple benchmarks? 
Tell me if I miss something :)



-- 
To view, visit http://gerrit.cloudera.org:8080/19794
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie4bea5c10b4ac2c62b85625fe9d2a33ceb4fb2e9
Gerrit-Change-Number: 19794
Gerrit-PatchSet: 13
Gerrit-Owner: Yuqi Du <sh...@gmail.com>
Gerrit-Reviewer: Abhishek Chennaka <ac...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Ashwani Raina <ar...@cloudera.com>
Gerrit-Reviewer: KeDeng <kd...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Wang Xixu <14...@qq.com>
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Yuqi Du <sh...@gmail.com>
Gerrit-Comment-Date: Thu, 22 Jun 2023 15:35:21 +0000
Gerrit-HasComments: Yes

[kudu-CR] [cpp-client] KUDU-3455 Reduce space complexity and speed up hash partition pruning for in-list predicate

Posted by "Yuqi Du (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Alexey Serbin, Ashwani Raina, Yingchun Lai, Yifan Zhang, Kudu Jenkins, Abhishek Chennaka, KeDeng, Wang Xixu, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/19794

to look at the new patch set (#7).

Change subject: [cpp-client] KUDU-3455 Reduce space complexity and speed up hash partition pruning for in-list predicate
......................................................................

[cpp-client] KUDU-3455 Reduce space complexity and speed up hash partition pruning for in-list predicate

This patch comes from https://gerrit.cloudera.org/c/19568/. As that patch
said, logic of pruning hash partitions for in-list predicate in Kudu cpp
client has also a high space complexity and slow. Old algorithm must keep
all intermedium objects because they are incomplete until they are
completed and can be computed hash.

This patch fixes the problems and provides a recursive algorithm the
same as java client in patch: https://gerrit.cloudera.org/c/19568/.

Compared with java client, the cpp client is less likely to cause the
OOM condition because it does not keep too many intermediate results.
This optimization has good benefits too. The benefits are related to
the in-list length and the number of primary columns, The performance
would be better if in-list length is bigger. For example,
PartitionPrunerTest::TestMultiColumnInListHashPruningManyValues,
Using 10 key columns and kMaxInListLength=50, old algorithm memory cost
may reach 600MB, while new algorithm's memory cost can be ignored
(it only need one objects and a few stacks for contexts). At the same
time, new algorithm has a good speedup, some effect as below:

combination_count: 5554006920000, old cost: 428238us, new cost: 713us, speedup: 600.6x
combination_count: 89083783664568, old cost: 2764924us, new cost: 1145us, speedup: 2414.7x
combination_count: 27194091724800, old cost: 1610475us, new cost: 1151us, speedup: 1399.2x
combination_count: 7116622216704, old cost: 34544289us, new cost: 375us, speedup: 92118.1x
combination_count: 37570734489600, old cost: 1733205us, new cost: 901us, speedup: 1923.6x

Change-Id: Ie4bea5c10b4ac2c62b85625fe9d2a33ceb4fb2e9
---
M src/kudu/common/partition_pruner-test.cc
M src/kudu/common/partition_pruner.cc
M src/kudu/common/partition_pruner.h
3 files changed, 235 insertions(+), 18 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/94/19794/7
-- 
To view, visit http://gerrit.cloudera.org:8080/19794
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie4bea5c10b4ac2c62b85625fe9d2a33ceb4fb2e9
Gerrit-Change-Number: 19794
Gerrit-PatchSet: 7
Gerrit-Owner: Yuqi Du <sh...@gmail.com>
Gerrit-Reviewer: Abhishek Chennaka <ac...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Ashwani Raina <ar...@cloudera.com>
Gerrit-Reviewer: KeDeng <kd...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Wang Xixu <14...@qq.com>
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Yuqi Du <sh...@gmail.com>

[kudu-CR] [cpp-client] KUDU-3455 Reduce space complexity and speed up hash partition pruning for in-list predicate

Posted by "Yifan Zhang (Code Review)" <ge...@cloudera.org>.
Yifan Zhang has posted comments on this change. ( http://gerrit.cloudera.org:8080/19794 )

Change subject: [cpp-client] KUDU-3455 Reduce space complexity and speed up hash partition pruning for in-list predicate
......................................................................


Patch Set 7: Code-Review+1


-- 
To view, visit http://gerrit.cloudera.org:8080/19794
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie4bea5c10b4ac2c62b85625fe9d2a33ceb4fb2e9
Gerrit-Change-Number: 19794
Gerrit-PatchSet: 7
Gerrit-Owner: Yuqi Du <sh...@gmail.com>
Gerrit-Reviewer: Abhishek Chennaka <ac...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Ashwani Raina <ar...@cloudera.com>
Gerrit-Reviewer: KeDeng <kd...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Wang Xixu <14...@qq.com>
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Yuqi Du <sh...@gmail.com>
Gerrit-Comment-Date: Mon, 08 May 2023 12:29:16 +0000
Gerrit-HasComments: No

[kudu-CR] [cpp-client] KUDU-3455 Reduce space complexity and speed up hash partition pruning for in-list predicate

Posted by "Yuqi Du (Code Review)" <ge...@cloudera.org>.
Yuqi Du has posted comments on this change. ( http://gerrit.cloudera.org:8080/19794 )

Change subject: [cpp-client] KUDU-3455 Reduce space complexity and speed up hash partition pruning for in-list predicate
......................................................................


Patch Set 6:

(18 comments)

http://gerrit.cloudera.org:8080/#/c/19794/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19794/3//COMMIT_MSG@9
PS3, Line 9: This patch comes from https://gerrit.cloudera.org/c/19568/. As that patch
> This patch comes from https://gerrit.cloudera.org/c/19568/.
Done


http://gerrit.cloudera.org:8080/#/c/19794/3//COMMIT_MSG@18
PS3, Line 18: Compared with java client, the cpp client is less likely to cause the
            : OOM condition because it does not keep too many intermediate results.
            : This optimization has good be
> Compared with the java client, the cpp client is less likely to cause the O
Done


http://gerrit.cloudera.org:8080/#/c/19794/3//COMMIT_MSG@21
PS3, Line 21: olumns,
> the number of
Done


http://gerrit.cloudera.org:8080/#/c/19794/3//COMMIT_MSG@21
PS3, Line 21: ength and
> are related to
Done


http://gerrit.cloudera.org:8080/#/c/19794/3//COMMIT_MSG@22
PS3, Line 22: would be better if
> The performance are better
Done


http://gerrit.cloudera.org:8080/#/c/19794/3/src/kudu/common/partition_pruner-test.cc
File src/kudu/common/partition_pruner-test.cc:

http://gerrit.cloudera.org:8080/#/c/19794/3/src/kudu/common/partition_pruner-test.cc@974
PS3, Line 974: // For test cases that will run with in-list predicates using variant length
> I don't quite understand this sentence, please reorganize it.
Done


http://gerrit.cloudera.org:8080/#/c/19794/3/src/kudu/common/partition_pruner-test.cc@982
PS3, Line 982: with 200 columns including 
> nit: with 10 columns.
changed to "with 200 columns including 10 key columns"


http://gerrit.cloudera.org:8080/#/c/19794/3/src/kudu/common/partition_pruner-test.cc@983
PS3, Line 983: // generate some in-list predicates for these key columns.
> Check the correctness of the new algorithm by comparing it with the old one
Done


http://gerrit.cloudera.org:8080/#/c/19794/3/src/kudu/common/partition_pruner-test.cc@984
PS3, Line 984: // Check the correctness of new algorithm by comparing it with the old one.
> Compare the efficiency of the two algorithms.
Done


http://gerrit.cloudera.org:8080/#/c/19794/3/src/kudu/common/partition_pruner-test.cc@985
PS3, Line 985: // Compare the efficiency of the two algorithms.
> nit: Add 'SKIP_IF_SLOW_NOT_ALLOWED()' for the test.
Done


http://gerrit.cloudera.org:8080/#/c/19794/3/src/kudu/common/partition_pruner-test.cc@1023
PS3, Line 1023: on> partitions;
> comparing it with the old one.
Done


http://gerrit.cloudera.org:8080/#/c/19794/3/src/kudu/common/partition_pruner-test.cc@1046
PS3, Line 1046: or (int i = 0; i < v1.size(); i++) {
> The following logs are used to compare the efficiency of the two algorithms
Done


http://gerrit.cloudera.org:8080/#/c/19794/3/src/kudu/common/partition_pruner-test.cc@1063
PS3, Line 1063: 
              :   constexpr const int kTotalCount = 10;
> Increase the vector's capacity to kMaxSafeLength * kKeyColumnSize to avoid 
Done


http://gerrit.cloudera.org:8080/#/c/19794/3/src/kudu/common/partition_pruner.h
File src/kudu/common/partition_pruner.h:

http://gerrit.cloudera.org:8080/#/c/19794/3/src/kudu/common/partition_pruner.h@94
PS3, Line 94:   // Pick all combinations in in-list values and compute their hash buckets,
> nit: Please add some description about this newly added method.
Done


http://gerrit.cloudera.org:8080/#/c/19794/3/src/kudu/common/partition_pruner.cc
File src/kudu/common/partition_pruner.cc:

http://gerrit.cloudera.org:8080/#/c/19794/3/src/kudu/common/partition_pruner.cc@182
PS3, Line 182: 
> nit: Remove this line.
Done


http://gerrit.cloudera.org:8080/#/c/19794/3/src/kudu/common/partition_pruner.cc@216
PS3, Line 216: predicate_valu
> nit: What about renaming it to 'predicate_values_picked'?
Done


http://gerrit.cloudera.org:8080/#/c/19794/3/src/kudu/common/partition_pruner.cc@242
PS3, Line 242: edicate_v
> nit: What about using 'emplace_back'?
Done


http://gerrit.cloudera.org:8080/#/c/19794/4/src/kudu/common/partition_pruner.cc
File src/kudu/common/partition_pruner.cc:

http://gerrit.cloudera.org:8080/#/c/19794/4/src/kudu/common/partition_pruner.cc@213
PS4, Line 213: void PartitionPruner::ComputeHashBuckets(const Schema& schema, // NOLINT(misc-no-recursion)
> warning: function 'ComputeHashBuckets' is within a recursive call chain [mi
Ignore it simply.  Maybe solve it by convert this function to a non-recursive function



-- 
To view, visit http://gerrit.cloudera.org:8080/19794
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie4bea5c10b4ac2c62b85625fe9d2a33ceb4fb2e9
Gerrit-Change-Number: 19794
Gerrit-PatchSet: 6
Gerrit-Owner: Yuqi Du <sh...@gmail.com>
Gerrit-Reviewer: Abhishek Chennaka <ac...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <al...@apache.org>
Gerrit-Reviewer: Ashwani Raina <ar...@cloudera.com>
Gerrit-Reviewer: KeDeng <kd...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Wang Xixu <14...@qq.com>
Gerrit-Reviewer: Yifan Zhang <ch...@163.com>
Gerrit-Reviewer: Yingchun Lai <la...@apache.org>
Gerrit-Reviewer: Yuqi Du <sh...@gmail.com>
Gerrit-Comment-Date: Sat, 06 May 2023 11:28:44 +0000
Gerrit-HasComments: Yes