You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "wangning (Code Review)" <ge...@cloudera.org> on 2019/11/14 09:11:02 UTC

[kudu-CR] [java] KUDU-1644 Simplify IN-list predicate values during token building

wangning has uploaded this change for review. ( http://gerrit.cloudera.org:8080/14706


Change subject: [java] KUDU-1644 Simplify IN-list predicate values during token building
......................................................................

[java] KUDU-1644 Simplify IN-list predicate values during token building

This commit intends to improve the performance of token-based scan in
case of only one hash parititon which contains only one key.
By reducing the number of values pushed to server, it can reduce
the complexity of token-based in-list logarithmically.
The commit implemented the idea by filtering the unnesscessary values
when building the scanner.

Change-Id: I05affbd10804b4d07478f7abdde5380dd70ec116
---
M java/kudu-client/src/main/java/org/apache/kudu/client/KuduPredicate.java
M java/kudu-client/src/main/java/org/apache/kudu/client/KuduScanToken.java
M java/kudu-client/src/main/java/org/apache/kudu/client/PartitionPruner.java
3 files changed, 89 insertions(+), 16 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/06/14706/1
-- 
To view, visit http://gerrit.cloudera.org:8080/14706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I05affbd10804b4d07478f7abdde5380dd70ec116
Gerrit-Change-Number: 14706
Gerrit-PatchSet: 1
Gerrit-Owner: wangning <19...@gmail.com>

[kudu-CR] [java] KUDU-1644 Simplify IN-list predicate values during token building

Posted by "wangning (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, Andrew Wong, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14706

to look at the new patch set (#5).

Change subject: [java] KUDU-1644 Simplify IN-list predicate values during token building
......................................................................

[java] KUDU-1644 Simplify IN-list predicate values during token building

This commit intends to improve the performance of token-based scan in
case of only one hash parititon which contains only one key.
By reducing the number of values pushed to server, it can reduce
the complexity of token-based in-list logarithmically.
The commit implemented the idea by filtering the unnesscessary values
when building the scanner.

Change-Id: I05affbd10804b4d07478f7abdde5380dd70ec116
---
M java/kudu-client/src/main/java/org/apache/kudu/client/KuduPredicate.java
M java/kudu-client/src/main/java/org/apache/kudu/client/KuduScanToken.java
M java/kudu-client/src/main/java/org/apache/kudu/client/PartitionPruner.java
3 files changed, 90 insertions(+), 16 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/06/14706/5
-- 
To view, visit http://gerrit.cloudera.org:8080/14706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I05affbd10804b4d07478f7abdde5380dd70ec116
Gerrit-Change-Number: 14706
Gerrit-PatchSet: 5
Gerrit-Owner: wangning <19...@gmail.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR] [java] KUDU-1644 Simplify IN-list predicate values during token building

Posted by "wangning (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, Andrew Wong, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14706

to look at the new patch set (#10).

Change subject: [java] KUDU-1644 Simplify IN-list predicate values during token building
......................................................................

[java] KUDU-1644 Simplify IN-list predicate values during token building

This commit intends to improve the performance of token-based in-list
scan in case of only one hash partition.
By reducing the number of values pushed to server, it can reduce
the complexity of token-based in-list logarithmically.
The commit implemented the idea by filtering the unnesscessary values
when building the scanner.

Here I attach an simple benchmark which implemented by java client and
calculate qps by a 10s-scheduled timer.
Some settings,
client 4core, 8g memory
server 4core, 8g memory, HDD
in-list size, 100000
table settings, 10M rows, 24 partitions

Before tuning:
Mon Nov 11 19:09:53 CST 2019, qps :2528
Mon Nov 11 19:10:03 CST 2019, qps :3715
Mon Nov 11 19:10:13 CST 2019, qps :2509
Mon Nov 11 19:10:23 CST 2019, qps :2900
Mon Nov 11 19:10:33 CST 2019, qps :2081
Mon Nov 11 19:10:43 CST 2019, qps :3635

After tuning
Mon Nov 11 19:15:01 CST 2019, qps :13735
Mon Nov 11 19:15:11 CST 2019, qps :16264
Mon Nov 11 19:15:21 CST 2019, qps :13702
Mon Nov 11 19:15:31 CST 2019, qps :15469
Mon Nov 11 19:15:41 CST 2019, qps :16269

Change-Id: I05affbd10804b4d07478f7abdde5380dd70ec116
---
M java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java
M java/kudu-client/src/main/java/org/apache/kudu/client/KuduPredicate.java
M java/kudu-client/src/main/java/org/apache/kudu/client/KuduScanToken.java
M java/kudu-client/src/main/java/org/apache/kudu/client/KuduScanner.java
M java/kudu-client/src/main/java/org/apache/kudu/client/PartitionPruner.java
M java/kudu-client/src/test/java/org/apache/kudu/client/TestPartitionPruner.java
6 files changed, 189 insertions(+), 14 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/06/14706/10
-- 
To view, visit http://gerrit.cloudera.org:8080/14706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I05affbd10804b4d07478f7abdde5380dd70ec116
Gerrit-Change-Number: 14706
Gerrit-PatchSet: 10
Gerrit-Owner: wangning <19...@gmail.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: wangning <19...@gmail.com>

[kudu-CR] [java] KUDU-1644 Simplify IN-list predicate values during token building

Posted by "wangning (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14706

to look at the new patch set (#2).

Change subject: [java] KUDU-1644 Simplify IN-list predicate values during token building
......................................................................

[java] KUDU-1644 Simplify IN-list predicate values during token building

This commit intends to improve the performance of token-based scan in
case of only one hash parititon which contains only one key.
By reducing the number of values pushed to server, it can reduce
the complexity of token-based in-list logarithmically.
The commit implemented the idea by filtering the unnesscessary values
when building the scanner.

Change-Id: I05affbd10804b4d07478f7abdde5380dd70ec116
---
M java/kudu-client/src/main/java/org/apache/kudu/client/KuduPredicate.java
M java/kudu-client/src/main/java/org/apache/kudu/client/KuduScanToken.java
M java/kudu-client/src/main/java/org/apache/kudu/client/PartitionPruner.java
3 files changed, 89 insertions(+), 16 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/06/14706/2
-- 
To view, visit http://gerrit.cloudera.org:8080/14706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I05affbd10804b4d07478f7abdde5380dd70ec116
Gerrit-Change-Number: 14706
Gerrit-PatchSet: 2
Gerrit-Owner: wangning <19...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR] [java] KUDU-1644 Simplify IN-list predicate values during token building

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/14706 )

Change subject: [java] KUDU-1644 Simplify IN-list predicate values during token building
......................................................................


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14706/5//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/14706/5//COMMIT_MSG@14
PS5, Line 14: when building the scanner.
why not do this on the server side? then you get it for free in both C++ and Java clients. The server side of the tablet should already know its partitioning information.



-- 
To view, visit http://gerrit.cloudera.org:8080/14706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I05affbd10804b4d07478f7abdde5380dd70ec116
Gerrit-Change-Number: 14706
Gerrit-PatchSet: 5
Gerrit-Owner: wangning <19...@gmail.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Wed, 04 Dec 2019 00:46:44 +0000
Gerrit-HasComments: Yes

[kudu-CR] [java] KUDU-1644 Simplify IN-list predicate values during token building

Posted by "wangning (Code Review)" <ge...@cloudera.org>.
wangning has posted comments on this change. ( http://gerrit.cloudera.org:8080/14706 )

Change subject: [java] KUDU-1644 Simplify IN-list predicate values during token building
......................................................................


Patch Set 8:

Patched with multi-partitions & multi-key support.


-- 
To view, visit http://gerrit.cloudera.org:8080/14706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I05affbd10804b4d07478f7abdde5380dd70ec116
Gerrit-Change-Number: 14706
Gerrit-PatchSet: 8
Gerrit-Owner: wangning <19...@gmail.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: wangning <19...@gmail.com>
Gerrit-Comment-Date: Mon, 09 Dec 2019 06:55:20 +0000
Gerrit-HasComments: No

[kudu-CR] [java] KUDU-1644 Simplify IN-list predicate values during token building

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/14706 )

Change subject: [java] KUDU-1644 Simplify IN-list predicate values during token building
......................................................................


Patch Set 5:

(5 comments)

Thanks for the contribution! I'm not very familiar with this pruning code yet, but I've left some initial style comments.

http://gerrit.cloudera.org:8080/#/c/14706/5//COMMIT_MSG
Commit Message:

PS5: 
Can you add the details you listed in Jira about your reasoning for this approach and your benchmarking numbers to the commit message?


http://gerrit.cloudera.org:8080/#/c/14706/5/java/kudu-client/src/main/java/org/apache/kudu/client/KuduScanToken.java
File java/kudu-client/src/main/java/org/apache/kudu/client/KuduScanToken.java:

http://gerrit.cloudera.org:8080/#/c/14706/5/java/kudu-client/src/main/java/org/apache/kudu/client/KuduScanToken.java@436
PS5, Line 436: Map<Integer, List<byte[]>> predicateValueMap = new HashMap<>();
             :         PartitionPruner pruner = PartitionPruner.create(this, predicateValueMap);
nit: having PartitionPruner modify a non-member map is a little awkward. What do you think about instead doing something like:

 PartitionPruner pruner = PartitionPruner.create(this);
 ...
 // And then below, where you actually use the map...
 Map<Integer, List<byte[]>> predicateValueMap = pruner.getPredicateValuePerHashBucket();


http://gerrit.cloudera.org:8080/#/c/14706/5/java/kudu-client/src/main/java/org/apache/kudu/client/KuduScanToken.java@462
PS5, Line 462: hashSchema.get(0).getColumnIds().size() == 1;
Why is it important that the list of columns is of size 1 too?


http://gerrit.cloudera.org:8080/#/c/14706/5/java/kudu-client/src/main/java/org/apache/kudu/client/PartitionPruner.java
File java/kudu-client/src/main/java/org/apache/kudu/client/PartitionPruner.java:

http://gerrit.cloudera.org:8080/#/c/14706/5/java/kudu-client/src/main/java/org/apache/kudu/client/PartitionPruner.java@76
PS5, Line 76: Map<Integer, List<byte[]>> inListValuesMap
Please document what this does.


http://gerrit.cloudera.org:8080/#/c/14706/5/java/kudu-client/src/main/java/org/apache/kudu/client/PartitionPruner.java@184
PS5, Line 184: // For minimize the modification the content of current system, current implementation works
             :       // only in case of exactly one hash partition and one partition key.
This is useful as a reviewer of this patch to know why this change looks the way it is and this should be part of the commit message, but it's not helpful reader of the codebase. Code comments should describe what exists now, and usually shouldn't care about what previously existed.



-- 
To view, visit http://gerrit.cloudera.org:8080/14706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I05affbd10804b4d07478f7abdde5380dd70ec116
Gerrit-Change-Number: 14706
Gerrit-PatchSet: 5
Gerrit-Owner: wangning <19...@gmail.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Sun, 24 Nov 2019 22:16:07 +0000
Gerrit-HasComments: Yes

[kudu-CR] [java] KUDU-1644 Simplify IN-list predicate values during token building

Posted by "wangning (Code Review)" <ge...@cloudera.org>.
wangning has posted comments on this change. ( http://gerrit.cloudera.org:8080/14706 )

Change subject: [java] KUDU-1644 Simplify IN-list predicate values during token building
......................................................................


Patch Set 5:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/14706/5//COMMIT_MSG
Commit Message:

PS5: 
> Also it would be great if you could add unit tests to cover this new behavi
Thx for suggestion, will well organize and patch.


http://gerrit.cloudera.org:8080/#/c/14706/5//COMMIT_MSG@14
PS5, Line 14: when building the scanner.
> why not do this on the server side? then you get it for free in both C++ an
Pruning on client means less network overhead  and less hash-bucket computation times. And less pain for current code. 

As I understand for a non token-based scan, each nextRows call actually send one rpc request to one tablet. Is that means every time the server should calculate all the hash bucket of all query rows?  e.g. 10000 rows in-list query over 60 partitions table would result in 60 x 10000 times calculation of each row, and 60 times 10000 partial rows transfer, is it necessary? 

If the calculation is well implemented in server with all hash bucket calculation cached, the calculation times can reduce to 3 x 10000 (for bootstrap it cannot be avoid) in a 3 tservers cluster. But are network transfer efficient? I don't think network transfer burden could be reduced if I do this in server-side.



-- 
To view, visit http://gerrit.cloudera.org:8080/14706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I05affbd10804b4d07478f7abdde5380dd70ec116
Gerrit-Change-Number: 14706
Gerrit-PatchSet: 5
Gerrit-Owner: wangning <19...@gmail.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: wangning <19...@gmail.com>
Gerrit-Comment-Date: Wed, 04 Dec 2019 06:25:06 +0000
Gerrit-HasComments: Yes

[kudu-CR] [java] KUDU-1644 Simplify IN-list predicate values during token building

Posted by "wangning (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, Andrew Wong, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14706

to look at the new patch set (#9).

Change subject: [java] KUDU-1644 Simplify IN-list predicate values during token building
......................................................................

[java] KUDU-1644 Simplify IN-list predicate values during token building

This commit intends to improve the performance of token-based in-list
scan in case of only one hash partition.
By reducing the number of values pushed to server, it can reduce
the complexity of token-based in-list logarithmically.
The commit implemented the idea by filtering the unnesscessary values
when building the scanner.

Here I attach an simple benchmark which implemented by java client and
calculate qps by a 10s-scheduled timer.
Some settings,
client 4core, 8g memory
server 4core, 8g memory, HDD
in-list size, 100000
table settings, 10M rows, 24 partitions

Before tuning:
Mon Nov 11 19:09:53 CST 2019, qps :2528
Mon Nov 11 19:10:03 CST 2019, qps :3715
Mon Nov 11 19:10:13 CST 2019, qps :2509
Mon Nov 11 19:10:23 CST 2019, qps :2900
Mon Nov 11 19:10:33 CST 2019, qps :2081
Mon Nov 11 19:10:43 CST 2019, qps :3635

After tuning
Mon Nov 11 19:15:01 CST 2019, qps :13735
Mon Nov 11 19:15:11 CST 2019, qps :16264
Mon Nov 11 19:15:21 CST 2019, qps :13702
Mon Nov 11 19:15:31 CST 2019, qps :15469
Mon Nov 11 19:15:41 CST 2019, qps :16269

Change-Id: I05affbd10804b4d07478f7abdde5380dd70ec116
---
M java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java
M java/kudu-client/src/main/java/org/apache/kudu/client/KuduPredicate.java
M java/kudu-client/src/main/java/org/apache/kudu/client/KuduScanToken.java
M java/kudu-client/src/main/java/org/apache/kudu/client/KuduScanner.java
M java/kudu-client/src/main/java/org/apache/kudu/client/PartitionPruner.java
M java/kudu-client/src/test/java/org/apache/kudu/client/TestPartitionPruner.java
6 files changed, 190 insertions(+), 16 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/06/14706/9
-- 
To view, visit http://gerrit.cloudera.org:8080/14706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I05affbd10804b4d07478f7abdde5380dd70ec116
Gerrit-Change-Number: 14706
Gerrit-PatchSet: 9
Gerrit-Owner: wangning <19...@gmail.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: wangning <19...@gmail.com>

[kudu-CR] [java] KUDU-1644 Simplify IN-list predicate values during token building

Posted by "wangning (Code Review)" <ge...@cloudera.org>.
wangning has abandoned this change. ( http://gerrit.cloudera.org:8080/14706 )

Change subject: [java] KUDU-1644 Simplify IN-list predicate values during token building
......................................................................


Abandoned
-- 
To view, visit http://gerrit.cloudera.org:8080/14706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: abandon
Gerrit-Change-Id: I05affbd10804b4d07478f7abdde5380dd70ec116
Gerrit-Change-Number: 14706
Gerrit-PatchSet: 10
Gerrit-Owner: wangning <19...@gmail.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: wangning <19...@gmail.com>

[kudu-CR] [java] KUDU-1644 Simplify IN-list predicate values during token building

Posted by "wangning (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, Andrew Wong, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14706

to look at the new patch set (#6).

Change subject: [java] KUDU-1644 Simplify IN-list predicate values during token building
......................................................................

[java] KUDU-1644 Simplify IN-list predicate values during token building

This commit intends to improve the performance of token-based scan in
case of only one hash parititon which contains only one key.
By reducing the number of values pushed to server, it can reduce
the complexity of token-based in-list logarithmically.
The commit implemented the idea by filtering the unnesscessary values
when building the scanner.

Here I attach an simple benchmark which implemented by java client and
calculate qps by a 10s-scheduled timer.

Before tuning:
Mon Nov 11 19:09:53 CST 2019, qps :2528
Mon Nov 11 19:10:03 CST 2019, qps :3715
Mon Nov 11 19:10:13 CST 2019, qps :2509
Mon Nov 11 19:10:23 CST 2019, qps :2900
Mon Nov 11 19:10:33 CST 2019, qps :2081
Mon Nov 11 19:10:43 CST 2019, qps :3635

After tuning
Mon Nov 11 19:15:01 CST 2019, qps :13735
Mon Nov 11 19:15:11 CST 2019, qps :16264
Mon Nov 11 19:15:21 CST 2019, qps :13702
Mon Nov 11 19:15:31 CST 2019, qps :15469
Mon Nov 11 19:15:41 CST 2019, qps :16269

Change-Id: I05affbd10804b4d07478f7abdde5380dd70ec116
---
M java/kudu-client/src/main/java/org/apache/kudu/client/KuduPredicate.java
M java/kudu-client/src/main/java/org/apache/kudu/client/KuduScanToken.java
M java/kudu-client/src/main/java/org/apache/kudu/client/PartitionPruner.java
3 files changed, 100 insertions(+), 13 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/06/14706/6
-- 
To view, visit http://gerrit.cloudera.org:8080/14706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I05affbd10804b4d07478f7abdde5380dd70ec116
Gerrit-Change-Number: 14706
Gerrit-PatchSet: 6
Gerrit-Owner: wangning <19...@gmail.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: wangning <19...@gmail.com>

[kudu-CR] [java] KUDU-1644 Simplify IN-list predicate values during token building

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/14706 )

Change subject: [java] KUDU-1644 Simplify IN-list predicate values during token building
......................................................................


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14706/5//COMMIT_MSG
Commit Message:

PS5: 
> Can you add the details you listed in Jira about your reasoning for this ap
Also it would be great if you could add unit tests to cover this new behavior.



-- 
To view, visit http://gerrit.cloudera.org:8080/14706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I05affbd10804b4d07478f7abdde5380dd70ec116
Gerrit-Change-Number: 14706
Gerrit-PatchSet: 5
Gerrit-Owner: wangning <19...@gmail.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Sun, 24 Nov 2019 22:17:16 +0000
Gerrit-HasComments: Yes

[kudu-CR] [java] KUDU-1644 Simplify IN-list predicate values during token building

Posted by "wangning (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, Andrew Wong, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14706

to look at the new patch set (#8).

Change subject: [java] KUDU-1644 Simplify IN-list predicate values during token building
......................................................................

[java] KUDU-1644 Simplify IN-list predicate values during token building

This commit intends to improve the performance of token-based in-list
scan.
By reducing the number of values pushed to server, it can reduce
the complexity of token-based in-list logarithmically.
The commit implemented the idea by filtering the unnesscessary values
when building the scanner.

Here I attach an simple benchmark which implemented by java client and
calculate qps by a 10s-scheduled timer.
Some settings,
client 4core, 8g memory
server 4core, 8g memory
in-list size, 100000
table settings, 10M rows, 24 partitions

Before tuning:
Mon Nov 11 19:09:53 CST 2019, qps :2528
Mon Nov 11 19:10:03 CST 2019, qps :3715
Mon Nov 11 19:10:13 CST 2019, qps :2509
Mon Nov 11 19:10:23 CST 2019, qps :2900
Mon Nov 11 19:10:33 CST 2019, qps :2081
Mon Nov 11 19:10:43 CST 2019, qps :3635

After tuning
Mon Nov 11 19:15:01 CST 2019, qps :13735
Mon Nov 11 19:15:11 CST 2019, qps :16264
Mon Nov 11 19:15:21 CST 2019, qps :13702
Mon Nov 11 19:15:31 CST 2019, qps :15469
Mon Nov 11 19:15:41 CST 2019, qps :16269

Change-Id: I05affbd10804b4d07478f7abdde5380dd70ec116
---
M java/kudu-client/src/main/java/org/apache/kudu/client/KuduPredicate.java
M java/kudu-client/src/main/java/org/apache/kudu/client/KuduScanToken.java
M java/kudu-client/src/main/java/org/apache/kudu/client/PartitionPruner.java
3 files changed, 96 insertions(+), 13 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/06/14706/8
-- 
To view, visit http://gerrit.cloudera.org:8080/14706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I05affbd10804b4d07478f7abdde5380dd70ec116
Gerrit-Change-Number: 14706
Gerrit-PatchSet: 8
Gerrit-Owner: wangning <19...@gmail.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: wangning <19...@gmail.com>

[kudu-CR] [java] KUDU-1644 Simplify IN-list predicate values during token building

Posted by "wangning (Code Review)" <ge...@cloudera.org>.
wangning has posted comments on this change. ( http://gerrit.cloudera.org:8080/14706 )

Change subject: [java] KUDU-1644 Simplify IN-list predicate values during token building
......................................................................


Patch Set 9:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14706/9/java/kudu-client/src/test/java/org/apache/kudu/client/TestPartitionPruner.java
File java/kudu-client/src/test/java/org/apache/kudu/client/TestPartitionPruner.java:

http://gerrit.cloudera.org:8080/#/c/14706/9/java/kudu-client/src/test/java/org/apache/kudu/client/TestPartitionPruner.java@667
PS9, Line 667:                       .of((byte) 0, (byte) 1, (byte) 2, (byte) 3, (byte) 4,
             :                                            (byte) 5, (byte) 6)));
The hashPrune filtered all in-list values to each token, so all the count of in-list predicates after split should equal to count of predicates before split.
And the predicates on other key shouldn't affect in-lists' count.



-- 
To view, visit http://gerrit.cloudera.org:8080/14706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I05affbd10804b4d07478f7abdde5380dd70ec116
Gerrit-Change-Number: 14706
Gerrit-PatchSet: 9
Gerrit-Owner: wangning <19...@gmail.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: wangning <19...@gmail.com>
Gerrit-Comment-Date: Thu, 19 Dec 2019 04:46:28 +0000
Gerrit-HasComments: Yes

[kudu-CR] [java] KUDU-1644 Simplify IN-list predicate values during token building

Posted by "wangning (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, Andrew Wong, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14706

to look at the new patch set (#4).

Change subject: [java] KUDU-1644 Simplify IN-list predicate values during token building
......................................................................

[java] KUDU-1644 Simplify IN-list predicate values during token building

This commit intends to improve the performance of token-based scan in
case of only one hash parititon which contains only one key.
By reducing the number of values pushed to server, it can reduce
the complexity of token-based in-list logarithmically.
The commit implemented the idea by filtering the unnesscessary values
when building the scanner.

Change-Id: I05affbd10804b4d07478f7abdde5380dd70ec116
---
M java/kudu-client/src/main/java/org/apache/kudu/client/KuduPredicate.java
M java/kudu-client/src/main/java/org/apache/kudu/client/KuduScanToken.java
M java/kudu-client/src/main/java/org/apache/kudu/client/PartitionPruner.java
3 files changed, 90 insertions(+), 16 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/06/14706/4
-- 
To view, visit http://gerrit.cloudera.org:8080/14706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I05affbd10804b4d07478f7abdde5380dd70ec116
Gerrit-Change-Number: 14706
Gerrit-PatchSet: 4
Gerrit-Owner: wangning <19...@gmail.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR] [java] KUDU-1644 Simplify IN-list predicate values during token building

Posted by "wangning (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, Andrew Wong, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14706

to look at the new patch set (#7).

Change subject: [java] KUDU-1644 Simplify IN-list predicate values during token building
......................................................................

[java] KUDU-1644 Simplify IN-list predicate values during token building

This commit intends to improve the performance of token-based scan in
case of only one hash parititon which contains only one key.
By reducing the number of values pushed to server, it can reduce
the complexity of token-based in-list logarithmically.
The commit implemented the idea by filtering the unnesscessary values
when building the scanner.

Here I attach an simple benchmark which implemented by java client and
calculate qps by a 10s-scheduled timer.

Before tuning:
Mon Nov 11 19:09:53 CST 2019, qps :2528
Mon Nov 11 19:10:03 CST 2019, qps :3715
Mon Nov 11 19:10:13 CST 2019, qps :2509
Mon Nov 11 19:10:23 CST 2019, qps :2900
Mon Nov 11 19:10:33 CST 2019, qps :2081
Mon Nov 11 19:10:43 CST 2019, qps :3635

After tuning
Mon Nov 11 19:15:01 CST 2019, qps :13735
Mon Nov 11 19:15:11 CST 2019, qps :16264
Mon Nov 11 19:15:21 CST 2019, qps :13702
Mon Nov 11 19:15:31 CST 2019, qps :15469
Mon Nov 11 19:15:41 CST 2019, qps :16269

Change-Id: I05affbd10804b4d07478f7abdde5380dd70ec116
---
M java/kudu-client/src/main/java/org/apache/kudu/client/KuduPredicate.java
M java/kudu-client/src/main/java/org/apache/kudu/client/KuduScanToken.java
M java/kudu-client/src/main/java/org/apache/kudu/client/PartitionPruner.java
3 files changed, 100 insertions(+), 13 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/06/14706/7
-- 
To view, visit http://gerrit.cloudera.org:8080/14706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I05affbd10804b4d07478f7abdde5380dd70ec116
Gerrit-Change-Number: 14706
Gerrit-PatchSet: 7
Gerrit-Owner: wangning <19...@gmail.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: wangning <19...@gmail.com>

[kudu-CR] [java] KUDU-1644 Simplify IN-list predicate values during token building

Posted by "wangning (Code Review)" <ge...@cloudera.org>.
wangning has posted comments on this change. ( http://gerrit.cloudera.org:8080/14706 )

Change subject: [java] KUDU-1644 Simplify IN-list predicate values during token building
......................................................................


Patch Set 7:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/14706/5/java/kudu-client/src/main/java/org/apache/kudu/client/KuduScanToken.java
File java/kudu-client/src/main/java/org/apache/kudu/client/KuduScanToken.java:

http://gerrit.cloudera.org:8080/#/c/14706/5/java/kudu-client/src/main/java/org/apache/kudu/client/KuduScanToken.java@436
PS5, Line 436: PartitionPruner pruner = PartitionPruner.create(this);
             :         List<KeyRange> keyRanges = new ArrayList<>();
> nit: having PartitionPruner modify a non-member map is a little awkward. Wh
Done


http://gerrit.cloudera.org:8080/#/c/14706/5/java/kudu-client/src/main/java/org/apache/kudu/client/KuduScanToken.java@462
PS5, Line 462: = new ArrayList<>(keyRanges.size());
> Why is it important that the list of columns is of size 1 too?
Previously I thought the hash bucket cannot be calculated with partial info of row.
This not well-considered. Thx.


http://gerrit.cloudera.org:8080/#/c/14706/5/java/kudu-client/src/main/java/org/apache/kudu/client/PartitionPruner.java
File java/kudu-client/src/main/java/org/apache/kudu/client/PartitionPruner.java:

http://gerrit.cloudera.org:8080/#/c/14706/5/java/kudu-client/src/main/java/org/apache/kudu/client/PartitionPruner.java@76
PS5, Line 76: ) {
> Please document what this does.
This parameter is removed.


http://gerrit.cloudera.org:8080/#/c/14706/5/java/kudu-client/src/main/java/org/apache/kudu/client/PartitionPruner.java@184
PS5, Line 184:   rangeLowerBound = scanner.lowerBoundPrimaryKey;
             :       }
> This is useful as a reviewer of this patch to know why this change looks th
Comment shouldn't be stateful.
Thx.



-- 
To view, visit http://gerrit.cloudera.org:8080/14706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I05affbd10804b4d07478f7abdde5380dd70ec116
Gerrit-Change-Number: 14706
Gerrit-PatchSet: 7
Gerrit-Owner: wangning <19...@gmail.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: wangning <19...@gmail.com>
Gerrit-Comment-Date: Thu, 05 Dec 2019 08:28:13 +0000
Gerrit-HasComments: Yes

[kudu-CR] [java] KUDU-1644 Simplify IN-list predicate values during token building

Posted by "wangning (Code Review)" <ge...@cloudera.org>.
wangning has posted comments on this change. ( http://gerrit.cloudera.org:8080/14706 )

Change subject: [java] KUDU-1644 Simplify IN-list predicate values during token building
......................................................................


Patch Set 9:

Patched with test cover.
Solo hash partition with multi keys support. Unable to handle situations with multi hash partiitons.


-- 
To view, visit http://gerrit.cloudera.org:8080/14706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I05affbd10804b4d07478f7abdde5380dd70ec116
Gerrit-Change-Number: 14706
Gerrit-PatchSet: 9
Gerrit-Owner: wangning <19...@gmail.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: wangning <19...@gmail.com>
Gerrit-Comment-Date: Thu, 19 Dec 2019 04:34:48 +0000
Gerrit-HasComments: No

[kudu-CR] [java] KUDU-1644 Simplify IN-list predicate values during token building

Posted by "wangning (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14706

to look at the new patch set (#3).

Change subject: [java] KUDU-1644 Simplify IN-list predicate values during token building
......................................................................

[java] KUDU-1644 Simplify IN-list predicate values during token building

This commit intends to improve the performance of token-based scan in
case of only one hash parititon which contains only one key.
By reducing the number of values pushed to server, it can reduce
the complexity of token-based in-list logarithmically.
The commit implemented the idea by filtering the unnesscessary values
when building the scanner.

Change-Id: I05affbd10804b4d07478f7abdde5380dd70ec116
---
M java/kudu-client/src/main/java/org/apache/kudu/client/KuduPredicate.java
M java/kudu-client/src/main/java/org/apache/kudu/client/KuduScanToken.java
M java/kudu-client/src/main/java/org/apache/kudu/client/PartitionPruner.java
3 files changed, 89 insertions(+), 16 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/06/14706/3
-- 
To view, visit http://gerrit.cloudera.org:8080/14706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I05affbd10804b4d07478f7abdde5380dd70ec116
Gerrit-Change-Number: 14706
Gerrit-PatchSet: 3
Gerrit-Owner: wangning <19...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins (120)