You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "wangningito (Jira)" <ji...@apache.org> on 2019/11/15 01:59:00 UTC
[jira] [Comment Edited] (KUDU-1644) Simplify IN-list predicate values based on tablet partition key or rowset PK bounds

    [ https://issues.apache.org/jira/browse/KUDU-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16974290#comment-16974290 ] 

wangningito edited comment on KUDU-1644 at 11/15/19 1:58 AM:
-------------------------------------------------------------

Here I submitted an implementation for token-based scan in case of only one hash partition which it contains only one key.  [https://gerrit.cloudera.org/c/14706/ |https://gerrit.cloudera.org/c/14706/]

This implementation, in client module, filtered the values to be pushed during the stage of token building while do very slightly modification of current code and slightly impact on performance.

In previous pruneHashComponent method, all the hash bucket of rows were calculated, I simply implemented the idea by collecting those id and replace the in-list predicate values with filtered values . So this implementation were done with almost no performance impaction for other case. I implemented it by place it in client instead of place in tablet while the performance improvement can be acquired in two aspects, less values for transport in network, and reduction the complexity of further binary search logarithmically.

Here I attach some performance benchmark with this implementation.

Hardware:

Client:  4 cores, 8g memory 

Server: 4 cores, 8g memory

In-List size: 100000, all query happen in cache.

The table to be scan by in-list query contains 10M rows and 30 dense columns, cells are consist of  BIGINT or STRING randomly.   24 partitions.

Before tuning:

!http://doc.sensorsdata.cn/download/attachments/29573518/image2019-11-11_19-11-21.png?version=1&modificationDate=1573470681000&api=v2!

After tuning:

!http://doc.sensorsdata.cn/download/attachments/29573518/image2019-11-12_15-5-57.png?version=1&modificationDate=1573542358000&api=v2!


was (Author: wangning):
Here I submitted an implementation for token-based scan in case of only one hash partition which it contains only one key.  [https://gerrit.cloudera.org/c/14706/ |https://gerrit.cloudera.org/c/14706/]

This implementation, in client module, filtered the values to be pushed during the stage of token building while do very slightly modification of current code and slightly impact on performance.

In previous pruneHashComponent method, all the hash bucket of rows were calculated, I simply implemented the idea by collecting those id and replace the in-list predicate values with filtered values . So this implementation were done with almost no performance impaction for other case. I implemented it by place it in client instead of place in tablet while the performance improvement can be acquired in two aspects, less values for transport in network, and reduction the complexity of further binary search logarithmically.

Here I attach some performance benchmark with this implementation.

Hardware:

Client:  4 cores, 8g memory 

Server: 4 cores, 8g memory

In-List size: 100000, all query happen in cache.

The table to be scan by in-list query contains 10M rows and 30 dense columns, cells are consist of  BIGINT or STRING randomly.   

Before tuning:

!http://doc.sensorsdata.cn/download/attachments/29573518/image2019-11-11_19-11-21.png?version=1&modificationDate=1573470681000&api=v2!

After tuning:

!http://doc.sensorsdata.cn/download/attachments/29573518/image2019-11-12_15-5-57.png?version=1&modificationDate=1573542358000&api=v2!

> Simplify IN-list predicate values based on tablet partition key or rowset PK bounds
> -----------------------------------------------------------------------------------
>
>                 Key: KUDU-1644
>                 URL: https://issues.apache.org/jira/browse/KUDU-1644
>             Project: Kudu
>          Issue Type: Sub-task
>          Components: perf, tablet
>            Reporter: Dan Burkert
>            Priority: Major
>
> When new scans are optimized by the tablet, the tablet's partition key bounds aren't taken into account in order to remove predicates from the scan.  One of the most important such optimizations is that IN-list predicates could remove values based on the tablet's constraints.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)