You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "DawnZhang (JIRA)" <ji...@apache.org> on 2017/12/01 12:31:00 UTC
[jira] [Commented] (KUDU-2231)
"materializing_iterator_do_pushdown=true" cause simple query slow
[ https://issues.apache.org/jira/browse/KUDU-2231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16274327#comment-16274327 ]
DawnZhang commented on KUDU-2231:
---------------------------------
here's detail:
h3. table under test:
{code:sql}
CREATE TABLE rawdata.test_table (
day INT NOT NULL ENCODING BIT_SHUFFLE COMPRESSION DEFAULT_COMPRESSION,
user_id BIGINT NOT NULL ENCODING BIT_SHUFFLE COMPRESSION DEFAULT_COMPRESSION,
time TIMESTAMP NOT NULL ENCODING BIT_SHUFFLE COMPRESSION DEFAULT_COMPRESSION,
event_id INT NULL ENCODING BIT_SHUFFLE COMPRESSION DEFAULT_COMPRESSION,
distinct_id STRING NULL ENCODING DICT_ENCODING COMPRESSION DEFAULT_COMPRESSION,
...
... other fields ...
...
PRIMARY KEY (day, user_id, time, _offset)
)
PARTITION BY HASH (user_id) PARTITIONS 9
STORED AS KUDU
TBLPROPERTIES ( ... );
{code}
table size (select count(1) from test_table) : 19510709
h3. CASE 1, materializing_iterator_do_pushdown = true
scans:
https://issues.apache.org/jira/secure/attachment/12900200/756ACA6F105F0905EBCB79B940FFCE86.jpg
!756ACA6F105F0905EBCB79B940FFCE86.jpg|thumbnail!
h3. CASE 2, materializing_iterator_do_pushdown = false (sql ran faster)
scans:
https://issues.apache.org/jira/secure/attachment/12900199/F8C604537B8E921DDCCA78995DC11BDA.jpg
!F8C604537B8E921DDCCA78995DC11BDA.jpg|thumbnail!
it looks like kudu scan table multiple times for the simple sql caused by some silly bug.
> "materializing_iterator_do_pushdown=true" cause simple query slow
> -----------------------------------------------------------------
>
> Key: KUDU-2231
> URL: https://issues.apache.org/jira/browse/KUDU-2231
> Project: Kudu
> Issue Type: Bug
> Components: master, tserver
> Affects Versions: 1.4.0, 1.5.0
> Environment: CentOS release 6.5 (2.6.32-431.11.9.el6.ucloud.x86_64)
> KUDU-1.4.0-1.cdh5.12.1.p0.10
> IMPALA 2.6.0
> x86-64
> Intel CPU
> Reporter: DawnZhang
> Attachments: 756ACA6F105F0905EBCB79B940FFCE86.jpg, F8C604537B8E921DDCCA78995DC11BDA.jpg
>
>
> I ran the following SQL again and again
> while refresh 8050/scans page at the same time.
> h3. sql:
> {code:sql}
> select count(xx_id),count(yy_id),count(time) from test_table where event_id =29983;
> {code}
> "Cells read from disk" is much more greater then table size when materializing_iterator_do_pushdown = true (default).
> after setting materializing_iterator_do_pushdown = false
> "Cells read from disk" reduced to some reasonable value (close to table size)
> and the sql run faster.
> here's detail:
> h3. table under test:
> {code:sql}
> CREATE TABLE rawdata.test_table (
> day INT NOT NULL ENCODING BIT_SHUFFLE COMPRESSION DEFAULT_COMPRESSION,
> user_id BIGINT NOT NULL ENCODING BIT_SHUFFLE COMPRESSION DEFAULT_COMPRESSION,
> time TIMESTAMP NOT NULL ENCODING BIT_SHUFFLE COMPRESSION DEFAULT_COMPRESSION,
> event_id INT NULL ENCODING BIT_SHUFFLE COMPRESSION DEFAULT_COMPRESSION,
> distinct_id STRING NULL ENCODING DICT_ENCODING COMPRESSION DEFAULT_COMPRESSION,
> ...
> ... other fields ...
> ...
> PRIMARY KEY (day, user_id, time, _offset)
> )
> PARTITION BY HASH (user_id) PARTITIONS 9
> STORED AS KUDU
> TBLPROPERTIES ( ... );
> {code}
> table size (select count(1) from test_table) : 19510709
> h3. CASE 1, materializing_iterator_do_pushdown = true
> scans:
> https://issues.apache.org/jira/secure/attachment/12900200/756ACA6F105F0905EBCB79B940FFCE86.jpg
> !756ACA6F105F0905EBCB79B940FFCE86.jpg|thumbnail!
> h3. CASE 2, materializing_iterator_do_pushdown = false (sql ran faster)
> scans:
> https://issues.apache.org/jira/secure/attachment/12900199/F8C604537B8E921DDCCA78995DC11BDA.jpg
> !F8C604537B8E921DDCCA78995DC11BDA.jpg|thumbnail!
> it looks like kudu scan table multiple times for the simple sql caused by some silly bug.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)