You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@phoenix.apache.org by "tu nguyen khac (JIRA)" <ji...@apache.org> on 2016/08/02 07:46:20 UTC
[jira] [Comment Edited] (PHOENIX-3131) improve "order by "
performance with aggregated query
[ https://issues.apache.org/jira/browse/PHOENIX-3131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15403336#comment-15403336 ]
tu nguyen khac edited comment on PHOENIX-3131 at 8/2/16 7:46 AM:
-----------------------------------------------------------------
this is query plan of second query ( slow query )
+----------------------------------------------------------------------+
| CLIENT 4-CHUNK PARALLEL 4-WAY RANGE SCAN OVER pageview_site [0] |
| ROW TIMESTAMP FILTER [1464739200001, 1470112675610) |
| SERVER FILTER BY DT > DATE '2016-06-01 00:00:00.000' |
| SERVER AGGREGATE INTO ORDERED DISTINCT ROWS BY [URL] |
| CLIENT MERGE SORT |
| CLIENT TOP 102 ROWS SORTED BY [SUM(PAGEVIEW) DESC] |
+----------------------------------------------------------------------+
this is belong to fast query :
+----------------------------------------------------------------------------+
| PLAN |
+----------------------------------------------------------------------------+
| CLIENT 4-CHUNK PARALLEL 4-WAY RANGE SCAN OVER PAGEVIEW_SITE [0] |
| ROW TIMESTAMP FILTER [1464739200001, 1470112798683) |
| SERVER FILTER BY DT > DATE '2016-06-01 00:00:00.000' |
| SERVER AGGREGATE INTO ORDERED DISTINCT ROWS BY [URL] LIMIT 102 GROUPS |
| CLIENT MERGE SORT |
| CLIENT OFFSET 2 |
| CLIENT 100 ROW LIMIT |
+----------------------------------------------------------------------------+
was (Author: tuyuri):
this is query plan of second query ( slow query )
+----------------------------------------------------------------------+
| CLIENT 4-CHUNK PARALLEL 4-WAY RANGE SCAN OVER pageview_site [0] |
| ROW TIMESTAMP FILTER [1464739200001, 1470112675610) |
| SERVER FILTER BY DT > DATE '2016-06-01 00:00:00.000' |
| SERVER AGGREGATE INTO ORDERED DISTINCT ROWS BY [URL] |
| CLIENT MERGE SORT |
| CLIENT TOP 102 ROWS SORTED BY [SUM(PAGEVIEW) DESC] |
+----------------------------------------------------------------------+
this is belong to fast query :
+----------------------------------------------------------------------------+
| PLAN |
+----------------------------------------------------------------------------+
| CLIENT 4-CHUNK PARALLEL 4-WAY RANGE SCAN OVER PAGEVIEW_SITE_K14 [0] |
| ROW TIMESTAMP FILTER [1464739200001, 1470112798683) |
| SERVER FILTER BY DT > DATE '2016-06-01 00:00:00.000' |
| SERVER AGGREGATE INTO ORDERED DISTINCT ROWS BY [URL] LIMIT 102 GROUPS |
| CLIENT MERGE SORT |
| CLIENT OFFSET 2 |
| CLIENT 100 ROW LIMIT |
+----------------------------------------------------------------------------+
> improve "order by " performance with aggregated query
> ------------------------------------------------------
>
> Key: PHOENIX-3131
> URL: https://issues.apache.org/jira/browse/PHOENIX-3131
> Project: Phoenix
> Issue Type: Improvement
> Affects Versions: 4.8.0
> Reporter: tu nguyen khac
> Priority: Critical
>
> I created a table in phoenix with query : ( 4 node , ram 8gb, 4 cores / node )
> CREATE TABLE pageview_site (
> url varchar(255) not null,
> pageview bigint,
> dt date not null,
> CONSTRAINT PK PRIMARY KEY (url, dt ROW_TIMESTAMP)
> ) SALT_BUCKETS = 4;
> After that :
> 1. I tried to upsert about : 13 milions rows to this table .
> 2. Run 2 queries :
> a. select url,sum(pageview) as pv FROM pageview_site where dt > to_date ('2016-06-01') group by url limit 100 offset 2;
> the duration this query in about : 0.5 second
> b. select url,sum(pageview) as pv FROM pageview_site where dt > to_date ('2016-06-01') group by ur order by pv descl limit 100 offset 2;
> the duration this query in about : 9.5 seconds
> what happens with 2nd query ?? I think we should improve performance for "order by " command
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)