You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@phoenix.apache.org by "tu nguyen khac (JIRA)" <ji...@apache.org> on 2016/08/02 07:46:20 UTC
[jira] [Comment Edited] (PHOENIX-3131) improve "order by " performance with aggregated query

    [ https://issues.apache.org/jira/browse/PHOENIX-3131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15403336#comment-15403336 ] 

tu nguyen khac edited comment on PHOENIX-3131 at 8/2/16 7:46 AM:
-----------------------------------------------------------------

this is query plan of second query ( slow query ) 
+----------------------------------------------------------------------+
| CLIENT 4-CHUNK PARALLEL 4-WAY RANGE SCAN OVER pageview_site [0]  |
|     ROW TIMESTAMP FILTER [1464739200001, 1470112675610)              |
|     SERVER FILTER BY DT > DATE '2016-06-01 00:00:00.000'             |
|     SERVER AGGREGATE INTO ORDERED DISTINCT ROWS BY [URL]             |
| CLIENT MERGE SORT                                                    |
| CLIENT TOP 102 ROWS SORTED BY [SUM(PAGEVIEW) DESC]                   |
+----------------------------------------------------------------------+

this is belong to fast query : 

+----------------------------------------------------------------------------+
|                                    PLAN                                    |
+----------------------------------------------------------------------------+
| CLIENT 4-CHUNK PARALLEL 4-WAY RANGE SCAN OVER PAGEVIEW_SITE [0]        |
|     ROW TIMESTAMP FILTER [1464739200001, 1470112798683)                    |
|     SERVER FILTER BY DT > DATE '2016-06-01 00:00:00.000'                   |
|     SERVER AGGREGATE INTO ORDERED DISTINCT ROWS BY [URL] LIMIT 102 GROUPS  |
| CLIENT MERGE SORT                                                          |
| CLIENT OFFSET 2                                                            |
| CLIENT 100 ROW LIMIT                                                       |
+----------------------------------------------------------------------------+




was (Author: tuyuri):

this is query plan of second query ( slow query ) 
+----------------------------------------------------------------------+
| CLIENT 4-CHUNK PARALLEL 4-WAY RANGE SCAN OVER pageview_site [0]  |
|     ROW TIMESTAMP FILTER [1464739200001, 1470112675610)              |
|     SERVER FILTER BY DT > DATE '2016-06-01 00:00:00.000'             |
|     SERVER AGGREGATE INTO ORDERED DISTINCT ROWS BY [URL]             |
| CLIENT MERGE SORT                                                    |
| CLIENT TOP 102 ROWS SORTED BY [SUM(PAGEVIEW) DESC]                   |
+----------------------------------------------------------------------+

this is belong to fast query : 

+----------------------------------------------------------------------------+
|                                    PLAN                                    |
+----------------------------------------------------------------------------+
| CLIENT 4-CHUNK PARALLEL 4-WAY RANGE SCAN OVER PAGEVIEW_SITE_K14 [0]        |
|     ROW TIMESTAMP FILTER [1464739200001, 1470112798683)                    |
|     SERVER FILTER BY DT > DATE '2016-06-01 00:00:00.000'                   |
|     SERVER AGGREGATE INTO ORDERED DISTINCT ROWS BY [URL] LIMIT 102 GROUPS  |
| CLIENT MERGE SORT                                                          |
| CLIENT OFFSET 2                                                            |
| CLIENT 100 ROW LIMIT                                                       |
+----------------------------------------------------------------------------+



> improve "order by " performance with aggregated query 
> ------------------------------------------------------
>
>                 Key: PHOENIX-3131
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-3131
>             Project: Phoenix
>          Issue Type: Improvement
>    Affects Versions: 4.8.0
>            Reporter: tu nguyen khac
>            Priority: Critical
>
> I created a table in phoenix with query : ( 4 node , ram 8gb, 4 cores / node ) 
> CREATE TABLE pageview_site (
>     url varchar(255) not null,
>     pageview bigint,
>     dt date not null,
>     CONSTRAINT PK PRIMARY KEY (url, dt ROW_TIMESTAMP)
> ) SALT_BUCKETS = 4;
> After that : 
> 1. I tried to upsert about : 13 milions rows to this table . 
> 2. Run 2 queries : 
>     a. select url,sum(pageview) as pv FROM pageview_site where dt > to_date ('2016-06-01') group by url limit 100 offset 2;
> the duration this query  in about : 0.5 second
>     b. select url,sum(pageview) as pv FROM pageview_site where dt > to_date ('2016-06-01') group by ur order by pv descl limit 100 offset 2;
> the duration this query  in about : 9.5 seconds
> what happens with 2nd query ?? I think we should improve performance for "order by " command 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)