You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@phoenix.apache.org by "James Taylor (JIRA)" <ji...@apache.org> on 2016/01/24 04:54:39 UTC

[jira] [Updated] (PHOENIX-1006) Do not sort group by rows without order by

     [ https://issues.apache.org/jira/browse/PHOENIX-1006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

James Taylor updated PHOENIX-1006:
----------------------------------
    Assignee:     (was: Samarth Jain)

> Do not sort group by rows without order by
> ------------------------------------------
>
>                 Key: PHOENIX-1006
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-1006
>             Project: Phoenix
>          Issue Type: Improvement
>    Affects Versions: 3.0.0
>            Reporter: jay wong
>              Labels: gsoc2015
>         Attachments: PHOENIX-1006.patch, PHOENIX-1006v2.patch
>
>
> Assuming a sql query like below which will generate 55000 groups:
> {code}
> SELECT count(1) as count,SUM(int_column) as sum_column, MAX(int_column) as max_column2,MIN(int_column) as min_column,AVG(int_column) as avg_column FROM table1 WHERE int_column IS NOT NULL GROUP BY int_column2 ORDER BY int_column DESC LIMIT 200;
> {code}
> From AgreegatePlan we could see the *resultIterator* will be set to MergeSortRowKeyResultIterator during group by, and the MergeSortRowKeyResultIterator needs an OrderedResultIterator. As a result, no matter whether the _group by_ query is with _order by_ or not, it'll ALWAYS be sorted first, which is unnecessary.
> To improve this, we could modify the code to not trigger orderby iterator when groupby w/o orderby, and sort the result within each group on client side instead.
> On the other side, in the groupby plus orderby case, now the sort on RegionServer side is triggered sequentially, which cause s poor performance especially w/ big region number. We should improve this by getting an element from each scanner earlier to trigger the sort  and make the sorting in parallel.
> More details, please refer to the attached patch. Any comment/suggestion will be highly appreciated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)