You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Teddy Choi (JIRA)" <ji...@apache.org> on 2018/03/06 01:40:00 UTC

[jira] [Commented] (HIVE-17896) TopNKey: Create a standalone vectorizable TopNKey operator

    [ https://issues.apache.org/jira/browse/HIVE-17896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16387122#comment-16387122 ] 

Teddy Choi commented on HIVE-17896:
-----------------------------------

Updated the patch with the latest master branch.

> TopNKey: Create a standalone vectorizable TopNKey operator
> ----------------------------------------------------------
>
>                 Key: HIVE-17896
>                 URL: https://issues.apache.org/jira/browse/HIVE-17896
>             Project: Hive
>          Issue Type: New Feature
>          Components: Operators
>    Affects Versions: 3.0.0
>            Reporter: Gopal V
>            Assignee: Teddy Choi
>            Priority: Major
>         Attachments: HIVE-17896.1.patch, HIVE-17896.3.patch, HIVE-17896.4.patch, HIVE-17896.5.patch, HIVE-17896.6.patch, HIVE-17896.7.patch
>
>
> For TPC-DS Query27, the TopN operation is delayed by the group-by - the group-by operator buffers up all the rows before discarding the 99% of the rows in the TopN Hash within the ReduceSink Operator.
> The RS TopN operator is very restrictive as it only supports doing the filtering on the shuffle keys, but it is better to do this before breaking the vectors into rows and losing the isRepeating properties.
> Adding a TopN Key operator in the physical operator tree allows the following to happen.
> GBY->RS(Top=1)
> can become 
> TNK(1)->GBY->RS(Top=1)
> So that, the TopNKey can remove rows before they are buffered into the GBY and consume memory.
> Here's the equivalent implementation in Presto
> https://github.com/prestodb/presto/blob/master/presto-main/src/main/java/com/facebook/presto/operator/TopNOperator.java#L35
> Adding this as a sub-feature of GroupBy prevents further optimizations if the GBY is on keys "a,b,c" and the TopNKey is on just "a".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)