You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Zheng Shao (JIRA)" <ji...@apache.org> on 2009/02/02 09:17:59 UTC

[jira] Created: (HIVE-267) Multi-GroupBy inserts with the same distinct expression should share the first map-reduce job

Multi-GroupBy inserts with the same distinct expression should share the first map-reduce job
---------------------------------------------------------------------------------------------

                 Key: HIVE-267
                 URL: https://issues.apache.org/jira/browse/HIVE-267
             Project: Hadoop Hive
          Issue Type: Improvement
          Components: Query Processor
    Affects Versions: 0.2.0
            Reporter: Zheng Shao


Currently multi-GroupBy inserts was done in a way that each GroupBy is separate.

We should be able to optimize the plan.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-267) Multi-GroupBy inserts with the same distinct expression should share the first map-reduce job

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12669525#action_12669525 ] 

Zheng Shao commented on HIVE-267:
---------------------------------

Some preliminary thinking:

When there is a single shared global distinct expression:
1. When generating the first reducer for the GroupBy plans, put the distinct expression into the key.
2. Let the query optimizer merge the reduceSinkOperators with the same source tables and key value expressions. Note that there might be filter operators, and the query optimizer needs to take a union of all rows that might pass any filter operators.
3. The reducer outputs need to be separated into several different sets, each for one GroupBy.

When there is no distinct expression:
1. Run a map-only job and do map-side aggregation for all group-bys.
2. Split the results into different sets.


Both require some infrastructure change: to be able to split the output of mappers/reducers into several sets.


> Multi-GroupBy inserts with the same distinct expression should share the first map-reduce job
> ---------------------------------------------------------------------------------------------
>
>                 Key: HIVE-267
>                 URL: https://issues.apache.org/jira/browse/HIVE-267
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Query Processor
>    Affects Versions: 0.2.0
>            Reporter: Zheng Shao
>
> Currently multi-GroupBy inserts was done in a way that each GroupBy is separate.
> We should be able to optimize the plan.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.