You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "He Yongqiang (JIRA)" <ji...@apache.org> on 2010/08/02 20:26:16 UTC

[jira] Created: (HIVE-1506) Optimize number of mr jobs produced by group by sort by

Optimize number of mr jobs produced by group by sort by
-------------------------------------------------------

                 Key: HIVE-1506
                 URL: https://issues.apache.org/jira/browse/HIVE-1506
             Project: Hadoop Hive
          Issue Type: Improvement
            Reporter: He Yongqiang


Right now,

select key, INPUT__FILE__NAME, count(value from src group by key, INPUT__FILE__NAME sort by key
require 2 jobs
and
select key, INPUT__FILE__NAME, count(value from src group by key, INPUT__FILE__NAME sort by key limit 3
require 3 jobs.

Both can be done with just one job.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1506) Optimize number of mr jobs produced by group by sort by

Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12894645#action_12894645 ] 

He Yongqiang commented on HIVE-1506:
------------------------------------

this is not a common case because the sort by columns are prefix of the group by  columns. The sort by clause can just be removed from the query, and the output will be the same.

> Optimize number of mr jobs produced by group by sort by
> -------------------------------------------------------
>
>                 Key: HIVE-1506
>                 URL: https://issues.apache.org/jira/browse/HIVE-1506
>             Project: Hadoop Hive
>          Issue Type: Improvement
>            Reporter: He Yongqiang
>
> Right now,
> select key, INPUT__FILE__NAME, count(value from src group by key, INPUT__FILE__NAME sort by key
> require 2 jobs
> and
> select key, INPUT__FILE__NAME, count(value from src group by key, INPUT__FILE__NAME sort by key limit 3
> require 3 jobs.
> Both can be done with just one job.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.