You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-dev@hadoop.apache.org by "Zheng Shao (JIRA)" <ji...@apache.org> on 2008/09/11 03:42:46 UTC

[jira] Commented: (HADOOP-4157) [hive] redundant expression evaluation in group by stage 1

    [ https://issues.apache.org/jira/browse/HADOOP-4157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12630063#action_12630063 ] 

Zheng Shao commented on HADOOP-4157:
------------------------------------

2 extremes are:

1. send all columns that are used in at least one group by.
2. send all group-by keys (eliminating duplicates)

I think we can first try to eliminating duplicates (so we become 2). That will give us the most improvements for common multi group bys.
Later we can think about other optimizations like in the description.


> [hive] redundant expression evaluation in group by stage 1
> ----------------------------------------------------------
>
>                 Key: HADOOP-4157
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4157
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/hive
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>
> We can be smarter in terms of this list of what parameters we evaluate in the first stage, we should only evaluate those that are common across the group by clauses.
> All expressions and aggregation function parameters are evaluated in the first stage thereby increasing data flow. It might be better to only evaluate common expressions

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.