You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Namit Jain (JIRA)" <ji...@apache.org> on 2008/09/11 03:34:44 UTC

[jira] Created: (HADOOP-4157) [hive] redundant expression evaluation in group by stage 1

[hive] redundant expression evaluation in group by stage 1
----------------------------------------------------------

                 Key: HADOOP-4157
                 URL: https://issues.apache.org/jira/browse/HADOOP-4157
             Project: Hadoop Core
          Issue Type: Bug
          Components: contrib/hive
            Reporter: Namit Jain


We can be smarter in terms of this list of what parameters we evaluate in the first stage, we should only evaluate those that are common across the group by clauses.

All expressions and aggregation function parameters are evaluated in the first stage thereby increasing data flow. It might be better to only evaluate common expressions

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4157) [hive] redundant expression evaluation in group by stage 1

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12630063#action_12630063 ] 

Zheng Shao commented on HADOOP-4157:
------------------------------------

2 extremes are:

1. send all columns that are used in at least one group by.
2. send all group-by keys (eliminating duplicates)

I think we can first try to eliminating duplicates (so we become 2). That will give us the most improvements for common multi group bys.
Later we can think about other optimizations like in the description.


> [hive] redundant expression evaluation in group by stage 1
> ----------------------------------------------------------
>
>                 Key: HADOOP-4157
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4157
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/hive
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>
> We can be smarter in terms of this list of what parameters we evaluate in the first stage, we should only evaluate those that are common across the group by clauses.
> All expressions and aggregation function parameters are evaluated in the first stage thereby increasing data flow. It might be better to only evaluate common expressions

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (HADOOP-4157) [hive] redundant expression evaluation in group by stage 1

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Namit Jain reassigned HADOOP-4157:
----------------------------------

    Assignee: Namit Jain

> [hive] redundant expression evaluation in group by stage 1
> ----------------------------------------------------------
>
>                 Key: HADOOP-4157
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4157
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/hive
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>
> We can be smarter in terms of this list of what parameters we evaluate in the first stage, we should only evaluate those that are common across the group by clauses.
> All expressions and aggregation function parameters are evaluated in the first stage thereby increasing data flow. It might be better to only evaluate common expressions

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.