You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hive.apache.org by "Prasad Chakka (JIRA)" <ji...@apache.org> on 2009/02/07 02:10:59 UTC

[jira] Created: (HIVE-279) Implement predicate push down for hive queries

Implement predicate push down for hive queries
----------------------------------------------

                 Key: HIVE-279
                 URL: https://issues.apache.org/jira/browse/HIVE-279
             Project: Hadoop Hive
          Issue Type: New Feature
    Affects Versions: 0.2.0
            Reporter: Prasad Chakka
            Assignee: Prasad Chakka


Push predicates that are expressed in outer queries into inner queries where possible so that rows will get filtered out sooner.
eg.

select a.*, b.* from a join b on (a.uid = b.uid) where a.age = 20 and a.gender = 'm'

current compiler generates the filter predicate in the reducer after the join so all the rows have to be passed from mapper to reducer. by pushing the filter predicate to the mapper, query performance should improve.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-279) Implement predicate push down for hive queries

Posted by "Prasad Chakka (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HIVE-279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12690161#action_12690161 ] 

Prasad Chakka commented on HIVE-279:
------------------------------------

will add the tests & take care of minor comments

colExprMap & posToAliasMap are just like any other transient fields in the operator and not needed beyond predicate pushdown. so i am going to leave them there.

TableScan operator can't have more than one child or rather any children of TableScan can't have more than one parent. So changing the order doesn't matter and this is assumed in lot of places. But I will change it so that the order is unchanged.



> Implement predicate push down for hive queries
> ----------------------------------------------
>
>                 Key: HIVE-279
>                 URL: https://issues.apache.org/jira/browse/HIVE-279
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Query Processor
>    Affects Versions: 0.2.0
>            Reporter: Prasad Chakka
>            Assignee: Prasad Chakka
>         Attachments: hive-279.2.patch, hive-279.3.patch, hive-279.4.patch, hive-279.patch
>
>
> Push predicates that are expressed in outer queries into inner queries where possible so that rows will get filtered out sooner.
> eg.
> select a.*, b.* from a join b on (a.uid = b.uid) where a.age = 20 and a.gender = 'm'
> current compiler generates the filter predicate in the reducer after the join so all the rows have to be passed from mapper to reducer. by pushing the filter predicate to the mapper, query performance should improve.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-279) Implement predicate push down for hive queries

Posted by "Ashish Thusoo (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HIVE-279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12693975#action_12693975 ] 

Ashish Thusoo commented on HIVE-279:
------------------------------------

+1

Looks good to me.

A few things to watch out for before you upload the complete patch..

1. ppd_gby_join.q is missing.
2. Would be great to have a join test case where the left and right ourter joins are part of the same join operator.
3. Also would be great to have a join test case where there is a three way join and the join operators do not merge.
4. Also a test case with UDFs (not the rand one) that gets pushed down would also be awesome.


Otherwise looks quite cool to me. 

> Implement predicate push down for hive queries
> ----------------------------------------------
>
>                 Key: HIVE-279
>                 URL: https://issues.apache.org/jira/browse/HIVE-279
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Query Processor
>    Affects Versions: 0.2.0
>            Reporter: Prasad Chakka
>            Assignee: Prasad Chakka
>         Attachments: hive-279.2.patch, hive-279.3.patch, hive-279.4.patch, hive-279.5.patch, hive-279.patch
>
>
> Push predicates that are expressed in outer queries into inner queries where possible so that rows will get filtered out sooner.
> eg.
> select a.*, b.* from a join b on (a.uid = b.uid) where a.age = 20 and a.gender = 'm'
> current compiler generates the filter predicate in the reducer after the join so all the rows have to be passed from mapper to reducer. by pushing the filter predicate to the mapper, query performance should improve.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-279) Implement predicate push down for hive queries

Posted by "Prasad Chakka (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HIVE-279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Prasad Chakka updated HIVE-279:
-------------------------------

    Attachment: hive-279.3.patch

incorporated comments from Namit and added a bunch of new testcases.

due to changes to exprNodeDesc, the outputs of all parse unit tests have changed. I will upload them in couple of days. otherwise there are no pending code changes for this JIRA.

will open jiras for the following further optimizatons
1) mult-insert
2) intermediate filter operators for partial pushdown
3) remove pushed preds from original operator to eliminate repeat evaluation.
4) merge successive filter ops into single op

> Implement predicate push down for hive queries
> ----------------------------------------------
>
>                 Key: HIVE-279
>                 URL: https://issues.apache.org/jira/browse/HIVE-279
>             Project: Hadoop Hive
>          Issue Type: New Feature
>    Affects Versions: 0.2.0
>            Reporter: Prasad Chakka
>            Assignee: Prasad Chakka
>         Attachments: hive-279.2.patch, hive-279.3.patch, hive-279.patch
>
>
> Push predicates that are expressed in outer queries into inner queries where possible so that rows will get filtered out sooner.
> eg.
> select a.*, b.* from a join b on (a.uid = b.uid) where a.age = 20 and a.gender = 'm'
> current compiler generates the filter predicate in the reducer after the join so all the rows have to be passed from mapper to reducer. by pushing the filter predicate to the mapper, query performance should improve.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-279) Implement predicate push down for hive queries

Posted by "Prasad Chakka (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HIVE-279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Prasad Chakka updated HIVE-279:
-------------------------------

    Attachment: hive-279.2.patch

Here is an updated patch that simplifies some of the column reference conversions. Now every operator converts the column references in the predicate expressions so that they will be understood by the parent operator(s)


> Implement predicate push down for hive queries
> ----------------------------------------------
>
>                 Key: HIVE-279
>                 URL: https://issues.apache.org/jira/browse/HIVE-279
>             Project: Hadoop Hive
>          Issue Type: New Feature
>    Affects Versions: 0.2.0
>            Reporter: Prasad Chakka
>            Assignee: Prasad Chakka
>         Attachments: hive-279.2.patch, hive-279.patch
>
>
> Push predicates that are expressed in outer queries into inner queries where possible so that rows will get filtered out sooner.
> eg.
> select a.*, b.* from a join b on (a.uid = b.uid) where a.age = 20 and a.gender = 'm'
> current compiler generates the filter predicate in the reducer after the join so all the rows have to be passed from mapper to reducer. by pushing the filter predicate to the mapper, query performance should improve.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-279) Implement predicate push down for hive queries

Posted by "Prasad Chakka (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HIVE-279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12693984#action_12693984 ] 

Prasad Chakka commented on HIVE-279:
------------------------------------

1) I missed that file in the patch
2) Will add it
3) This test case is already there (ppd_join2.q or ppd_join3.q)
4) Will add it, but aren't all operators UDFs? 

> Implement predicate push down for hive queries
> ----------------------------------------------
>
>                 Key: HIVE-279
>                 URL: https://issues.apache.org/jira/browse/HIVE-279
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Query Processor
>    Affects Versions: 0.2.0
>            Reporter: Prasad Chakka
>            Assignee: Prasad Chakka
>         Attachments: hive-279.2.patch, hive-279.3.patch, hive-279.4.patch, hive-279.5.patch, hive-279.patch
>
>
> Push predicates that are expressed in outer queries into inner queries where possible so that rows will get filtered out sooner.
> eg.
> select a.*, b.* from a join b on (a.uid = b.uid) where a.age = 20 and a.gender = 'm'
> current compiler generates the filter predicate in the reducer after the join so all the rows have to be passed from mapper to reducer. by pushing the filter predicate to the mapper, query performance should improve.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-279) Implement predicate push down for hive queries

Posted by "Prasad Chakka (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HIVE-279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Prasad Chakka updated HIVE-279:
-------------------------------

    Attachment: hive-279.5.patch

updated patch incorporating Namit's comments

> Implement predicate push down for hive queries
> ----------------------------------------------
>
>                 Key: HIVE-279
>                 URL: https://issues.apache.org/jira/browse/HIVE-279
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Query Processor
>    Affects Versions: 0.2.0
>            Reporter: Prasad Chakka
>            Assignee: Prasad Chakka
>         Attachments: hive-279.2.patch, hive-279.3.patch, hive-279.4.patch, hive-279.5.patch, hive-279.patch
>
>
> Push predicates that are expressed in outer queries into inner queries where possible so that rows will get filtered out sooner.
> eg.
> select a.*, b.* from a join b on (a.uid = b.uid) where a.age = 20 and a.gender = 'm'
> current compiler generates the filter predicate in the reducer after the join so all the rows have to be passed from mapper to reducer. by pushing the filter predicate to the mapper, query performance should improve.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-279) Implement predicate push down for hive queries

Posted by "Prasad Chakka (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HIVE-279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12677727#action_12677727 ] 

Prasad Chakka commented on HIVE-279:
------------------------------------

1) will do
2) can't remove all op specific code here. gby still needs to check whether this column is part of keys or values and value cols can't be pushed.
3) will do when possible without doing too much class casting
4) once i use specific classes instead of Node, things should be more clear and I don't see any reason in moving to a separate directory. They are already different classes so not sure the benefit of moving to separate directories
5) will open a new JIRA once I am done with this

colInfoMap will go away in the next patch. 


> Implement predicate push down for hive queries
> ----------------------------------------------
>
>                 Key: HIVE-279
>                 URL: https://issues.apache.org/jira/browse/HIVE-279
>             Project: Hadoop Hive
>          Issue Type: New Feature
>    Affects Versions: 0.2.0
>            Reporter: Prasad Chakka
>            Assignee: Prasad Chakka
>         Attachments: hive-279.2.patch, hive-279.patch
>
>
> Push predicates that are expressed in outer queries into inner queries where possible so that rows will get filtered out sooner.
> eg.
> select a.*, b.* from a join b on (a.uid = b.uid) where a.age = 20 and a.gender = 'm'
> current compiler generates the filter predicate in the reducer after the join so all the rows have to be passed from mapper to reducer. by pushing the filter predicate to the mapper, query performance should improve.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-279) Implement predicate push down for hive queries

Posted by "Prasad Chakka (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HIVE-279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Prasad Chakka updated HIVE-279:
-------------------------------

    Attachment: hive-279.patch

this is a drop for initial review since i suspect there will be lot of comments :). it should work for all cases except for multi-insert queries.

i have not enabled this by default but added a new config param called hive.optimize.ppd to enable this feature. 

i have not modified existing testcases but added couple of new testcases. will add more while uploading final patch.


> Implement predicate push down for hive queries
> ----------------------------------------------
>
>                 Key: HIVE-279
>                 URL: https://issues.apache.org/jira/browse/HIVE-279
>             Project: Hadoop Hive
>          Issue Type: New Feature
>    Affects Versions: 0.2.0
>            Reporter: Prasad Chakka
>            Assignee: Prasad Chakka
>         Attachments: hive-279.patch
>
>
> Push predicates that are expressed in outer queries into inner queries where possible so that rows will get filtered out sooner.
> eg.
> select a.*, b.* from a join b on (a.uid = b.uid) where a.age = 20 and a.gender = 'm'
> current compiler generates the filter predicate in the reducer after the join so all the rows have to be passed from mapper to reducer. by pushing the filter predicate to the mapper, query performance should improve.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-279) Implement predicate push down for hive queries

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HIVE-279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12690123#action_12690123 ] 

Namit Jain commented on HIVE-279:
---------------------------------

Add comments in tests:

For eg:  ppd_gby.q : the src1.c1 > 'val_200' is pushed up, but the other is not etc.


More tests needed:


groupby followed by groupby
groupby followed by join 
3-way join not being merged
3-way join being merged
outer join various scenarios
etc.


add a test for multi-table insert also, where ppd is not happening.



rand() being undeterministic - I think that change has already been merged by Raghu

PredicatePushdown.java: line 45: pusehed -> pushed

Optimizer.java:

columnPruner should be done after ppd, since it regenerates the operator tree.
Can you add a test for that ? I think ppd will not happen - need to confirm via a test

It might be a debugging nightmare - can you add a LOG trace/info in OpProcFactory 
(minimally in TableScanPPD - ideally everywhere.

In SemanticAnalyzer: the colPosMap is not maintained in genReduceSinkPlan : 
although the RR does not change, it might be a good idea to add a test for the same.
ppd after cluster by




ExprWalkerProcFactory:72
        if(exp == null) {
          ctx.setIsCandidate(colref, false);
          return false;
        }

I am assuming exp can be null only because colExprMap is not maintained in some cases
(for eg: group by exprs.)

Is that true ? 
If yes, Can you add a comment for the same ?
If no, can you explain that ?


83:        ctx.setIsCandidate(colref, true);
redundant


112: cant u break out of the loop if isCandidate is false


OpProcFactory:

128: the order of parents of children of tablescan can be lost, change parent at that position

247:       if(aliases.size() == 1 && aliases.contains("")) {
        // Reduce sink of group by operator
        aliases = null; 
      }


Instead of this, do you want to add a parameter to mergeWithChildrenPred() -- allAliasesOk

null and empty aliases are differentiated in mergeWi..() in a bizarre way, it might be easier
to understand with a seperate parameter



Some cleanup:

JoinOperator: posToAliasMap --> cant it me moved to ParseContext instead ?
same for colExprMap -> or it can be moved to OpParseContext ?
They are all parse time structures.

> Implement predicate push down for hive queries
> ----------------------------------------------
>
>                 Key: HIVE-279
>                 URL: https://issues.apache.org/jira/browse/HIVE-279
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Query Processor
>    Affects Versions: 0.2.0
>            Reporter: Prasad Chakka
>            Assignee: Prasad Chakka
>         Attachments: hive-279.2.patch, hive-279.3.patch, hive-279.4.patch, hive-279.patch
>
>
> Push predicates that are expressed in outer queries into inner queries where possible so that rows will get filtered out sooner.
> eg.
> select a.*, b.* from a join b on (a.uid = b.uid) where a.age = 20 and a.gender = 'm'
> current compiler generates the filter predicate in the reducer after the join so all the rows have to be passed from mapper to reducer. by pushing the filter predicate to the mapper, query performance should improve.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-279) Implement predicate push down for hive queries

Posted by "Ashish Thusoo (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HIVE-279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ashish Thusoo updated HIVE-279:
-------------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

committed. Thanks Prasad!!


> Implement predicate push down for hive queries
> ----------------------------------------------
>
>                 Key: HIVE-279
>                 URL: https://issues.apache.org/jira/browse/HIVE-279
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Query Processor
>    Affects Versions: 0.2.0
>            Reporter: Prasad Chakka
>            Assignee: Prasad Chakka
>             Fix For: 0.4.0
>
>         Attachments: hive-279.2.patch, hive-279.3.patch, hive-279.4.patch, hive-279.5.patch, hive-279.6.patch, hive-279.patch
>
>
> Push predicates that are expressed in outer queries into inner queries where possible so that rows will get filtered out sooner.
> eg.
> select a.*, b.* from a join b on (a.uid = b.uid) where a.age = 20 and a.gender = 'm'
> current compiler generates the filter predicate in the reducer after the join so all the rows have to be passed from mapper to reducer. by pushing the filter predicate to the mapper, query performance should improve.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-279) Implement predicate push down for hive queries

Posted by "Ashish Thusoo (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HIVE-279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12696071#action_12696071 ] 

Ashish Thusoo commented on HIVE-279:
------------------------------------

+1

Running tests now. Will commit once the tests pass.


> Implement predicate push down for hive queries
> ----------------------------------------------
>
>                 Key: HIVE-279
>                 URL: https://issues.apache.org/jira/browse/HIVE-279
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Query Processor
>    Affects Versions: 0.2.0
>            Reporter: Prasad Chakka
>            Assignee: Prasad Chakka
>             Fix For: 0.4.0
>
>         Attachments: hive-279.2.patch, hive-279.3.patch, hive-279.4.patch, hive-279.5.patch, hive-279.6.patch, hive-279.patch
>
>
> Push predicates that are expressed in outer queries into inner queries where possible so that rows will get filtered out sooner.
> eg.
> select a.*, b.* from a join b on (a.uid = b.uid) where a.age = 20 and a.gender = 'm'
> current compiler generates the filter predicate in the reducer after the join so all the rows have to be passed from mapper to reducer. by pushing the filter predicate to the mapper, query performance should improve.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-279) Implement predicate push down for hive queries

Posted by "Prasad Chakka (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HIVE-279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Prasad Chakka updated HIVE-279:
-------------------------------

    Attachment: hive-279.6.patch

added following tests
1) multi insert (which doesn't push predicates)
2) query containing left and right outer joins

This patch is intended to be committed.

> Implement predicate push down for hive queries
> ----------------------------------------------
>
>                 Key: HIVE-279
>                 URL: https://issues.apache.org/jira/browse/HIVE-279
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Query Processor
>    Affects Versions: 0.2.0
>            Reporter: Prasad Chakka
>            Assignee: Prasad Chakka
>             Fix For: 0.4.0
>
>         Attachments: hive-279.2.patch, hive-279.3.patch, hive-279.4.patch, hive-279.5.patch, hive-279.6.patch, hive-279.patch
>
>
> Push predicates that are expressed in outer queries into inner queries where possible so that rows will get filtered out sooner.
> eg.
> select a.*, b.* from a join b on (a.uid = b.uid) where a.age = 20 and a.gender = 'm'
> current compiler generates the filter predicate in the reducer after the join so all the rows have to be passed from mapper to reducer. by pushing the filter predicate to the mapper, query performance should improve.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-279) Implement predicate push down for hive queries

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HIVE-279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12677548#action_12677548 ] 

Namit Jain commented on HIVE-279:
---------------------------------

Some high level comments:

1. Add more comments everywhere, specifically in joinPPD (OpProcFactory)
2. Remove operator specific code in ExprWalkerProcFactory: ColumnExprProcessor: process
3. Use specific data-structures where-ever possible instead of using more generic data-structures.

ExprWalkerInfo:

  private Map<String, List<Node>> pushdownPreds;
  private Map<Node, ExprInfo> exprInfoMap;

In both of them, Node means exprNodeDesc, why dont we use that instead ?

Simlarly, in OpWalkerInfo:

  private Map<Node, ExprWalkerInfo> opToPrunedPredsMap;
  private Map<Operator<? extends Serializable>, OpParseContext> opToParseCtxMap;

use Operator instead of Node in opToPrunedPredsMap

4. Can you move OpWalker and ExprWalker in different directories ?
5. Why are filters only pushed on top of TableScan - cant it be done anywhere. - If you want to do so in a follow-up, can you file a JIRA for that ?
6. No apache header in many files (ppd directory)


SemanticAnalyzer.java:

A comment explaining the reason for existence of colInfoMap will help. Give an example: group by 
where the table column order is different from the grouped column order.

Same for posAliasMap, nameToInputColumnInfoMap for JOIN

genJoinOperatorChildren:


      if(aliases == null) {
        aliases = new HashSet<String>();
        posToAliasMap.put(pos, aliases);
      }

isn't the IF redundant ?




> Implement predicate push down for hive queries
> ----------------------------------------------
>
>                 Key: HIVE-279
>                 URL: https://issues.apache.org/jira/browse/HIVE-279
>             Project: Hadoop Hive
>          Issue Type: New Feature
>    Affects Versions: 0.2.0
>            Reporter: Prasad Chakka
>            Assignee: Prasad Chakka
>         Attachments: hive-279.2.patch, hive-279.patch
>
>
> Push predicates that are expressed in outer queries into inner queries where possible so that rows will get filtered out sooner.
> eg.
> select a.*, b.* from a join b on (a.uid = b.uid) where a.age = 20 and a.gender = 'm'
> current compiler generates the filter predicate in the reducer after the join so all the rows have to be passed from mapper to reducer. by pushing the filter predicate to the mapper, query performance should improve.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-279) Implement predicate push down for hive queries

Posted by "Prasad Chakka (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HIVE-279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Prasad Chakka updated HIVE-279:
-------------------------------

    Attachment: hive-279.4.patch

updated the patch to the latest trunk

> Implement predicate push down for hive queries
> ----------------------------------------------
>
>                 Key: HIVE-279
>                 URL: https://issues.apache.org/jira/browse/HIVE-279
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Query Processor
>    Affects Versions: 0.2.0
>            Reporter: Prasad Chakka
>            Assignee: Prasad Chakka
>         Attachments: hive-279.2.patch, hive-279.3.patch, hive-279.4.patch, hive-279.patch
>
>
> Push predicates that are expressed in outer queries into inner queries where possible so that rows will get filtered out sooner.
> eg.
> select a.*, b.* from a join b on (a.uid = b.uid) where a.age = 20 and a.gender = 'm'
> current compiler generates the filter predicate in the reducer after the join so all the rows have to be passed from mapper to reducer. by pushing the filter predicate to the mapper, query performance should improve.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-279) Implement predicate push down for hive queries

Posted by "Prasad Chakka (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HIVE-279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Prasad Chakka updated HIVE-279:
-------------------------------

    Fix Version/s: 0.4.0
     Hadoop Flags: [Reviewed]
           Status: Patch Available  (was: Open)

> Implement predicate push down for hive queries
> ----------------------------------------------
>
>                 Key: HIVE-279
>                 URL: https://issues.apache.org/jira/browse/HIVE-279
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Query Processor
>    Affects Versions: 0.2.0
>            Reporter: Prasad Chakka
>            Assignee: Prasad Chakka
>             Fix For: 0.4.0
>
>         Attachments: hive-279.2.patch, hive-279.3.patch, hive-279.4.patch, hive-279.5.patch, hive-279.6.patch, hive-279.patch
>
>
> Push predicates that are expressed in outer queries into inner queries where possible so that rows will get filtered out sooner.
> eg.
> select a.*, b.* from a join b on (a.uid = b.uid) where a.age = 20 and a.gender = 'm'
> current compiler generates the filter predicate in the reducer after the join so all the rows have to be passed from mapper to reducer. by pushing the filter predicate to the mapper, query performance should improve.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-279) Implement predicate push down for hive queries

Posted by "Raghotham Murthy (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HIVE-279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Raghotham Murthy updated HIVE-279:
----------------------------------

    Component/s: Query Processor

> Implement predicate push down for hive queries
> ----------------------------------------------
>
>                 Key: HIVE-279
>                 URL: https://issues.apache.org/jira/browse/HIVE-279
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Query Processor
>    Affects Versions: 0.2.0
>            Reporter: Prasad Chakka
>            Assignee: Prasad Chakka
>         Attachments: hive-279.2.patch, hive-279.3.patch, hive-279.patch
>
>
> Push predicates that are expressed in outer queries into inner queries where possible so that rows will get filtered out sooner.
> eg.
> select a.*, b.* from a join b on (a.uid = b.uid) where a.age = 20 and a.gender = 'm'
> current compiler generates the filter predicate in the reducer after the join so all the rows have to be passed from mapper to reducer. by pushing the filter predicate to the mapper, query performance should improve.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.