You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Namit Jain (JIRA)" <ji...@apache.org> on 2009/03/27 22:52:50 UTC

[jira] Commented: (HIVE-279) Implement predicate push down for hive queries

    [ https://issues.apache.org/jira/browse/HIVE-279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12690123#action_12690123 ] 

Namit Jain commented on HIVE-279:
---------------------------------

Add comments in tests:

For eg:  ppd_gby.q : the src1.c1 > 'val_200' is pushed up, but the other is not etc.


More tests needed:


groupby followed by groupby
groupby followed by join 
3-way join not being merged
3-way join being merged
outer join various scenarios
etc.


add a test for multi-table insert also, where ppd is not happening.



rand() being undeterministic - I think that change has already been merged by Raghu

PredicatePushdown.java: line 45: pusehed -> pushed

Optimizer.java:

columnPruner should be done after ppd, since it regenerates the operator tree.
Can you add a test for that ? I think ppd will not happen - need to confirm via a test

It might be a debugging nightmare - can you add a LOG trace/info in OpProcFactory 
(minimally in TableScanPPD - ideally everywhere.

In SemanticAnalyzer: the colPosMap is not maintained in genReduceSinkPlan : 
although the RR does not change, it might be a good idea to add a test for the same.
ppd after cluster by




ExprWalkerProcFactory:72
        if(exp == null) {
          ctx.setIsCandidate(colref, false);
          return false;
        }

I am assuming exp can be null only because colExprMap is not maintained in some cases
(for eg: group by exprs.)

Is that true ? 
If yes, Can you add a comment for the same ?
If no, can you explain that ?


83:        ctx.setIsCandidate(colref, true);
redundant


112: cant u break out of the loop if isCandidate is false


OpProcFactory:

128: the order of parents of children of tablescan can be lost, change parent at that position

247:       if(aliases.size() == 1 && aliases.contains("")) {
        // Reduce sink of group by operator
        aliases = null; 
      }


Instead of this, do you want to add a parameter to mergeWithChildrenPred() -- allAliasesOk

null and empty aliases are differentiated in mergeWi..() in a bizarre way, it might be easier
to understand with a seperate parameter



Some cleanup:

JoinOperator: posToAliasMap --> cant it me moved to ParseContext instead ?
same for colExprMap -> or it can be moved to OpParseContext ?
They are all parse time structures.

> Implement predicate push down for hive queries
> ----------------------------------------------
>
>                 Key: HIVE-279
>                 URL: https://issues.apache.org/jira/browse/HIVE-279
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Query Processor
>    Affects Versions: 0.2.0
>            Reporter: Prasad Chakka
>            Assignee: Prasad Chakka
>         Attachments: hive-279.2.patch, hive-279.3.patch, hive-279.4.patch, hive-279.patch
>
>
> Push predicates that are expressed in outer queries into inner queries where possible so that rows will get filtered out sooner.
> eg.
> select a.*, b.* from a join b on (a.uid = b.uid) where a.age = 20 and a.gender = 'm'
> current compiler generates the filter predicate in the reducer after the join so all the rows have to be passed from mapper to reducer. by pushing the filter predicate to the mapper, query performance should improve.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.