You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org> on 2012/02/10 18:39:01 UTC

[jira] [Commented] (HIVE-2206) add a new optimizer for query correlation discovery and optimization

    [ https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13205570#comment-13205570 ] 

jiraposter@reviews.apache.org commented on HIVE-2206:
-----------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2001/#review4912
-----------------------------------------------------------


I've started reviewing this, here's my comments so far.  I'll continue to look over it.


trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
<https://reviews.apache.org/r/2001/#comment11010>

    Does this have to default to false, does anything break if it's true?
    
    Similarly, have you tried running the tests with this set to true?



trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationCompositeOperator.java
<https://reviews.apache.org/r/2001/#comment10818>

    It's not clear to me why we need both setRowNumber and processOp.



trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationCompositeOperator.java
<https://reviews.apache.org/r/2001/#comment10817>

    Putting this code in a helper method would be better than having it both here and in setRowNumber.



trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationLocalSimulativeReduceSinkOperator.java
<https://reviews.apache.org/r/2001/#comment10819>

    Does this commented out code need to be kept?



trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java
<https://reviews.apache.org/r/2001/#comment10820>

    I couldn't find a CorrelationFakeReduceSinkOperator class.



trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java
<https://reviews.apache.org/r/2001/#comment10821>

    Tabs are bad, could you change them to spaces, at least in the new code your introducing.



trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/CorrelationOptimizer.java
<https://reviews.apache.org/r/2001/#comment10850>

    I take it from this line it's a requirement that in order for this correlation optimization to be attempted every reduce sink has to be followed only by children with a single child.
    
    Could this be relaxed?  Could the optimization simply not be applied if there is an operator between two ReduceSinks that has more than one child?
    
    Also, if there is a ReduceSink which is not followed by another ReduceSink, but is followed by an operator with more than one child, this prevents the optimization from being used, even though it shouldn't have an effect.
    
    Also, regarding checking if the size <=1, if the size <1 the next line will throw an exception.



trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/CorrelationOptimizer.java
<https://reviews.apache.org/r/2001/#comment10851>

    findNextChildReduceSinkOperator can return null, do you need to check for this?


- Kevin


On 2012-01-29 17:56:48, Yin Huai wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2001/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-01-29 17:56:48)
bq.  
bq.  
bq.  Review request for hive.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  This optimizer exploits intra-query correlations and merges multiple correlated MapReduce jobs into one jobs.
bq.  
bq.  
bq.  This addresses bug HIVE-2206.
bq.      https://issues.apache.org/jira/browse/HIVE-2206
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 1237326 
bq.    trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/BaseReduceSinkOperator.java PRE-CREATION 
bq.    trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationCompositeOperator.java PRE-CREATION 
bq.    trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationLocalSimulativeReduceSinkOperator.java PRE-CREATION 
bq.    trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationReducerDispatchOperator.java PRE-CREATION 
bq.    trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ExecReducer.java 1237326 
bq.    trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java 1237326 
bq.    trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 1237326 
bq.    trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java 1237326 
bq.    trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java 1237326 
bq.    trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/TableScanOperator.java 1237326 
bq.    trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/CorrelationOptimizer.java PRE-CREATION 
bq.    trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/CorrelationOptimizerUtils.java PRE-CREATION 
bq.    trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 1237326 
bq.    trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java 1237326 
bq.    trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/ParseContext.java 1237326 
bq.    trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 1237326 
bq.    trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/BaseReduceSinkDesc.java PRE-CREATION 
bq.    trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/CorrelationCompositeDesc.java PRE-CREATION 
bq.    trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/CorrelationLocalSimulativeReduceSinkDesc.java PRE-CREATION 
bq.    trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/CorrelationReducerDispatchDesc.java PRE-CREATION 
bq.    trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java 1237326 
bq.    trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/ReduceSinkDesc.java 1237326 
bq.    trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java 1237326 
bq.    trunk/ql/src/test/org/apache/hadoop/hive/ql/exec/TestExecDriver.java 1237326 
bq.    trunk/ql/src/test/results/compiler/plan/groupby1.q.xml 1237326 
bq.    trunk/ql/src/test/results/compiler/plan/groupby2.q.xml 1237326 
bq.    trunk/ql/src/test/results/compiler/plan/groupby3.q.xml 1237326 
bq.    trunk/ql/src/test/results/compiler/plan/groupby5.q.xml 1237326 
bq.  
bq.  Diff: https://reviews.apache.org/r/2001/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Yin
bq.  
bq.


                
> add a new optimizer for query correlation discovery and optimization
> --------------------------------------------------------------------
>
>                 Key: HIVE-2206
>                 URL: https://issues.apache.org/jira/browse/HIVE-2206
>             Project: Hive
>          Issue Type: New Feature
>            Reporter: He Yongqiang
>            Assignee: Yin Huai
>         Attachments: HIVE-2206.1.patch.txt, HIVE-2206.2.patch.txt, HIVE-2206.3.patch.txt, HIVE-2206.4.patch.txt, HIVE-2206.5-1.patch.txt, HIVE-2206.5.patch.txt, HIVE-2206.6.patch.txt, HIVE-2206.7.patch.txt, HIVE-2206.8-r1237253.patch.txt, HIVE-2206.8.r1224646.patch.txt, YSmartPatchForHive.patch, testQueries.2.q
>
>
> reference:
> http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira