You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by pengcheng xiong <px...@hortonworks.com> on 2014/10/31 05:26:18 UTC
Review Request 27415: CBO: Column names are missing from join
expression in Map join with CBO enabled
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27415/
-----------------------------------------------------------
Review request for hive and Ashutosh Chauhan.
Repository: hive-git
Description
-------
On Hive 14 when CBO is enabled the column names are missing from the join expression. Rather than to use external names "key", "value", some internal names such as "_col0" or "_col1" are used. For map join with more than two tables it is very hard to figure the actual join order. In this patch, I am going to address this issue not only for join but also for all the other operators. And it will also be addressed in not only CLI but also TEZ environment.
The basic idea to transform the internal name to external name is to
(1) use snapshotLogicalPlanForExplain() to make a snapshot of a logical plan after logical optimization
(2) for each operator in the explain task, call the prepareCBOExplain function for each operatorDesc
(2.1) Each operator uses ''helpGetStartOp'' to map to a logical operator (start point) in the LogicalPlan
(2.2) From start point, each operatorDesc uses ''findExternalName'' to track its external name
Diffs
-----
ql/src/java/org/apache/hadoop/hive/ql/exec/ExplainTask.java e238ff1
ql/src/java/org/apache/hadoop/hive/ql/optimizer/AbstractSMBJoinProc.java c9e8086
ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConvertJoinMapJoin.java bedc3ac
ql/src/java/org/apache/hadoop/hive/ql/optimizer/MapJoinProcessor.java 46dcfaf
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/CommonJoinTaskDispatcher.java 1a4fcbf
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/LocalMapJoinProcFactory.java 9076d48
ql/src/java/org/apache/hadoop/hive/ql/parse/ParseContext.java 8215c26
ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 9c944b6
ql/src/java/org/apache/hadoop/hive/ql/parse/TaskCompiler.java 23fbbe1
ql/src/java/org/apache/hadoop/hive/ql/plan/AbstractOperatorDesc.java 8410664
ql/src/java/org/apache/hadoop/hive/ql/plan/FileSinkDesc.java 8b25c2b
ql/src/java/org/apache/hadoop/hive/ql/plan/FilterDesc.java 5856743
ql/src/java/org/apache/hadoop/hive/ql/plan/GroupByDesc.java 7a0b0da
ql/src/java/org/apache/hadoop/hive/ql/plan/HashTableSinkDesc.java 1e0eb6b
ql/src/java/org/apache/hadoop/hive/ql/plan/JoinDesc.java 0e2c6ee
ql/src/java/org/apache/hadoop/hive/ql/plan/MapJoinDesc.java d43bd60
ql/src/java/org/apache/hadoop/hive/ql/plan/OperatorDesc.java c8c9570
ql/src/java/org/apache/hadoop/hive/ql/plan/ReduceSinkDesc.java 57beb69
ql/src/java/org/apache/hadoop/hive/ql/plan/SelectDesc.java fa6b548
ql/src/test/queries/clientpositive/explainColTest_1.q PRE-CREATION
ql/src/test/queries/clientpositive/explainColTest_2.q PRE-CREATION
ql/src/test/results/clientpositive/explainColTest_1.q.out PRE-CREATION
ql/src/test/results/clientpositive/explainColTest_2.q.out PRE-CREATION
ql/src/test/results/clientpositive/tez/explainColTest_1.q.out PRE-CREATION
ql/src/test/results/clientpositive/tez/explainColTest_2.q.out PRE-CREATION
Diff: https://reviews.apache.org/r/27415/diff/
Testing
-------
Thanks,
pengcheng xiong
Re: Review Request 27415: CBO: Column names are missing from join
expression in Map join with CBO enabled
Posted by pengcheng xiong <px...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27415/
-----------------------------------------------------------
(Updated Nov. 11, 2014, 10:34 p.m.)
Review request for hive, Ashutosh Chauhan and Vikram Dixit Kumaraswamy.
Repository: hive-git
Description
-------
On Hive 14 when CBO is enabled the column names are missing from the join expression. Rather than to use external names "key", "value", some internal names such as "_col0" or "_col1" are used. For map join with more than two tables it is very hard to figure the actual join order. In this patch, I am going to address this issue not only for join but also for all the other operators. And it will also be addressed in not only CLI but also TEZ environment.
The basic idea to transform the internal name to external name is to
(1) use snapshotLogicalPlanForExplain() to make a snapshot of a logical plan after logical optimization
(2) for each operator in the explain task, call the prepareCBOExplain function for each operatorDesc
(2.1) Each operator uses ''helpGetStartOp'' to map to a logical operator (start point) in the LogicalPlan
(2.2) From start point, each operatorDesc uses ''findExternalName'' to track its external name
And, an important assumption here is that, every physical operator (SEL, RS, JOIN (include hashtablesinkop, join, mapjoinop, mergejoinop, semijoin), FIL, GRY, FS, UNION) that is included in the explain can find its corresponding logical operator. If it is not found, its internal column names will be used.
Diffs (updated)
-----
common/src/java/org/apache/hadoop/hive/conf/HiveConf.java cca57d2
ql/src/java/org/apache/hadoop/hive/ql/exec/ExplainTask.java 8e1ba48
ql/src/java/org/apache/hadoop/hive/ql/optimizer/AbstractSMBJoinProc.java c9e8086
ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConvertJoinMapJoin.java 04bafda
ql/src/java/org/apache/hadoop/hive/ql/optimizer/MapJoinProcessor.java 2d82cd8
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/CommonJoinTaskDispatcher.java 3df1c26
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/LocalMapJoinProcFactory.java 9076d48
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 2c02bd4
ql/src/java/org/apache/hadoop/hive/ql/parse/ExplainColumnContext.java PRE-CREATION
ql/src/java/org/apache/hadoop/hive/ql/parse/ParseContext.java 8215c26
ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 88fd0fc
ql/src/java/org/apache/hadoop/hive/ql/parse/TaskCompiler.java 23fbbe1
ql/src/java/org/apache/hadoop/hive/ql/plan/AbstractOperatorDesc.java 8410664
ql/src/java/org/apache/hadoop/hive/ql/plan/FileSinkDesc.java 8b25c2b
ql/src/java/org/apache/hadoop/hive/ql/plan/FilterDesc.java 5856743
ql/src/java/org/apache/hadoop/hive/ql/plan/GroupByDesc.java 7a0b0da
ql/src/java/org/apache/hadoop/hive/ql/plan/HashTableSinkDesc.java 1e0eb6b
ql/src/java/org/apache/hadoop/hive/ql/plan/JoinDesc.java 0e2c6ee
ql/src/java/org/apache/hadoop/hive/ql/plan/MapJoinDesc.java d43bd60
ql/src/java/org/apache/hadoop/hive/ql/plan/OperatorDesc.java c8c9570
ql/src/java/org/apache/hadoop/hive/ql/plan/ReduceSinkDesc.java 57beb69
ql/src/java/org/apache/hadoop/hive/ql/plan/SelectDesc.java fa6b548
ql/src/test/queries/clientpositive/explainColTest_1.q PRE-CREATION
ql/src/test/queries/clientpositive/explainColTest_2.q PRE-CREATION
ql/src/test/results/clientpositive/explainColTest_1.q.out PRE-CREATION
ql/src/test/results/clientpositive/explainColTest_2.q.out PRE-CREATION
ql/src/test/results/clientpositive/tez/explainColTest_1.q.out PRE-CREATION
ql/src/test/results/clientpositive/tez/explainColTest_2.q.out PRE-CREATION
Diff: https://reviews.apache.org/r/27415/diff/
Testing
-------
Thanks,
pengcheng xiong
Re: Review Request 27415: CBO: Column names are missing from join
expression in Map join with CBO enabled
Posted by pengcheng xiong <px...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27415/
-----------------------------------------------------------
(Updated Oct. 31, 2014, 9:03 p.m.)
Review request for hive, Ashutosh Chauhan and Vikram Dixit Kumaraswamy.
Changes
-------
include vikram in the discussion
Repository: hive-git
Description
-------
On Hive 14 when CBO is enabled the column names are missing from the join expression. Rather than to use external names "key", "value", some internal names such as "_col0" or "_col1" are used. For map join with more than two tables it is very hard to figure the actual join order. In this patch, I am going to address this issue not only for join but also for all the other operators. And it will also be addressed in not only CLI but also TEZ environment.
The basic idea to transform the internal name to external name is to
(1) use snapshotLogicalPlanForExplain() to make a snapshot of a logical plan after logical optimization
(2) for each operator in the explain task, call the prepareCBOExplain function for each operatorDesc
(2.1) Each operator uses ''helpGetStartOp'' to map to a logical operator (start point) in the LogicalPlan
(2.2) From start point, each operatorDesc uses ''findExternalName'' to track its external name
And, an important assumption here is that, every physical operator (SEL, RS, JOIN (include hashtablesinkop, join, mapjoinop, mergejoinop, semijoin), FIL, GRY, FS, UNION) that is included in the explain can find its corresponding logical operator. If it is not found, its internal column names will be used.
Diffs
-----
ql/src/java/org/apache/hadoop/hive/ql/exec/ExplainTask.java e238ff1
ql/src/java/org/apache/hadoop/hive/ql/optimizer/AbstractSMBJoinProc.java c9e8086
ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConvertJoinMapJoin.java bedc3ac
ql/src/java/org/apache/hadoop/hive/ql/optimizer/MapJoinProcessor.java 46dcfaf
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/CommonJoinTaskDispatcher.java 1a4fcbf
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/LocalMapJoinProcFactory.java 9076d48
ql/src/java/org/apache/hadoop/hive/ql/parse/ParseContext.java 8215c26
ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 9c944b6
ql/src/java/org/apache/hadoop/hive/ql/parse/TaskCompiler.java 23fbbe1
ql/src/java/org/apache/hadoop/hive/ql/plan/AbstractOperatorDesc.java 8410664
ql/src/java/org/apache/hadoop/hive/ql/plan/FileSinkDesc.java 8b25c2b
ql/src/java/org/apache/hadoop/hive/ql/plan/FilterDesc.java 5856743
ql/src/java/org/apache/hadoop/hive/ql/plan/GroupByDesc.java 7a0b0da
ql/src/java/org/apache/hadoop/hive/ql/plan/HashTableSinkDesc.java 1e0eb6b
ql/src/java/org/apache/hadoop/hive/ql/plan/JoinDesc.java 0e2c6ee
ql/src/java/org/apache/hadoop/hive/ql/plan/MapJoinDesc.java d43bd60
ql/src/java/org/apache/hadoop/hive/ql/plan/OperatorDesc.java c8c9570
ql/src/java/org/apache/hadoop/hive/ql/plan/ReduceSinkDesc.java 57beb69
ql/src/java/org/apache/hadoop/hive/ql/plan/SelectDesc.java fa6b548
ql/src/test/queries/clientpositive/explainColTest_1.q PRE-CREATION
ql/src/test/queries/clientpositive/explainColTest_2.q PRE-CREATION
ql/src/test/results/clientpositive/explainColTest_1.q.out PRE-CREATION
ql/src/test/results/clientpositive/explainColTest_2.q.out PRE-CREATION
ql/src/test/results/clientpositive/tez/explainColTest_1.q.out PRE-CREATION
ql/src/test/results/clientpositive/tez/explainColTest_2.q.out PRE-CREATION
Diff: https://reviews.apache.org/r/27415/diff/
Testing
-------
Thanks,
pengcheng xiong
Re: Review Request 27415: CBO: Column names are missing from join
expression in Map join with CBO enabled
Posted by pengcheng xiong <px...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27415/
-----------------------------------------------------------
(Updated Oct. 31, 2014, 9:02 p.m.)
Review request for hive, Ashutosh Chauhan and Vikram Dixit Kumaraswamy.
Repository: hive-git
Description
-------
On Hive 14 when CBO is enabled the column names are missing from the join expression. Rather than to use external names "key", "value", some internal names such as "_col0" or "_col1" are used. For map join with more than two tables it is very hard to figure the actual join order. In this patch, I am going to address this issue not only for join but also for all the other operators. And it will also be addressed in not only CLI but also TEZ environment.
The basic idea to transform the internal name to external name is to
(1) use snapshotLogicalPlanForExplain() to make a snapshot of a logical plan after logical optimization
(2) for each operator in the explain task, call the prepareCBOExplain function for each operatorDesc
(2.1) Each operator uses ''helpGetStartOp'' to map to a logical operator (start point) in the LogicalPlan
(2.2) From start point, each operatorDesc uses ''findExternalName'' to track its external name
And, an important assumption here is that, every physical operator (SEL, RS, JOIN (include hashtablesinkop, join, mapjoinop, mergejoinop, semijoin), FIL, GRY, FS, UNION) that is included in the explain can find its corresponding logical operator. If it is not found, its internal column names will be used.
Diffs
-----
ql/src/java/org/apache/hadoop/hive/ql/exec/ExplainTask.java e238ff1
ql/src/java/org/apache/hadoop/hive/ql/optimizer/AbstractSMBJoinProc.java c9e8086
ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConvertJoinMapJoin.java bedc3ac
ql/src/java/org/apache/hadoop/hive/ql/optimizer/MapJoinProcessor.java 46dcfaf
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/CommonJoinTaskDispatcher.java 1a4fcbf
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/LocalMapJoinProcFactory.java 9076d48
ql/src/java/org/apache/hadoop/hive/ql/parse/ParseContext.java 8215c26
ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 9c944b6
ql/src/java/org/apache/hadoop/hive/ql/parse/TaskCompiler.java 23fbbe1
ql/src/java/org/apache/hadoop/hive/ql/plan/AbstractOperatorDesc.java 8410664
ql/src/java/org/apache/hadoop/hive/ql/plan/FileSinkDesc.java 8b25c2b
ql/src/java/org/apache/hadoop/hive/ql/plan/FilterDesc.java 5856743
ql/src/java/org/apache/hadoop/hive/ql/plan/GroupByDesc.java 7a0b0da
ql/src/java/org/apache/hadoop/hive/ql/plan/HashTableSinkDesc.java 1e0eb6b
ql/src/java/org/apache/hadoop/hive/ql/plan/JoinDesc.java 0e2c6ee
ql/src/java/org/apache/hadoop/hive/ql/plan/MapJoinDesc.java d43bd60
ql/src/java/org/apache/hadoop/hive/ql/plan/OperatorDesc.java c8c9570
ql/src/java/org/apache/hadoop/hive/ql/plan/ReduceSinkDesc.java 57beb69
ql/src/java/org/apache/hadoop/hive/ql/plan/SelectDesc.java fa6b548
ql/src/test/queries/clientpositive/explainColTest_1.q PRE-CREATION
ql/src/test/queries/clientpositive/explainColTest_2.q PRE-CREATION
ql/src/test/results/clientpositive/explainColTest_1.q.out PRE-CREATION
ql/src/test/results/clientpositive/explainColTest_2.q.out PRE-CREATION
ql/src/test/results/clientpositive/tez/explainColTest_1.q.out PRE-CREATION
ql/src/test/results/clientpositive/tez/explainColTest_2.q.out PRE-CREATION
Diff: https://reviews.apache.org/r/27415/diff/
Testing
-------
Thanks,
pengcheng xiong
Re: Review Request 27415: CBO: Column names are missing from join
expression in Map join with CBO enabled
Posted by pengcheng xiong <px...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27415/
-----------------------------------------------------------
(Updated Oct. 31, 2014, 9:02 p.m.)
Review request for hive and Ashutosh Chauhan.
Repository: hive-git
Description (updated)
-------
On Hive 14 when CBO is enabled the column names are missing from the join expression. Rather than to use external names "key", "value", some internal names such as "_col0" or "_col1" are used. For map join with more than two tables it is very hard to figure the actual join order. In this patch, I am going to address this issue not only for join but also for all the other operators. And it will also be addressed in not only CLI but also TEZ environment.
The basic idea to transform the internal name to external name is to
(1) use snapshotLogicalPlanForExplain() to make a snapshot of a logical plan after logical optimization
(2) for each operator in the explain task, call the prepareCBOExplain function for each operatorDesc
(2.1) Each operator uses ''helpGetStartOp'' to map to a logical operator (start point) in the LogicalPlan
(2.2) From start point, each operatorDesc uses ''findExternalName'' to track its external name
And, an important assumption here is that, every physical operator (SEL, RS, JOIN (include hashtablesinkop, join, mapjoinop, mergejoinop, semijoin), FIL, GRY, FS, UNION) that is included in the explain can find its corresponding logical operator. If it is not found, its internal column names will be used.
Diffs
-----
ql/src/java/org/apache/hadoop/hive/ql/exec/ExplainTask.java e238ff1
ql/src/java/org/apache/hadoop/hive/ql/optimizer/AbstractSMBJoinProc.java c9e8086
ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConvertJoinMapJoin.java bedc3ac
ql/src/java/org/apache/hadoop/hive/ql/optimizer/MapJoinProcessor.java 46dcfaf
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/CommonJoinTaskDispatcher.java 1a4fcbf
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/LocalMapJoinProcFactory.java 9076d48
ql/src/java/org/apache/hadoop/hive/ql/parse/ParseContext.java 8215c26
ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 9c944b6
ql/src/java/org/apache/hadoop/hive/ql/parse/TaskCompiler.java 23fbbe1
ql/src/java/org/apache/hadoop/hive/ql/plan/AbstractOperatorDesc.java 8410664
ql/src/java/org/apache/hadoop/hive/ql/plan/FileSinkDesc.java 8b25c2b
ql/src/java/org/apache/hadoop/hive/ql/plan/FilterDesc.java 5856743
ql/src/java/org/apache/hadoop/hive/ql/plan/GroupByDesc.java 7a0b0da
ql/src/java/org/apache/hadoop/hive/ql/plan/HashTableSinkDesc.java 1e0eb6b
ql/src/java/org/apache/hadoop/hive/ql/plan/JoinDesc.java 0e2c6ee
ql/src/java/org/apache/hadoop/hive/ql/plan/MapJoinDesc.java d43bd60
ql/src/java/org/apache/hadoop/hive/ql/plan/OperatorDesc.java c8c9570
ql/src/java/org/apache/hadoop/hive/ql/plan/ReduceSinkDesc.java 57beb69
ql/src/java/org/apache/hadoop/hive/ql/plan/SelectDesc.java fa6b548
ql/src/test/queries/clientpositive/explainColTest_1.q PRE-CREATION
ql/src/test/queries/clientpositive/explainColTest_2.q PRE-CREATION
ql/src/test/results/clientpositive/explainColTest_1.q.out PRE-CREATION
ql/src/test/results/clientpositive/explainColTest_2.q.out PRE-CREATION
ql/src/test/results/clientpositive/tez/explainColTest_1.q.out PRE-CREATION
ql/src/test/results/clientpositive/tez/explainColTest_2.q.out PRE-CREATION
Diff: https://reviews.apache.org/r/27415/diff/
Testing
-------
Thanks,
pengcheng xiong