You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Charles Chen (JIRA)" <ji...@apache.org> on 2011/08/03 03:16:27 UTC
[jira] [Created] (HIVE-2337) Predicate pushdown erroneously
conservative with outer joins
Predicate pushdown erroneously conservative with outer joins
------------------------------------------------------------
Key: HIVE-2337
URL: https://issues.apache.org/jira/browse/HIVE-2337
Project: Hive
Issue Type: Bug
Components: Query Processor
Reporter: Charles Chen
The predicate pushdown filter is not applying left associativity of joins correctly in determining possible aliases for pushing predicates.
In hive.ql.ppd.OpProcFactory.JoinPPD.getQualifiedAliases, the criteria for pushing aliases is specified as:
{noformat}
/**
* Figures out the aliases for whom it is safe to push predicates based on
* ANSI SQL semantics For inner join, all predicates for all aliases can be
* pushed For full outer join, none of the predicates can be pushed as that
* would limit the number of rows for join For left outer join, all the
* predicates on the left side aliases can be pushed up For right outer
* join, all the predicates on the right side aliases can be pushed up Joins
* chain containing both left and right outer joins are treated as full
* outer join. [...]
*
* @param op
* Join Operator
* @param rr
* Row resolver
* @return set of qualified aliases
*/
{noformat}
Since hive joins are left associative, something like "a RIGHT OUTER JOIN b LEFT OUTER JOIN c INNER JOIN d" should be interpreted as "((a RIGHT OUTER JOIN b) LEFT OUTER JOIN c) INNER JOIN d", so there would be cases where joins with both left and right outer joins can have aliases that can be pushed. Here, aliases a and d are eligible to be pushed up while the current criteria provide that none are eligible.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2337) Predicate pushdown erroneously
conservative with outer joins
Posted by "Charles Chen (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-2337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Charles Chen updated HIVE-2337:
-------------------------------
Attachment: HIVE-2337v3.patch
> Predicate pushdown erroneously conservative with outer joins
> ------------------------------------------------------------
>
> Key: HIVE-2337
> URL: https://issues.apache.org/jira/browse/HIVE-2337
> Project: Hive
> Issue Type: Bug
> Components: Query Processor
> Reporter: Charles Chen
> Assignee: Charles Chen
> Attachments: HIVE-2337v1.patch, HIVE-2337v2.patch, HIVE-2337v3.patch
>
>
> The predicate pushdown filter is not applying left associativity of joins correctly in determining possible aliases for pushing predicates.
> In hive.ql.ppd.OpProcFactory.JoinPPD.getQualifiedAliases, the criteria for pushing aliases is specified as:
> {noformat}
> /**
> * Figures out the aliases for whom it is safe to push predicates based on
> * ANSI SQL semantics For inner join, all predicates for all aliases can be
> * pushed For full outer join, none of the predicates can be pushed as that
> * would limit the number of rows for join For left outer join, all the
> * predicates on the left side aliases can be pushed up For right outer
> * join, all the predicates on the right side aliases can be pushed up Joins
> * chain containing both left and right outer joins are treated as full
> * outer join. [...]
> *
> * @param op
> * Join Operator
> * @param rr
> * Row resolver
> * @return set of qualified aliases
> */
> {noformat}
> Since hive joins are left associative, something like "a RIGHT OUTER JOIN b LEFT OUTER JOIN c INNER JOIN d" should be interpreted as "((a RIGHT OUTER JOIN b) LEFT OUTER JOIN c) INNER JOIN d", so there would be cases where joins with both left and right outer joins can have aliases that can be pushed. Here, aliases b and d are eligible to be pushed up while the current criteria provide that none are eligible.
> Using:
> {noformat}
> create table t1 (id int, key string, value string);
> create table t2 (id int, key string, value string);
> create table t3 (id int, key string, value string);
> create table t4 (id int, key string, value string);
> {noformat}
> For example, the query
> {noformat}
> explain select * from t1 full outer join t2 on t1.id=t2.id join t3 on t2.id=t3.id where t3.id=20;
> {noformat}
> currently gives
> {noformat}
> STAGE DEPENDENCIES:
> Stage-1 is a root stage
> Stage-0 is a root stage
> STAGE PLANS:
> Stage: Stage-1
> Map Reduce
> Alias -> Map Operator Tree:
> t1
> TableScan
> alias: t1
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 0
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> t2
> TableScan
> alias: t2
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 1
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> t3
> TableScan
> alias: t3
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 2
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> Reduce Operator Tree:
> Join Operator
> condition map:
> Outer Join 0 to 1
> Inner Join 1 to 2
> condition expressions:
> 0 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> 1 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> 2 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> handleSkewJoin: false
> outputColumnNames: _col0, _col1, _col2, _col5, _col6, _col7, _col10, _col11, _col12
> Filter Operator
> predicate:
> expr: (_col10 = 20)
> type: boolean
> Select Operator
> expressions:
> expr: _col0
> type: int
> expr: _col1
> type: string
> expr: _col2
> type: string
> expr: _col5
> type: int
> expr: _col6
> type: string
> expr: _col7
> type: string
> expr: _col10
> type: int
> expr: _col11
> type: string
> expr: _col12
> type: string
> outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8
> File Output Operator
> compressed: false
> GlobalTableId: 0
> table:
> input format: org.apache.hadoop.mapred.TextInputFormat
> output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> Stage: Stage-0
> Fetch Operator
> limit: -1
> {noformat}
> while the correct behavior is to push the filter "t3.id=20" down:
> {noformat}
> STAGE DEPENDENCIES:
> Stage-1 is a root stage
> Stage-0 is a root stage
> STAGE PLANS:
> Stage: Stage-1
> Map Reduce
> Alias -> Map Operator Tree:
> t1
> TableScan
> alias: t1
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 0
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> t2
> TableScan
> alias: t2
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 1
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> t3
> TableScan
> alias: t3
> Filter Operator
> predicate:
> expr: (id = 20)
> type: boolean
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 2
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> Reduce Operator Tree:
> Join Operator
> condition map:
> Outer Join 0 to 1
> Inner Join 1 to 2
> condition expressions:
> 0 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> 1 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> 2 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> handleSkewJoin: false
> outputColumnNames: _col0, _col1, _col2, _col5, _col6, _col7, _col10, _col11, _col12
> Select Operator
> expressions:
> expr: _col0
> type: int
> expr: _col1
> type: string
> expr: _col2
> type: string
> expr: _col5
> type: int
> expr: _col6
> type: string
> expr: _col7
> type: string
> expr: _col10
> type: int
> expr: _col11
> type: string
> expr: _col12
> type: string
> outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8
> File Output Operator
> compressed: false
> GlobalTableId: 0
> table:
> input format: org.apache.hadoop.mapred.TextInputFormat
> output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> Stage: Stage-0
> Fetch Operator
> limit: -1
> {noformat}
> The current behavior is actually stranger than this: for a left outer join (similarly for a right outer join), hive finds the leftmost alias referred to in the *predicates* of left outer joins and rejects any alias to the right of it for pushdown. So in this query the filter "t2.id=20" pushed down:
> {noformat}
> explain select * from t1 join t2 on (t1.id=t2.id) left outer join t3 on (t2.id=t3.id) where t2.id=20;
> {noformat}
> while it isn't here:
> {noformat}
> explain select * from t1 join t2 on (t1.id=t2.id) left outer join t3 on (t1.id=t3.id) where t2.id=20;
> {noformat}
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2337) Predicate pushdown erroneously
conservative with outer joins
Posted by "jiraposter@reviews.apache.org (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-2337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13079023#comment-13079023 ]
jiraposter@reviews.apache.org commented on HIVE-2337:
-----------------------------------------------------
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1275/
-----------------------------------------------------------
(Updated 2011-08-03 21:03:23.393902)
Review request for hive.
Summary (updated)
-------
https://issues.apache.org/jira/browse/HIVE-2337
This addresses bug HIVE-2337.
https://issues.apache.org/jira/browse/HIVE-2337
Diffs
-----
http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java 1153598
http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/ppd_outer_join5.q.out PRE-CREATION
Diff: https://reviews.apache.org/r/1275/diff
Testing
-------
Thanks,
Charles
> Predicate pushdown erroneously conservative with outer joins
> ------------------------------------------------------------
>
> Key: HIVE-2337
> URL: https://issues.apache.org/jira/browse/HIVE-2337
> Project: Hive
> Issue Type: Bug
> Components: Query Processor
> Reporter: Charles Chen
> Assignee: Charles Chen
> Attachments: HIVE-2337v1.patch, HIVE-2337v2.patch
>
>
> The predicate pushdown filter is not applying left associativity of joins correctly in determining possible aliases for pushing predicates.
> In hive.ql.ppd.OpProcFactory.JoinPPD.getQualifiedAliases, the criteria for pushing aliases is specified as:
> {noformat}
> /**
> * Figures out the aliases for whom it is safe to push predicates based on
> * ANSI SQL semantics For inner join, all predicates for all aliases can be
> * pushed For full outer join, none of the predicates can be pushed as that
> * would limit the number of rows for join For left outer join, all the
> * predicates on the left side aliases can be pushed up For right outer
> * join, all the predicates on the right side aliases can be pushed up Joins
> * chain containing both left and right outer joins are treated as full
> * outer join. [...]
> *
> * @param op
> * Join Operator
> * @param rr
> * Row resolver
> * @return set of qualified aliases
> */
> {noformat}
> Since hive joins are left associative, something like "a RIGHT OUTER JOIN b LEFT OUTER JOIN c INNER JOIN d" should be interpreted as "((a RIGHT OUTER JOIN b) LEFT OUTER JOIN c) INNER JOIN d", so there would be cases where joins with both left and right outer joins can have aliases that can be pushed. Here, aliases b and d are eligible to be pushed up while the current criteria provide that none are eligible.
> Using:
> {noformat}
> create table t1 (id int, key string, value string);
> create table t2 (id int, key string, value string);
> create table t3 (id int, key string, value string);
> create table t4 (id int, key string, value string);
> {noformat}
> For example, the query
> {noformat}
> explain select * from t1 full outer join t2 on t1.id=t2.id join t3 on t2.id=t3.id where t3.id=20;
> {noformat}
> currently gives
> {noformat}
> STAGE DEPENDENCIES:
> Stage-1 is a root stage
> Stage-0 is a root stage
> STAGE PLANS:
> Stage: Stage-1
> Map Reduce
> Alias -> Map Operator Tree:
> t1
> TableScan
> alias: t1
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 0
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> t2
> TableScan
> alias: t2
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 1
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> t3
> TableScan
> alias: t3
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 2
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> Reduce Operator Tree:
> Join Operator
> condition map:
> Outer Join 0 to 1
> Inner Join 1 to 2
> condition expressions:
> 0 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> 1 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> 2 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> handleSkewJoin: false
> outputColumnNames: _col0, _col1, _col2, _col5, _col6, _col7, _col10, _col11, _col12
> Filter Operator
> predicate:
> expr: (_col10 = 20)
> type: boolean
> Select Operator
> expressions:
> expr: _col0
> type: int
> expr: _col1
> type: string
> expr: _col2
> type: string
> expr: _col5
> type: int
> expr: _col6
> type: string
> expr: _col7
> type: string
> expr: _col10
> type: int
> expr: _col11
> type: string
> expr: _col12
> type: string
> outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8
> File Output Operator
> compressed: false
> GlobalTableId: 0
> table:
> input format: org.apache.hadoop.mapred.TextInputFormat
> output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> Stage: Stage-0
> Fetch Operator
> limit: -1
> {noformat}
> while the correct behavior is to push the filter "t3.id=20" down:
> {noformat}
> STAGE DEPENDENCIES:
> Stage-1 is a root stage
> Stage-0 is a root stage
> STAGE PLANS:
> Stage: Stage-1
> Map Reduce
> Alias -> Map Operator Tree:
> t1
> TableScan
> alias: t1
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 0
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> t2
> TableScan
> alias: t2
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 1
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> t3
> TableScan
> alias: t3
> Filter Operator
> predicate:
> expr: (id = 20)
> type: boolean
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 2
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> Reduce Operator Tree:
> Join Operator
> condition map:
> Outer Join 0 to 1
> Inner Join 1 to 2
> condition expressions:
> 0 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> 1 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> 2 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> handleSkewJoin: false
> outputColumnNames: _col0, _col1, _col2, _col5, _col6, _col7, _col10, _col11, _col12
> Select Operator
> expressions:
> expr: _col0
> type: int
> expr: _col1
> type: string
> expr: _col2
> type: string
> expr: _col5
> type: int
> expr: _col6
> type: string
> expr: _col7
> type: string
> expr: _col10
> type: int
> expr: _col11
> type: string
> expr: _col12
> type: string
> outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8
> File Output Operator
> compressed: false
> GlobalTableId: 0
> table:
> input format: org.apache.hadoop.mapred.TextInputFormat
> output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> Stage: Stage-0
> Fetch Operator
> limit: -1
> {noformat}
> The current behavior is actually stranger than this: for a left outer join (similarly for a right outer join), hive finds the leftmost alias referred to in the *predicates* of left outer joins and rejects any alias to the right of it for pushdown. So in this query the filter "t2.id=20" pushed down:
> {noformat}
> explain select * from t1 join t2 on (t1.id=t2.id) left outer join t3 on (t2.id=t3.id) where t2.id=20;
> {noformat}
> while it isn't here:
> {noformat}
> explain select * from t1 join t2 on (t1.id=t2.id) left outer join t3 on (t1.id=t3.id) where t2.id=20;
> {noformat}
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2337) Predicate pushdown erroneously
conservative with outer joins
Posted by "Charles Chen (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-2337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Charles Chen updated HIVE-2337:
-------------------------------
Attachment: HIVE-2337v6.patch
> Predicate pushdown erroneously conservative with outer joins
> ------------------------------------------------------------
>
> Key: HIVE-2337
> URL: https://issues.apache.org/jira/browse/HIVE-2337
> Project: Hive
> Issue Type: Bug
> Components: Query Processor
> Reporter: Charles Chen
> Assignee: Charles Chen
> Fix For: 0.9.0
>
> Attachments: HIVE-2337v1.patch, HIVE-2337v2.patch, HIVE-2337v3.patch, HIVE-2337v4.patch, HIVE-2337v5.patch, HIVE-2337v6.patch
>
>
> The predicate pushdown filter is not applying left associativity of joins correctly in determining possible aliases for pushing predicates.
> In hive.ql.ppd.OpProcFactory.JoinPPD.getQualifiedAliases, the criteria for pushing aliases is specified as:
> {noformat}
> /**
> * Figures out the aliases for whom it is safe to push predicates based on
> * ANSI SQL semantics For inner join, all predicates for all aliases can be
> * pushed For full outer join, none of the predicates can be pushed as that
> * would limit the number of rows for join For left outer join, all the
> * predicates on the left side aliases can be pushed up For right outer
> * join, all the predicates on the right side aliases can be pushed up Joins
> * chain containing both left and right outer joins are treated as full
> * outer join. [...]
> *
> * @param op
> * Join Operator
> * @param rr
> * Row resolver
> * @return set of qualified aliases
> */
> {noformat}
> Since hive joins are left associative, something like "a RIGHT OUTER JOIN b LEFT OUTER JOIN c INNER JOIN d" should be interpreted as "((a RIGHT OUTER JOIN b) LEFT OUTER JOIN c) INNER JOIN d", so there would be cases where joins with both left and right outer joins can have aliases that can be pushed. Here, aliases b and d are eligible to be pushed up while the current criteria provide that none are eligible.
> Using:
> {noformat}
> create table t1 (id int, key string, value string);
> create table t2 (id int, key string, value string);
> create table t3 (id int, key string, value string);
> create table t4 (id int, key string, value string);
> {noformat}
> For example, the query
> {noformat}
> explain select * from t1 full outer join t2 on t1.id=t2.id join t3 on t2.id=t3.id where t3.id=20;
> {noformat}
> currently gives
> {noformat}
> STAGE DEPENDENCIES:
> Stage-1 is a root stage
> Stage-0 is a root stage
> STAGE PLANS:
> Stage: Stage-1
> Map Reduce
> Alias -> Map Operator Tree:
> t1
> TableScan
> alias: t1
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 0
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> t2
> TableScan
> alias: t2
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 1
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> t3
> TableScan
> alias: t3
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 2
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> Reduce Operator Tree:
> Join Operator
> condition map:
> Outer Join 0 to 1
> Inner Join 1 to 2
> condition expressions:
> 0 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> 1 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> 2 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> handleSkewJoin: false
> outputColumnNames: _col0, _col1, _col2, _col5, _col6, _col7, _col10, _col11, _col12
> Filter Operator
> predicate:
> expr: (_col10 = 20)
> type: boolean
> Select Operator
> expressions:
> expr: _col0
> type: int
> expr: _col1
> type: string
> expr: _col2
> type: string
> expr: _col5
> type: int
> expr: _col6
> type: string
> expr: _col7
> type: string
> expr: _col10
> type: int
> expr: _col11
> type: string
> expr: _col12
> type: string
> outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8
> File Output Operator
> compressed: false
> GlobalTableId: 0
> table:
> input format: org.apache.hadoop.mapred.TextInputFormat
> output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> Stage: Stage-0
> Fetch Operator
> limit: -1
> {noformat}
> while the correct behavior is to push the filter "t3.id=20" down:
> {noformat}
> STAGE DEPENDENCIES:
> Stage-1 is a root stage
> Stage-0 is a root stage
> STAGE PLANS:
> Stage: Stage-1
> Map Reduce
> Alias -> Map Operator Tree:
> t1
> TableScan
> alias: t1
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 0
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> t2
> TableScan
> alias: t2
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 1
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> t3
> TableScan
> alias: t3
> Filter Operator
> predicate:
> expr: (id = 20)
> type: boolean
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 2
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> Reduce Operator Tree:
> Join Operator
> condition map:
> Outer Join 0 to 1
> Inner Join 1 to 2
> condition expressions:
> 0 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> 1 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> 2 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> handleSkewJoin: false
> outputColumnNames: _col0, _col1, _col2, _col5, _col6, _col7, _col10, _col11, _col12
> Select Operator
> expressions:
> expr: _col0
> type: int
> expr: _col1
> type: string
> expr: _col2
> type: string
> expr: _col5
> type: int
> expr: _col6
> type: string
> expr: _col7
> type: string
> expr: _col10
> type: int
> expr: _col11
> type: string
> expr: _col12
> type: string
> outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8
> File Output Operator
> compressed: false
> GlobalTableId: 0
> table:
> input format: org.apache.hadoop.mapred.TextInputFormat
> output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> Stage: Stage-0
> Fetch Operator
> limit: -1
> {noformat}
> The current behavior is actually stranger than this: for a left outer join (similarly for a right outer join), hive finds the leftmost alias referred to in the *predicates* of left outer joins and rejects any alias to the right of it for pushdown. So in this query the filter "t2.id=20" pushed down:
> {noformat}
> explain select * from t1 join t2 on (t1.id=t2.id) left outer join t3 on (t2.id=t3.id) where t2.id=20;
> {noformat}
> while it isn't here:
> {noformat}
> explain select * from t1 join t2 on (t1.id=t2.id) left outer join t3 on (t1.id=t3.id) where t2.id=20;
> {noformat}
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2337) Predicate pushdown erroneously
conservative with outer joins
Posted by "Charles Chen (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-2337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Charles Chen updated HIVE-2337:
-------------------------------
Fix Version/s: 0.9.0
Status: Patch Available (was: Open)
> Predicate pushdown erroneously conservative with outer joins
> ------------------------------------------------------------
>
> Key: HIVE-2337
> URL: https://issues.apache.org/jira/browse/HIVE-2337
> Project: Hive
> Issue Type: Bug
> Components: Query Processor
> Reporter: Charles Chen
> Assignee: Charles Chen
> Fix For: 0.9.0
>
> Attachments: HIVE-2337v1.patch, HIVE-2337v2.patch, HIVE-2337v3.patch, HIVE-2337v4.patch, HIVE-2337v5.patch
>
>
> The predicate pushdown filter is not applying left associativity of joins correctly in determining possible aliases for pushing predicates.
> In hive.ql.ppd.OpProcFactory.JoinPPD.getQualifiedAliases, the criteria for pushing aliases is specified as:
> {noformat}
> /**
> * Figures out the aliases for whom it is safe to push predicates based on
> * ANSI SQL semantics For inner join, all predicates for all aliases can be
> * pushed For full outer join, none of the predicates can be pushed as that
> * would limit the number of rows for join For left outer join, all the
> * predicates on the left side aliases can be pushed up For right outer
> * join, all the predicates on the right side aliases can be pushed up Joins
> * chain containing both left and right outer joins are treated as full
> * outer join. [...]
> *
> * @param op
> * Join Operator
> * @param rr
> * Row resolver
> * @return set of qualified aliases
> */
> {noformat}
> Since hive joins are left associative, something like "a RIGHT OUTER JOIN b LEFT OUTER JOIN c INNER JOIN d" should be interpreted as "((a RIGHT OUTER JOIN b) LEFT OUTER JOIN c) INNER JOIN d", so there would be cases where joins with both left and right outer joins can have aliases that can be pushed. Here, aliases b and d are eligible to be pushed up while the current criteria provide that none are eligible.
> Using:
> {noformat}
> create table t1 (id int, key string, value string);
> create table t2 (id int, key string, value string);
> create table t3 (id int, key string, value string);
> create table t4 (id int, key string, value string);
> {noformat}
> For example, the query
> {noformat}
> explain select * from t1 full outer join t2 on t1.id=t2.id join t3 on t2.id=t3.id where t3.id=20;
> {noformat}
> currently gives
> {noformat}
> STAGE DEPENDENCIES:
> Stage-1 is a root stage
> Stage-0 is a root stage
> STAGE PLANS:
> Stage: Stage-1
> Map Reduce
> Alias -> Map Operator Tree:
> t1
> TableScan
> alias: t1
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 0
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> t2
> TableScan
> alias: t2
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 1
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> t3
> TableScan
> alias: t3
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 2
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> Reduce Operator Tree:
> Join Operator
> condition map:
> Outer Join 0 to 1
> Inner Join 1 to 2
> condition expressions:
> 0 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> 1 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> 2 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> handleSkewJoin: false
> outputColumnNames: _col0, _col1, _col2, _col5, _col6, _col7, _col10, _col11, _col12
> Filter Operator
> predicate:
> expr: (_col10 = 20)
> type: boolean
> Select Operator
> expressions:
> expr: _col0
> type: int
> expr: _col1
> type: string
> expr: _col2
> type: string
> expr: _col5
> type: int
> expr: _col6
> type: string
> expr: _col7
> type: string
> expr: _col10
> type: int
> expr: _col11
> type: string
> expr: _col12
> type: string
> outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8
> File Output Operator
> compressed: false
> GlobalTableId: 0
> table:
> input format: org.apache.hadoop.mapred.TextInputFormat
> output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> Stage: Stage-0
> Fetch Operator
> limit: -1
> {noformat}
> while the correct behavior is to push the filter "t3.id=20" down:
> {noformat}
> STAGE DEPENDENCIES:
> Stage-1 is a root stage
> Stage-0 is a root stage
> STAGE PLANS:
> Stage: Stage-1
> Map Reduce
> Alias -> Map Operator Tree:
> t1
> TableScan
> alias: t1
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 0
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> t2
> TableScan
> alias: t2
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 1
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> t3
> TableScan
> alias: t3
> Filter Operator
> predicate:
> expr: (id = 20)
> type: boolean
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 2
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> Reduce Operator Tree:
> Join Operator
> condition map:
> Outer Join 0 to 1
> Inner Join 1 to 2
> condition expressions:
> 0 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> 1 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> 2 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> handleSkewJoin: false
> outputColumnNames: _col0, _col1, _col2, _col5, _col6, _col7, _col10, _col11, _col12
> Select Operator
> expressions:
> expr: _col0
> type: int
> expr: _col1
> type: string
> expr: _col2
> type: string
> expr: _col5
> type: int
> expr: _col6
> type: string
> expr: _col7
> type: string
> expr: _col10
> type: int
> expr: _col11
> type: string
> expr: _col12
> type: string
> outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8
> File Output Operator
> compressed: false
> GlobalTableId: 0
> table:
> input format: org.apache.hadoop.mapred.TextInputFormat
> output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> Stage: Stage-0
> Fetch Operator
> limit: -1
> {noformat}
> The current behavior is actually stranger than this: for a left outer join (similarly for a right outer join), hive finds the leftmost alias referred to in the *predicates* of left outer joins and rejects any alias to the right of it for pushdown. So in this query the filter "t2.id=20" pushed down:
> {noformat}
> explain select * from t1 join t2 on (t1.id=t2.id) left outer join t3 on (t2.id=t3.id) where t2.id=20;
> {noformat}
> while it isn't here:
> {noformat}
> explain select * from t1 join t2 on (t1.id=t2.id) left outer join t3 on (t1.id=t3.id) where t2.id=20;
> {noformat}
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2337) Predicate pushdown erroneously
conservative with outer joins
Posted by "jiraposter@reviews.apache.org (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-2337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13079024#comment-13079024 ]
jiraposter@reviews.apache.org commented on HIVE-2337:
-----------------------------------------------------
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1275/
-----------------------------------------------------------
(Updated 2011-08-03 21:02:11.625203)
Review request for hive.
Changes
-------
Added test cases, fixed comment
Summary
-------
Initial patch
This addresses bug HIVE-2337.
https://issues.apache.org/jira/browse/HIVE-2337
Diffs (updated)
-----
http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java 1153598
http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/ppd_outer_join5.q.out PRE-CREATION
Diff: https://reviews.apache.org/r/1275/diff
Testing
-------
Thanks,
Charles
> Predicate pushdown erroneously conservative with outer joins
> ------------------------------------------------------------
>
> Key: HIVE-2337
> URL: https://issues.apache.org/jira/browse/HIVE-2337
> Project: Hive
> Issue Type: Bug
> Components: Query Processor
> Reporter: Charles Chen
> Assignee: Charles Chen
> Attachments: HIVE-2337v1.patch, HIVE-2337v2.patch
>
>
> The predicate pushdown filter is not applying left associativity of joins correctly in determining possible aliases for pushing predicates.
> In hive.ql.ppd.OpProcFactory.JoinPPD.getQualifiedAliases, the criteria for pushing aliases is specified as:
> {noformat}
> /**
> * Figures out the aliases for whom it is safe to push predicates based on
> * ANSI SQL semantics For inner join, all predicates for all aliases can be
> * pushed For full outer join, none of the predicates can be pushed as that
> * would limit the number of rows for join For left outer join, all the
> * predicates on the left side aliases can be pushed up For right outer
> * join, all the predicates on the right side aliases can be pushed up Joins
> * chain containing both left and right outer joins are treated as full
> * outer join. [...]
> *
> * @param op
> * Join Operator
> * @param rr
> * Row resolver
> * @return set of qualified aliases
> */
> {noformat}
> Since hive joins are left associative, something like "a RIGHT OUTER JOIN b LEFT OUTER JOIN c INNER JOIN d" should be interpreted as "((a RIGHT OUTER JOIN b) LEFT OUTER JOIN c) INNER JOIN d", so there would be cases where joins with both left and right outer joins can have aliases that can be pushed. Here, aliases b and d are eligible to be pushed up while the current criteria provide that none are eligible.
> Using:
> {noformat}
> create table t1 (id int, key string, value string);
> create table t2 (id int, key string, value string);
> create table t3 (id int, key string, value string);
> create table t4 (id int, key string, value string);
> {noformat}
> For example, the query
> {noformat}
> explain select * from t1 full outer join t2 on t1.id=t2.id join t3 on t2.id=t3.id where t3.id=20;
> {noformat}
> currently gives
> {noformat}
> STAGE DEPENDENCIES:
> Stage-1 is a root stage
> Stage-0 is a root stage
> STAGE PLANS:
> Stage: Stage-1
> Map Reduce
> Alias -> Map Operator Tree:
> t1
> TableScan
> alias: t1
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 0
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> t2
> TableScan
> alias: t2
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 1
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> t3
> TableScan
> alias: t3
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 2
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> Reduce Operator Tree:
> Join Operator
> condition map:
> Outer Join 0 to 1
> Inner Join 1 to 2
> condition expressions:
> 0 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> 1 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> 2 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> handleSkewJoin: false
> outputColumnNames: _col0, _col1, _col2, _col5, _col6, _col7, _col10, _col11, _col12
> Filter Operator
> predicate:
> expr: (_col10 = 20)
> type: boolean
> Select Operator
> expressions:
> expr: _col0
> type: int
> expr: _col1
> type: string
> expr: _col2
> type: string
> expr: _col5
> type: int
> expr: _col6
> type: string
> expr: _col7
> type: string
> expr: _col10
> type: int
> expr: _col11
> type: string
> expr: _col12
> type: string
> outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8
> File Output Operator
> compressed: false
> GlobalTableId: 0
> table:
> input format: org.apache.hadoop.mapred.TextInputFormat
> output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> Stage: Stage-0
> Fetch Operator
> limit: -1
> {noformat}
> while the correct behavior is to push the filter "t3.id=20" down:
> {noformat}
> STAGE DEPENDENCIES:
> Stage-1 is a root stage
> Stage-0 is a root stage
> STAGE PLANS:
> Stage: Stage-1
> Map Reduce
> Alias -> Map Operator Tree:
> t1
> TableScan
> alias: t1
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 0
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> t2
> TableScan
> alias: t2
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 1
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> t3
> TableScan
> alias: t3
> Filter Operator
> predicate:
> expr: (id = 20)
> type: boolean
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 2
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> Reduce Operator Tree:
> Join Operator
> condition map:
> Outer Join 0 to 1
> Inner Join 1 to 2
> condition expressions:
> 0 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> 1 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> 2 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> handleSkewJoin: false
> outputColumnNames: _col0, _col1, _col2, _col5, _col6, _col7, _col10, _col11, _col12
> Select Operator
> expressions:
> expr: _col0
> type: int
> expr: _col1
> type: string
> expr: _col2
> type: string
> expr: _col5
> type: int
> expr: _col6
> type: string
> expr: _col7
> type: string
> expr: _col10
> type: int
> expr: _col11
> type: string
> expr: _col12
> type: string
> outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8
> File Output Operator
> compressed: false
> GlobalTableId: 0
> table:
> input format: org.apache.hadoop.mapred.TextInputFormat
> output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> Stage: Stage-0
> Fetch Operator
> limit: -1
> {noformat}
> The current behavior is actually stranger than this: for a left outer join (similarly for a right outer join), hive finds the leftmost alias referred to in the *predicates* of left outer joins and rejects any alias to the right of it for pushdown. So in this query the filter "t2.id=20" pushed down:
> {noformat}
> explain select * from t1 join t2 on (t1.id=t2.id) left outer join t3 on (t2.id=t3.id) where t2.id=20;
> {noformat}
> while it isn't here:
> {noformat}
> explain select * from t1 join t2 on (t1.id=t2.id) left outer join t3 on (t1.id=t3.id) where t2.id=20;
> {noformat}
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2337) Predicate pushdown erroneously
conservative with outer joins
Posted by "jiraposter@reviews.apache.org (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-2337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13095110#comment-13095110 ]
jiraposter@reviews.apache.org commented on HIVE-2337:
-----------------------------------------------------
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1275/
-----------------------------------------------------------
(Updated 2011-09-01 04:26:59.076177)
Review request for hive.
Changes
-------
Oops fixed dropped unit test, javadoc character
Summary
-------
https://issues.apache.org/jira/browse/HIVE-2337
This addresses bug HIVE-2337.
https://issues.apache.org/jira/browse/HIVE-2337
Diffs (updated)
-----
http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/ppd_outer_join5.q.out PRE-CREATION
http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java 1163875
http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/queries/clientpositive/ppd_outer_join5.q PRE-CREATION
http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/ppd_outer_join4.q.out 1163875
Diff: https://reviews.apache.org/r/1275/diff
Testing
-------
Unit tests passed
Thanks,
Charles
> Predicate pushdown erroneously conservative with outer joins
> ------------------------------------------------------------
>
> Key: HIVE-2337
> URL: https://issues.apache.org/jira/browse/HIVE-2337
> Project: Hive
> Issue Type: Bug
> Components: Query Processor
> Reporter: Charles Chen
> Assignee: Charles Chen
> Fix For: 0.9.0
>
> Attachments: HIVE-2337v1.patch, HIVE-2337v2.patch, HIVE-2337v3.patch, HIVE-2337v4.patch, HIVE-2337v5.patch
>
>
> The predicate pushdown filter is not applying left associativity of joins correctly in determining possible aliases for pushing predicates.
> In hive.ql.ppd.OpProcFactory.JoinPPD.getQualifiedAliases, the criteria for pushing aliases is specified as:
> {noformat}
> /**
> * Figures out the aliases for whom it is safe to push predicates based on
> * ANSI SQL semantics For inner join, all predicates for all aliases can be
> * pushed For full outer join, none of the predicates can be pushed as that
> * would limit the number of rows for join For left outer join, all the
> * predicates on the left side aliases can be pushed up For right outer
> * join, all the predicates on the right side aliases can be pushed up Joins
> * chain containing both left and right outer joins are treated as full
> * outer join. [...]
> *
> * @param op
> * Join Operator
> * @param rr
> * Row resolver
> * @return set of qualified aliases
> */
> {noformat}
> Since hive joins are left associative, something like "a RIGHT OUTER JOIN b LEFT OUTER JOIN c INNER JOIN d" should be interpreted as "((a RIGHT OUTER JOIN b) LEFT OUTER JOIN c) INNER JOIN d", so there would be cases where joins with both left and right outer joins can have aliases that can be pushed. Here, aliases b and d are eligible to be pushed up while the current criteria provide that none are eligible.
> Using:
> {noformat}
> create table t1 (id int, key string, value string);
> create table t2 (id int, key string, value string);
> create table t3 (id int, key string, value string);
> create table t4 (id int, key string, value string);
> {noformat}
> For example, the query
> {noformat}
> explain select * from t1 full outer join t2 on t1.id=t2.id join t3 on t2.id=t3.id where t3.id=20;
> {noformat}
> currently gives
> {noformat}
> STAGE DEPENDENCIES:
> Stage-1 is a root stage
> Stage-0 is a root stage
> STAGE PLANS:
> Stage: Stage-1
> Map Reduce
> Alias -> Map Operator Tree:
> t1
> TableScan
> alias: t1
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 0
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> t2
> TableScan
> alias: t2
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 1
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> t3
> TableScan
> alias: t3
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 2
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> Reduce Operator Tree:
> Join Operator
> condition map:
> Outer Join 0 to 1
> Inner Join 1 to 2
> condition expressions:
> 0 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> 1 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> 2 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> handleSkewJoin: false
> outputColumnNames: _col0, _col1, _col2, _col5, _col6, _col7, _col10, _col11, _col12
> Filter Operator
> predicate:
> expr: (_col10 = 20)
> type: boolean
> Select Operator
> expressions:
> expr: _col0
> type: int
> expr: _col1
> type: string
> expr: _col2
> type: string
> expr: _col5
> type: int
> expr: _col6
> type: string
> expr: _col7
> type: string
> expr: _col10
> type: int
> expr: _col11
> type: string
> expr: _col12
> type: string
> outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8
> File Output Operator
> compressed: false
> GlobalTableId: 0
> table:
> input format: org.apache.hadoop.mapred.TextInputFormat
> output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> Stage: Stage-0
> Fetch Operator
> limit: -1
> {noformat}
> while the correct behavior is to push the filter "t3.id=20" down:
> {noformat}
> STAGE DEPENDENCIES:
> Stage-1 is a root stage
> Stage-0 is a root stage
> STAGE PLANS:
> Stage: Stage-1
> Map Reduce
> Alias -> Map Operator Tree:
> t1
> TableScan
> alias: t1
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 0
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> t2
> TableScan
> alias: t2
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 1
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> t3
> TableScan
> alias: t3
> Filter Operator
> predicate:
> expr: (id = 20)
> type: boolean
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 2
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> Reduce Operator Tree:
> Join Operator
> condition map:
> Outer Join 0 to 1
> Inner Join 1 to 2
> condition expressions:
> 0 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> 1 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> 2 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> handleSkewJoin: false
> outputColumnNames: _col0, _col1, _col2, _col5, _col6, _col7, _col10, _col11, _col12
> Select Operator
> expressions:
> expr: _col0
> type: int
> expr: _col1
> type: string
> expr: _col2
> type: string
> expr: _col5
> type: int
> expr: _col6
> type: string
> expr: _col7
> type: string
> expr: _col10
> type: int
> expr: _col11
> type: string
> expr: _col12
> type: string
> outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8
> File Output Operator
> compressed: false
> GlobalTableId: 0
> table:
> input format: org.apache.hadoop.mapred.TextInputFormat
> output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> Stage: Stage-0
> Fetch Operator
> limit: -1
> {noformat}
> The current behavior is actually stranger than this: for a left outer join (similarly for a right outer join), hive finds the leftmost alias referred to in the *predicates* of left outer joins and rejects any alias to the right of it for pushdown. So in this query the filter "t2.id=20" pushed down:
> {noformat}
> explain select * from t1 join t2 on (t1.id=t2.id) left outer join t3 on (t2.id=t3.id) where t2.id=20;
> {noformat}
> while it isn't here:
> {noformat}
> explain select * from t1 join t2 on (t1.id=t2.id) left outer join t3 on (t1.id=t3.id) where t2.id=20;
> {noformat}
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2337) Predicate pushdown erroneously
conservative with outer joins
Posted by "John Sichi (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-2337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13095607#comment-13095607 ]
John Sichi commented on HIVE-2337:
----------------------------------
+1. Will commit when tests pass.
> Predicate pushdown erroneously conservative with outer joins
> ------------------------------------------------------------
>
> Key: HIVE-2337
> URL: https://issues.apache.org/jira/browse/HIVE-2337
> Project: Hive
> Issue Type: Bug
> Components: Query Processor
> Reporter: Charles Chen
> Assignee: Charles Chen
> Fix For: 0.9.0
>
> Attachments: HIVE-2337v1.patch, HIVE-2337v2.patch, HIVE-2337v3.patch, HIVE-2337v4.patch, HIVE-2337v5.patch, HIVE-2337v6.patch, HIVE-2337v7.patch
>
>
> The predicate pushdown filter is not applying left associativity of joins correctly in determining possible aliases for pushing predicates.
> In hive.ql.ppd.OpProcFactory.JoinPPD.getQualifiedAliases, the criteria for pushing aliases is specified as:
> {noformat}
> /**
> * Figures out the aliases for whom it is safe to push predicates based on
> * ANSI SQL semantics For inner join, all predicates for all aliases can be
> * pushed For full outer join, none of the predicates can be pushed as that
> * would limit the number of rows for join For left outer join, all the
> * predicates on the left side aliases can be pushed up For right outer
> * join, all the predicates on the right side aliases can be pushed up Joins
> * chain containing both left and right outer joins are treated as full
> * outer join. [...]
> *
> * @param op
> * Join Operator
> * @param rr
> * Row resolver
> * @return set of qualified aliases
> */
> {noformat}
> Since hive joins are left associative, something like "a RIGHT OUTER JOIN b LEFT OUTER JOIN c INNER JOIN d" should be interpreted as "((a RIGHT OUTER JOIN b) LEFT OUTER JOIN c) INNER JOIN d", so there would be cases where joins with both left and right outer joins can have aliases that can be pushed. Here, aliases b and d are eligible to be pushed up while the current criteria provide that none are eligible.
> Using:
> {noformat}
> create table t1 (id int, key string, value string);
> create table t2 (id int, key string, value string);
> create table t3 (id int, key string, value string);
> create table t4 (id int, key string, value string);
> {noformat}
> For example, the query
> {noformat}
> explain select * from t1 full outer join t2 on t1.id=t2.id join t3 on t2.id=t3.id where t3.id=20;
> {noformat}
> currently gives
> {noformat}
> STAGE DEPENDENCIES:
> Stage-1 is a root stage
> Stage-0 is a root stage
> STAGE PLANS:
> Stage: Stage-1
> Map Reduce
> Alias -> Map Operator Tree:
> t1
> TableScan
> alias: t1
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 0
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> t2
> TableScan
> alias: t2
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 1
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> t3
> TableScan
> alias: t3
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 2
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> Reduce Operator Tree:
> Join Operator
> condition map:
> Outer Join 0 to 1
> Inner Join 1 to 2
> condition expressions:
> 0 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> 1 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> 2 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> handleSkewJoin: false
> outputColumnNames: _col0, _col1, _col2, _col5, _col6, _col7, _col10, _col11, _col12
> Filter Operator
> predicate:
> expr: (_col10 = 20)
> type: boolean
> Select Operator
> expressions:
> expr: _col0
> type: int
> expr: _col1
> type: string
> expr: _col2
> type: string
> expr: _col5
> type: int
> expr: _col6
> type: string
> expr: _col7
> type: string
> expr: _col10
> type: int
> expr: _col11
> type: string
> expr: _col12
> type: string
> outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8
> File Output Operator
> compressed: false
> GlobalTableId: 0
> table:
> input format: org.apache.hadoop.mapred.TextInputFormat
> output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> Stage: Stage-0
> Fetch Operator
> limit: -1
> {noformat}
> while the correct behavior is to push the filter "t3.id=20" down:
> {noformat}
> STAGE DEPENDENCIES:
> Stage-1 is a root stage
> Stage-0 is a root stage
> STAGE PLANS:
> Stage: Stage-1
> Map Reduce
> Alias -> Map Operator Tree:
> t1
> TableScan
> alias: t1
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 0
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> t2
> TableScan
> alias: t2
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 1
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> t3
> TableScan
> alias: t3
> Filter Operator
> predicate:
> expr: (id = 20)
> type: boolean
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 2
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> Reduce Operator Tree:
> Join Operator
> condition map:
> Outer Join 0 to 1
> Inner Join 1 to 2
> condition expressions:
> 0 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> 1 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> 2 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> handleSkewJoin: false
> outputColumnNames: _col0, _col1, _col2, _col5, _col6, _col7, _col10, _col11, _col12
> Select Operator
> expressions:
> expr: _col0
> type: int
> expr: _col1
> type: string
> expr: _col2
> type: string
> expr: _col5
> type: int
> expr: _col6
> type: string
> expr: _col7
> type: string
> expr: _col10
> type: int
> expr: _col11
> type: string
> expr: _col12
> type: string
> outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8
> File Output Operator
> compressed: false
> GlobalTableId: 0
> table:
> input format: org.apache.hadoop.mapred.TextInputFormat
> output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> Stage: Stage-0
> Fetch Operator
> limit: -1
> {noformat}
> The current behavior is actually stranger than this: for a left outer join (similarly for a right outer join), hive finds the leftmost alias referred to in the *predicates* of left outer joins and rejects any alias to the right of it for pushdown. So in this query the filter "t2.id=20" pushed down:
> {noformat}
> explain select * from t1 join t2 on (t1.id=t2.id) left outer join t3 on (t2.id=t3.id) where t2.id=20;
> {noformat}
> while it isn't here:
> {noformat}
> explain select * from t1 join t2 on (t1.id=t2.id) left outer join t3 on (t1.id=t3.id) where t2.id=20;
> {noformat}
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2337) Predicate pushdown erroneously
conservative with outer joins
Posted by "Charles Chen (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-2337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Charles Chen updated HIVE-2337:
-------------------------------
Status: Patch Available (was: Open)
> Predicate pushdown erroneously conservative with outer joins
> ------------------------------------------------------------
>
> Key: HIVE-2337
> URL: https://issues.apache.org/jira/browse/HIVE-2337
> Project: Hive
> Issue Type: Bug
> Components: Query Processor
> Reporter: Charles Chen
> Assignee: Charles Chen
> Attachments: HIVE-2337v1.patch, HIVE-2337v2.patch
>
>
> The predicate pushdown filter is not applying left associativity of joins correctly in determining possible aliases for pushing predicates.
> In hive.ql.ppd.OpProcFactory.JoinPPD.getQualifiedAliases, the criteria for pushing aliases is specified as:
> {noformat}
> /**
> * Figures out the aliases for whom it is safe to push predicates based on
> * ANSI SQL semantics For inner join, all predicates for all aliases can be
> * pushed For full outer join, none of the predicates can be pushed as that
> * would limit the number of rows for join For left outer join, all the
> * predicates on the left side aliases can be pushed up For right outer
> * join, all the predicates on the right side aliases can be pushed up Joins
> * chain containing both left and right outer joins are treated as full
> * outer join. [...]
> *
> * @param op
> * Join Operator
> * @param rr
> * Row resolver
> * @return set of qualified aliases
> */
> {noformat}
> Since hive joins are left associative, something like "a RIGHT OUTER JOIN b LEFT OUTER JOIN c INNER JOIN d" should be interpreted as "((a RIGHT OUTER JOIN b) LEFT OUTER JOIN c) INNER JOIN d", so there would be cases where joins with both left and right outer joins can have aliases that can be pushed. Here, aliases b and d are eligible to be pushed up while the current criteria provide that none are eligible.
> Using:
> {noformat}
> create table t1 (id int, key string, value string);
> create table t2 (id int, key string, value string);
> create table t3 (id int, key string, value string);
> create table t4 (id int, key string, value string);
> {noformat}
> For example, the query
> {noformat}
> explain select * from t1 full outer join t2 on t1.id=t2.id join t3 on t2.id=t3.id where t3.id=20;
> {noformat}
> currently gives
> {noformat}
> STAGE DEPENDENCIES:
> Stage-1 is a root stage
> Stage-0 is a root stage
> STAGE PLANS:
> Stage: Stage-1
> Map Reduce
> Alias -> Map Operator Tree:
> t1
> TableScan
> alias: t1
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 0
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> t2
> TableScan
> alias: t2
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 1
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> t3
> TableScan
> alias: t3
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 2
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> Reduce Operator Tree:
> Join Operator
> condition map:
> Outer Join 0 to 1
> Inner Join 1 to 2
> condition expressions:
> 0 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> 1 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> 2 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> handleSkewJoin: false
> outputColumnNames: _col0, _col1, _col2, _col5, _col6, _col7, _col10, _col11, _col12
> Filter Operator
> predicate:
> expr: (_col10 = 20)
> type: boolean
> Select Operator
> expressions:
> expr: _col0
> type: int
> expr: _col1
> type: string
> expr: _col2
> type: string
> expr: _col5
> type: int
> expr: _col6
> type: string
> expr: _col7
> type: string
> expr: _col10
> type: int
> expr: _col11
> type: string
> expr: _col12
> type: string
> outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8
> File Output Operator
> compressed: false
> GlobalTableId: 0
> table:
> input format: org.apache.hadoop.mapred.TextInputFormat
> output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> Stage: Stage-0
> Fetch Operator
> limit: -1
> {noformat}
> while the correct behavior is to push the filter "t3.id=20" down:
> {noformat}
> STAGE DEPENDENCIES:
> Stage-1 is a root stage
> Stage-0 is a root stage
> STAGE PLANS:
> Stage: Stage-1
> Map Reduce
> Alias -> Map Operator Tree:
> t1
> TableScan
> alias: t1
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 0
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> t2
> TableScan
> alias: t2
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 1
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> t3
> TableScan
> alias: t3
> Filter Operator
> predicate:
> expr: (id = 20)
> type: boolean
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 2
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> Reduce Operator Tree:
> Join Operator
> condition map:
> Outer Join 0 to 1
> Inner Join 1 to 2
> condition expressions:
> 0 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> 1 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> 2 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> handleSkewJoin: false
> outputColumnNames: _col0, _col1, _col2, _col5, _col6, _col7, _col10, _col11, _col12
> Select Operator
> expressions:
> expr: _col0
> type: int
> expr: _col1
> type: string
> expr: _col2
> type: string
> expr: _col5
> type: int
> expr: _col6
> type: string
> expr: _col7
> type: string
> expr: _col10
> type: int
> expr: _col11
> type: string
> expr: _col12
> type: string
> outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8
> File Output Operator
> compressed: false
> GlobalTableId: 0
> table:
> input format: org.apache.hadoop.mapred.TextInputFormat
> output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> Stage: Stage-0
> Fetch Operator
> limit: -1
> {noformat}
> The current behavior is actually stranger than this: for a left outer join (similarly for a right outer join), hive finds the leftmost alias referred to in the *predicates* of left outer joins and rejects any alias to the right of it for pushdown. So in this query the filter "t2.id=20" pushed down:
> {noformat}
> explain select * from t1 join t2 on (t1.id=t2.id) left outer join t3 on (t2.id=t3.id) where t2.id=20;
> {noformat}
> while it isn't here:
> {noformat}
> explain select * from t1 join t2 on (t1.id=t2.id) left outer join t3 on (t1.id=t3.id) where t2.id=20;
> {noformat}
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HIVE-2337) Predicate pushdown erroneously
conservative with outer joins
Posted by "Charles Chen (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-2337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Charles Chen reassigned HIVE-2337:
----------------------------------
Assignee: Charles Chen
> Predicate pushdown erroneously conservative with outer joins
> ------------------------------------------------------------
>
> Key: HIVE-2337
> URL: https://issues.apache.org/jira/browse/HIVE-2337
> Project: Hive
> Issue Type: Bug
> Components: Query Processor
> Reporter: Charles Chen
> Assignee: Charles Chen
>
> The predicate pushdown filter is not applying left associativity of joins correctly in determining possible aliases for pushing predicates.
> In hive.ql.ppd.OpProcFactory.JoinPPD.getQualifiedAliases, the criteria for pushing aliases is specified as:
> {noformat}
> /**
> * Figures out the aliases for whom it is safe to push predicates based on
> * ANSI SQL semantics For inner join, all predicates for all aliases can be
> * pushed For full outer join, none of the predicates can be pushed as that
> * would limit the number of rows for join For left outer join, all the
> * predicates on the left side aliases can be pushed up For right outer
> * join, all the predicates on the right side aliases can be pushed up Joins
> * chain containing both left and right outer joins are treated as full
> * outer join. [...]
> *
> * @param op
> * Join Operator
> * @param rr
> * Row resolver
> * @return set of qualified aliases
> */
> {noformat}
> Since hive joins are left associative, something like "a RIGHT OUTER JOIN b LEFT OUTER JOIN c INNER JOIN d" should be interpreted as "((a RIGHT OUTER JOIN b) LEFT OUTER JOIN c) INNER JOIN d", so there would be cases where joins with both left and right outer joins can have aliases that can be pushed. Here, aliases a and d are eligible to be pushed up while the current criteria provide that none are eligible.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2337) Predicate pushdown erroneously
conservative with outer joins
Posted by "Charles Chen (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-2337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13095015#comment-13095015 ]
Charles Chen commented on HIVE-2337:
------------------------------------
I've fixed the test output--it seems to be an improvement.
> Predicate pushdown erroneously conservative with outer joins
> ------------------------------------------------------------
>
> Key: HIVE-2337
> URL: https://issues.apache.org/jira/browse/HIVE-2337
> Project: Hive
> Issue Type: Bug
> Components: Query Processor
> Reporter: Charles Chen
> Assignee: Charles Chen
> Attachments: HIVE-2337v1.patch, HIVE-2337v2.patch, HIVE-2337v3.patch, HIVE-2337v4.patch
>
>
> The predicate pushdown filter is not applying left associativity of joins correctly in determining possible aliases for pushing predicates.
> In hive.ql.ppd.OpProcFactory.JoinPPD.getQualifiedAliases, the criteria for pushing aliases is specified as:
> {noformat}
> /**
> * Figures out the aliases for whom it is safe to push predicates based on
> * ANSI SQL semantics For inner join, all predicates for all aliases can be
> * pushed For full outer join, none of the predicates can be pushed as that
> * would limit the number of rows for join For left outer join, all the
> * predicates on the left side aliases can be pushed up For right outer
> * join, all the predicates on the right side aliases can be pushed up Joins
> * chain containing both left and right outer joins are treated as full
> * outer join. [...]
> *
> * @param op
> * Join Operator
> * @param rr
> * Row resolver
> * @return set of qualified aliases
> */
> {noformat}
> Since hive joins are left associative, something like "a RIGHT OUTER JOIN b LEFT OUTER JOIN c INNER JOIN d" should be interpreted as "((a RIGHT OUTER JOIN b) LEFT OUTER JOIN c) INNER JOIN d", so there would be cases where joins with both left and right outer joins can have aliases that can be pushed. Here, aliases b and d are eligible to be pushed up while the current criteria provide that none are eligible.
> Using:
> {noformat}
> create table t1 (id int, key string, value string);
> create table t2 (id int, key string, value string);
> create table t3 (id int, key string, value string);
> create table t4 (id int, key string, value string);
> {noformat}
> For example, the query
> {noformat}
> explain select * from t1 full outer join t2 on t1.id=t2.id join t3 on t2.id=t3.id where t3.id=20;
> {noformat}
> currently gives
> {noformat}
> STAGE DEPENDENCIES:
> Stage-1 is a root stage
> Stage-0 is a root stage
> STAGE PLANS:
> Stage: Stage-1
> Map Reduce
> Alias -> Map Operator Tree:
> t1
> TableScan
> alias: t1
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 0
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> t2
> TableScan
> alias: t2
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 1
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> t3
> TableScan
> alias: t3
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 2
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> Reduce Operator Tree:
> Join Operator
> condition map:
> Outer Join 0 to 1
> Inner Join 1 to 2
> condition expressions:
> 0 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> 1 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> 2 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> handleSkewJoin: false
> outputColumnNames: _col0, _col1, _col2, _col5, _col6, _col7, _col10, _col11, _col12
> Filter Operator
> predicate:
> expr: (_col10 = 20)
> type: boolean
> Select Operator
> expressions:
> expr: _col0
> type: int
> expr: _col1
> type: string
> expr: _col2
> type: string
> expr: _col5
> type: int
> expr: _col6
> type: string
> expr: _col7
> type: string
> expr: _col10
> type: int
> expr: _col11
> type: string
> expr: _col12
> type: string
> outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8
> File Output Operator
> compressed: false
> GlobalTableId: 0
> table:
> input format: org.apache.hadoop.mapred.TextInputFormat
> output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> Stage: Stage-0
> Fetch Operator
> limit: -1
> {noformat}
> while the correct behavior is to push the filter "t3.id=20" down:
> {noformat}
> STAGE DEPENDENCIES:
> Stage-1 is a root stage
> Stage-0 is a root stage
> STAGE PLANS:
> Stage: Stage-1
> Map Reduce
> Alias -> Map Operator Tree:
> t1
> TableScan
> alias: t1
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 0
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> t2
> TableScan
> alias: t2
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 1
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> t3
> TableScan
> alias: t3
> Filter Operator
> predicate:
> expr: (id = 20)
> type: boolean
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 2
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> Reduce Operator Tree:
> Join Operator
> condition map:
> Outer Join 0 to 1
> Inner Join 1 to 2
> condition expressions:
> 0 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> 1 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> 2 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> handleSkewJoin: false
> outputColumnNames: _col0, _col1, _col2, _col5, _col6, _col7, _col10, _col11, _col12
> Select Operator
> expressions:
> expr: _col0
> type: int
> expr: _col1
> type: string
> expr: _col2
> type: string
> expr: _col5
> type: int
> expr: _col6
> type: string
> expr: _col7
> type: string
> expr: _col10
> type: int
> expr: _col11
> type: string
> expr: _col12
> type: string
> outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8
> File Output Operator
> compressed: false
> GlobalTableId: 0
> table:
> input format: org.apache.hadoop.mapred.TextInputFormat
> output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> Stage: Stage-0
> Fetch Operator
> limit: -1
> {noformat}
> The current behavior is actually stranger than this: for a left outer join (similarly for a right outer join), hive finds the leftmost alias referred to in the *predicates* of left outer joins and rejects any alias to the right of it for pushdown. So in this query the filter "t2.id=20" pushed down:
> {noformat}
> explain select * from t1 join t2 on (t1.id=t2.id) left outer join t3 on (t2.id=t3.id) where t2.id=20;
> {noformat}
> while it isn't here:
> {noformat}
> explain select * from t1 join t2 on (t1.id=t2.id) left outer join t3 on (t1.id=t3.id) where t2.id=20;
> {noformat}
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2337) Predicate pushdown erroneously
conservative with outer joins
Posted by "jiraposter@reviews.apache.org (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-2337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13095104#comment-13095104 ]
jiraposter@reviews.apache.org commented on HIVE-2337:
-----------------------------------------------------
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1275/#review1710
-----------------------------------------------------------
http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java
<https://reviews.apache.org/r/1275/#comment3884>
There is a weird non-ASCII character on this line.
- John
On 2011-09-01 00:19:17, Charles Chen wrote:
bq.
bq. -----------------------------------------------------------
bq. This is an automatically generated e-mail. To reply, visit:
bq. https://reviews.apache.org/r/1275/
bq. -----------------------------------------------------------
bq.
bq. (Updated 2011-09-01 00:19:17)
bq.
bq.
bq. Review request for hive.
bq.
bq.
bq. Summary
bq. -------
bq.
bq. https://issues.apache.org/jira/browse/HIVE-2337
bq.
bq.
bq. This addresses bug HIVE-2337.
bq. https://issues.apache.org/jira/browse/HIVE-2337
bq.
bq.
bq. Diffs
bq. -----
bq.
bq. http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java 1163875
bq. http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/ppd_outer_join4.q.out 1163875
bq.
bq. Diff: https://reviews.apache.org/r/1275/diff
bq.
bq.
bq. Testing
bq. -------
bq.
bq. Unit tests passed
bq.
bq.
bq. Thanks,
bq.
bq. Charles
bq.
bq.
> Predicate pushdown erroneously conservative with outer joins
> ------------------------------------------------------------
>
> Key: HIVE-2337
> URL: https://issues.apache.org/jira/browse/HIVE-2337
> Project: Hive
> Issue Type: Bug
> Components: Query Processor
> Reporter: Charles Chen
> Assignee: Charles Chen
> Fix For: 0.9.0
>
> Attachments: HIVE-2337v1.patch, HIVE-2337v2.patch, HIVE-2337v3.patch, HIVE-2337v4.patch, HIVE-2337v5.patch
>
>
> The predicate pushdown filter is not applying left associativity of joins correctly in determining possible aliases for pushing predicates.
> In hive.ql.ppd.OpProcFactory.JoinPPD.getQualifiedAliases, the criteria for pushing aliases is specified as:
> {noformat}
> /**
> * Figures out the aliases for whom it is safe to push predicates based on
> * ANSI SQL semantics For inner join, all predicates for all aliases can be
> * pushed For full outer join, none of the predicates can be pushed as that
> * would limit the number of rows for join For left outer join, all the
> * predicates on the left side aliases can be pushed up For right outer
> * join, all the predicates on the right side aliases can be pushed up Joins
> * chain containing both left and right outer joins are treated as full
> * outer join. [...]
> *
> * @param op
> * Join Operator
> * @param rr
> * Row resolver
> * @return set of qualified aliases
> */
> {noformat}
> Since hive joins are left associative, something like "a RIGHT OUTER JOIN b LEFT OUTER JOIN c INNER JOIN d" should be interpreted as "((a RIGHT OUTER JOIN b) LEFT OUTER JOIN c) INNER JOIN d", so there would be cases where joins with both left and right outer joins can have aliases that can be pushed. Here, aliases b and d are eligible to be pushed up while the current criteria provide that none are eligible.
> Using:
> {noformat}
> create table t1 (id int, key string, value string);
> create table t2 (id int, key string, value string);
> create table t3 (id int, key string, value string);
> create table t4 (id int, key string, value string);
> {noformat}
> For example, the query
> {noformat}
> explain select * from t1 full outer join t2 on t1.id=t2.id join t3 on t2.id=t3.id where t3.id=20;
> {noformat}
> currently gives
> {noformat}
> STAGE DEPENDENCIES:
> Stage-1 is a root stage
> Stage-0 is a root stage
> STAGE PLANS:
> Stage: Stage-1
> Map Reduce
> Alias -> Map Operator Tree:
> t1
> TableScan
> alias: t1
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 0
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> t2
> TableScan
> alias: t2
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 1
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> t3
> TableScan
> alias: t3
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 2
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> Reduce Operator Tree:
> Join Operator
> condition map:
> Outer Join 0 to 1
> Inner Join 1 to 2
> condition expressions:
> 0 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> 1 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> 2 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> handleSkewJoin: false
> outputColumnNames: _col0, _col1, _col2, _col5, _col6, _col7, _col10, _col11, _col12
> Filter Operator
> predicate:
> expr: (_col10 = 20)
> type: boolean
> Select Operator
> expressions:
> expr: _col0
> type: int
> expr: _col1
> type: string
> expr: _col2
> type: string
> expr: _col5
> type: int
> expr: _col6
> type: string
> expr: _col7
> type: string
> expr: _col10
> type: int
> expr: _col11
> type: string
> expr: _col12
> type: string
> outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8
> File Output Operator
> compressed: false
> GlobalTableId: 0
> table:
> input format: org.apache.hadoop.mapred.TextInputFormat
> output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> Stage: Stage-0
> Fetch Operator
> limit: -1
> {noformat}
> while the correct behavior is to push the filter "t3.id=20" down:
> {noformat}
> STAGE DEPENDENCIES:
> Stage-1 is a root stage
> Stage-0 is a root stage
> STAGE PLANS:
> Stage: Stage-1
> Map Reduce
> Alias -> Map Operator Tree:
> t1
> TableScan
> alias: t1
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 0
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> t2
> TableScan
> alias: t2
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 1
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> t3
> TableScan
> alias: t3
> Filter Operator
> predicate:
> expr: (id = 20)
> type: boolean
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 2
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> Reduce Operator Tree:
> Join Operator
> condition map:
> Outer Join 0 to 1
> Inner Join 1 to 2
> condition expressions:
> 0 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> 1 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> 2 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> handleSkewJoin: false
> outputColumnNames: _col0, _col1, _col2, _col5, _col6, _col7, _col10, _col11, _col12
> Select Operator
> expressions:
> expr: _col0
> type: int
> expr: _col1
> type: string
> expr: _col2
> type: string
> expr: _col5
> type: int
> expr: _col6
> type: string
> expr: _col7
> type: string
> expr: _col10
> type: int
> expr: _col11
> type: string
> expr: _col12
> type: string
> outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8
> File Output Operator
> compressed: false
> GlobalTableId: 0
> table:
> input format: org.apache.hadoop.mapred.TextInputFormat
> output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> Stage: Stage-0
> Fetch Operator
> limit: -1
> {noformat}
> The current behavior is actually stranger than this: for a left outer join (similarly for a right outer join), hive finds the leftmost alias referred to in the *predicates* of left outer joins and rejects any alias to the right of it for pushdown. So in this query the filter "t2.id=20" pushed down:
> {noformat}
> explain select * from t1 join t2 on (t1.id=t2.id) left outer join t3 on (t2.id=t3.id) where t2.id=20;
> {noformat}
> while it isn't here:
> {noformat}
> explain select * from t1 join t2 on (t1.id=t2.id) left outer join t3 on (t1.id=t3.id) where t2.id=20;
> {noformat}
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2337) Predicate pushdown erroneously
conservative with outer joins
Posted by "Charles Chen (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-2337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Charles Chen updated HIVE-2337:
-------------------------------
Attachment: HIVE-2337v4.patch
> Predicate pushdown erroneously conservative with outer joins
> ------------------------------------------------------------
>
> Key: HIVE-2337
> URL: https://issues.apache.org/jira/browse/HIVE-2337
> Project: Hive
> Issue Type: Bug
> Components: Query Processor
> Reporter: Charles Chen
> Assignee: Charles Chen
> Attachments: HIVE-2337v1.patch, HIVE-2337v2.patch, HIVE-2337v3.patch, HIVE-2337v4.patch
>
>
> The predicate pushdown filter is not applying left associativity of joins correctly in determining possible aliases for pushing predicates.
> In hive.ql.ppd.OpProcFactory.JoinPPD.getQualifiedAliases, the criteria for pushing aliases is specified as:
> {noformat}
> /**
> * Figures out the aliases for whom it is safe to push predicates based on
> * ANSI SQL semantics For inner join, all predicates for all aliases can be
> * pushed For full outer join, none of the predicates can be pushed as that
> * would limit the number of rows for join For left outer join, all the
> * predicates on the left side aliases can be pushed up For right outer
> * join, all the predicates on the right side aliases can be pushed up Joins
> * chain containing both left and right outer joins are treated as full
> * outer join. [...]
> *
> * @param op
> * Join Operator
> * @param rr
> * Row resolver
> * @return set of qualified aliases
> */
> {noformat}
> Since hive joins are left associative, something like "a RIGHT OUTER JOIN b LEFT OUTER JOIN c INNER JOIN d" should be interpreted as "((a RIGHT OUTER JOIN b) LEFT OUTER JOIN c) INNER JOIN d", so there would be cases where joins with both left and right outer joins can have aliases that can be pushed. Here, aliases b and d are eligible to be pushed up while the current criteria provide that none are eligible.
> Using:
> {noformat}
> create table t1 (id int, key string, value string);
> create table t2 (id int, key string, value string);
> create table t3 (id int, key string, value string);
> create table t4 (id int, key string, value string);
> {noformat}
> For example, the query
> {noformat}
> explain select * from t1 full outer join t2 on t1.id=t2.id join t3 on t2.id=t3.id where t3.id=20;
> {noformat}
> currently gives
> {noformat}
> STAGE DEPENDENCIES:
> Stage-1 is a root stage
> Stage-0 is a root stage
> STAGE PLANS:
> Stage: Stage-1
> Map Reduce
> Alias -> Map Operator Tree:
> t1
> TableScan
> alias: t1
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 0
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> t2
> TableScan
> alias: t2
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 1
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> t3
> TableScan
> alias: t3
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 2
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> Reduce Operator Tree:
> Join Operator
> condition map:
> Outer Join 0 to 1
> Inner Join 1 to 2
> condition expressions:
> 0 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> 1 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> 2 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> handleSkewJoin: false
> outputColumnNames: _col0, _col1, _col2, _col5, _col6, _col7, _col10, _col11, _col12
> Filter Operator
> predicate:
> expr: (_col10 = 20)
> type: boolean
> Select Operator
> expressions:
> expr: _col0
> type: int
> expr: _col1
> type: string
> expr: _col2
> type: string
> expr: _col5
> type: int
> expr: _col6
> type: string
> expr: _col7
> type: string
> expr: _col10
> type: int
> expr: _col11
> type: string
> expr: _col12
> type: string
> outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8
> File Output Operator
> compressed: false
> GlobalTableId: 0
> table:
> input format: org.apache.hadoop.mapred.TextInputFormat
> output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> Stage: Stage-0
> Fetch Operator
> limit: -1
> {noformat}
> while the correct behavior is to push the filter "t3.id=20" down:
> {noformat}
> STAGE DEPENDENCIES:
> Stage-1 is a root stage
> Stage-0 is a root stage
> STAGE PLANS:
> Stage: Stage-1
> Map Reduce
> Alias -> Map Operator Tree:
> t1
> TableScan
> alias: t1
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 0
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> t2
> TableScan
> alias: t2
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 1
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> t3
> TableScan
> alias: t3
> Filter Operator
> predicate:
> expr: (id = 20)
> type: boolean
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 2
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> Reduce Operator Tree:
> Join Operator
> condition map:
> Outer Join 0 to 1
> Inner Join 1 to 2
> condition expressions:
> 0 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> 1 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> 2 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> handleSkewJoin: false
> outputColumnNames: _col0, _col1, _col2, _col5, _col6, _col7, _col10, _col11, _col12
> Select Operator
> expressions:
> expr: _col0
> type: int
> expr: _col1
> type: string
> expr: _col2
> type: string
> expr: _col5
> type: int
> expr: _col6
> type: string
> expr: _col7
> type: string
> expr: _col10
> type: int
> expr: _col11
> type: string
> expr: _col12
> type: string
> outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8
> File Output Operator
> compressed: false
> GlobalTableId: 0
> table:
> input format: org.apache.hadoop.mapred.TextInputFormat
> output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> Stage: Stage-0
> Fetch Operator
> limit: -1
> {noformat}
> The current behavior is actually stranger than this: for a left outer join (similarly for a right outer join), hive finds the leftmost alias referred to in the *predicates* of left outer joins and rejects any alias to the right of it for pushdown. So in this query the filter "t2.id=20" pushed down:
> {noformat}
> explain select * from t1 join t2 on (t1.id=t2.id) left outer join t3 on (t2.id=t3.id) where t2.id=20;
> {noformat}
> while it isn't here:
> {noformat}
> explain select * from t1 join t2 on (t1.id=t2.id) left outer join t3 on (t1.id=t3.id) where t2.id=20;
> {noformat}
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2337) Predicate pushdown erroneously
conservative with outer joins
Posted by "John Sichi (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-2337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
John Sichi updated HIVE-2337:
-----------------------------
Resolution: Fixed
Hadoop Flags: [Reviewed]
Status: Resolved (was: Patch Available)
Committed to trunk. Thanks Charles!
> Predicate pushdown erroneously conservative with outer joins
> ------------------------------------------------------------
>
> Key: HIVE-2337
> URL: https://issues.apache.org/jira/browse/HIVE-2337
> Project: Hive
> Issue Type: Bug
> Components: Query Processor
> Reporter: Charles Chen
> Assignee: Charles Chen
> Fix For: 0.9.0
>
> Attachments: HIVE-2337v1.patch, HIVE-2337v2.patch, HIVE-2337v3.patch, HIVE-2337v4.patch, HIVE-2337v5.patch, HIVE-2337v6.patch, HIVE-2337v7.patch
>
>
> The predicate pushdown filter is not applying left associativity of joins correctly in determining possible aliases for pushing predicates.
> In hive.ql.ppd.OpProcFactory.JoinPPD.getQualifiedAliases, the criteria for pushing aliases is specified as:
> {noformat}
> /**
> * Figures out the aliases for whom it is safe to push predicates based on
> * ANSI SQL semantics For inner join, all predicates for all aliases can be
> * pushed For full outer join, none of the predicates can be pushed as that
> * would limit the number of rows for join For left outer join, all the
> * predicates on the left side aliases can be pushed up For right outer
> * join, all the predicates on the right side aliases can be pushed up Joins
> * chain containing both left and right outer joins are treated as full
> * outer join. [...]
> *
> * @param op
> * Join Operator
> * @param rr
> * Row resolver
> * @return set of qualified aliases
> */
> {noformat}
> Since hive joins are left associative, something like "a RIGHT OUTER JOIN b LEFT OUTER JOIN c INNER JOIN d" should be interpreted as "((a RIGHT OUTER JOIN b) LEFT OUTER JOIN c) INNER JOIN d", so there would be cases where joins with both left and right outer joins can have aliases that can be pushed. Here, aliases b and d are eligible to be pushed up while the current criteria provide that none are eligible.
> Using:
> {noformat}
> create table t1 (id int, key string, value string);
> create table t2 (id int, key string, value string);
> create table t3 (id int, key string, value string);
> create table t4 (id int, key string, value string);
> {noformat}
> For example, the query
> {noformat}
> explain select * from t1 full outer join t2 on t1.id=t2.id join t3 on t2.id=t3.id where t3.id=20;
> {noformat}
> currently gives
> {noformat}
> STAGE DEPENDENCIES:
> Stage-1 is a root stage
> Stage-0 is a root stage
> STAGE PLANS:
> Stage: Stage-1
> Map Reduce
> Alias -> Map Operator Tree:
> t1
> TableScan
> alias: t1
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 0
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> t2
> TableScan
> alias: t2
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 1
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> t3
> TableScan
> alias: t3
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 2
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> Reduce Operator Tree:
> Join Operator
> condition map:
> Outer Join 0 to 1
> Inner Join 1 to 2
> condition expressions:
> 0 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> 1 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> 2 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> handleSkewJoin: false
> outputColumnNames: _col0, _col1, _col2, _col5, _col6, _col7, _col10, _col11, _col12
> Filter Operator
> predicate:
> expr: (_col10 = 20)
> type: boolean
> Select Operator
> expressions:
> expr: _col0
> type: int
> expr: _col1
> type: string
> expr: _col2
> type: string
> expr: _col5
> type: int
> expr: _col6
> type: string
> expr: _col7
> type: string
> expr: _col10
> type: int
> expr: _col11
> type: string
> expr: _col12
> type: string
> outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8
> File Output Operator
> compressed: false
> GlobalTableId: 0
> table:
> input format: org.apache.hadoop.mapred.TextInputFormat
> output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> Stage: Stage-0
> Fetch Operator
> limit: -1
> {noformat}
> while the correct behavior is to push the filter "t3.id=20" down:
> {noformat}
> STAGE DEPENDENCIES:
> Stage-1 is a root stage
> Stage-0 is a root stage
> STAGE PLANS:
> Stage: Stage-1
> Map Reduce
> Alias -> Map Operator Tree:
> t1
> TableScan
> alias: t1
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 0
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> t2
> TableScan
> alias: t2
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 1
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> t3
> TableScan
> alias: t3
> Filter Operator
> predicate:
> expr: (id = 20)
> type: boolean
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 2
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> Reduce Operator Tree:
> Join Operator
> condition map:
> Outer Join 0 to 1
> Inner Join 1 to 2
> condition expressions:
> 0 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> 1 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> 2 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> handleSkewJoin: false
> outputColumnNames: _col0, _col1, _col2, _col5, _col6, _col7, _col10, _col11, _col12
> Select Operator
> expressions:
> expr: _col0
> type: int
> expr: _col1
> type: string
> expr: _col2
> type: string
> expr: _col5
> type: int
> expr: _col6
> type: string
> expr: _col7
> type: string
> expr: _col10
> type: int
> expr: _col11
> type: string
> expr: _col12
> type: string
> outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8
> File Output Operator
> compressed: false
> GlobalTableId: 0
> table:
> input format: org.apache.hadoop.mapred.TextInputFormat
> output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> Stage: Stage-0
> Fetch Operator
> limit: -1
> {noformat}
> The current behavior is actually stranger than this: for a left outer join (similarly for a right outer join), hive finds the leftmost alias referred to in the *predicates* of left outer joins and rejects any alias to the right of it for pushdown. So in this query the filter "t2.id=20" pushed down:
> {noformat}
> explain select * from t1 join t2 on (t1.id=t2.id) left outer join t3 on (t2.id=t3.id) where t2.id=20;
> {noformat}
> while it isn't here:
> {noformat}
> explain select * from t1 join t2 on (t1.id=t2.id) left outer join t3 on (t1.id=t3.id) where t2.id=20;
> {noformat}
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2337) Predicate pushdown erroneously
conservative with outer joins
Posted by "jiraposter@reviews.apache.org (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-2337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13095022#comment-13095022 ]
jiraposter@reviews.apache.org commented on HIVE-2337:
-----------------------------------------------------
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1275/
-----------------------------------------------------------
(Updated 2011-09-01 00:19:17.176704)
Review request for hive.
Changes
-------
Rebased to current trunk
Summary
-------
https://issues.apache.org/jira/browse/HIVE-2337
This addresses bug HIVE-2337.
https://issues.apache.org/jira/browse/HIVE-2337
Diffs (updated)
-----
http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java 1163875
http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/ppd_outer_join4.q.out 1163875
Diff: https://reviews.apache.org/r/1275/diff
Testing
-------
Unit tests passed
Thanks,
Charles
> Predicate pushdown erroneously conservative with outer joins
> ------------------------------------------------------------
>
> Key: HIVE-2337
> URL: https://issues.apache.org/jira/browse/HIVE-2337
> Project: Hive
> Issue Type: Bug
> Components: Query Processor
> Reporter: Charles Chen
> Assignee: Charles Chen
> Fix For: 0.9.0
>
> Attachments: HIVE-2337v1.patch, HIVE-2337v2.patch, HIVE-2337v3.patch, HIVE-2337v4.patch, HIVE-2337v5.patch
>
>
> The predicate pushdown filter is not applying left associativity of joins correctly in determining possible aliases for pushing predicates.
> In hive.ql.ppd.OpProcFactory.JoinPPD.getQualifiedAliases, the criteria for pushing aliases is specified as:
> {noformat}
> /**
> * Figures out the aliases for whom it is safe to push predicates based on
> * ANSI SQL semantics For inner join, all predicates for all aliases can be
> * pushed For full outer join, none of the predicates can be pushed as that
> * would limit the number of rows for join For left outer join, all the
> * predicates on the left side aliases can be pushed up For right outer
> * join, all the predicates on the right side aliases can be pushed up Joins
> * chain containing both left and right outer joins are treated as full
> * outer join. [...]
> *
> * @param op
> * Join Operator
> * @param rr
> * Row resolver
> * @return set of qualified aliases
> */
> {noformat}
> Since hive joins are left associative, something like "a RIGHT OUTER JOIN b LEFT OUTER JOIN c INNER JOIN d" should be interpreted as "((a RIGHT OUTER JOIN b) LEFT OUTER JOIN c) INNER JOIN d", so there would be cases where joins with both left and right outer joins can have aliases that can be pushed. Here, aliases b and d are eligible to be pushed up while the current criteria provide that none are eligible.
> Using:
> {noformat}
> create table t1 (id int, key string, value string);
> create table t2 (id int, key string, value string);
> create table t3 (id int, key string, value string);
> create table t4 (id int, key string, value string);
> {noformat}
> For example, the query
> {noformat}
> explain select * from t1 full outer join t2 on t1.id=t2.id join t3 on t2.id=t3.id where t3.id=20;
> {noformat}
> currently gives
> {noformat}
> STAGE DEPENDENCIES:
> Stage-1 is a root stage
> Stage-0 is a root stage
> STAGE PLANS:
> Stage: Stage-1
> Map Reduce
> Alias -> Map Operator Tree:
> t1
> TableScan
> alias: t1
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 0
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> t2
> TableScan
> alias: t2
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 1
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> t3
> TableScan
> alias: t3
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 2
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> Reduce Operator Tree:
> Join Operator
> condition map:
> Outer Join 0 to 1
> Inner Join 1 to 2
> condition expressions:
> 0 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> 1 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> 2 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> handleSkewJoin: false
> outputColumnNames: _col0, _col1, _col2, _col5, _col6, _col7, _col10, _col11, _col12
> Filter Operator
> predicate:
> expr: (_col10 = 20)
> type: boolean
> Select Operator
> expressions:
> expr: _col0
> type: int
> expr: _col1
> type: string
> expr: _col2
> type: string
> expr: _col5
> type: int
> expr: _col6
> type: string
> expr: _col7
> type: string
> expr: _col10
> type: int
> expr: _col11
> type: string
> expr: _col12
> type: string
> outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8
> File Output Operator
> compressed: false
> GlobalTableId: 0
> table:
> input format: org.apache.hadoop.mapred.TextInputFormat
> output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> Stage: Stage-0
> Fetch Operator
> limit: -1
> {noformat}
> while the correct behavior is to push the filter "t3.id=20" down:
> {noformat}
> STAGE DEPENDENCIES:
> Stage-1 is a root stage
> Stage-0 is a root stage
> STAGE PLANS:
> Stage: Stage-1
> Map Reduce
> Alias -> Map Operator Tree:
> t1
> TableScan
> alias: t1
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 0
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> t2
> TableScan
> alias: t2
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 1
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> t3
> TableScan
> alias: t3
> Filter Operator
> predicate:
> expr: (id = 20)
> type: boolean
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 2
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> Reduce Operator Tree:
> Join Operator
> condition map:
> Outer Join 0 to 1
> Inner Join 1 to 2
> condition expressions:
> 0 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> 1 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> 2 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> handleSkewJoin: false
> outputColumnNames: _col0, _col1, _col2, _col5, _col6, _col7, _col10, _col11, _col12
> Select Operator
> expressions:
> expr: _col0
> type: int
> expr: _col1
> type: string
> expr: _col2
> type: string
> expr: _col5
> type: int
> expr: _col6
> type: string
> expr: _col7
> type: string
> expr: _col10
> type: int
> expr: _col11
> type: string
> expr: _col12
> type: string
> outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8
> File Output Operator
> compressed: false
> GlobalTableId: 0
> table:
> input format: org.apache.hadoop.mapred.TextInputFormat
> output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> Stage: Stage-0
> Fetch Operator
> limit: -1
> {noformat}
> The current behavior is actually stranger than this: for a left outer join (similarly for a right outer join), hive finds the leftmost alias referred to in the *predicates* of left outer joins and rejects any alias to the right of it for pushdown. So in this query the filter "t2.id=20" pushed down:
> {noformat}
> explain select * from t1 join t2 on (t1.id=t2.id) left outer join t3 on (t2.id=t3.id) where t2.id=20;
> {noformat}
> while it isn't here:
> {noformat}
> explain select * from t1 join t2 on (t1.id=t2.id) left outer join t3 on (t1.id=t3.id) where t2.id=20;
> {noformat}
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2337) Predicate pushdown erroneously
conservative with outer joins
Posted by "Charles Chen (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-2337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Charles Chen updated HIVE-2337:
-------------------------------
Status: Patch Available (was: Open)
> Predicate pushdown erroneously conservative with outer joins
> ------------------------------------------------------------
>
> Key: HIVE-2337
> URL: https://issues.apache.org/jira/browse/HIVE-2337
> Project: Hive
> Issue Type: Bug
> Components: Query Processor
> Reporter: Charles Chen
> Assignee: Charles Chen
> Attachments: HIVE-2337v1.patch
>
>
> The predicate pushdown filter is not applying left associativity of joins correctly in determining possible aliases for pushing predicates.
> In hive.ql.ppd.OpProcFactory.JoinPPD.getQualifiedAliases, the criteria for pushing aliases is specified as:
> {noformat}
> /**
> * Figures out the aliases for whom it is safe to push predicates based on
> * ANSI SQL semantics For inner join, all predicates for all aliases can be
> * pushed For full outer join, none of the predicates can be pushed as that
> * would limit the number of rows for join For left outer join, all the
> * predicates on the left side aliases can be pushed up For right outer
> * join, all the predicates on the right side aliases can be pushed up Joins
> * chain containing both left and right outer joins are treated as full
> * outer join. [...]
> *
> * @param op
> * Join Operator
> * @param rr
> * Row resolver
> * @return set of qualified aliases
> */
> {noformat}
> Since hive joins are left associative, something like "a RIGHT OUTER JOIN b LEFT OUTER JOIN c INNER JOIN d" should be interpreted as "((a RIGHT OUTER JOIN b) LEFT OUTER JOIN c) INNER JOIN d", so there would be cases where joins with both left and right outer joins can have aliases that can be pushed. Here, aliases a and d are eligible to be pushed up while the current criteria provide that none are eligible.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2337) Predicate pushdown erroneously
conservative with outer joins
Posted by "jiraposter@reviews.apache.org (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-2337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13095013#comment-13095013 ]
jiraposter@reviews.apache.org commented on HIVE-2337:
-----------------------------------------------------
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1275/
-----------------------------------------------------------
(Updated 2011-09-01 00:08:37.474019)
Review request for hive.
Changes
-------
Fixed ppd_outer_join4.q.out
Summary
-------
https://issues.apache.org/jira/browse/HIVE-2337
This addresses bug HIVE-2337.
https://issues.apache.org/jira/browse/HIVE-2337
Diffs (updated)
-----
http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java 1163856
http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/queries/clientpositive/ppd_outer_join5.q PRE-CREATION
http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/ppd_outer_join4.q.out 1163856
http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/ppd_outer_join5.q.out PRE-CREATION
Diff: https://reviews.apache.org/r/1275/diff
Testing
-------
Unit tests passed
Thanks,
Charles
> Predicate pushdown erroneously conservative with outer joins
> ------------------------------------------------------------
>
> Key: HIVE-2337
> URL: https://issues.apache.org/jira/browse/HIVE-2337
> Project: Hive
> Issue Type: Bug
> Components: Query Processor
> Reporter: Charles Chen
> Assignee: Charles Chen
> Attachments: HIVE-2337v1.patch, HIVE-2337v2.patch, HIVE-2337v3.patch, HIVE-2337v4.patch
>
>
> The predicate pushdown filter is not applying left associativity of joins correctly in determining possible aliases for pushing predicates.
> In hive.ql.ppd.OpProcFactory.JoinPPD.getQualifiedAliases, the criteria for pushing aliases is specified as:
> {noformat}
> /**
> * Figures out the aliases for whom it is safe to push predicates based on
> * ANSI SQL semantics For inner join, all predicates for all aliases can be
> * pushed For full outer join, none of the predicates can be pushed as that
> * would limit the number of rows for join For left outer join, all the
> * predicates on the left side aliases can be pushed up For right outer
> * join, all the predicates on the right side aliases can be pushed up Joins
> * chain containing both left and right outer joins are treated as full
> * outer join. [...]
> *
> * @param op
> * Join Operator
> * @param rr
> * Row resolver
> * @return set of qualified aliases
> */
> {noformat}
> Since hive joins are left associative, something like "a RIGHT OUTER JOIN b LEFT OUTER JOIN c INNER JOIN d" should be interpreted as "((a RIGHT OUTER JOIN b) LEFT OUTER JOIN c) INNER JOIN d", so there would be cases where joins with both left and right outer joins can have aliases that can be pushed. Here, aliases b and d are eligible to be pushed up while the current criteria provide that none are eligible.
> Using:
> {noformat}
> create table t1 (id int, key string, value string);
> create table t2 (id int, key string, value string);
> create table t3 (id int, key string, value string);
> create table t4 (id int, key string, value string);
> {noformat}
> For example, the query
> {noformat}
> explain select * from t1 full outer join t2 on t1.id=t2.id join t3 on t2.id=t3.id where t3.id=20;
> {noformat}
> currently gives
> {noformat}
> STAGE DEPENDENCIES:
> Stage-1 is a root stage
> Stage-0 is a root stage
> STAGE PLANS:
> Stage: Stage-1
> Map Reduce
> Alias -> Map Operator Tree:
> t1
> TableScan
> alias: t1
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 0
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> t2
> TableScan
> alias: t2
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 1
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> t3
> TableScan
> alias: t3
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 2
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> Reduce Operator Tree:
> Join Operator
> condition map:
> Outer Join 0 to 1
> Inner Join 1 to 2
> condition expressions:
> 0 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> 1 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> 2 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> handleSkewJoin: false
> outputColumnNames: _col0, _col1, _col2, _col5, _col6, _col7, _col10, _col11, _col12
> Filter Operator
> predicate:
> expr: (_col10 = 20)
> type: boolean
> Select Operator
> expressions:
> expr: _col0
> type: int
> expr: _col1
> type: string
> expr: _col2
> type: string
> expr: _col5
> type: int
> expr: _col6
> type: string
> expr: _col7
> type: string
> expr: _col10
> type: int
> expr: _col11
> type: string
> expr: _col12
> type: string
> outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8
> File Output Operator
> compressed: false
> GlobalTableId: 0
> table:
> input format: org.apache.hadoop.mapred.TextInputFormat
> output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> Stage: Stage-0
> Fetch Operator
> limit: -1
> {noformat}
> while the correct behavior is to push the filter "t3.id=20" down:
> {noformat}
> STAGE DEPENDENCIES:
> Stage-1 is a root stage
> Stage-0 is a root stage
> STAGE PLANS:
> Stage: Stage-1
> Map Reduce
> Alias -> Map Operator Tree:
> t1
> TableScan
> alias: t1
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 0
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> t2
> TableScan
> alias: t2
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 1
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> t3
> TableScan
> alias: t3
> Filter Operator
> predicate:
> expr: (id = 20)
> type: boolean
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 2
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> Reduce Operator Tree:
> Join Operator
> condition map:
> Outer Join 0 to 1
> Inner Join 1 to 2
> condition expressions:
> 0 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> 1 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> 2 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> handleSkewJoin: false
> outputColumnNames: _col0, _col1, _col2, _col5, _col6, _col7, _col10, _col11, _col12
> Select Operator
> expressions:
> expr: _col0
> type: int
> expr: _col1
> type: string
> expr: _col2
> type: string
> expr: _col5
> type: int
> expr: _col6
> type: string
> expr: _col7
> type: string
> expr: _col10
> type: int
> expr: _col11
> type: string
> expr: _col12
> type: string
> outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8
> File Output Operator
> compressed: false
> GlobalTableId: 0
> table:
> input format: org.apache.hadoop.mapred.TextInputFormat
> output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> Stage: Stage-0
> Fetch Operator
> limit: -1
> {noformat}
> The current behavior is actually stranger than this: for a left outer join (similarly for a right outer join), hive finds the leftmost alias referred to in the *predicates* of left outer joins and rejects any alias to the right of it for pushdown. So in this query the filter "t2.id=20" pushed down:
> {noformat}
> explain select * from t1 join t2 on (t1.id=t2.id) left outer join t3 on (t2.id=t3.id) where t2.id=20;
> {noformat}
> while it isn't here:
> {noformat}
> explain select * from t1 join t2 on (t1.id=t2.id) left outer join t3 on (t1.id=t3.id) where t2.id=20;
> {noformat}
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2337) Predicate pushdown erroneously
conservative with outer joins
Posted by "Carl Steinbach (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-2337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Carl Steinbach updated HIVE-2337:
---------------------------------
Status: Open (was: Patch Available)
@Charles: There's a test failure in ppd_outer_join4. Can you please take a look? Thanks.
> Predicate pushdown erroneously conservative with outer joins
> ------------------------------------------------------------
>
> Key: HIVE-2337
> URL: https://issues.apache.org/jira/browse/HIVE-2337
> Project: Hive
> Issue Type: Bug
> Components: Query Processor
> Reporter: Charles Chen
> Assignee: Charles Chen
> Attachments: HIVE-2337v1.patch, HIVE-2337v2.patch, HIVE-2337v3.patch
>
>
> The predicate pushdown filter is not applying left associativity of joins correctly in determining possible aliases for pushing predicates.
> In hive.ql.ppd.OpProcFactory.JoinPPD.getQualifiedAliases, the criteria for pushing aliases is specified as:
> {noformat}
> /**
> * Figures out the aliases for whom it is safe to push predicates based on
> * ANSI SQL semantics For inner join, all predicates for all aliases can be
> * pushed For full outer join, none of the predicates can be pushed as that
> * would limit the number of rows for join For left outer join, all the
> * predicates on the left side aliases can be pushed up For right outer
> * join, all the predicates on the right side aliases can be pushed up Joins
> * chain containing both left and right outer joins are treated as full
> * outer join. [...]
> *
> * @param op
> * Join Operator
> * @param rr
> * Row resolver
> * @return set of qualified aliases
> */
> {noformat}
> Since hive joins are left associative, something like "a RIGHT OUTER JOIN b LEFT OUTER JOIN c INNER JOIN d" should be interpreted as "((a RIGHT OUTER JOIN b) LEFT OUTER JOIN c) INNER JOIN d", so there would be cases where joins with both left and right outer joins can have aliases that can be pushed. Here, aliases b and d are eligible to be pushed up while the current criteria provide that none are eligible.
> Using:
> {noformat}
> create table t1 (id int, key string, value string);
> create table t2 (id int, key string, value string);
> create table t3 (id int, key string, value string);
> create table t4 (id int, key string, value string);
> {noformat}
> For example, the query
> {noformat}
> explain select * from t1 full outer join t2 on t1.id=t2.id join t3 on t2.id=t3.id where t3.id=20;
> {noformat}
> currently gives
> {noformat}
> STAGE DEPENDENCIES:
> Stage-1 is a root stage
> Stage-0 is a root stage
> STAGE PLANS:
> Stage: Stage-1
> Map Reduce
> Alias -> Map Operator Tree:
> t1
> TableScan
> alias: t1
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 0
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> t2
> TableScan
> alias: t2
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 1
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> t3
> TableScan
> alias: t3
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 2
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> Reduce Operator Tree:
> Join Operator
> condition map:
> Outer Join 0 to 1
> Inner Join 1 to 2
> condition expressions:
> 0 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> 1 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> 2 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> handleSkewJoin: false
> outputColumnNames: _col0, _col1, _col2, _col5, _col6, _col7, _col10, _col11, _col12
> Filter Operator
> predicate:
> expr: (_col10 = 20)
> type: boolean
> Select Operator
> expressions:
> expr: _col0
> type: int
> expr: _col1
> type: string
> expr: _col2
> type: string
> expr: _col5
> type: int
> expr: _col6
> type: string
> expr: _col7
> type: string
> expr: _col10
> type: int
> expr: _col11
> type: string
> expr: _col12
> type: string
> outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8
> File Output Operator
> compressed: false
> GlobalTableId: 0
> table:
> input format: org.apache.hadoop.mapred.TextInputFormat
> output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> Stage: Stage-0
> Fetch Operator
> limit: -1
> {noformat}
> while the correct behavior is to push the filter "t3.id=20" down:
> {noformat}
> STAGE DEPENDENCIES:
> Stage-1 is a root stage
> Stage-0 is a root stage
> STAGE PLANS:
> Stage: Stage-1
> Map Reduce
> Alias -> Map Operator Tree:
> t1
> TableScan
> alias: t1
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 0
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> t2
> TableScan
> alias: t2
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 1
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> t3
> TableScan
> alias: t3
> Filter Operator
> predicate:
> expr: (id = 20)
> type: boolean
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 2
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> Reduce Operator Tree:
> Join Operator
> condition map:
> Outer Join 0 to 1
> Inner Join 1 to 2
> condition expressions:
> 0 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> 1 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> 2 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> handleSkewJoin: false
> outputColumnNames: _col0, _col1, _col2, _col5, _col6, _col7, _col10, _col11, _col12
> Select Operator
> expressions:
> expr: _col0
> type: int
> expr: _col1
> type: string
> expr: _col2
> type: string
> expr: _col5
> type: int
> expr: _col6
> type: string
> expr: _col7
> type: string
> expr: _col10
> type: int
> expr: _col11
> type: string
> expr: _col12
> type: string
> outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8
> File Output Operator
> compressed: false
> GlobalTableId: 0
> table:
> input format: org.apache.hadoop.mapred.TextInputFormat
> output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> Stage: Stage-0
> Fetch Operator
> limit: -1
> {noformat}
> The current behavior is actually stranger than this: for a left outer join (similarly for a right outer join), hive finds the leftmost alias referred to in the *predicates* of left outer joins and rejects any alias to the right of it for pushdown. So in this query the filter "t2.id=20" pushed down:
> {noformat}
> explain select * from t1 join t2 on (t1.id=t2.id) left outer join t3 on (t2.id=t3.id) where t2.id=20;
> {noformat}
> while it isn't here:
> {noformat}
> explain select * from t1 join t2 on (t1.id=t2.id) left outer join t3 on (t1.id=t3.id) where t2.id=20;
> {noformat}
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2337) Predicate pushdown erroneously
conservative with outer joins
Posted by "jiraposter@reviews.apache.org (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-2337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13078551#comment-13078551 ]
jiraposter@reviews.apache.org commented on HIVE-2337:
-----------------------------------------------------
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1275/
-----------------------------------------------------------
Review request for hive.
Summary
-------
Initial patch
This addresses bug HIVE-2337.
https://issues.apache.org/jira/browse/HIVE-2337
Diffs
-----
http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java 1153293
Diff: https://reviews.apache.org/r/1275/diff
Testing
-------
Thanks,
Charles
> Predicate pushdown erroneously conservative with outer joins
> ------------------------------------------------------------
>
> Key: HIVE-2337
> URL: https://issues.apache.org/jira/browse/HIVE-2337
> Project: Hive
> Issue Type: Bug
> Components: Query Processor
> Reporter: Charles Chen
> Assignee: Charles Chen
> Attachments: HIVE-2337v1.patch
>
>
> The predicate pushdown filter is not applying left associativity of joins correctly in determining possible aliases for pushing predicates.
> In hive.ql.ppd.OpProcFactory.JoinPPD.getQualifiedAliases, the criteria for pushing aliases is specified as:
> {noformat}
> /**
> * Figures out the aliases for whom it is safe to push predicates based on
> * ANSI SQL semantics For inner join, all predicates for all aliases can be
> * pushed For full outer join, none of the predicates can be pushed as that
> * would limit the number of rows for join For left outer join, all the
> * predicates on the left side aliases can be pushed up For right outer
> * join, all the predicates on the right side aliases can be pushed up Joins
> * chain containing both left and right outer joins are treated as full
> * outer join. [...]
> *
> * @param op
> * Join Operator
> * @param rr
> * Row resolver
> * @return set of qualified aliases
> */
> {noformat}
> Since hive joins are left associative, something like "a RIGHT OUTER JOIN b LEFT OUTER JOIN c INNER JOIN d" should be interpreted as "((a RIGHT OUTER JOIN b) LEFT OUTER JOIN c) INNER JOIN d", so there would be cases where joins with both left and right outer joins can have aliases that can be pushed. Here, aliases a and d are eligible to be pushed up while the current criteria provide that none are eligible.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2337) Predicate pushdown erroneously
conservative with outer joins
Posted by "jiraposter@reviews.apache.org (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-2337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081788#comment-13081788 ]
jiraposter@reviews.apache.org commented on HIVE-2337:
-----------------------------------------------------
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1275/
-----------------------------------------------------------
(Updated 2011-08-09 17:41:41.427733)
Review request for hive.
Changes
-------
Unit tests passed
Summary
-------
https://issues.apache.org/jira/browse/HIVE-2337
This addresses bug HIVE-2337.
https://issues.apache.org/jira/browse/HIVE-2337
Diffs
-----
http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java 1153598
http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/queries/clientpositive/ppd_outer_join5.q PRE-CREATION
http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/ppd_outer_join5.q.out PRE-CREATION
Diff: https://reviews.apache.org/r/1275/diff
Testing (updated)
-------
Unit tests passed
Thanks,
Charles
> Predicate pushdown erroneously conservative with outer joins
> ------------------------------------------------------------
>
> Key: HIVE-2337
> URL: https://issues.apache.org/jira/browse/HIVE-2337
> Project: Hive
> Issue Type: Bug
> Components: Query Processor
> Reporter: Charles Chen
> Assignee: Charles Chen
> Attachments: HIVE-2337v1.patch, HIVE-2337v2.patch, HIVE-2337v3.patch
>
>
> The predicate pushdown filter is not applying left associativity of joins correctly in determining possible aliases for pushing predicates.
> In hive.ql.ppd.OpProcFactory.JoinPPD.getQualifiedAliases, the criteria for pushing aliases is specified as:
> {noformat}
> /**
> * Figures out the aliases for whom it is safe to push predicates based on
> * ANSI SQL semantics For inner join, all predicates for all aliases can be
> * pushed For full outer join, none of the predicates can be pushed as that
> * would limit the number of rows for join For left outer join, all the
> * predicates on the left side aliases can be pushed up For right outer
> * join, all the predicates on the right side aliases can be pushed up Joins
> * chain containing both left and right outer joins are treated as full
> * outer join. [...]
> *
> * @param op
> * Join Operator
> * @param rr
> * Row resolver
> * @return set of qualified aliases
> */
> {noformat}
> Since hive joins are left associative, something like "a RIGHT OUTER JOIN b LEFT OUTER JOIN c INNER JOIN d" should be interpreted as "((a RIGHT OUTER JOIN b) LEFT OUTER JOIN c) INNER JOIN d", so there would be cases where joins with both left and right outer joins can have aliases that can be pushed. Here, aliases b and d are eligible to be pushed up while the current criteria provide that none are eligible.
> Using:
> {noformat}
> create table t1 (id int, key string, value string);
> create table t2 (id int, key string, value string);
> create table t3 (id int, key string, value string);
> create table t4 (id int, key string, value string);
> {noformat}
> For example, the query
> {noformat}
> explain select * from t1 full outer join t2 on t1.id=t2.id join t3 on t2.id=t3.id where t3.id=20;
> {noformat}
> currently gives
> {noformat}
> STAGE DEPENDENCIES:
> Stage-1 is a root stage
> Stage-0 is a root stage
> STAGE PLANS:
> Stage: Stage-1
> Map Reduce
> Alias -> Map Operator Tree:
> t1
> TableScan
> alias: t1
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 0
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> t2
> TableScan
> alias: t2
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 1
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> t3
> TableScan
> alias: t3
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 2
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> Reduce Operator Tree:
> Join Operator
> condition map:
> Outer Join 0 to 1
> Inner Join 1 to 2
> condition expressions:
> 0 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> 1 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> 2 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> handleSkewJoin: false
> outputColumnNames: _col0, _col1, _col2, _col5, _col6, _col7, _col10, _col11, _col12
> Filter Operator
> predicate:
> expr: (_col10 = 20)
> type: boolean
> Select Operator
> expressions:
> expr: _col0
> type: int
> expr: _col1
> type: string
> expr: _col2
> type: string
> expr: _col5
> type: int
> expr: _col6
> type: string
> expr: _col7
> type: string
> expr: _col10
> type: int
> expr: _col11
> type: string
> expr: _col12
> type: string
> outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8
> File Output Operator
> compressed: false
> GlobalTableId: 0
> table:
> input format: org.apache.hadoop.mapred.TextInputFormat
> output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> Stage: Stage-0
> Fetch Operator
> limit: -1
> {noformat}
> while the correct behavior is to push the filter "t3.id=20" down:
> {noformat}
> STAGE DEPENDENCIES:
> Stage-1 is a root stage
> Stage-0 is a root stage
> STAGE PLANS:
> Stage: Stage-1
> Map Reduce
> Alias -> Map Operator Tree:
> t1
> TableScan
> alias: t1
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 0
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> t2
> TableScan
> alias: t2
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 1
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> t3
> TableScan
> alias: t3
> Filter Operator
> predicate:
> expr: (id = 20)
> type: boolean
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 2
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> Reduce Operator Tree:
> Join Operator
> condition map:
> Outer Join 0 to 1
> Inner Join 1 to 2
> condition expressions:
> 0 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> 1 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> 2 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> handleSkewJoin: false
> outputColumnNames: _col0, _col1, _col2, _col5, _col6, _col7, _col10, _col11, _col12
> Select Operator
> expressions:
> expr: _col0
> type: int
> expr: _col1
> type: string
> expr: _col2
> type: string
> expr: _col5
> type: int
> expr: _col6
> type: string
> expr: _col7
> type: string
> expr: _col10
> type: int
> expr: _col11
> type: string
> expr: _col12
> type: string
> outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8
> File Output Operator
> compressed: false
> GlobalTableId: 0
> table:
> input format: org.apache.hadoop.mapred.TextInputFormat
> output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> Stage: Stage-0
> Fetch Operator
> limit: -1
> {noformat}
> The current behavior is actually stranger than this: for a left outer join (similarly for a right outer join), hive finds the leftmost alias referred to in the *predicates* of left outer joins and rejects any alias to the right of it for pushdown. So in this query the filter "t2.id=20" pushed down:
> {noformat}
> explain select * from t1 join t2 on (t1.id=t2.id) left outer join t3 on (t2.id=t3.id) where t2.id=20;
> {noformat}
> while it isn't here:
> {noformat}
> explain select * from t1 join t2 on (t1.id=t2.id) left outer join t3 on (t1.id=t3.id) where t2.id=20;
> {noformat}
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2337) Predicate pushdown erroneously
conservative with outer joins
Posted by "Charles Chen (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-2337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Charles Chen updated HIVE-2337:
-------------------------------
Attachment: (was: HIVE-2337v2.patch)
> Predicate pushdown erroneously conservative with outer joins
> ------------------------------------------------------------
>
> Key: HIVE-2337
> URL: https://issues.apache.org/jira/browse/HIVE-2337
> Project: Hive
> Issue Type: Bug
> Components: Query Processor
> Reporter: Charles Chen
> Assignee: Charles Chen
> Attachments: HIVE-2337v1.patch, HIVE-2337v2.patch
>
>
> The predicate pushdown filter is not applying left associativity of joins correctly in determining possible aliases for pushing predicates.
> In hive.ql.ppd.OpProcFactory.JoinPPD.getQualifiedAliases, the criteria for pushing aliases is specified as:
> {noformat}
> /**
> * Figures out the aliases for whom it is safe to push predicates based on
> * ANSI SQL semantics For inner join, all predicates for all aliases can be
> * pushed For full outer join, none of the predicates can be pushed as that
> * would limit the number of rows for join For left outer join, all the
> * predicates on the left side aliases can be pushed up For right outer
> * join, all the predicates on the right side aliases can be pushed up Joins
> * chain containing both left and right outer joins are treated as full
> * outer join. [...]
> *
> * @param op
> * Join Operator
> * @param rr
> * Row resolver
> * @return set of qualified aliases
> */
> {noformat}
> Since hive joins are left associative, something like "a RIGHT OUTER JOIN b LEFT OUTER JOIN c INNER JOIN d" should be interpreted as "((a RIGHT OUTER JOIN b) LEFT OUTER JOIN c) INNER JOIN d", so there would be cases where joins with both left and right outer joins can have aliases that can be pushed. Here, aliases b and d are eligible to be pushed up while the current criteria provide that none are eligible.
> Using:
> {noformat}
> create table t1 (id int, key string, value string);
> create table t2 (id int, key string, value string);
> create table t3 (id int, key string, value string);
> create table t4 (id int, key string, value string);
> {noformat}
> For example, the query
> {noformat}
> explain select * from t1 full outer join t2 on t1.id=t2.id join t3 on t2.id=t3.id where t3.id=20;
> {noformat}
> currently gives
> {noformat}
> STAGE DEPENDENCIES:
> Stage-1 is a root stage
> Stage-0 is a root stage
> STAGE PLANS:
> Stage: Stage-1
> Map Reduce
> Alias -> Map Operator Tree:
> t1
> TableScan
> alias: t1
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 0
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> t2
> TableScan
> alias: t2
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 1
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> t3
> TableScan
> alias: t3
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 2
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> Reduce Operator Tree:
> Join Operator
> condition map:
> Outer Join 0 to 1
> Inner Join 1 to 2
> condition expressions:
> 0 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> 1 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> 2 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> handleSkewJoin: false
> outputColumnNames: _col0, _col1, _col2, _col5, _col6, _col7, _col10, _col11, _col12
> Filter Operator
> predicate:
> expr: (_col10 = 20)
> type: boolean
> Select Operator
> expressions:
> expr: _col0
> type: int
> expr: _col1
> type: string
> expr: _col2
> type: string
> expr: _col5
> type: int
> expr: _col6
> type: string
> expr: _col7
> type: string
> expr: _col10
> type: int
> expr: _col11
> type: string
> expr: _col12
> type: string
> outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8
> File Output Operator
> compressed: false
> GlobalTableId: 0
> table:
> input format: org.apache.hadoop.mapred.TextInputFormat
> output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> Stage: Stage-0
> Fetch Operator
> limit: -1
> {noformat}
> while the correct behavior is to push the filter "t3.id=20" down:
> {noformat}
> STAGE DEPENDENCIES:
> Stage-1 is a root stage
> Stage-0 is a root stage
> STAGE PLANS:
> Stage: Stage-1
> Map Reduce
> Alias -> Map Operator Tree:
> t1
> TableScan
> alias: t1
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 0
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> t2
> TableScan
> alias: t2
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 1
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> t3
> TableScan
> alias: t3
> Filter Operator
> predicate:
> expr: (id = 20)
> type: boolean
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 2
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> Reduce Operator Tree:
> Join Operator
> condition map:
> Outer Join 0 to 1
> Inner Join 1 to 2
> condition expressions:
> 0 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> 1 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> 2 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> handleSkewJoin: false
> outputColumnNames: _col0, _col1, _col2, _col5, _col6, _col7, _col10, _col11, _col12
> Select Operator
> expressions:
> expr: _col0
> type: int
> expr: _col1
> type: string
> expr: _col2
> type: string
> expr: _col5
> type: int
> expr: _col6
> type: string
> expr: _col7
> type: string
> expr: _col10
> type: int
> expr: _col11
> type: string
> expr: _col12
> type: string
> outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8
> File Output Operator
> compressed: false
> GlobalTableId: 0
> table:
> input format: org.apache.hadoop.mapred.TextInputFormat
> output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> Stage: Stage-0
> Fetch Operator
> limit: -1
> {noformat}
> The current behavior is actually stranger than this: for a left outer join (similarly for a right outer join), hive finds the leftmost alias referred to in the *predicates* of left outer joins and rejects any alias to the right of it for pushdown. So in this query the filter "t2.id=20" pushed down:
> {noformat}
> explain select * from t1 join t2 on (t1.id=t2.id) left outer join t3 on (t2.id=t3.id) where t2.id=20;
> {noformat}
> while it isn't here:
> {noformat}
> explain select * from t1 join t2 on (t1.id=t2.id) left outer join t3 on (t1.id=t3.id) where t2.id=20;
> {noformat}
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2337) Predicate pushdown erroneously
conservative with outer joins
Posted by "Charles Chen (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-2337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Charles Chen updated HIVE-2337:
-------------------------------
Description:
The predicate pushdown filter is not applying left associativity of joins correctly in determining possible aliases for pushing predicates.
In hive.ql.ppd.OpProcFactory.JoinPPD.getQualifiedAliases, the criteria for pushing aliases is specified as:
{noformat}
/**
* Figures out the aliases for whom it is safe to push predicates based on
* ANSI SQL semantics For inner join, all predicates for all aliases can be
* pushed For full outer join, none of the predicates can be pushed as that
* would limit the number of rows for join For left outer join, all the
* predicates on the left side aliases can be pushed up For right outer
* join, all the predicates on the right side aliases can be pushed up Joins
* chain containing both left and right outer joins are treated as full
* outer join. [...]
*
* @param op
* Join Operator
* @param rr
* Row resolver
* @return set of qualified aliases
*/
{noformat}
Since hive joins are left associative, something like "a RIGHT OUTER JOIN b LEFT OUTER JOIN c INNER JOIN d" should be interpreted as "((a RIGHT OUTER JOIN b) LEFT OUTER JOIN c) INNER JOIN d", so there would be cases where joins with both left and right outer joins can have aliases that can be pushed. Here, aliases b and d are eligible to be pushed up while the current criteria provide that none are eligible.
Using:
{noformat}
create table t1 (id int, key string, value string);
create table t2 (id int, key string, value string);
create table t3 (id int, key string, value string);
create table t4 (id int, key string, value string);
{noformat}
For example, the query
{noformat}
explain select * from t1 full outer join t2 on t1.id=t2.id join t3 on t2.id=t3.id where t3.id=20;
{noformat}
currently gives
{noformat}
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-0 is a root stage
STAGE PLANS:
Stage: Stage-1
Map Reduce
Alias -> Map Operator Tree:
t1
TableScan
alias: t1
Reduce Output Operator
key expressions:
expr: id
type: int
sort order: +
Map-reduce partition columns:
expr: id
type: int
tag: 0
value expressions:
expr: id
type: int
expr: key
type: string
expr: value
type: string
t2
TableScan
alias: t2
Reduce Output Operator
key expressions:
expr: id
type: int
sort order: +
Map-reduce partition columns:
expr: id
type: int
tag: 1
value expressions:
expr: id
type: int
expr: key
type: string
expr: value
type: string
t3
TableScan
alias: t3
Reduce Output Operator
key expressions:
expr: id
type: int
sort order: +
Map-reduce partition columns:
expr: id
type: int
tag: 2
value expressions:
expr: id
type: int
expr: key
type: string
expr: value
type: string
Reduce Operator Tree:
Join Operator
condition map:
Outer Join 0 to 1
Inner Join 1 to 2
condition expressions:
0 {VALUE._col0} {VALUE._col1} {VALUE._col2}
1 {VALUE._col0} {VALUE._col1} {VALUE._col2}
2 {VALUE._col0} {VALUE._col1} {VALUE._col2}
handleSkewJoin: false
outputColumnNames: _col0, _col1, _col2, _col5, _col6, _col7, _col10, _col11, _col12
Filter Operator
predicate:
expr: (_col10 = 20)
type: boolean
Select Operator
expressions:
expr: _col0
type: int
expr: _col1
type: string
expr: _col2
type: string
expr: _col5
type: int
expr: _col6
type: string
expr: _col7
type: string
expr: _col10
type: int
expr: _col11
type: string
expr: _col12
type: string
outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8
File Output Operator
compressed: false
GlobalTableId: 0
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Stage: Stage-0
Fetch Operator
limit: -1
{noformat}
while the correct behavior is to push the filter "t3.id=20" down:
{noformat}
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-0 is a root stage
STAGE PLANS:
Stage: Stage-1
Map Reduce
Alias -> Map Operator Tree:
t1
TableScan
alias: t1
Reduce Output Operator
key expressions:
expr: id
type: int
sort order: +
Map-reduce partition columns:
expr: id
type: int
tag: 0
value expressions:
expr: id
type: int
expr: key
type: string
expr: value
type: string
t2
TableScan
alias: t2
Reduce Output Operator
key expressions:
expr: id
type: int
sort order: +
Map-reduce partition columns:
expr: id
type: int
tag: 1
value expressions:
expr: id
type: int
expr: key
type: string
expr: value
type: string
t3
TableScan
alias: t3
Filter Operator
predicate:
expr: (id = 20)
type: boolean
Reduce Output Operator
key expressions:
expr: id
type: int
sort order: +
Map-reduce partition columns:
expr: id
type: int
tag: 2
value expressions:
expr: id
type: int
expr: key
type: string
expr: value
type: string
Reduce Operator Tree:
Join Operator
condition map:
Outer Join 0 to 1
Inner Join 1 to 2
condition expressions:
0 {VALUE._col0} {VALUE._col1} {VALUE._col2}
1 {VALUE._col0} {VALUE._col1} {VALUE._col2}
2 {VALUE._col0} {VALUE._col1} {VALUE._col2}
handleSkewJoin: false
outputColumnNames: _col0, _col1, _col2, _col5, _col6, _col7, _col10, _col11, _col12
Select Operator
expressions:
expr: _col0
type: int
expr: _col1
type: string
expr: _col2
type: string
expr: _col5
type: int
expr: _col6
type: string
expr: _col7
type: string
expr: _col10
type: int
expr: _col11
type: string
expr: _col12
type: string
outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8
File Output Operator
compressed: false
GlobalTableId: 0
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Stage: Stage-0
Fetch Operator
limit: -1
{noformat}
The current behavior is actually stranger than this: for a left outer join (similarly for a right outer join), hive finds the leftmost alias referred to in the *predicates* of left outer joins and rejects any alias to the right of it for pushdown. So in this query the filter "t2.id=20" pushed down:
{noformat}
explain select * from t1 join t2 on (t1.id=t2.id) left outer join t3 on (t2.id=t3.id) where t2.id=20;
{noformat}
while it isn't here:
{noformat}
explain select * from t1 join t2 on (t1.id=t2.id) left outer join t3 on (t1.id=t3.id) where t2.id=20;
{noformat}
was:
The predicate pushdown filter is not applying left associativity of joins correctly in determining possible aliases for pushing predicates.
In hive.ql.ppd.OpProcFactory.JoinPPD.getQualifiedAliases, the criteria for pushing aliases is specified as:
{noformat}
/**
* Figures out the aliases for whom it is safe to push predicates based on
* ANSI SQL semantics For inner join, all predicates for all aliases can be
* pushed For full outer join, none of the predicates can be pushed as that
* would limit the number of rows for join For left outer join, all the
* predicates on the left side aliases can be pushed up For right outer
* join, all the predicates on the right side aliases can be pushed up Joins
* chain containing both left and right outer joins are treated as full
* outer join. [...]
*
* @param op
* Join Operator
* @param rr
* Row resolver
* @return set of qualified aliases
*/
{noformat}
Since hive joins are left associative, something like "a RIGHT OUTER JOIN b LEFT OUTER JOIN c INNER JOIN d" should be interpreted as "((a RIGHT OUTER JOIN b) LEFT OUTER JOIN c) INNER JOIN d", so there would be cases where joins with both left and right outer joins can have aliases that can be pushed. Here, aliases b and d are eligible to be pushed up while the current criteria provide that none are eligible.
> Predicate pushdown erroneously conservative with outer joins
> ------------------------------------------------------------
>
> Key: HIVE-2337
> URL: https://issues.apache.org/jira/browse/HIVE-2337
> Project: Hive
> Issue Type: Bug
> Components: Query Processor
> Reporter: Charles Chen
> Assignee: Charles Chen
> Attachments: HIVE-2337v1.patch
>
>
> The predicate pushdown filter is not applying left associativity of joins correctly in determining possible aliases for pushing predicates.
> In hive.ql.ppd.OpProcFactory.JoinPPD.getQualifiedAliases, the criteria for pushing aliases is specified as:
> {noformat}
> /**
> * Figures out the aliases for whom it is safe to push predicates based on
> * ANSI SQL semantics For inner join, all predicates for all aliases can be
> * pushed For full outer join, none of the predicates can be pushed as that
> * would limit the number of rows for join For left outer join, all the
> * predicates on the left side aliases can be pushed up For right outer
> * join, all the predicates on the right side aliases can be pushed up Joins
> * chain containing both left and right outer joins are treated as full
> * outer join. [...]
> *
> * @param op
> * Join Operator
> * @param rr
> * Row resolver
> * @return set of qualified aliases
> */
> {noformat}
> Since hive joins are left associative, something like "a RIGHT OUTER JOIN b LEFT OUTER JOIN c INNER JOIN d" should be interpreted as "((a RIGHT OUTER JOIN b) LEFT OUTER JOIN c) INNER JOIN d", so there would be cases where joins with both left and right outer joins can have aliases that can be pushed. Here, aliases b and d are eligible to be pushed up while the current criteria provide that none are eligible.
> Using:
> {noformat}
> create table t1 (id int, key string, value string);
> create table t2 (id int, key string, value string);
> create table t3 (id int, key string, value string);
> create table t4 (id int, key string, value string);
> {noformat}
> For example, the query
> {noformat}
> explain select * from t1 full outer join t2 on t1.id=t2.id join t3 on t2.id=t3.id where t3.id=20;
> {noformat}
> currently gives
> {noformat}
> STAGE DEPENDENCIES:
> Stage-1 is a root stage
> Stage-0 is a root stage
> STAGE PLANS:
> Stage: Stage-1
> Map Reduce
> Alias -> Map Operator Tree:
> t1
> TableScan
> alias: t1
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 0
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> t2
> TableScan
> alias: t2
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 1
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> t3
> TableScan
> alias: t3
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 2
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> Reduce Operator Tree:
> Join Operator
> condition map:
> Outer Join 0 to 1
> Inner Join 1 to 2
> condition expressions:
> 0 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> 1 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> 2 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> handleSkewJoin: false
> outputColumnNames: _col0, _col1, _col2, _col5, _col6, _col7, _col10, _col11, _col12
> Filter Operator
> predicate:
> expr: (_col10 = 20)
> type: boolean
> Select Operator
> expressions:
> expr: _col0
> type: int
> expr: _col1
> type: string
> expr: _col2
> type: string
> expr: _col5
> type: int
> expr: _col6
> type: string
> expr: _col7
> type: string
> expr: _col10
> type: int
> expr: _col11
> type: string
> expr: _col12
> type: string
> outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8
> File Output Operator
> compressed: false
> GlobalTableId: 0
> table:
> input format: org.apache.hadoop.mapred.TextInputFormat
> output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> Stage: Stage-0
> Fetch Operator
> limit: -1
> {noformat}
> while the correct behavior is to push the filter "t3.id=20" down:
> {noformat}
> STAGE DEPENDENCIES:
> Stage-1 is a root stage
> Stage-0 is a root stage
> STAGE PLANS:
> Stage: Stage-1
> Map Reduce
> Alias -> Map Operator Tree:
> t1
> TableScan
> alias: t1
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 0
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> t2
> TableScan
> alias: t2
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 1
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> t3
> TableScan
> alias: t3
> Filter Operator
> predicate:
> expr: (id = 20)
> type: boolean
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 2
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> Reduce Operator Tree:
> Join Operator
> condition map:
> Outer Join 0 to 1
> Inner Join 1 to 2
> condition expressions:
> 0 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> 1 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> 2 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> handleSkewJoin: false
> outputColumnNames: _col0, _col1, _col2, _col5, _col6, _col7, _col10, _col11, _col12
> Select Operator
> expressions:
> expr: _col0
> type: int
> expr: _col1
> type: string
> expr: _col2
> type: string
> expr: _col5
> type: int
> expr: _col6
> type: string
> expr: _col7
> type: string
> expr: _col10
> type: int
> expr: _col11
> type: string
> expr: _col12
> type: string
> outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8
> File Output Operator
> compressed: false
> GlobalTableId: 0
> table:
> input format: org.apache.hadoop.mapred.TextInputFormat
> output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> Stage: Stage-0
> Fetch Operator
> limit: -1
> {noformat}
> The current behavior is actually stranger than this: for a left outer join (similarly for a right outer join), hive finds the leftmost alias referred to in the *predicates* of left outer joins and rejects any alias to the right of it for pushdown. So in this query the filter "t2.id=20" pushed down:
> {noformat}
> explain select * from t1 join t2 on (t1.id=t2.id) left outer join t3 on (t2.id=t3.id) where t2.id=20;
> {noformat}
> while it isn't here:
> {noformat}
> explain select * from t1 join t2 on (t1.id=t2.id) left outer join t3 on (t1.id=t3.id) where t2.id=20;
> {noformat}
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2337) Predicate pushdown erroneously
conservative with outer joins
Posted by "Amareshwari Sriramadasu (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-2337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Amareshwari Sriramadasu updated HIVE-2337:
------------------------------------------
Status: Open (was: Patch Available)
Charles, Can you add a test case including explain plans?
> Predicate pushdown erroneously conservative with outer joins
> ------------------------------------------------------------
>
> Key: HIVE-2337
> URL: https://issues.apache.org/jira/browse/HIVE-2337
> Project: Hive
> Issue Type: Bug
> Components: Query Processor
> Reporter: Charles Chen
> Assignee: Charles Chen
> Attachments: HIVE-2337v1.patch
>
>
> The predicate pushdown filter is not applying left associativity of joins correctly in determining possible aliases for pushing predicates.
> In hive.ql.ppd.OpProcFactory.JoinPPD.getQualifiedAliases, the criteria for pushing aliases is specified as:
> {noformat}
> /**
> * Figures out the aliases for whom it is safe to push predicates based on
> * ANSI SQL semantics For inner join, all predicates for all aliases can be
> * pushed For full outer join, none of the predicates can be pushed as that
> * would limit the number of rows for join For left outer join, all the
> * predicates on the left side aliases can be pushed up For right outer
> * join, all the predicates on the right side aliases can be pushed up Joins
> * chain containing both left and right outer joins are treated as full
> * outer join. [...]
> *
> * @param op
> * Join Operator
> * @param rr
> * Row resolver
> * @return set of qualified aliases
> */
> {noformat}
> Since hive joins are left associative, something like "a RIGHT OUTER JOIN b LEFT OUTER JOIN c INNER JOIN d" should be interpreted as "((a RIGHT OUTER JOIN b) LEFT OUTER JOIN c) INNER JOIN d", so there would be cases where joins with both left and right outer joins can have aliases that can be pushed. Here, aliases b and d are eligible to be pushed up while the current criteria provide that none are eligible.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2337) Predicate pushdown erroneously
conservative with outer joins
Posted by "Charles Chen (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-2337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Charles Chen updated HIVE-2337:
-------------------------------
Attachment: HIVE-2337v2.patch
> Predicate pushdown erroneously conservative with outer joins
> ------------------------------------------------------------
>
> Key: HIVE-2337
> URL: https://issues.apache.org/jira/browse/HIVE-2337
> Project: Hive
> Issue Type: Bug
> Components: Query Processor
> Reporter: Charles Chen
> Assignee: Charles Chen
> Attachments: HIVE-2337v1.patch, HIVE-2337v2.patch
>
>
> The predicate pushdown filter is not applying left associativity of joins correctly in determining possible aliases for pushing predicates.
> In hive.ql.ppd.OpProcFactory.JoinPPD.getQualifiedAliases, the criteria for pushing aliases is specified as:
> {noformat}
> /**
> * Figures out the aliases for whom it is safe to push predicates based on
> * ANSI SQL semantics For inner join, all predicates for all aliases can be
> * pushed For full outer join, none of the predicates can be pushed as that
> * would limit the number of rows for join For left outer join, all the
> * predicates on the left side aliases can be pushed up For right outer
> * join, all the predicates on the right side aliases can be pushed up Joins
> * chain containing both left and right outer joins are treated as full
> * outer join. [...]
> *
> * @param op
> * Join Operator
> * @param rr
> * Row resolver
> * @return set of qualified aliases
> */
> {noformat}
> Since hive joins are left associative, something like "a RIGHT OUTER JOIN b LEFT OUTER JOIN c INNER JOIN d" should be interpreted as "((a RIGHT OUTER JOIN b) LEFT OUTER JOIN c) INNER JOIN d", so there would be cases where joins with both left and right outer joins can have aliases that can be pushed. Here, aliases b and d are eligible to be pushed up while the current criteria provide that none are eligible.
> Using:
> {noformat}
> create table t1 (id int, key string, value string);
> create table t2 (id int, key string, value string);
> create table t3 (id int, key string, value string);
> create table t4 (id int, key string, value string);
> {noformat}
> For example, the query
> {noformat}
> explain select * from t1 full outer join t2 on t1.id=t2.id join t3 on t2.id=t3.id where t3.id=20;
> {noformat}
> currently gives
> {noformat}
> STAGE DEPENDENCIES:
> Stage-1 is a root stage
> Stage-0 is a root stage
> STAGE PLANS:
> Stage: Stage-1
> Map Reduce
> Alias -> Map Operator Tree:
> t1
> TableScan
> alias: t1
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 0
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> t2
> TableScan
> alias: t2
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 1
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> t3
> TableScan
> alias: t3
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 2
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> Reduce Operator Tree:
> Join Operator
> condition map:
> Outer Join 0 to 1
> Inner Join 1 to 2
> condition expressions:
> 0 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> 1 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> 2 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> handleSkewJoin: false
> outputColumnNames: _col0, _col1, _col2, _col5, _col6, _col7, _col10, _col11, _col12
> Filter Operator
> predicate:
> expr: (_col10 = 20)
> type: boolean
> Select Operator
> expressions:
> expr: _col0
> type: int
> expr: _col1
> type: string
> expr: _col2
> type: string
> expr: _col5
> type: int
> expr: _col6
> type: string
> expr: _col7
> type: string
> expr: _col10
> type: int
> expr: _col11
> type: string
> expr: _col12
> type: string
> outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8
> File Output Operator
> compressed: false
> GlobalTableId: 0
> table:
> input format: org.apache.hadoop.mapred.TextInputFormat
> output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> Stage: Stage-0
> Fetch Operator
> limit: -1
> {noformat}
> while the correct behavior is to push the filter "t3.id=20" down:
> {noformat}
> STAGE DEPENDENCIES:
> Stage-1 is a root stage
> Stage-0 is a root stage
> STAGE PLANS:
> Stage: Stage-1
> Map Reduce
> Alias -> Map Operator Tree:
> t1
> TableScan
> alias: t1
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 0
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> t2
> TableScan
> alias: t2
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 1
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> t3
> TableScan
> alias: t3
> Filter Operator
> predicate:
> expr: (id = 20)
> type: boolean
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 2
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> Reduce Operator Tree:
> Join Operator
> condition map:
> Outer Join 0 to 1
> Inner Join 1 to 2
> condition expressions:
> 0 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> 1 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> 2 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> handleSkewJoin: false
> outputColumnNames: _col0, _col1, _col2, _col5, _col6, _col7, _col10, _col11, _col12
> Select Operator
> expressions:
> expr: _col0
> type: int
> expr: _col1
> type: string
> expr: _col2
> type: string
> expr: _col5
> type: int
> expr: _col6
> type: string
> expr: _col7
> type: string
> expr: _col10
> type: int
> expr: _col11
> type: string
> expr: _col12
> type: string
> outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8
> File Output Operator
> compressed: false
> GlobalTableId: 0
> table:
> input format: org.apache.hadoop.mapred.TextInputFormat
> output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> Stage: Stage-0
> Fetch Operator
> limit: -1
> {noformat}
> The current behavior is actually stranger than this: for a left outer join (similarly for a right outer join), hive finds the leftmost alias referred to in the *predicates* of left outer joins and rejects any alias to the right of it for pushdown. So in this query the filter "t2.id=20" pushed down:
> {noformat}
> explain select * from t1 join t2 on (t1.id=t2.id) left outer join t3 on (t2.id=t3.id) where t2.id=20;
> {noformat}
> while it isn't here:
> {noformat}
> explain select * from t1 join t2 on (t1.id=t2.id) left outer join t3 on (t1.id=t3.id) where t2.id=20;
> {noformat}
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2337) Predicate pushdown erroneously
conservative with outer joins
Posted by "Charles Chen (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-2337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Charles Chen updated HIVE-2337:
-------------------------------
Attachment: HIVE-2337v7.patch
> Predicate pushdown erroneously conservative with outer joins
> ------------------------------------------------------------
>
> Key: HIVE-2337
> URL: https://issues.apache.org/jira/browse/HIVE-2337
> Project: Hive
> Issue Type: Bug
> Components: Query Processor
> Reporter: Charles Chen
> Assignee: Charles Chen
> Fix For: 0.9.0
>
> Attachments: HIVE-2337v1.patch, HIVE-2337v2.patch, HIVE-2337v3.patch, HIVE-2337v4.patch, HIVE-2337v5.patch, HIVE-2337v6.patch, HIVE-2337v7.patch
>
>
> The predicate pushdown filter is not applying left associativity of joins correctly in determining possible aliases for pushing predicates.
> In hive.ql.ppd.OpProcFactory.JoinPPD.getQualifiedAliases, the criteria for pushing aliases is specified as:
> {noformat}
> /**
> * Figures out the aliases for whom it is safe to push predicates based on
> * ANSI SQL semantics For inner join, all predicates for all aliases can be
> * pushed For full outer join, none of the predicates can be pushed as that
> * would limit the number of rows for join For left outer join, all the
> * predicates on the left side aliases can be pushed up For right outer
> * join, all the predicates on the right side aliases can be pushed up Joins
> * chain containing both left and right outer joins are treated as full
> * outer join. [...]
> *
> * @param op
> * Join Operator
> * @param rr
> * Row resolver
> * @return set of qualified aliases
> */
> {noformat}
> Since hive joins are left associative, something like "a RIGHT OUTER JOIN b LEFT OUTER JOIN c INNER JOIN d" should be interpreted as "((a RIGHT OUTER JOIN b) LEFT OUTER JOIN c) INNER JOIN d", so there would be cases where joins with both left and right outer joins can have aliases that can be pushed. Here, aliases b and d are eligible to be pushed up while the current criteria provide that none are eligible.
> Using:
> {noformat}
> create table t1 (id int, key string, value string);
> create table t2 (id int, key string, value string);
> create table t3 (id int, key string, value string);
> create table t4 (id int, key string, value string);
> {noformat}
> For example, the query
> {noformat}
> explain select * from t1 full outer join t2 on t1.id=t2.id join t3 on t2.id=t3.id where t3.id=20;
> {noformat}
> currently gives
> {noformat}
> STAGE DEPENDENCIES:
> Stage-1 is a root stage
> Stage-0 is a root stage
> STAGE PLANS:
> Stage: Stage-1
> Map Reduce
> Alias -> Map Operator Tree:
> t1
> TableScan
> alias: t1
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 0
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> t2
> TableScan
> alias: t2
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 1
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> t3
> TableScan
> alias: t3
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 2
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> Reduce Operator Tree:
> Join Operator
> condition map:
> Outer Join 0 to 1
> Inner Join 1 to 2
> condition expressions:
> 0 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> 1 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> 2 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> handleSkewJoin: false
> outputColumnNames: _col0, _col1, _col2, _col5, _col6, _col7, _col10, _col11, _col12
> Filter Operator
> predicate:
> expr: (_col10 = 20)
> type: boolean
> Select Operator
> expressions:
> expr: _col0
> type: int
> expr: _col1
> type: string
> expr: _col2
> type: string
> expr: _col5
> type: int
> expr: _col6
> type: string
> expr: _col7
> type: string
> expr: _col10
> type: int
> expr: _col11
> type: string
> expr: _col12
> type: string
> outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8
> File Output Operator
> compressed: false
> GlobalTableId: 0
> table:
> input format: org.apache.hadoop.mapred.TextInputFormat
> output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> Stage: Stage-0
> Fetch Operator
> limit: -1
> {noformat}
> while the correct behavior is to push the filter "t3.id=20" down:
> {noformat}
> STAGE DEPENDENCIES:
> Stage-1 is a root stage
> Stage-0 is a root stage
> STAGE PLANS:
> Stage: Stage-1
> Map Reduce
> Alias -> Map Operator Tree:
> t1
> TableScan
> alias: t1
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 0
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> t2
> TableScan
> alias: t2
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 1
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> t3
> TableScan
> alias: t3
> Filter Operator
> predicate:
> expr: (id = 20)
> type: boolean
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 2
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> Reduce Operator Tree:
> Join Operator
> condition map:
> Outer Join 0 to 1
> Inner Join 1 to 2
> condition expressions:
> 0 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> 1 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> 2 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> handleSkewJoin: false
> outputColumnNames: _col0, _col1, _col2, _col5, _col6, _col7, _col10, _col11, _col12
> Select Operator
> expressions:
> expr: _col0
> type: int
> expr: _col1
> type: string
> expr: _col2
> type: string
> expr: _col5
> type: int
> expr: _col6
> type: string
> expr: _col7
> type: string
> expr: _col10
> type: int
> expr: _col11
> type: string
> expr: _col12
> type: string
> outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8
> File Output Operator
> compressed: false
> GlobalTableId: 0
> table:
> input format: org.apache.hadoop.mapred.TextInputFormat
> output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> Stage: Stage-0
> Fetch Operator
> limit: -1
> {noformat}
> The current behavior is actually stranger than this: for a left outer join (similarly for a right outer join), hive finds the leftmost alias referred to in the *predicates* of left outer joins and rejects any alias to the right of it for pushdown. So in this query the filter "t2.id=20" pushed down:
> {noformat}
> explain select * from t1 join t2 on (t1.id=t2.id) left outer join t3 on (t2.id=t3.id) where t2.id=20;
> {noformat}
> while it isn't here:
> {noformat}
> explain select * from t1 join t2 on (t1.id=t2.id) left outer join t3 on (t1.id=t3.id) where t2.id=20;
> {noformat}
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2337) Predicate pushdown erroneously
conservative with outer joins
Posted by "Charles Chen (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-2337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Charles Chen updated HIVE-2337:
-------------------------------
Attachment: HIVE-2337v1.patch
> Predicate pushdown erroneously conservative with outer joins
> ------------------------------------------------------------
>
> Key: HIVE-2337
> URL: https://issues.apache.org/jira/browse/HIVE-2337
> Project: Hive
> Issue Type: Bug
> Components: Query Processor
> Reporter: Charles Chen
> Assignee: Charles Chen
> Attachments: HIVE-2337v1.patch
>
>
> The predicate pushdown filter is not applying left associativity of joins correctly in determining possible aliases for pushing predicates.
> In hive.ql.ppd.OpProcFactory.JoinPPD.getQualifiedAliases, the criteria for pushing aliases is specified as:
> {noformat}
> /**
> * Figures out the aliases for whom it is safe to push predicates based on
> * ANSI SQL semantics For inner join, all predicates for all aliases can be
> * pushed For full outer join, none of the predicates can be pushed as that
> * would limit the number of rows for join For left outer join, all the
> * predicates on the left side aliases can be pushed up For right outer
> * join, all the predicates on the right side aliases can be pushed up Joins
> * chain containing both left and right outer joins are treated as full
> * outer join. [...]
> *
> * @param op
> * Join Operator
> * @param rr
> * Row resolver
> * @return set of qualified aliases
> */
> {noformat}
> Since hive joins are left associative, something like "a RIGHT OUTER JOIN b LEFT OUTER JOIN c INNER JOIN d" should be interpreted as "((a RIGHT OUTER JOIN b) LEFT OUTER JOIN c) INNER JOIN d", so there would be cases where joins with both left and right outer joins can have aliases that can be pushed. Here, aliases a and d are eligible to be pushed up while the current criteria provide that none are eligible.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2337) Predicate pushdown erroneously
conservative with outer joins
Posted by "jiraposter@reviews.apache.org (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-2337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13079030#comment-13079030 ]
jiraposter@reviews.apache.org commented on HIVE-2337:
-----------------------------------------------------
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1275/
-----------------------------------------------------------
(Updated 2011-08-03 21:13:25.076077)
Review request for hive.
Changes
-------
Fix testcase
Summary
-------
https://issues.apache.org/jira/browse/HIVE-2337
This addresses bug HIVE-2337.
https://issues.apache.org/jira/browse/HIVE-2337
Diffs (updated)
-----
http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java 1153598
http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/queries/clientpositive/ppd_outer_join5.q PRE-CREATION
http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/ppd_outer_join5.q.out PRE-CREATION
Diff: https://reviews.apache.org/r/1275/diff
Testing
-------
Thanks,
Charles
> Predicate pushdown erroneously conservative with outer joins
> ------------------------------------------------------------
>
> Key: HIVE-2337
> URL: https://issues.apache.org/jira/browse/HIVE-2337
> Project: Hive
> Issue Type: Bug
> Components: Query Processor
> Reporter: Charles Chen
> Assignee: Charles Chen
> Attachments: HIVE-2337v1.patch, HIVE-2337v2.patch
>
>
> The predicate pushdown filter is not applying left associativity of joins correctly in determining possible aliases for pushing predicates.
> In hive.ql.ppd.OpProcFactory.JoinPPD.getQualifiedAliases, the criteria for pushing aliases is specified as:
> {noformat}
> /**
> * Figures out the aliases for whom it is safe to push predicates based on
> * ANSI SQL semantics For inner join, all predicates for all aliases can be
> * pushed For full outer join, none of the predicates can be pushed as that
> * would limit the number of rows for join For left outer join, all the
> * predicates on the left side aliases can be pushed up For right outer
> * join, all the predicates on the right side aliases can be pushed up Joins
> * chain containing both left and right outer joins are treated as full
> * outer join. [...]
> *
> * @param op
> * Join Operator
> * @param rr
> * Row resolver
> * @return set of qualified aliases
> */
> {noformat}
> Since hive joins are left associative, something like "a RIGHT OUTER JOIN b LEFT OUTER JOIN c INNER JOIN d" should be interpreted as "((a RIGHT OUTER JOIN b) LEFT OUTER JOIN c) INNER JOIN d", so there would be cases where joins with both left and right outer joins can have aliases that can be pushed. Here, aliases b and d are eligible to be pushed up while the current criteria provide that none are eligible.
> Using:
> {noformat}
> create table t1 (id int, key string, value string);
> create table t2 (id int, key string, value string);
> create table t3 (id int, key string, value string);
> create table t4 (id int, key string, value string);
> {noformat}
> For example, the query
> {noformat}
> explain select * from t1 full outer join t2 on t1.id=t2.id join t3 on t2.id=t3.id where t3.id=20;
> {noformat}
> currently gives
> {noformat}
> STAGE DEPENDENCIES:
> Stage-1 is a root stage
> Stage-0 is a root stage
> STAGE PLANS:
> Stage: Stage-1
> Map Reduce
> Alias -> Map Operator Tree:
> t1
> TableScan
> alias: t1
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 0
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> t2
> TableScan
> alias: t2
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 1
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> t3
> TableScan
> alias: t3
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 2
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> Reduce Operator Tree:
> Join Operator
> condition map:
> Outer Join 0 to 1
> Inner Join 1 to 2
> condition expressions:
> 0 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> 1 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> 2 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> handleSkewJoin: false
> outputColumnNames: _col0, _col1, _col2, _col5, _col6, _col7, _col10, _col11, _col12
> Filter Operator
> predicate:
> expr: (_col10 = 20)
> type: boolean
> Select Operator
> expressions:
> expr: _col0
> type: int
> expr: _col1
> type: string
> expr: _col2
> type: string
> expr: _col5
> type: int
> expr: _col6
> type: string
> expr: _col7
> type: string
> expr: _col10
> type: int
> expr: _col11
> type: string
> expr: _col12
> type: string
> outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8
> File Output Operator
> compressed: false
> GlobalTableId: 0
> table:
> input format: org.apache.hadoop.mapred.TextInputFormat
> output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> Stage: Stage-0
> Fetch Operator
> limit: -1
> {noformat}
> while the correct behavior is to push the filter "t3.id=20" down:
> {noformat}
> STAGE DEPENDENCIES:
> Stage-1 is a root stage
> Stage-0 is a root stage
> STAGE PLANS:
> Stage: Stage-1
> Map Reduce
> Alias -> Map Operator Tree:
> t1
> TableScan
> alias: t1
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 0
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> t2
> TableScan
> alias: t2
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 1
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> t3
> TableScan
> alias: t3
> Filter Operator
> predicate:
> expr: (id = 20)
> type: boolean
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 2
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> Reduce Operator Tree:
> Join Operator
> condition map:
> Outer Join 0 to 1
> Inner Join 1 to 2
> condition expressions:
> 0 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> 1 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> 2 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> handleSkewJoin: false
> outputColumnNames: _col0, _col1, _col2, _col5, _col6, _col7, _col10, _col11, _col12
> Select Operator
> expressions:
> expr: _col0
> type: int
> expr: _col1
> type: string
> expr: _col2
> type: string
> expr: _col5
> type: int
> expr: _col6
> type: string
> expr: _col7
> type: string
> expr: _col10
> type: int
> expr: _col11
> type: string
> expr: _col12
> type: string
> outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8
> File Output Operator
> compressed: false
> GlobalTableId: 0
> table:
> input format: org.apache.hadoop.mapred.TextInputFormat
> output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> Stage: Stage-0
> Fetch Operator
> limit: -1
> {noformat}
> The current behavior is actually stranger than this: for a left outer join (similarly for a right outer join), hive finds the leftmost alias referred to in the *predicates* of left outer joins and rejects any alias to the right of it for pushdown. So in this query the filter "t2.id=20" pushed down:
> {noformat}
> explain select * from t1 join t2 on (t1.id=t2.id) left outer join t3 on (t2.id=t3.id) where t2.id=20;
> {noformat}
> while it isn't here:
> {noformat}
> explain select * from t1 join t2 on (t1.id=t2.id) left outer join t3 on (t1.id=t3.id) where t2.id=20;
> {noformat}
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2337) Predicate pushdown erroneously
conservative with outer joins
Posted by "jiraposter@reviews.apache.org (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-2337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13095116#comment-13095116 ]
jiraposter@reviews.apache.org commented on HIVE-2337:
-----------------------------------------------------
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1275/
-----------------------------------------------------------
(Updated 2011-09-01 04:42:25.815081)
Review request for hive.
Changes
-------
Added TestParse changes ??
Summary
-------
https://issues.apache.org/jira/browse/HIVE-2337
This addresses bug HIVE-2337.
https://issues.apache.org/jira/browse/HIVE-2337
Diffs (updated)
-----
http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java 1163875
http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/queries/clientpositive/ppd_outer_join5.q PRE-CREATION
http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/ppd_outer_join4.q.out 1163875
http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientpositive/ppd_outer_join5.q.out PRE-CREATION
http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/compiler/plan/input4.q.xml 1163875
http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/compiler/plan/join8.q.xml 1163875
Diff: https://reviews.apache.org/r/1275/diff
Testing
-------
Unit tests passed
Thanks,
Charles
> Predicate pushdown erroneously conservative with outer joins
> ------------------------------------------------------------
>
> Key: HIVE-2337
> URL: https://issues.apache.org/jira/browse/HIVE-2337
> Project: Hive
> Issue Type: Bug
> Components: Query Processor
> Reporter: Charles Chen
> Assignee: Charles Chen
> Fix For: 0.9.0
>
> Attachments: HIVE-2337v1.patch, HIVE-2337v2.patch, HIVE-2337v3.patch, HIVE-2337v4.patch, HIVE-2337v5.patch, HIVE-2337v6.patch
>
>
> The predicate pushdown filter is not applying left associativity of joins correctly in determining possible aliases for pushing predicates.
> In hive.ql.ppd.OpProcFactory.JoinPPD.getQualifiedAliases, the criteria for pushing aliases is specified as:
> {noformat}
> /**
> * Figures out the aliases for whom it is safe to push predicates based on
> * ANSI SQL semantics For inner join, all predicates for all aliases can be
> * pushed For full outer join, none of the predicates can be pushed as that
> * would limit the number of rows for join For left outer join, all the
> * predicates on the left side aliases can be pushed up For right outer
> * join, all the predicates on the right side aliases can be pushed up Joins
> * chain containing both left and right outer joins are treated as full
> * outer join. [...]
> *
> * @param op
> * Join Operator
> * @param rr
> * Row resolver
> * @return set of qualified aliases
> */
> {noformat}
> Since hive joins are left associative, something like "a RIGHT OUTER JOIN b LEFT OUTER JOIN c INNER JOIN d" should be interpreted as "((a RIGHT OUTER JOIN b) LEFT OUTER JOIN c) INNER JOIN d", so there would be cases where joins with both left and right outer joins can have aliases that can be pushed. Here, aliases b and d are eligible to be pushed up while the current criteria provide that none are eligible.
> Using:
> {noformat}
> create table t1 (id int, key string, value string);
> create table t2 (id int, key string, value string);
> create table t3 (id int, key string, value string);
> create table t4 (id int, key string, value string);
> {noformat}
> For example, the query
> {noformat}
> explain select * from t1 full outer join t2 on t1.id=t2.id join t3 on t2.id=t3.id where t3.id=20;
> {noformat}
> currently gives
> {noformat}
> STAGE DEPENDENCIES:
> Stage-1 is a root stage
> Stage-0 is a root stage
> STAGE PLANS:
> Stage: Stage-1
> Map Reduce
> Alias -> Map Operator Tree:
> t1
> TableScan
> alias: t1
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 0
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> t2
> TableScan
> alias: t2
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 1
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> t3
> TableScan
> alias: t3
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 2
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> Reduce Operator Tree:
> Join Operator
> condition map:
> Outer Join 0 to 1
> Inner Join 1 to 2
> condition expressions:
> 0 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> 1 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> 2 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> handleSkewJoin: false
> outputColumnNames: _col0, _col1, _col2, _col5, _col6, _col7, _col10, _col11, _col12
> Filter Operator
> predicate:
> expr: (_col10 = 20)
> type: boolean
> Select Operator
> expressions:
> expr: _col0
> type: int
> expr: _col1
> type: string
> expr: _col2
> type: string
> expr: _col5
> type: int
> expr: _col6
> type: string
> expr: _col7
> type: string
> expr: _col10
> type: int
> expr: _col11
> type: string
> expr: _col12
> type: string
> outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8
> File Output Operator
> compressed: false
> GlobalTableId: 0
> table:
> input format: org.apache.hadoop.mapred.TextInputFormat
> output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> Stage: Stage-0
> Fetch Operator
> limit: -1
> {noformat}
> while the correct behavior is to push the filter "t3.id=20" down:
> {noformat}
> STAGE DEPENDENCIES:
> Stage-1 is a root stage
> Stage-0 is a root stage
> STAGE PLANS:
> Stage: Stage-1
> Map Reduce
> Alias -> Map Operator Tree:
> t1
> TableScan
> alias: t1
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 0
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> t2
> TableScan
> alias: t2
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 1
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> t3
> TableScan
> alias: t3
> Filter Operator
> predicate:
> expr: (id = 20)
> type: boolean
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 2
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> Reduce Operator Tree:
> Join Operator
> condition map:
> Outer Join 0 to 1
> Inner Join 1 to 2
> condition expressions:
> 0 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> 1 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> 2 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> handleSkewJoin: false
> outputColumnNames: _col0, _col1, _col2, _col5, _col6, _col7, _col10, _col11, _col12
> Select Operator
> expressions:
> expr: _col0
> type: int
> expr: _col1
> type: string
> expr: _col2
> type: string
> expr: _col5
> type: int
> expr: _col6
> type: string
> expr: _col7
> type: string
> expr: _col10
> type: int
> expr: _col11
> type: string
> expr: _col12
> type: string
> outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8
> File Output Operator
> compressed: false
> GlobalTableId: 0
> table:
> input format: org.apache.hadoop.mapred.TextInputFormat
> output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> Stage: Stage-0
> Fetch Operator
> limit: -1
> {noformat}
> The current behavior is actually stranger than this: for a left outer join (similarly for a right outer join), hive finds the leftmost alias referred to in the *predicates* of left outer joins and rejects any alias to the right of it for pushdown. So in this query the filter "t2.id=20" pushed down:
> {noformat}
> explain select * from t1 join t2 on (t1.id=t2.id) left outer join t3 on (t2.id=t3.id) where t2.id=20;
> {noformat}
> while it isn't here:
> {noformat}
> explain select * from t1 join t2 on (t1.id=t2.id) left outer join t3 on (t1.id=t3.id) where t2.id=20;
> {noformat}
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2337) Predicate pushdown erroneously
conservative with outer joins
Posted by "Charles Chen (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-2337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Charles Chen updated HIVE-2337:
-------------------------------
Attachment: HIVE-2337v5.patch
> Predicate pushdown erroneously conservative with outer joins
> ------------------------------------------------------------
>
> Key: HIVE-2337
> URL: https://issues.apache.org/jira/browse/HIVE-2337
> Project: Hive
> Issue Type: Bug
> Components: Query Processor
> Reporter: Charles Chen
> Assignee: Charles Chen
> Fix For: 0.9.0
>
> Attachments: HIVE-2337v1.patch, HIVE-2337v2.patch, HIVE-2337v3.patch, HIVE-2337v4.patch, HIVE-2337v5.patch
>
>
> The predicate pushdown filter is not applying left associativity of joins correctly in determining possible aliases for pushing predicates.
> In hive.ql.ppd.OpProcFactory.JoinPPD.getQualifiedAliases, the criteria for pushing aliases is specified as:
> {noformat}
> /**
> * Figures out the aliases for whom it is safe to push predicates based on
> * ANSI SQL semantics For inner join, all predicates for all aliases can be
> * pushed For full outer join, none of the predicates can be pushed as that
> * would limit the number of rows for join For left outer join, all the
> * predicates on the left side aliases can be pushed up For right outer
> * join, all the predicates on the right side aliases can be pushed up Joins
> * chain containing both left and right outer joins are treated as full
> * outer join. [...]
> *
> * @param op
> * Join Operator
> * @param rr
> * Row resolver
> * @return set of qualified aliases
> */
> {noformat}
> Since hive joins are left associative, something like "a RIGHT OUTER JOIN b LEFT OUTER JOIN c INNER JOIN d" should be interpreted as "((a RIGHT OUTER JOIN b) LEFT OUTER JOIN c) INNER JOIN d", so there would be cases where joins with both left and right outer joins can have aliases that can be pushed. Here, aliases b and d are eligible to be pushed up while the current criteria provide that none are eligible.
> Using:
> {noformat}
> create table t1 (id int, key string, value string);
> create table t2 (id int, key string, value string);
> create table t3 (id int, key string, value string);
> create table t4 (id int, key string, value string);
> {noformat}
> For example, the query
> {noformat}
> explain select * from t1 full outer join t2 on t1.id=t2.id join t3 on t2.id=t3.id where t3.id=20;
> {noformat}
> currently gives
> {noformat}
> STAGE DEPENDENCIES:
> Stage-1 is a root stage
> Stage-0 is a root stage
> STAGE PLANS:
> Stage: Stage-1
> Map Reduce
> Alias -> Map Operator Tree:
> t1
> TableScan
> alias: t1
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 0
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> t2
> TableScan
> alias: t2
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 1
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> t3
> TableScan
> alias: t3
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 2
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> Reduce Operator Tree:
> Join Operator
> condition map:
> Outer Join 0 to 1
> Inner Join 1 to 2
> condition expressions:
> 0 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> 1 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> 2 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> handleSkewJoin: false
> outputColumnNames: _col0, _col1, _col2, _col5, _col6, _col7, _col10, _col11, _col12
> Filter Operator
> predicate:
> expr: (_col10 = 20)
> type: boolean
> Select Operator
> expressions:
> expr: _col0
> type: int
> expr: _col1
> type: string
> expr: _col2
> type: string
> expr: _col5
> type: int
> expr: _col6
> type: string
> expr: _col7
> type: string
> expr: _col10
> type: int
> expr: _col11
> type: string
> expr: _col12
> type: string
> outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8
> File Output Operator
> compressed: false
> GlobalTableId: 0
> table:
> input format: org.apache.hadoop.mapred.TextInputFormat
> output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> Stage: Stage-0
> Fetch Operator
> limit: -1
> {noformat}
> while the correct behavior is to push the filter "t3.id=20" down:
> {noformat}
> STAGE DEPENDENCIES:
> Stage-1 is a root stage
> Stage-0 is a root stage
> STAGE PLANS:
> Stage: Stage-1
> Map Reduce
> Alias -> Map Operator Tree:
> t1
> TableScan
> alias: t1
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 0
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> t2
> TableScan
> alias: t2
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 1
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> t3
> TableScan
> alias: t3
> Filter Operator
> predicate:
> expr: (id = 20)
> type: boolean
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 2
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> Reduce Operator Tree:
> Join Operator
> condition map:
> Outer Join 0 to 1
> Inner Join 1 to 2
> condition expressions:
> 0 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> 1 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> 2 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> handleSkewJoin: false
> outputColumnNames: _col0, _col1, _col2, _col5, _col6, _col7, _col10, _col11, _col12
> Select Operator
> expressions:
> expr: _col0
> type: int
> expr: _col1
> type: string
> expr: _col2
> type: string
> expr: _col5
> type: int
> expr: _col6
> type: string
> expr: _col7
> type: string
> expr: _col10
> type: int
> expr: _col11
> type: string
> expr: _col12
> type: string
> outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8
> File Output Operator
> compressed: false
> GlobalTableId: 0
> table:
> input format: org.apache.hadoop.mapred.TextInputFormat
> output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> Stage: Stage-0
> Fetch Operator
> limit: -1
> {noformat}
> The current behavior is actually stranger than this: for a left outer join (similarly for a right outer join), hive finds the leftmost alias referred to in the *predicates* of left outer joins and rejects any alias to the right of it for pushdown. So in this query the filter "t2.id=20" pushed down:
> {noformat}
> explain select * from t1 join t2 on (t1.id=t2.id) left outer join t3 on (t2.id=t3.id) where t2.id=20;
> {noformat}
> while it isn't here:
> {noformat}
> explain select * from t1 join t2 on (t1.id=t2.id) left outer join t3 on (t1.id=t3.id) where t2.id=20;
> {noformat}
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2337) Predicate pushdown erroneously
conservative with outer joins
Posted by "Charles Chen (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-2337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Charles Chen updated HIVE-2337:
-------------------------------
Description:
The predicate pushdown filter is not applying left associativity of joins correctly in determining possible aliases for pushing predicates.
In hive.ql.ppd.OpProcFactory.JoinPPD.getQualifiedAliases, the criteria for pushing aliases is specified as:
{noformat}
/**
* Figures out the aliases for whom it is safe to push predicates based on
* ANSI SQL semantics For inner join, all predicates for all aliases can be
* pushed For full outer join, none of the predicates can be pushed as that
* would limit the number of rows for join For left outer join, all the
* predicates on the left side aliases can be pushed up For right outer
* join, all the predicates on the right side aliases can be pushed up Joins
* chain containing both left and right outer joins are treated as full
* outer join. [...]
*
* @param op
* Join Operator
* @param rr
* Row resolver
* @return set of qualified aliases
*/
{noformat}
Since hive joins are left associative, something like "a RIGHT OUTER JOIN b LEFT OUTER JOIN c INNER JOIN d" should be interpreted as "((a RIGHT OUTER JOIN b) LEFT OUTER JOIN c) INNER JOIN d", so there would be cases where joins with both left and right outer joins can have aliases that can be pushed. Here, aliases b and d are eligible to be pushed up while the current criteria provide that none are eligible.
was:
The predicate pushdown filter is not applying left associativity of joins correctly in determining possible aliases for pushing predicates.
In hive.ql.ppd.OpProcFactory.JoinPPD.getQualifiedAliases, the criteria for pushing aliases is specified as:
{noformat}
/**
* Figures out the aliases for whom it is safe to push predicates based on
* ANSI SQL semantics For inner join, all predicates for all aliases can be
* pushed For full outer join, none of the predicates can be pushed as that
* would limit the number of rows for join For left outer join, all the
* predicates on the left side aliases can be pushed up For right outer
* join, all the predicates on the right side aliases can be pushed up Joins
* chain containing both left and right outer joins are treated as full
* outer join. [...]
*
* @param op
* Join Operator
* @param rr
* Row resolver
* @return set of qualified aliases
*/
{noformat}
Since hive joins are left associative, something like "a RIGHT OUTER JOIN b LEFT OUTER JOIN c INNER JOIN d" should be interpreted as "((a RIGHT OUTER JOIN b) LEFT OUTER JOIN c) INNER JOIN d", so there would be cases where joins with both left and right outer joins can have aliases that can be pushed. Here, aliases a and d are eligible to be pushed up while the current criteria provide that none are eligible.
> Predicate pushdown erroneously conservative with outer joins
> ------------------------------------------------------------
>
> Key: HIVE-2337
> URL: https://issues.apache.org/jira/browse/HIVE-2337
> Project: Hive
> Issue Type: Bug
> Components: Query Processor
> Reporter: Charles Chen
> Assignee: Charles Chen
> Attachments: HIVE-2337v1.patch
>
>
> The predicate pushdown filter is not applying left associativity of joins correctly in determining possible aliases for pushing predicates.
> In hive.ql.ppd.OpProcFactory.JoinPPD.getQualifiedAliases, the criteria for pushing aliases is specified as:
> {noformat}
> /**
> * Figures out the aliases for whom it is safe to push predicates based on
> * ANSI SQL semantics For inner join, all predicates for all aliases can be
> * pushed For full outer join, none of the predicates can be pushed as that
> * would limit the number of rows for join For left outer join, all the
> * predicates on the left side aliases can be pushed up For right outer
> * join, all the predicates on the right side aliases can be pushed up Joins
> * chain containing both left and right outer joins are treated as full
> * outer join. [...]
> *
> * @param op
> * Join Operator
> * @param rr
> * Row resolver
> * @return set of qualified aliases
> */
> {noformat}
> Since hive joins are left associative, something like "a RIGHT OUTER JOIN b LEFT OUTER JOIN c INNER JOIN d" should be interpreted as "((a RIGHT OUTER JOIN b) LEFT OUTER JOIN c) INNER JOIN d", so there would be cases where joins with both left and right outer joins can have aliases that can be pushed. Here, aliases b and d are eligible to be pushed up while the current criteria provide that none are eligible.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2337) Predicate pushdown erroneously
conservative with outer joins
Posted by "John Sichi (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-2337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13095105#comment-13095105 ]
John Sichi commented on HIVE-2337:
----------------------------------
Charles, did you intentionally omit the new ppd_outer_join5.q from the latest patch?
Also, there's a weird non-ASCII character in the Javadoc.
> Predicate pushdown erroneously conservative with outer joins
> ------------------------------------------------------------
>
> Key: HIVE-2337
> URL: https://issues.apache.org/jira/browse/HIVE-2337
> Project: Hive
> Issue Type: Bug
> Components: Query Processor
> Reporter: Charles Chen
> Assignee: Charles Chen
> Fix For: 0.9.0
>
> Attachments: HIVE-2337v1.patch, HIVE-2337v2.patch, HIVE-2337v3.patch, HIVE-2337v4.patch, HIVE-2337v5.patch
>
>
> The predicate pushdown filter is not applying left associativity of joins correctly in determining possible aliases for pushing predicates.
> In hive.ql.ppd.OpProcFactory.JoinPPD.getQualifiedAliases, the criteria for pushing aliases is specified as:
> {noformat}
> /**
> * Figures out the aliases for whom it is safe to push predicates based on
> * ANSI SQL semantics For inner join, all predicates for all aliases can be
> * pushed For full outer join, none of the predicates can be pushed as that
> * would limit the number of rows for join For left outer join, all the
> * predicates on the left side aliases can be pushed up For right outer
> * join, all the predicates on the right side aliases can be pushed up Joins
> * chain containing both left and right outer joins are treated as full
> * outer join. [...]
> *
> * @param op
> * Join Operator
> * @param rr
> * Row resolver
> * @return set of qualified aliases
> */
> {noformat}
> Since hive joins are left associative, something like "a RIGHT OUTER JOIN b LEFT OUTER JOIN c INNER JOIN d" should be interpreted as "((a RIGHT OUTER JOIN b) LEFT OUTER JOIN c) INNER JOIN d", so there would be cases where joins with both left and right outer joins can have aliases that can be pushed. Here, aliases b and d are eligible to be pushed up while the current criteria provide that none are eligible.
> Using:
> {noformat}
> create table t1 (id int, key string, value string);
> create table t2 (id int, key string, value string);
> create table t3 (id int, key string, value string);
> create table t4 (id int, key string, value string);
> {noformat}
> For example, the query
> {noformat}
> explain select * from t1 full outer join t2 on t1.id=t2.id join t3 on t2.id=t3.id where t3.id=20;
> {noformat}
> currently gives
> {noformat}
> STAGE DEPENDENCIES:
> Stage-1 is a root stage
> Stage-0 is a root stage
> STAGE PLANS:
> Stage: Stage-1
> Map Reduce
> Alias -> Map Operator Tree:
> t1
> TableScan
> alias: t1
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 0
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> t2
> TableScan
> alias: t2
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 1
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> t3
> TableScan
> alias: t3
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 2
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> Reduce Operator Tree:
> Join Operator
> condition map:
> Outer Join 0 to 1
> Inner Join 1 to 2
> condition expressions:
> 0 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> 1 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> 2 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> handleSkewJoin: false
> outputColumnNames: _col0, _col1, _col2, _col5, _col6, _col7, _col10, _col11, _col12
> Filter Operator
> predicate:
> expr: (_col10 = 20)
> type: boolean
> Select Operator
> expressions:
> expr: _col0
> type: int
> expr: _col1
> type: string
> expr: _col2
> type: string
> expr: _col5
> type: int
> expr: _col6
> type: string
> expr: _col7
> type: string
> expr: _col10
> type: int
> expr: _col11
> type: string
> expr: _col12
> type: string
> outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8
> File Output Operator
> compressed: false
> GlobalTableId: 0
> table:
> input format: org.apache.hadoop.mapred.TextInputFormat
> output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> Stage: Stage-0
> Fetch Operator
> limit: -1
> {noformat}
> while the correct behavior is to push the filter "t3.id=20" down:
> {noformat}
> STAGE DEPENDENCIES:
> Stage-1 is a root stage
> Stage-0 is a root stage
> STAGE PLANS:
> Stage: Stage-1
> Map Reduce
> Alias -> Map Operator Tree:
> t1
> TableScan
> alias: t1
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 0
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> t2
> TableScan
> alias: t2
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 1
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> t3
> TableScan
> alias: t3
> Filter Operator
> predicate:
> expr: (id = 20)
> type: boolean
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 2
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> Reduce Operator Tree:
> Join Operator
> condition map:
> Outer Join 0 to 1
> Inner Join 1 to 2
> condition expressions:
> 0 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> 1 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> 2 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> handleSkewJoin: false
> outputColumnNames: _col0, _col1, _col2, _col5, _col6, _col7, _col10, _col11, _col12
> Select Operator
> expressions:
> expr: _col0
> type: int
> expr: _col1
> type: string
> expr: _col2
> type: string
> expr: _col5
> type: int
> expr: _col6
> type: string
> expr: _col7
> type: string
> expr: _col10
> type: int
> expr: _col11
> type: string
> expr: _col12
> type: string
> outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8
> File Output Operator
> compressed: false
> GlobalTableId: 0
> table:
> input format: org.apache.hadoop.mapred.TextInputFormat
> output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> Stage: Stage-0
> Fetch Operator
> limit: -1
> {noformat}
> The current behavior is actually stranger than this: for a left outer join (similarly for a right outer join), hive finds the leftmost alias referred to in the *predicates* of left outer joins and rejects any alias to the right of it for pushdown. So in this query the filter "t2.id=20" pushed down:
> {noformat}
> explain select * from t1 join t2 on (t1.id=t2.id) left outer join t3 on (t2.id=t3.id) where t2.id=20;
> {noformat}
> while it isn't here:
> {noformat}
> explain select * from t1 join t2 on (t1.id=t2.id) left outer join t3 on (t1.id=t3.id) where t2.id=20;
> {noformat}
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2337) Predicate pushdown erroneously
conservative with outer joins
Posted by "Charles Chen (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-2337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13078906#comment-13078906 ]
Charles Chen commented on HIVE-2337:
------------------------------------
Using:
{noformat}
create table t1 (id int, key string, value string);
create table t2 (id int, key string, value string);
create table t3 (id int, key string, value string);
create table t4 (id int, key string, value string);
{noformat}
For example, the query
{noformat}
explain select * from t1 full outer join t2 on t1.id=t2.id join t3 on t2.id=t3.id where t3.id=20;
{noformat}
currently gives
{noformat}
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-0 is a root stage
STAGE PLANS:
Stage: Stage-1
Map Reduce
Alias -> Map Operator Tree:
t1
TableScan
alias: t1
Reduce Output Operator
key expressions:
expr: id
type: int
sort order: +
Map-reduce partition columns:
expr: id
type: int
tag: 0
value expressions:
expr: id
type: int
expr: key
type: string
expr: value
type: string
t2
TableScan
alias: t2
Reduce Output Operator
key expressions:
expr: id
type: int
sort order: +
Map-reduce partition columns:
expr: id
type: int
tag: 1
value expressions:
expr: id
type: int
expr: key
type: string
expr: value
type: string
t3
TableScan
alias: t3
Reduce Output Operator
key expressions:
expr: id
type: int
sort order: +
Map-reduce partition columns:
expr: id
type: int
tag: 2
value expressions:
expr: id
type: int
expr: key
type: string
expr: value
type: string
Reduce Operator Tree:
Join Operator
condition map:
Outer Join 0 to 1
Inner Join 1 to 2
condition expressions:
0 {VALUE._col0} {VALUE._col1} {VALUE._col2}
1 {VALUE._col0} {VALUE._col1} {VALUE._col2}
2 {VALUE._col0} {VALUE._col1} {VALUE._col2}
handleSkewJoin: false
outputColumnNames: _col0, _col1, _col2, _col5, _col6, _col7, _col10, _col11, _col12
Filter Operator
predicate:
expr: (_col10 = 20)
type: boolean
Select Operator
expressions:
expr: _col0
type: int
expr: _col1
type: string
expr: _col2
type: string
expr: _col5
type: int
expr: _col6
type: string
expr: _col7
type: string
expr: _col10
type: int
expr: _col11
type: string
expr: _col12
type: string
outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8
File Output Operator
compressed: false
GlobalTableId: 0
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Stage: Stage-0
Fetch Operator
limit: -1
{noformat}
while the correct behavior is to push the filter "t3.id=20" down:
{noformat}
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-0 is a root stage
STAGE PLANS:
Stage: Stage-1
Map Reduce
Alias -> Map Operator Tree:
t1
TableScan
alias: t1
Reduce Output Operator
key expressions:
expr: id
type: int
sort order: +
Map-reduce partition columns:
expr: id
type: int
tag: 0
value expressions:
expr: id
type: int
expr: key
type: string
expr: value
type: string
t2
TableScan
alias: t2
Reduce Output Operator
key expressions:
expr: id
type: int
sort order: +
Map-reduce partition columns:
expr: id
type: int
tag: 1
value expressions:
expr: id
type: int
expr: key
type: string
expr: value
type: string
t3
TableScan
alias: t3
Filter Operator
predicate:
expr: (id = 20)
type: boolean
Reduce Output Operator
key expressions:
expr: id
type: int
sort order: +
Map-reduce partition columns:
expr: id
type: int
tag: 2
value expressions:
expr: id
type: int
expr: key
type: string
expr: value
type: string
Reduce Operator Tree:
Join Operator
condition map:
Outer Join 0 to 1
Inner Join 1 to 2
condition expressions:
0 {VALUE._col0} {VALUE._col1} {VALUE._col2}
1 {VALUE._col0} {VALUE._col1} {VALUE._col2}
2 {VALUE._col0} {VALUE._col1} {VALUE._col2}
handleSkewJoin: false
outputColumnNames: _col0, _col1, _col2, _col5, _col6, _col7, _col10, _col11, _col12
Select Operator
expressions:
expr: _col0
type: int
expr: _col1
type: string
expr: _col2
type: string
expr: _col5
type: int
expr: _col6
type: string
expr: _col7
type: string
expr: _col10
type: int
expr: _col11
type: string
expr: _col12
type: string
outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8
File Output Operator
compressed: false
GlobalTableId: 0
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Stage: Stage-0
Fetch Operator
limit: -1
{noformat}
The current behavior is actually stranger than this: for a left outer join (similarly for a right outer join), hive finds the leftmost alias referred to in the *predicates* of left outer joins and rejects any alias to the right of it for pushdown. So in this query the filter "t2.id=20" pushed down:
{noformat}
explain select * from t1 join t2 on (t1.id=t2.id) left outer join t3 on (t2.id=t3.id) where t2.id=20;
{noformat}
while it isn't here:
{noformat}
explain select * from t1 join t2 on (t1.id=t2.id) left outer join t3 on (t1.id=t3.id) where t2.id=20;
{noformat}
> Predicate pushdown erroneously conservative with outer joins
> ------------------------------------------------------------
>
> Key: HIVE-2337
> URL: https://issues.apache.org/jira/browse/HIVE-2337
> Project: Hive
> Issue Type: Bug
> Components: Query Processor
> Reporter: Charles Chen
> Assignee: Charles Chen
> Attachments: HIVE-2337v1.patch
>
>
> The predicate pushdown filter is not applying left associativity of joins correctly in determining possible aliases for pushing predicates.
> In hive.ql.ppd.OpProcFactory.JoinPPD.getQualifiedAliases, the criteria for pushing aliases is specified as:
> {noformat}
> /**
> * Figures out the aliases for whom it is safe to push predicates based on
> * ANSI SQL semantics For inner join, all predicates for all aliases can be
> * pushed For full outer join, none of the predicates can be pushed as that
> * would limit the number of rows for join For left outer join, all the
> * predicates on the left side aliases can be pushed up For right outer
> * join, all the predicates on the right side aliases can be pushed up Joins
> * chain containing both left and right outer joins are treated as full
> * outer join. [...]
> *
> * @param op
> * Join Operator
> * @param rr
> * Row resolver
> * @return set of qualified aliases
> */
> {noformat}
> Since hive joins are left associative, something like "a RIGHT OUTER JOIN b LEFT OUTER JOIN c INNER JOIN d" should be interpreted as "((a RIGHT OUTER JOIN b) LEFT OUTER JOIN c) INNER JOIN d", so there would be cases where joins with both left and right outer joins can have aliases that can be pushed. Here, aliases b and d are eligible to be pushed up while the current criteria provide that none are eligible.
> Using:
> {noformat}
> create table t1 (id int, key string, value string);
> create table t2 (id int, key string, value string);
> create table t3 (id int, key string, value string);
> create table t4 (id int, key string, value string);
> {noformat}
> For example, the query
> {noformat}
> explain select * from t1 full outer join t2 on t1.id=t2.id join t3 on t2.id=t3.id where t3.id=20;
> {noformat}
> currently gives
> {noformat}
> STAGE DEPENDENCIES:
> Stage-1 is a root stage
> Stage-0 is a root stage
> STAGE PLANS:
> Stage: Stage-1
> Map Reduce
> Alias -> Map Operator Tree:
> t1
> TableScan
> alias: t1
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 0
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> t2
> TableScan
> alias: t2
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 1
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> t3
> TableScan
> alias: t3
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 2
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> Reduce Operator Tree:
> Join Operator
> condition map:
> Outer Join 0 to 1
> Inner Join 1 to 2
> condition expressions:
> 0 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> 1 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> 2 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> handleSkewJoin: false
> outputColumnNames: _col0, _col1, _col2, _col5, _col6, _col7, _col10, _col11, _col12
> Filter Operator
> predicate:
> expr: (_col10 = 20)
> type: boolean
> Select Operator
> expressions:
> expr: _col0
> type: int
> expr: _col1
> type: string
> expr: _col2
> type: string
> expr: _col5
> type: int
> expr: _col6
> type: string
> expr: _col7
> type: string
> expr: _col10
> type: int
> expr: _col11
> type: string
> expr: _col12
> type: string
> outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8
> File Output Operator
> compressed: false
> GlobalTableId: 0
> table:
> input format: org.apache.hadoop.mapred.TextInputFormat
> output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> Stage: Stage-0
> Fetch Operator
> limit: -1
> {noformat}
> while the correct behavior is to push the filter "t3.id=20" down:
> {noformat}
> STAGE DEPENDENCIES:
> Stage-1 is a root stage
> Stage-0 is a root stage
> STAGE PLANS:
> Stage: Stage-1
> Map Reduce
> Alias -> Map Operator Tree:
> t1
> TableScan
> alias: t1
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 0
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> t2
> TableScan
> alias: t2
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 1
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> t3
> TableScan
> alias: t3
> Filter Operator
> predicate:
> expr: (id = 20)
> type: boolean
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 2
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> Reduce Operator Tree:
> Join Operator
> condition map:
> Outer Join 0 to 1
> Inner Join 1 to 2
> condition expressions:
> 0 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> 1 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> 2 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> handleSkewJoin: false
> outputColumnNames: _col0, _col1, _col2, _col5, _col6, _col7, _col10, _col11, _col12
> Select Operator
> expressions:
> expr: _col0
> type: int
> expr: _col1
> type: string
> expr: _col2
> type: string
> expr: _col5
> type: int
> expr: _col6
> type: string
> expr: _col7
> type: string
> expr: _col10
> type: int
> expr: _col11
> type: string
> expr: _col12
> type: string
> outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8
> File Output Operator
> compressed: false
> GlobalTableId: 0
> table:
> input format: org.apache.hadoop.mapred.TextInputFormat
> output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> Stage: Stage-0
> Fetch Operator
> limit: -1
> {noformat}
> The current behavior is actually stranger than this: for a left outer join (similarly for a right outer join), hive finds the leftmost alias referred to in the *predicates* of left outer joins and rejects any alias to the right of it for pushdown. So in this query the filter "t2.id=20" pushed down:
> {noformat}
> explain select * from t1 join t2 on (t1.id=t2.id) left outer join t3 on (t2.id=t3.id) where t2.id=20;
> {noformat}
> while it isn't here:
> {noformat}
> explain select * from t1 join t2 on (t1.id=t2.id) left outer join t3 on (t1.id=t3.id) where t2.id=20;
> {noformat}
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2337) Predicate pushdown erroneously
conservative with outer joins
Posted by "Charles Chen (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-2337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Charles Chen updated HIVE-2337:
-------------------------------
Attachment: HIVE-2337v2.patch
> Predicate pushdown erroneously conservative with outer joins
> ------------------------------------------------------------
>
> Key: HIVE-2337
> URL: https://issues.apache.org/jira/browse/HIVE-2337
> Project: Hive
> Issue Type: Bug
> Components: Query Processor
> Reporter: Charles Chen
> Assignee: Charles Chen
> Attachments: HIVE-2337v1.patch, HIVE-2337v2.patch
>
>
> The predicate pushdown filter is not applying left associativity of joins correctly in determining possible aliases for pushing predicates.
> In hive.ql.ppd.OpProcFactory.JoinPPD.getQualifiedAliases, the criteria for pushing aliases is specified as:
> {noformat}
> /**
> * Figures out the aliases for whom it is safe to push predicates based on
> * ANSI SQL semantics For inner join, all predicates for all aliases can be
> * pushed For full outer join, none of the predicates can be pushed as that
> * would limit the number of rows for join For left outer join, all the
> * predicates on the left side aliases can be pushed up For right outer
> * join, all the predicates on the right side aliases can be pushed up Joins
> * chain containing both left and right outer joins are treated as full
> * outer join. [...]
> *
> * @param op
> * Join Operator
> * @param rr
> * Row resolver
> * @return set of qualified aliases
> */
> {noformat}
> Since hive joins are left associative, something like "a RIGHT OUTER JOIN b LEFT OUTER JOIN c INNER JOIN d" should be interpreted as "((a RIGHT OUTER JOIN b) LEFT OUTER JOIN c) INNER JOIN d", so there would be cases where joins with both left and right outer joins can have aliases that can be pushed. Here, aliases b and d are eligible to be pushed up while the current criteria provide that none are eligible.
> Using:
> {noformat}
> create table t1 (id int, key string, value string);
> create table t2 (id int, key string, value string);
> create table t3 (id int, key string, value string);
> create table t4 (id int, key string, value string);
> {noformat}
> For example, the query
> {noformat}
> explain select * from t1 full outer join t2 on t1.id=t2.id join t3 on t2.id=t3.id where t3.id=20;
> {noformat}
> currently gives
> {noformat}
> STAGE DEPENDENCIES:
> Stage-1 is a root stage
> Stage-0 is a root stage
> STAGE PLANS:
> Stage: Stage-1
> Map Reduce
> Alias -> Map Operator Tree:
> t1
> TableScan
> alias: t1
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 0
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> t2
> TableScan
> alias: t2
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 1
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> t3
> TableScan
> alias: t3
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 2
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> Reduce Operator Tree:
> Join Operator
> condition map:
> Outer Join 0 to 1
> Inner Join 1 to 2
> condition expressions:
> 0 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> 1 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> 2 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> handleSkewJoin: false
> outputColumnNames: _col0, _col1, _col2, _col5, _col6, _col7, _col10, _col11, _col12
> Filter Operator
> predicate:
> expr: (_col10 = 20)
> type: boolean
> Select Operator
> expressions:
> expr: _col0
> type: int
> expr: _col1
> type: string
> expr: _col2
> type: string
> expr: _col5
> type: int
> expr: _col6
> type: string
> expr: _col7
> type: string
> expr: _col10
> type: int
> expr: _col11
> type: string
> expr: _col12
> type: string
> outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8
> File Output Operator
> compressed: false
> GlobalTableId: 0
> table:
> input format: org.apache.hadoop.mapred.TextInputFormat
> output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> Stage: Stage-0
> Fetch Operator
> limit: -1
> {noformat}
> while the correct behavior is to push the filter "t3.id=20" down:
> {noformat}
> STAGE DEPENDENCIES:
> Stage-1 is a root stage
> Stage-0 is a root stage
> STAGE PLANS:
> Stage: Stage-1
> Map Reduce
> Alias -> Map Operator Tree:
> t1
> TableScan
> alias: t1
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 0
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> t2
> TableScan
> alias: t2
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 1
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> t3
> TableScan
> alias: t3
> Filter Operator
> predicate:
> expr: (id = 20)
> type: boolean
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 2
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> Reduce Operator Tree:
> Join Operator
> condition map:
> Outer Join 0 to 1
> Inner Join 1 to 2
> condition expressions:
> 0 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> 1 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> 2 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> handleSkewJoin: false
> outputColumnNames: _col0, _col1, _col2, _col5, _col6, _col7, _col10, _col11, _col12
> Select Operator
> expressions:
> expr: _col0
> type: int
> expr: _col1
> type: string
> expr: _col2
> type: string
> expr: _col5
> type: int
> expr: _col6
> type: string
> expr: _col7
> type: string
> expr: _col10
> type: int
> expr: _col11
> type: string
> expr: _col12
> type: string
> outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8
> File Output Operator
> compressed: false
> GlobalTableId: 0
> table:
> input format: org.apache.hadoop.mapred.TextInputFormat
> output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> Stage: Stage-0
> Fetch Operator
> limit: -1
> {noformat}
> The current behavior is actually stranger than this: for a left outer join (similarly for a right outer join), hive finds the leftmost alias referred to in the *predicates* of left outer joins and rejects any alias to the right of it for pushdown. So in this query the filter "t2.id=20" pushed down:
> {noformat}
> explain select * from t1 join t2 on (t1.id=t2.id) left outer join t3 on (t2.id=t3.id) where t2.id=20;
> {noformat}
> while it isn't here:
> {noformat}
> explain select * from t1 join t2 on (t1.id=t2.id) left outer join t3 on (t1.id=t3.id) where t2.id=20;
> {noformat}
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2337) Predicate pushdown erroneously
conservative with outer joins
Posted by "Hudson (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-2337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13095919#comment-13095919 ]
Hudson commented on HIVE-2337:
------------------------------
Integrated in Hive-trunk-h0.21 #928 (See [https://builds.apache.org/job/Hive-trunk-h0.21/928/])
HIVE-2337. Predicate pushdown erroneously conservative with outer joins
(Charles Chen via jvs)
jvs : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1164340
Files :
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java
* /hive/trunk/ql/src/test/queries/clientpositive/ppd_outer_join5.q
* /hive/trunk/ql/src/test/results/clientpositive/ppd_outer_join4.q.out
* /hive/trunk/ql/src/test/results/clientpositive/ppd_outer_join5.q.out
* /hive/trunk/ql/src/test/results/compiler/plan/input4.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/join8.q.xml
> Predicate pushdown erroneously conservative with outer joins
> ------------------------------------------------------------
>
> Key: HIVE-2337
> URL: https://issues.apache.org/jira/browse/HIVE-2337
> Project: Hive
> Issue Type: Bug
> Components: Query Processor
> Reporter: Charles Chen
> Assignee: Charles Chen
> Fix For: 0.9.0
>
> Attachments: HIVE-2337v1.patch, HIVE-2337v2.patch, HIVE-2337v3.patch, HIVE-2337v4.patch, HIVE-2337v5.patch, HIVE-2337v6.patch, HIVE-2337v7.patch
>
>
> The predicate pushdown filter is not applying left associativity of joins correctly in determining possible aliases for pushing predicates.
> In hive.ql.ppd.OpProcFactory.JoinPPD.getQualifiedAliases, the criteria for pushing aliases is specified as:
> {noformat}
> /**
> * Figures out the aliases for whom it is safe to push predicates based on
> * ANSI SQL semantics For inner join, all predicates for all aliases can be
> * pushed For full outer join, none of the predicates can be pushed as that
> * would limit the number of rows for join For left outer join, all the
> * predicates on the left side aliases can be pushed up For right outer
> * join, all the predicates on the right side aliases can be pushed up Joins
> * chain containing both left and right outer joins are treated as full
> * outer join. [...]
> *
> * @param op
> * Join Operator
> * @param rr
> * Row resolver
> * @return set of qualified aliases
> */
> {noformat}
> Since hive joins are left associative, something like "a RIGHT OUTER JOIN b LEFT OUTER JOIN c INNER JOIN d" should be interpreted as "((a RIGHT OUTER JOIN b) LEFT OUTER JOIN c) INNER JOIN d", so there would be cases where joins with both left and right outer joins can have aliases that can be pushed. Here, aliases b and d are eligible to be pushed up while the current criteria provide that none are eligible.
> Using:
> {noformat}
> create table t1 (id int, key string, value string);
> create table t2 (id int, key string, value string);
> create table t3 (id int, key string, value string);
> create table t4 (id int, key string, value string);
> {noformat}
> For example, the query
> {noformat}
> explain select * from t1 full outer join t2 on t1.id=t2.id join t3 on t2.id=t3.id where t3.id=20;
> {noformat}
> currently gives
> {noformat}
> STAGE DEPENDENCIES:
> Stage-1 is a root stage
> Stage-0 is a root stage
> STAGE PLANS:
> Stage: Stage-1
> Map Reduce
> Alias -> Map Operator Tree:
> t1
> TableScan
> alias: t1
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 0
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> t2
> TableScan
> alias: t2
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 1
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> t3
> TableScan
> alias: t3
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 2
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> Reduce Operator Tree:
> Join Operator
> condition map:
> Outer Join 0 to 1
> Inner Join 1 to 2
> condition expressions:
> 0 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> 1 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> 2 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> handleSkewJoin: false
> outputColumnNames: _col0, _col1, _col2, _col5, _col6, _col7, _col10, _col11, _col12
> Filter Operator
> predicate:
> expr: (_col10 = 20)
> type: boolean
> Select Operator
> expressions:
> expr: _col0
> type: int
> expr: _col1
> type: string
> expr: _col2
> type: string
> expr: _col5
> type: int
> expr: _col6
> type: string
> expr: _col7
> type: string
> expr: _col10
> type: int
> expr: _col11
> type: string
> expr: _col12
> type: string
> outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8
> File Output Operator
> compressed: false
> GlobalTableId: 0
> table:
> input format: org.apache.hadoop.mapred.TextInputFormat
> output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> Stage: Stage-0
> Fetch Operator
> limit: -1
> {noformat}
> while the correct behavior is to push the filter "t3.id=20" down:
> {noformat}
> STAGE DEPENDENCIES:
> Stage-1 is a root stage
> Stage-0 is a root stage
> STAGE PLANS:
> Stage: Stage-1
> Map Reduce
> Alias -> Map Operator Tree:
> t1
> TableScan
> alias: t1
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 0
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> t2
> TableScan
> alias: t2
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 1
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> t3
> TableScan
> alias: t3
> Filter Operator
> predicate:
> expr: (id = 20)
> type: boolean
> Reduce Output Operator
> key expressions:
> expr: id
> type: int
> sort order: +
> Map-reduce partition columns:
> expr: id
> type: int
> tag: 2
> value expressions:
> expr: id
> type: int
> expr: key
> type: string
> expr: value
> type: string
> Reduce Operator Tree:
> Join Operator
> condition map:
> Outer Join 0 to 1
> Inner Join 1 to 2
> condition expressions:
> 0 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> 1 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> 2 {VALUE._col0} {VALUE._col1} {VALUE._col2}
> handleSkewJoin: false
> outputColumnNames: _col0, _col1, _col2, _col5, _col6, _col7, _col10, _col11, _col12
> Select Operator
> expressions:
> expr: _col0
> type: int
> expr: _col1
> type: string
> expr: _col2
> type: string
> expr: _col5
> type: int
> expr: _col6
> type: string
> expr: _col7
> type: string
> expr: _col10
> type: int
> expr: _col11
> type: string
> expr: _col12
> type: string
> outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8
> File Output Operator
> compressed: false
> GlobalTableId: 0
> table:
> input format: org.apache.hadoop.mapred.TextInputFormat
> output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> Stage: Stage-0
> Fetch Operator
> limit: -1
> {noformat}
> The current behavior is actually stranger than this: for a left outer join (similarly for a right outer join), hive finds the leftmost alias referred to in the *predicates* of left outer joins and rejects any alias to the right of it for pushdown. So in this query the filter "t2.id=20" pushed down:
> {noformat}
> explain select * from t1 join t2 on (t1.id=t2.id) left outer join t3 on (t2.id=t3.id) where t2.id=20;
> {noformat}
> while it isn't here:
> {noformat}
> explain select * from t1 join t2 on (t1.id=t2.id) left outer join t3 on (t1.id=t3.id) where t2.id=20;
> {noformat}
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira