You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Alessandro Solimando (Jira)" <ji...@apache.org> on 2022/11/10 15:50:00 UTC
[jira] [Updated] (HIVE-26722) HiveFilterSetOpTransposeRule incorrectly prune UNION ALL operands
[ https://issues.apache.org/jira/browse/HIVE-26722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Alessandro Solimando updated HIVE-26722:
----------------------------------------
Description:
Consider the following query:
{code:java}
set hive.cbo.rule.exclusion.regex=ReduceExpressionsRule\(Project\);
CREATE EXTERNAL TABLE t (a string, b string);
INSERT INTO t VALUES ('1000', 'b1');
INSERT INTO t VALUES ('2000', 'b2');
SELECT * FROM (
SELECT
a,
b
FROM t
UNION ALL
SELECT
a,
CAST(NULL AS string)
FROM t) AS t2
WHERE a = 1000;EXPLAIN CBO
SELECT * FROM (
SELECT
a,
b
FROM t
UNION ALL
SELECT
a,
CAST(NULL AS string)
FROM t) AS t2
WHERE a = 1000; {code}
The expected result is:
{code:java}
1000 b1
1000 NULL{code}
An example of correct plan is as follows:
{noformat}
CBO PLAN:
HiveUnion(all=[true])
HiveProject(a=[$0], b=[$1])
HiveFilter(condition=[=(CAST($0):DOUBLE, 1000)])
HiveTableScan(table=[[default, t]], table:alias=[t])
HiveProject(a=[$0], _o__c1=[null:VARCHAR(2147483647) CHARACTER SET "UTF-16LE"])
HiveFilter(condition=[=(CAST($0):DOUBLE, 1000)])
HiveTableScan(table=[[default, t]], table:alias=[t]){noformat}
Consider now a scenario where expression reduction in projections is disabled by setting the following property{_}:{_}
{noformat}
set hive.cbo.rule.exclusion.regex=ReduceExpressionsRule\(Project\);
{noformat}
In this case, the simplification of _CAST(NULL)_ into _NULL_ does not happen, and we get the following (invalid) result:
{code:java}
1000 b1{code}
produced by the following invalid plan:
CBO PLAN:
HiveProject(a=[$0], b=[$1])
HiveFilter(condition=[=(CAST($0):DOUBLE, 1000)])
HiveTableScan(table=[[default, t]], table:alias=[t])
Problem analysis:
At [HiveFilterSetOpTransposeRule.java#L112|https://github.com/apache/hive/blob/297f510d3b581c9d4079e42caa28aa84f8486012/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveFilterSetOpTransposeRule.java#L112] the _RelMetadataQuery::getPulledUpPredicates_ method infers the following predicate due to the CAST(NULL) in the projection:
{code:java}
(=($1, CAST(null:NULL):VARCHAR(2147483647) CHARACTER SET "UTF-16LE")){code}
When the CAST is simplified to the NULL literal, the IS_NULL($1) predicate is inferred.
In [HiveFilterSetOpTransposeRule.java#L114-L122|https://github.com/apache/hive/blob/297f510d3b581c9d4079e42caa28aa84f8486012/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveFilterSetOpTransposeRule.java#L114-L122], the rule checks if the conjunction of the predicate coming from the filter (here =(CAST($0):DOUBLE, 1000)) and the inferred predicates is satisfiable or not, under the _UnknownAsFalse_ semantics.
To summarize, the following expression is simplified under the _UnknownAsFalse_ semantics:
{code:java}
AND((=($1, CAST(null:NULL):VARCHAR(2147483647) CHARACTER SET "UTF-16LE")), =(CAST($0):DOUBLE, 1000))
{code}
Under In such semantics, (=($1, CAST(null:NULL):...) evaluates to {_}FALSE{_}, because no value is equal to NULL (even NULL itself), AND(FALSE, =(CAST($0):DOUBLE, 1000)) necessarily evaluates to _FALSE_ altogether, and the UNION ALL operand is pruned.
Only by chance, when _CAST(NULL)_ is simplified to _NULL,_ we avoid the issue, due to the _IS_NULL($1)_ inferred predicate, see [HiveRelMdPredicates.java#L153-L156|https://github.com/apache/hive/blob/297f510d3b581c9d4079e42caa28aa84f8486012/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdPredicates.java#L153-L156] for understanding how the NULL literal is treated differently during predicate inference.
The problem lies in the fact that, depending on the input _RelNode_ that we infer predicates from, the semantics is not necessarily {_}UnknownAsFalse{_}, but it might be {_}UnknownAsUnknown{_}, like for {_}Project{_}, as in this case.
Solution: in order to correctly simplify a predicate and test if it's always false or not, we should build RexSimplify with _predicates_ as the list of predicates known to hold in the context. In this way, the different semantics are correctly taken into account.
The code at [HiveFilterSetOpTransposeRule.java#L114-L121|https://github.com/apache/hive/blob/297f510d3b581c9d4079e42caa28aa84f8486012/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveFilterSetOpTransposeRule.java#L114-L121] should be replaced by the following:
{code:java}
final RexExecutor executor =
Util.first(filterRel.getCluster().getPlanner().getExecutor(), RexUtil.EXECUTOR);
final RexSimplify simplify = new RexSimplify(rexBuilder, predicates, executor);
final RexNode x = simplify.simplifyUnknownAs(newCondition, RexUnknownAs.FALSE);{code}
was:
Consider the following query:
{code:java}
set hive.cbo.rule.exclusion.regex=ReduceExpressionsRule\(Project\);
CREATE EXTERNAL TABLE t (a string, b string);
INSERT INTO t VALUES ('1000', 'b1');
INSERT INTO t VALUES ('2000', 'b2');
SELECT * FROM (
SELECT
a,
b
FROM t
UNION ALL
SELECT
a,
CAST(NULL AS string)
FROM t) AS t2
WHERE a = 1000;EXPLAIN CBO
SELECT * FROM (
SELECT
a,
b
FROM t
UNION ALL
SELECT
a,
CAST(NULL AS string)
FROM t) AS t2
WHERE a = 1000; {code}
The expected result is:
{code:java}
1000 b1
1000 NULL{code}
An example of correct plan is as follows:
{noformat}
CBO PLAN:
HiveUnion(all=[true])
HiveProject(a=[$0], b=[$1])
HiveFilter(condition=[=(CAST($0):DOUBLE, 1000)])
HiveTableScan(table=[[default, t]], table:alias=[t])
HiveProject(a=[$0], _o__c1=[null:VARCHAR(2147483647) CHARACTER SET "UTF-16LE"])
HiveFilter(condition=[=(CAST($0):DOUBLE, 1000)])
HiveTableScan(table=[[default, t]], table:alias=[t]){noformat}
Consider now a scenario where expression reduction in projections is disabled by setting the following property{_}:{_}
{noformat}
set hive.cbo.rule.exclusion.regex=ReduceExpressionsRule\(Project\);
{noformat}
In this case, the simplification of _CAST(NULL)_ into _NULL_ does not happen, and we get the following (invalid) result:
1000 b1
> HiveFilterSetOpTransposeRule incorrectly prune UNION ALL operands
> -----------------------------------------------------------------
>
> Key: HIVE-26722
> URL: https://issues.apache.org/jira/browse/HIVE-26722
> Project: Hive
> Issue Type: Bug
> Components: CBO
> Affects Versions: 4.0.0-alpha-1
> Reporter: Alessandro Solimando
> Assignee: Alessandro Solimando
> Priority: Major
>
> Consider the following query:
> {code:java}
> set hive.cbo.rule.exclusion.regex=ReduceExpressionsRule\(Project\);
> CREATE EXTERNAL TABLE t (a string, b string);
> INSERT INTO t VALUES ('1000', 'b1');
> INSERT INTO t VALUES ('2000', 'b2');
> SELECT * FROM (
> SELECT
> a,
> b
> FROM t
> UNION ALL
> SELECT
> a,
> CAST(NULL AS string)
> FROM t) AS t2
> WHERE a = 1000;EXPLAIN CBO
> SELECT * FROM (
> SELECT
> a,
> b
> FROM t
> UNION ALL
> SELECT
> a,
> CAST(NULL AS string)
> FROM t) AS t2
> WHERE a = 1000; {code}
>
> The expected result is:
> {code:java}
> 1000 b1
> 1000 NULL{code}
> An example of correct plan is as follows:
> {noformat}
> CBO PLAN:
> HiveUnion(all=[true])
> HiveProject(a=[$0], b=[$1])
> HiveFilter(condition=[=(CAST($0):DOUBLE, 1000)])
> HiveTableScan(table=[[default, t]], table:alias=[t])
> HiveProject(a=[$0], _o__c1=[null:VARCHAR(2147483647) CHARACTER SET "UTF-16LE"])
> HiveFilter(condition=[=(CAST($0):DOUBLE, 1000)])
> HiveTableScan(table=[[default, t]], table:alias=[t]){noformat}
>
> Consider now a scenario where expression reduction in projections is disabled by setting the following property{_}:{_}
> {noformat}
> set hive.cbo.rule.exclusion.regex=ReduceExpressionsRule\(Project\);
> {noformat}
> In this case, the simplification of _CAST(NULL)_ into _NULL_ does not happen, and we get the following (invalid) result:
> {code:java}
> 1000 b1{code}
>
> produced by the following invalid plan:
> CBO PLAN:
> HiveProject(a=[$0], b=[$1])
> HiveFilter(condition=[=(CAST($0):DOUBLE, 1000)])
> HiveTableScan(table=[[default, t]], table:alias=[t])
>
> Problem analysis:
> At [HiveFilterSetOpTransposeRule.java#L112|https://github.com/apache/hive/blob/297f510d3b581c9d4079e42caa28aa84f8486012/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveFilterSetOpTransposeRule.java#L112] the _RelMetadataQuery::getPulledUpPredicates_ method infers the following predicate due to the CAST(NULL) in the projection:
>
> {code:java}
> (=($1, CAST(null:NULL):VARCHAR(2147483647) CHARACTER SET "UTF-16LE")){code}
>
> When the CAST is simplified to the NULL literal, the IS_NULL($1) predicate is inferred.
> In [HiveFilterSetOpTransposeRule.java#L114-L122|https://github.com/apache/hive/blob/297f510d3b581c9d4079e42caa28aa84f8486012/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveFilterSetOpTransposeRule.java#L114-L122], the rule checks if the conjunction of the predicate coming from the filter (here =(CAST($0):DOUBLE, 1000)) and the inferred predicates is satisfiable or not, under the _UnknownAsFalse_ semantics.
> To summarize, the following expression is simplified under the _UnknownAsFalse_ semantics:
>
> {code:java}
> AND((=($1, CAST(null:NULL):VARCHAR(2147483647) CHARACTER SET "UTF-16LE")), =(CAST($0):DOUBLE, 1000))
> {code}
> Under In such semantics, (=($1, CAST(null:NULL):...) evaluates to {_}FALSE{_}, because no value is equal to NULL (even NULL itself), AND(FALSE, =(CAST($0):DOUBLE, 1000)) necessarily evaluates to _FALSE_ altogether, and the UNION ALL operand is pruned.
> Only by chance, when _CAST(NULL)_ is simplified to _NULL,_ we avoid the issue, due to the _IS_NULL($1)_ inferred predicate, see [HiveRelMdPredicates.java#L153-L156|https://github.com/apache/hive/blob/297f510d3b581c9d4079e42caa28aa84f8486012/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdPredicates.java#L153-L156] for understanding how the NULL literal is treated differently during predicate inference.
> The problem lies in the fact that, depending on the input _RelNode_ that we infer predicates from, the semantics is not necessarily {_}UnknownAsFalse{_}, but it might be {_}UnknownAsUnknown{_}, like for {_}Project{_}, as in this case.
> Solution: in order to correctly simplify a predicate and test if it's always false or not, we should build RexSimplify with _predicates_ as the list of predicates known to hold in the context. In this way, the different semantics are correctly taken into account.
> The code at [HiveFilterSetOpTransposeRule.java#L114-L121|https://github.com/apache/hive/blob/297f510d3b581c9d4079e42caa28aa84f8486012/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveFilterSetOpTransposeRule.java#L114-L121] should be replaced by the following:
>
> {code:java}
> final RexExecutor executor =
> Util.first(filterRel.getCluster().getPlanner().getExecutor(), RexUtil.EXECUTOR);
> final RexSimplify simplify = new RexSimplify(rexBuilder, predicates, executor);
> final RexNode x = simplify.simplifyUnknownAs(newCondition, RexUnknownAs.FALSE);{code}
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)