You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Paul Rogers (JIRA)" <ji...@apache.org> on 2019/03/01 17:50:00 UTC

[jira] [Assigned] (IMPALA-7997) Rewrites omitted for ON clause, test case expect wrong plans

     [ https://issues.apache.org/jira/browse/IMPALA-7997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Paul Rogers reassigned IMPALA-7997:
-----------------------------------

    Assignee:     (was: Paul Rogers)

> Rewrites omitted for ON clause, test case expect wrong plans
> ------------------------------------------------------------
>
>                 Key: IMPALA-7997
>                 URL: https://issues.apache.org/jira/browse/IMPALA-7997
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Frontend
>    Affects Versions: Impala 3.1.0
>            Reporter: Paul Rogers
>            Priority: Minor
>
> Bugs in the rewrite code prevented the {{ON}} clause from being subject to the full set of rewrite rules. In particular, it appears that the “simplify conditions” rule was not applied. An effort to fix rewrites modified the code to correctly apply rewrites for {{ON}}. This exposed a bug in the planner for certain unrealistic, pathological queries.
> {{PlannerTest.empy.test}} has the following test case:
> {code:sql}
> # Constant conjunct in the ON-clause of an outer join is
> # assigned to the join.
> select *
> from functional.alltypessmall a
> right outer join functional.alltypestiny b
> on (a.id = b.id and !true)
> {code}
> Note the {{ON}} clause meaning, which is realized after rewrites:
> {noformat}
> a.id = b.id and !true --> a.id = b.id AND FALSE --> FALSE
> {noformat}
> That is, no rows in the join match. Currently expected (incorrect) plan:
> {noformat}
> PLAN-ROOT SINK
> |
> 02:HASH JOIN [RIGHT OUTER JOIN]
> |  hash predicates: a.id = b.id
> |  other join predicates: FALSE
> |  runtime filters: RF000 <- b.id
> |
> |--01:SCAN HDFS [functional.alltypestiny b]
> |     partitions=4/4 files=4 size=460B
> |
> 00:SCAN HDFS [functional.alltypessmall a]
>    partitions=4/4 files=4 size=6.32KB
>    runtime filters: RF000 -> a.id
> {noformat}
> Note the attempt to push the runtime filter for {{b.id}} into the left HDFS scan, then the query throws away the rows because of the {{FALSE}} join condition.
> Once rewrites are applied, we get a new plan, which is actually worse (see IMPALA-7996):
> {noformat}
> PLAN-ROOT SINK
> |
> 02:NESTED LOOP JOIN [RIGHT OUTER JOIN]
> |  join predicates: FALSE
> |
> |--01:SCAN HDFS [functional.alltypestiny b]
> |     partitions=4/4 files=4 size=460B
> |
> 00:SCAN HDFS [functional.alltypessmall a]
>    partitions=4/4 files=4 size=6.32KB
> {noformat}
> Similar case:
> {code:sql}
> # Constant conjunct in the ON-clause of an outer join is
> # assigned to the join.
> select *
> from functional.alltypessmall a
> left outer join functional.alltypestiny b
> on (a.id = b.id and 1 + 1 > 10)
> {code}
> Reasoning:
> {noformat}
> a.id = b.id and 1 + 1 > 10 --> a.id = b.id AND 2 > 10
>   --> a.id = b.id AND FALSE --> FALSE
> {noformat}
> The current expected plan:
> {noformat}
> PLAN-ROOT SINK
> |
> 02:HASH JOIN [LEFT OUTER JOIN]
> |  hash predicates: a.id = b.id
> |  other join predicates: FALSE
> |
> |--01:SCAN HDFS [functional.alltypestiny b]
> |     partitions=4/4 files=4 size=460B
> |
> 00:SCAN HDFS [functional.alltypessmall a]
>    partitions=4/4 files=4 size=6.32KB
> {noformat}
> The revised plan, after rewrites folding:
> {noformat}
> PLAN-ROOT SINK
> |
> 02:NESTED LOOP JOIN [LEFT OUTER JOIN]
> |  join predicates: FALSE
> |
> |--01:SCAN HDFS [functional.alltypestiny b]
> |     partitions=4/4 files=4 size=460B
> |
> 00:SCAN HDFS [functional.alltypessmall a]
>    partitions=4/4 files=4 size=6.32KB
> {noformat}
> The lack of simplification also shows up in {{resource-requirements.test}}
> Previous:
> {noformat}
> Analyzed query: SELECT c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice,
> sum(l_quantity) FROM tpch.customer, tpch.orders, tpch.lineitem LEFT SEMI JOIN
> (SELECT l_orderkey FROM tpch.lineitem GROUP BY l_orderkey HAVING $ao$1 /*
> sum(l_quantity) */ > CAST(300 AS DECIMAL(5,0))) `$a$1` (`$c$1`) ON o_orderkey =
> `$a$1`.`$c$1` WHERE TRUE AND c_custkey = o_custkey AND o_orderkey = l_orderkey
> GROUP BY c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice ORDER BY
> o_totalprice DESC, o_orderdate ASC LIMIT CAST(100 AS TINYINT)
> {noformat}
> Revised:
> {noformat}
> Analyzed query: SELECT c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice,
> sum(l_quantity) FROM tpch.customer, tpch.orders, tpch.lineitem LEFT SEMI JOIN
> (SELECT l_orderkey FROM tpch.lineitem GROUP BY l_orderkey HAVING $ao$1 /*
> sum(l_quantity) */ > CAST(300 AS DECIMAL(5,0))) `$a$1` (`$c$1`) ON o_orderkey =
> `$a$1`.`$c$1` WHERE c_custkey = o_custkey AND o_orderkey = l_orderkey GROUP BY
> c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice ORDER BY o_totalprice
> DESC, o_orderdate ASC LIMIT CAST(100 AS TINYINT)
> {noformat}
> Notice the elimination of {{TRUE AND}} in the revised {{WHERE}} clause.
> Also in {{subquery-rewrite.test}}, previous:
> {noformat}
> |     predicates: FALSE OR functional.alltypestiny.int_col IS NULL OR functional.alltypestiny.int_col = 1
> {noformat}
> Revised:
> {noformat}
> |     predicates: functional.alltypestiny.int_col IS NULL OR functional.alltypestiny.int_col = 1
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org