You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Paul Rogers (JIRA)" <ji...@apache.org> on 2019/03/01 17:51:00 UTC

[jira] [Assigned] (IMPALA-7952) Planner creates non-normalized binary predicates

     [ https://issues.apache.org/jira/browse/IMPALA-7952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Paul Rogers reassigned IMPALA-7952:
-----------------------------------

    Assignee:     (was: Paul Rogers)

> Planner creates non-normalized binary predicates
> ------------------------------------------------
>
>                 Key: IMPALA-7952
>                 URL: https://issues.apache.org/jira/browse/IMPALA-7952
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Frontend
>    Affects Versions: Impala 3.1.0
>            Reporter: Paul Rogers
>            Priority: Minor
>
> The FE has a "normalize binary predicates" rule that puts slots on the left hand side:
> {noformat}
> 1 = id --> id = 1
> {noformat}
> Presumably this is useful. As the planner proceeds, it creates additional binary predicates, but tends to create them in the non-normalized form.
> Examples:
> * {{Expr.trySubstitute()}}
> * {{StmtRewriter.createJoinConjunct()}}
> * {{SingleNodePlanner.getNormalizedEqPred()}}
> * {{StmtRewriter.rewriteWhereClauseSubqueries()}}
> * {{HashjoinNode.init()}}
> Once rewrite rules are integrated into analysis, we end up with a conflict: should expressions created internally be exempt from some or all of the rewrite rules? Even from mandatory rules, such as this one?
> The solution is to allow such expressions to be rewritten to normalized form as part of the new integrate analyze-and-rewrite logic.
> Note that the {{trySubstitute()}} case needs more attention. Presumably the expressions put into the "smap" are analyzed, hence rewritten. If not, then there are probably other subtle bugs lurking in that code.
> Fixing this bug caused plans to change in {{PlannerTest.testJoins()}}. These changes suggest that one part of the analyzer works to create the "<slot> <op> <expr>" pattern, while other parts strive for the opposite, creating instability. Requires more research.
> {code:sql}
> # test that on-clause predicates referring to multiple tuple ids
> # get registered as eq join conjuncts
> select t1.*
> from (select * from functional.alltypestiny) t1
>   join (select * from functional.alltypestiny) t2 on (t1.id = t2.id)
>   join functional.alltypestiny t3 on (coalesce(t1.id, t2.id) = t3.id)
> {code}
> Plan before the fix:
> {noformat}
> PLAN-ROOT SINK
> |
> 04:HASH JOIN [INNER JOIN]
> |  hash predicates: coalesce(functional.alltypestiny.id, functional.alltypestiny.id) = t3.id
> |  runtime filters: RF000 <- t3.id
> |
> |--02:SCAN HDFS [functional.alltypestiny t3]
> |     partitions=4/4 files=4 size=460B
> |
> 03:HASH JOIN [INNER JOIN]
> |  hash predicates: functional.alltypestiny.id = functional.alltypestiny.id
> |  runtime filters: RF002 <- functional.alltypestiny.id
> |
> |--01:SCAN HDFS [functional.alltypestiny]
> |     partitions=4/4 files=4 size=460B
> |     runtime filters: RF000 -> coalesce(functional.alltypestiny.id, functional.alltypestiny.id)
> |
> 00:SCAN HDFS [functional.alltypestiny]
>    partitions=4/4 files=4 size=460B
>    runtime filters: RF000 -> coalesce(functional.alltypestiny.id, functional.alltypestiny.id), RF002 -> functional.alltypestiny.id
> {noformat}
> Plan after the fix, with the filter pushed further down the plan:
> {noformat}
> PLAN-ROOT SINK
> |
> 04:HASH JOIN [INNER JOIN]
> |  hash predicates: t3.id = coalesce(functional.alltypestiny.id, functional.alltypestiny.id)
> |
> |--02:SCAN HDFS [functional.alltypestiny t3]
> |     partitions=4/4 files=4 size=460B
> |
> 03:HASH JOIN [INNER JOIN]
> |  hash predicates: functional.alltypestiny.id = functional.alltypestiny.id
> |  runtime filters: RF002 <- functional.alltypestiny.id
> |
> |--01:SCAN HDFS [functional.alltypestiny]
> |     partitions=4/4 files=4 size=460B
> |
> 00:SCAN HDFS [functional.alltypestiny]
>    partitions=4/4 files=4 size=460B
>    runtime filters: RF002 -> functional.alltypestiny.id
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org