You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Aman Sinha (Jira)" <ji...@apache.org> on 2021/02/20 23:50:00 UTC

[jira] [Commented] (IMPALA-9356) The predicates that the tuple ids involved are empty migrate to outer-joined inline view or real table

    [ https://issues.apache.org/jira/browse/IMPALA-9356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17287829#comment-17287829 ] 

Aman Sinha commented on IMPALA-9356:
------------------------------------

I didn't get a chance to review the patch sooner... sorry about that.  I think for non-deterministic functions such as rand(), it would be wrong to push to both the scans in case of the Full OJ.  I think it should only be evaluated by the Hash Join above.  The reason is the user query is expecting rand() to be evaluated **on a per row** basis for each row output by the full outer join. i.e it is not a one time computation at the beginning of the query. 
In the test query rand() = 2 is always false but if we were to replace it with rand() = 0.5 the plan could produce wrong results.
If we push to both scans, the left side rand() may evaluate to say 0.3 for a particular row and the right side scan may evaluate to 0.5 for a particular row.  This changes the semantics of the query.  Instead, if we keep it only as 'other predicates' at the Full Outer Hash Join, it will be evaluated correctly for each row.

For Right Outer Join, we should only push to the right child (the non-null producing side). 

> The predicates that the tuple ids involved are empty migrate to outer-joined inline view or real table
> ------------------------------------------------------------------------------------------------------
>
>                 Key: IMPALA-9356
>                 URL: https://issues.apache.org/jira/browse/IMPALA-9356
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Frontend
>    Affects Versions: Impala 3.3.0
>            Reporter: Xianqing He
>            Assignee: Xianqing He
>            Priority: Minor
>              Labels: correctness
>
> {code}
> SELECT COUNT(*)
> FROM (
>     SELECT id, upper(string_col) AS upper_val
>     FROM functional.alltypestiny
> ) a
>     FULL JOIN (
>         SELECT id, upper(string_col) AS upper_val
>         FROM functional.alltypestiny
>     ) b
>     ON a.id = b.id
> WHERE rand() = 12
> {code}
> The Plan
> {noformat}
> +------------------------------------------------------------+
> | Explain String                                             |
> +------------------------------------------------------------+
> | Max Per-Host Resource Reservation: Memory=1.95MB Threads=6 |
> | Per-Host Resource Estimates: Memory=86MB                   |
> | Codegen disabled by planner                                |
> |                                                            |
> | PLAN-ROOT SINK                                             |
> | |                                                          |
> | 07:AGGREGATE [FINALIZE]                                    |
> | |  output: count:merge(*)                                  |
> | |  row-size=8B cardinality=1                               |
> | |                                                          |
> | 06:EXCHANGE [UNPARTITIONED]                                |
> | |                                                          |
> | 03:AGGREGATE                                               |
> | |  output: count(*)                                        |
> | |  row-size=8B cardinality=1                               |
> | |                                                          |
> | 02:HASH JOIN [FULL OUTER JOIN, PARTITIONED]                |
> | |  hash predicates: id = id                                |
> | |  row-size=8B cardinality=9                               |
> | |                                                          |
> | |--05:EXCHANGE [HASH(id)]                                  |
> | |  |                                                       |
> | |  00:SCAN HDFS [functional.alltypestiny]                  |
> | |     HDFS partitions=4/4 files=4 size=460B                |
> | |     predicates: rand() = 12                              |
> | |     row-size=4B cardinality=1                            |
> | |                                                          |
> | 04:EXCHANGE [HASH(id)]                                     |
> | |                                                          |
> | 01:SCAN HDFS [functional.alltypestiny]                     |
> |    HDFS partitions=4/4 files=4 size=460B                   |
> |    row-size=4B cardinality=8                               |
> +------------------------------------------------------------+
> {noformat}
> The rand() returns a random value between 0 and 1 so "rand() = 12" will always be false. All rows should be rejected by the WHERE clause. If "rand() = 12" is evaluated in only one side, the other side can still produce rows. So the outer join will still have results.
> We can't migrate the predicate that the tuple ids involved are empty to outer-joined inline view, Also for real tables have this question.
> {code} 
> explain select 1 from functional.alltypestiny t1 full join functional.alltypestiny t2 on t1.id = t2.id where rand() = 2
> {code}
> {noformat}
> +------------------------------------------------------------+
> | Explain String                                             |
> +------------------------------------------------------------+
> | Max Per-Host Resource Reservation: Memory=1.95MB Threads=6 |
> | Per-Host Resource Estimates: Memory=66MB                   |
> | Codegen disabled by planner                                |
> |                                                            |
> | PLAN-ROOT SINK                                             |
> | |                                                          |
> | 05:EXCHANGE [UNPARTITIONED]                                |
> | |                                                          |
> | 02:HASH JOIN [FULL OUTER JOIN, PARTITIONED]                |
> | |  hash predicates: t2.id = t1.id                          |
> | |  row-size=8B cardinality=9                               |
> | |                                                          |
> | |--04:EXCHANGE [HASH(t1.id)]                               |
> | |  |                                                       |
> | |  00:SCAN HDFS [functional.alltypestiny t1]               |
> | |     HDFS partitions=4/4 files=4 size=460B                |
> | |     predicates: rand() = 2                               |
> | |     row-size=4B cardinality=1                            |
> | |                                                          |
> | 03:EXCHANGE [HASH(t2.id)]                                  |
> | |                                                          |
> | 01:SCAN HDFS [functional.alltypestiny t2]                  |
> |    HDFS partitions=4/4 files=4 size=460B                   |
> |    row-size=4B cardinality=8                               |
> +------------------------------------------------------------+
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org