You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Yida Wu (Jira)" <ji...@apache.org> on 2020/11/13 21:16:00 UTC
[jira] [Commented] (IMPALA-9356) The predicates that the tuple ids
involved are empty migrate to outer-joined inline view or real table
[ https://issues.apache.org/jira/browse/IMPALA-9356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17231834#comment-17231834 ]
Yida Wu commented on IMPALA-9356:
---------------------------------
A Note: The issue doesn't happen when using left join:
*DDL*
CREATE EXTERNAL TABLE default.cus as select * from tpcds_parquet.customer;
*Query and Plan*
{code:java}
Analyzed query: SELECT count(*) FROM `default`.cus a LEFT OUTER JOIN
`default`.cus b ON a.c_customer_id = b.c_customer_id WHERE rand() = CAST(2 AS
DOUBLE)
F03:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1
| Per-Host Resources: mem-estimate=10.02MB mem-reservation=0B thread-reservation=1
PLAN-ROOT SINK
| output exprs: count(*)
| mem-estimate=0B mem-reservation=0B thread-reservation=0
|
07:AGGREGATE [FINALIZE]
| output: count:merge(*)
| mem-estimate=10.00MB mem-reservation=0B spill-buffer=2.00MB thread-reservation=0
| tuple-ids=2 row-size=8B cardinality=1
| in pipelines: 07(GETNEXT), 03(OPEN)
|
06:EXCHANGE [UNPARTITIONED]
| mem-estimate=16.00KB mem-reservation=0B thread-reservation=0
| tuple-ids=2 row-size=8B cardinality=1
| in pipelines: 03(GETNEXT)
|
F02:PLAN FRAGMENT [HASH(a.c_customer_id)] hosts=1 instances=1
Per-Host Resources: mem-estimate=14.02MB mem-reservation=2.94MB thread-reservation=1 runtime-filters-memory=1.00MB
03:AGGREGATE
| output: count(*)
| mem-estimate=10.00MB mem-reservation=0B spill-buffer=2.00MB thread-reservation=0
| tuple-ids=2 row-size=8B cardinality=1
| in pipelines: 03(GETNEXT), 01(OPEN)
|
02:HASH JOIN [RIGHT OUTER JOIN, PARTITIONED]
| hash predicates: b.c_customer_id = a.c_customer_id
| fk/pk conjuncts: assumed fk/pk
| runtime filters: RF000[bloom] <- a.c_customer_id
| mem-estimate=1.94MB mem-reservation=1.94MB spill-buffer=64.00KB thread-reservation=0
| tuple-ids=1N,0 row-size=24B cardinality=9.31K
| in pipelines: 01(GETNEXT), 00(OPEN)
|
|--05:EXCHANGE [HASH(a.c_customer_id)]
| | mem-estimate=125.10KB mem-reservation=0B thread-reservation=0
| | tuple-ids=0 row-size=12B cardinality=9.31K
| | in pipelines: 00(GETNEXT)
| |
| F01:PLAN FRAGMENT [RANDOM] hosts=1 instances=1
| Per-Host Resources: mem-estimate=48.00MB mem-reservation=8.00MB thread-reservation=2
| 00:SCAN HDFS [default.cus a, RANDOM]
| HDFS partitions=1/1 files=1 size=12.79MB
| predicates: rand() = CAST(2 AS DOUBLE)
| stored statistics:
| table: rows=unavailable size=unavailable
| columns: unavailable
| extrapolated-rows=disabled max-scan-range-rows=unavailable
| mem-estimate=48.00MB mem-reservation=8.00MB thread-reservation=1
| tuple-ids=0 row-size=12B cardinality=9.31K
| in pipelines: 00(GETNEXT)
|
04:EXCHANGE [HASH(b.c_customer_id)]
| mem-estimate=1.08MB mem-reservation=0B thread-reservation=0
| tuple-ids=1 row-size=12B cardinality=93.10K
| in pipelines: 01(GETNEXT)
|
F00:PLAN FRAGMENT [RANDOM] hosts=1 instances=1
Per-Host Resources: mem-estimate=49.00MB mem-reservation=9.00MB thread-reservation=2 runtime-filters-memory=1.00MB
01:SCAN HDFS [default.cus b, RANDOM]
HDFS partitions=1/1 files=1 size=12.79MB
runtime filters: RF000[bloom] -> b.c_customer_id
stored statistics:
table: rows=unavailable size=unavailable
columns: unavailable
extrapolated-rows=disabled max-scan-range-rows=unavailable
mem-estimate=48.00MB mem-reservation=8.00MB thread-reservation=1
tuple-ids=1 row-size=12B cardinality=93.10K
in pipelines: 01(GETNEXT)
{code}
But happens using right join and full join:
{code:java}
Analyzed query: SELECT count(*) FROM `default`.cus a FULL OUTER JOIN
`default`.cus b ON a.c_customer_id = b.c_customer_id WHERE rand() = CAST(2 AS
DOUBLE)
F03:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1
| Per-Host Resources: mem-estimate=10.02MB mem-reservation=0B thread-reservation=1
PLAN-ROOT SINK
| output exprs: count(*)
| mem-estimate=0B mem-reservation=0B thread-reservation=0
|
07:AGGREGATE [FINALIZE]
| output: count:merge(*)
| mem-estimate=10.00MB mem-reservation=0B spill-buffer=2.00MB thread-reservation=0
| tuple-ids=2 row-size=8B cardinality=1
| in pipelines: 07(GETNEXT), 03(OPEN)
|
06:EXCHANGE [UNPARTITIONED]
| mem-estimate=16.00KB mem-reservation=0B thread-reservation=0
| tuple-ids=2 row-size=8B cardinality=1
| in pipelines: 03(GETNEXT)
|
F02:PLAN FRAGMENT [RANDOM] hosts=1 instances=1
Per-Host Resources: mem-estimate=13.02MB mem-reservation=1.94MB thread-reservation=1
03:AGGREGATE
| output: count(*)
| mem-estimate=10.00MB mem-reservation=0B spill-buffer=2.00MB thread-reservation=0
| tuple-ids=2 row-size=8B cardinality=1
| in pipelines: 03(GETNEXT), 01(OPEN)
|
02:HASH JOIN [FULL OUTER JOIN, PARTITIONED]
| hash predicates: b.c_customer_id = a.c_customer_id
| fk/pk conjuncts: assumed fk/pk
| mem-estimate=1.94MB mem-reservation=1.94MB spill-buffer=64.00KB thread-reservation=0
| tuple-ids=1N,0N row-size=24B cardinality=102.41K
| in pipelines: 01(GETNEXT), 00(OPEN)
|
|--05:EXCHANGE [HASH(a.c_customer_id)]
| | mem-estimate=125.10KB mem-reservation=0B thread-reservation=0
| | tuple-ids=0 row-size=12B cardinality=9.31K
| | in pipelines: 00(GETNEXT)
| |
| F01:PLAN FRAGMENT [RANDOM] hosts=1 instances=1
| Per-Host Resources: mem-estimate=48.00MB mem-reservation=8.00MB thread-reservation=2
| 00:SCAN HDFS [default.cus a, RANDOM]
| HDFS partitions=1/1 files=1 size=12.79MB
| predicates: rand() = CAST(2 AS DOUBLE)
| stored statistics:
| table: rows=unavailable size=unavailable
| columns: unavailable
| extrapolated-rows=disabled max-scan-range-rows=unavailable
| mem-estimate=48.00MB mem-reservation=8.00MB thread-reservation=1
| tuple-ids=0 row-size=12B cardinality=9.31K
| in pipelines: 00(GETNEXT)
|
04:EXCHANGE [HASH(b.c_customer_id)]
| mem-estimate=1.08MB mem-reservation=0B thread-reservation=0
| tuple-ids=1 row-size=12B cardinality=93.10K
| in pipelines: 01(GETNEXT)
|
F00:PLAN FRAGMENT [RANDOM] hosts=1 instances=1
Per-Host Resources: mem-estimate=48.00MB mem-reservation=8.00MB thread-reservation=2
01:SCAN HDFS [default.cus b, RANDOM]
HDFS partitions=1/1 files=1 size=12.79MB
stored statistics:
table: rows=unavailable size=unavailable
columns: unavailable
extrapolated-rows=disabled max-scan-range-rows=unavailable
mem-estimate=48.00MB mem-reservation=8.00MB thread-reservation=1
tuple-ids=1 row-size=12B cardinality=93.10K
in pipelines: 01(GETNEXT)
{code}
> The predicates that the tuple ids involved are empty migrate to outer-joined inline view or real table
> ------------------------------------------------------------------------------------------------------
>
> Key: IMPALA-9356
> URL: https://issues.apache.org/jira/browse/IMPALA-9356
> Project: IMPALA
> Issue Type: Bug
> Components: Frontend
> Affects Versions: Impala 3.3.0
> Reporter: Xianqing He
> Assignee: Yida Wu
> Priority: Minor
> Labels: correctness
>
> {code}
> SELECT COUNT(*)
> FROM (
> SELECT id, upper(string_col) AS upper_val
> FROM functional.alltypestiny
> ) a
> FULL JOIN (
> SELECT id, upper(string_col) AS upper_val
> FROM functional.alltypestiny
> ) b
> ON a.id = b.id
> WHERE rand() = 12
> {code}
> The Plan
> {noformat}
> +------------------------------------------------------------+
> | Explain String |
> +------------------------------------------------------------+
> | Max Per-Host Resource Reservation: Memory=1.95MB Threads=6 |
> | Per-Host Resource Estimates: Memory=86MB |
> | Codegen disabled by planner |
> | |
> | PLAN-ROOT SINK |
> | | |
> | 07:AGGREGATE [FINALIZE] |
> | | output: count:merge(*) |
> | | row-size=8B cardinality=1 |
> | | |
> | 06:EXCHANGE [UNPARTITIONED] |
> | | |
> | 03:AGGREGATE |
> | | output: count(*) |
> | | row-size=8B cardinality=1 |
> | | |
> | 02:HASH JOIN [FULL OUTER JOIN, PARTITIONED] |
> | | hash predicates: id = id |
> | | row-size=8B cardinality=9 |
> | | |
> | |--05:EXCHANGE [HASH(id)] |
> | | | |
> | | 00:SCAN HDFS [functional.alltypestiny] |
> | | HDFS partitions=4/4 files=4 size=460B |
> | | predicates: rand() = 12 |
> | | row-size=4B cardinality=1 |
> | | |
> | 04:EXCHANGE [HASH(id)] |
> | | |
> | 01:SCAN HDFS [functional.alltypestiny] |
> | HDFS partitions=4/4 files=4 size=460B |
> | row-size=4B cardinality=8 |
> +------------------------------------------------------------+
> {noformat}
> The rand() returns a random value between 0 and 1 so "rand() = 12" will always be false. All rows should be rejected by the WHERE clause. If "rand() = 12" is evaluated in only one side, the other side can still produce rows. So the outer join will still have results.
> We can't migrate the predicate that the tuple ids involved are empty to outer-joined inline view, Also for real tables have this question.
> {code}
> explain select 1 from functional.alltypestiny t1 full join functional.alltypestiny t2 on t1.id = t2.id where rand() = 2
> {code}
> {noformat}
> +------------------------------------------------------------+
> | Explain String |
> +------------------------------------------------------------+
> | Max Per-Host Resource Reservation: Memory=1.95MB Threads=6 |
> | Per-Host Resource Estimates: Memory=66MB |
> | Codegen disabled by planner |
> | |
> | PLAN-ROOT SINK |
> | | |
> | 05:EXCHANGE [UNPARTITIONED] |
> | | |
> | 02:HASH JOIN [FULL OUTER JOIN, PARTITIONED] |
> | | hash predicates: t2.id = t1.id |
> | | row-size=8B cardinality=9 |
> | | |
> | |--04:EXCHANGE [HASH(t1.id)] |
> | | | |
> | | 00:SCAN HDFS [functional.alltypestiny t1] |
> | | HDFS partitions=4/4 files=4 size=460B |
> | | predicates: rand() = 2 |
> | | row-size=4B cardinality=1 |
> | | |
> | 03:EXCHANGE [HASH(t2.id)] |
> | | |
> | 01:SCAN HDFS [functional.alltypestiny t2] |
> | HDFS partitions=4/4 files=4 size=460B |
> | row-size=4B cardinality=8 |
> +------------------------------------------------------------+
> {noformat}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org