You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "liuyao (Jira)" <ji...@apache.org> on 2021/08/27 10:06:00 UTC

[jira] [Created] (IMPALA-10891) Avoid hash exchanges in more situations

liuyao created IMPALA-10891:
-------------------------------

             Summary: Avoid hash exchanges in more situations
                 Key: IMPALA-10891
                 URL: https://issues.apache.org/jira/browse/IMPALA-10891
             Project: IMPALA
          Issue Type: Improvement
          Components: Frontend
    Affects Versions: Impala 4.0.0
            Reporter: liuyao
            Assignee: liuyao


if the partition keys of parent fragment and child fragment have an intersecion,  and the intersection  keys have a high cardinality,  We can think parent fragment and child fragment to have compatible partitions

 

The cardinality of user_id is 10000000.

 

This case has an unnecessary hash exchange:

[localhost.localdomain:21000] rawdata> Explain select count(*) from event_ros_p7 a join /* +shuffle */ event_ros_p1 b on a.user_id = b.user_id and a.month_id = b.month_id group by a.user_id, b.week_id;
Query: Explain select count(*) from event_ros_p7 a join /* +shuffle */ event_ros_p1 b on a.user_id = b.user_id and a.month_id = b.month_id group by a.user_id, b.week_id
+------------------------------------------------------------------------------------+
| Explain String |
+------------------------------------------------------------------------------------+
| Max Per-Host Resource Reservation: Memory=81.94MB Threads=6 |
| Per-Host Resource Estimates: Memory=324MB |
| WARNING: The following tables are missing relevant table and/or column statistics. |
| rawdata.event_ros_p1 |
| |
| PLAN-ROOT SINK |
| | |
| 08:EXCHANGE [UNPARTITIONED] |
| | |
| 07:AGGREGATE [FINALIZE] |
| | output: count:merge(*) |
| | group by: a.user_id, b.week_id |
| | row-size=20B cardinality=60.18M |
| | |
| 06:EXCHANGE [HASH(a.user_id,b.week_id)] |
| | |
| 03:AGGREGATE [STREAMING] |
| | output: count(*) |
| | group by: a.user_id, b.week_id |
| | row-size=20B cardinality=60.18M |
| | |
| 02:HASH JOIN [INNER JOIN, PARTITIONED] |
| | hash predicates: a.month_id = b.month_id, a.user_id = b.user_id |
| | runtime filters: RF000 <- b.month_id, RF001 <- b.user_id |
| | row-size=28B cardinality=60.18M |
| | |
| |--05:EXCHANGE [HASH(b.month_id,b.user_id)] |
| | | |
| | 01:SCAN HDFS [rawdata.event_ros_p1 b] |
| | partitions=0/0 files=0 size=0B |
| | row-size=16B cardinality=0 |
| | |
| 04:EXCHANGE [HASH(a.month_id,a.user_id)] |
| | |
| 00:SCAN HDFS [rawdata.event_ros_p7 a] |
| partitions=20/22 files=1346 size=10.48GB |
| runtime filters: RF000 -> a.month_id, RF001 -> a.user_id |
| row-size=12B cardinality=60.18M |
+------------------------------------------------------------------------------------+

 

This execution plan works as expected, It avoids redundant shuffle and Pre agg:
+------------------------------------------------------------------------------------+
| Explain String |
+------------------------------------------------------------------------------------+
| Max Per-Host Resource Reservation: Memory=47.94MB Threads=5 |
| Per-Host Resource Estimates: Memory=186MB |
| WARNING: The following tables are missing relevant table and/or column statistics. |
| rawdata.event_ros_p1 |
| |
| PLAN-ROOT SINK |
| | |
| 06:EXCHANGE [UNPARTITIONED] |
| | |
| 03:AGGREGATE [FINALIZE] |
| | output: count(*) |
| | group by: a.user_id, b.week_id |
| | row-size=20B cardinality=60.18M |
| | |
| 02:HASH JOIN [INNER JOIN, PARTITIONED] |
| | hash predicates: a.month_id = b.month_id, a.user_id = b.user_id |
| | runtime filters: RF000 <- b.month_id, RF001 <- b.user_id |
| | row-size=28B cardinality=60.18M |
| | |
| |--05:EXCHANGE [HASH(b.user_id)] |
| | | |
| | 01:SCAN HDFS [rawdata.event_ros_p1 b] |
| | partitions=0/0 files=0 size=0B |
| | row-size=16B cardinality=0 |
| | |
| 04:EXCHANGE [HASH(a.user_id)] |
| | |
| 00:SCAN HDFS [rawdata.event_ros_p7 a] |
| partitions=20/22 files=1346 size=10.48GB |
| runtime filters: RF000 -> a.month_id, RF001 -> a.user_id |
| row-size=12B cardinality=60.18M |
+------------------------------------------------------------------------------------+

 

 

 

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org