You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "liuyao (Jira)" <ji...@apache.org> on 2021/08/27 10:07:00 UTC

[jira] [Updated] (IMPALA-10891) Avoid hash exchanges in more situations

     [ https://issues.apache.org/jira/browse/IMPALA-10891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

liuyao updated IMPALA-10891:
----------------------------
    Description: 
if the partition keys of parent fragment and child fragment have an intersecion,  and the intersection  keys have a high cardinality,  We can think parent fragment and child fragment to have compatible partitions

 

The cardinality of user_id is 10000000.

 

This case has an unnecessary hash exchange:

 

This execution plan works as expected, It avoids redundant shuffle and Pre agg:
 +------------------------------------------------------------------------------------+
|Explain String|

+------------------------------------------------------------------------------------+
|Max Per-Host Resource Reservation: Memory=47.94MB Threads=5|
|Per-Host Resource Estimates: Memory=186MB|
|WARNING: The following tables are missing relevant table and/or column statistics.|
|rawdata.event_ros_p1|
| |
|PLAN-ROOT SINK|
| | |
|06:EXCHANGE [UNPARTITIONED]|
| | |
|03:AGGREGATE [FINALIZE]|
| |output: count(*)|
| |group by: a.user_id, b.week_id|
| |row-size=20B cardinality=60.18M|
| | |
|02:HASH JOIN [INNER JOIN, PARTITIONED]|
| |hash predicates: a.month_id = b.month_id, a.user_id = b.user_id|
| |runtime filters: RF000 <- b.month_id, RF001 <- b.user_id|
| |row-size=28B cardinality=60.18M|
| | |
| |--05:EXCHANGE [HASH(b.user_id)]|
| | | |
| |01:SCAN HDFS [rawdata.event_ros_p1 b]|
| |partitions=0/0 files=0 size=0B|
| |row-size=16B cardinality=0|
| | |
|04:EXCHANGE [HASH(a.user_id)]|
| | |
|00:SCAN HDFS [rawdata.event_ros_p7 a]|
|partitions=20/22 files=1346 size=10.48GB|
|runtime filters: RF000 -> a.month_id, RF001 -> a.user_id|
|row-size=12B cardinality=60.18M|

+------------------------------------------------------------------------------------+

 

 

 

 

 

  was:
if the partition keys of parent fragment and child fragment have an intersecion,  and the intersection  keys have a high cardinality,  We can think parent fragment and child fragment to have compatible partitions

 

The cardinality of user_id is 10000000.

 

This case has an unnecessary hash exchange:

[localhost.localdomain:21000] rawdata> Explain select count(*) from event_ros_p7 a join /* +shuffle */ event_ros_p1 b on a.user_id = b.user_id and a.month_id = b.month_id group by a.user_id, b.week_id;
Query: Explain select count(*) from event_ros_p7 a join /* +shuffle */ event_ros_p1 b on a.user_id = b.user_id and a.month_id = b.month_id group by a.user_id, b.week_id
+------------------------------------------------------------------------------------+
| Explain String |
+------------------------------------------------------------------------------------+
| Max Per-Host Resource Reservation: Memory=81.94MB Threads=6 |
| Per-Host Resource Estimates: Memory=324MB |
| WARNING: The following tables are missing relevant table and/or column statistics. |
| rawdata.event_ros_p1 |
| |
| PLAN-ROOT SINK |
| | |
| 08:EXCHANGE [UNPARTITIONED] |
| | |
| 07:AGGREGATE [FINALIZE] |
| | output: count:merge(*) |
| | group by: a.user_id, b.week_id |
| | row-size=20B cardinality=60.18M |
| | |
| 06:EXCHANGE [HASH(a.user_id,b.week_id)] |
| | |
| 03:AGGREGATE [STREAMING] |
| | output: count(*) |
| | group by: a.user_id, b.week_id |
| | row-size=20B cardinality=60.18M |
| | |
| 02:HASH JOIN [INNER JOIN, PARTITIONED] |
| | hash predicates: a.month_id = b.month_id, a.user_id = b.user_id |
| | runtime filters: RF000 <- b.month_id, RF001 <- b.user_id |
| | row-size=28B cardinality=60.18M |
| | |
| |--05:EXCHANGE [HASH(b.month_id,b.user_id)] |
| | | |
| | 01:SCAN HDFS [rawdata.event_ros_p1 b] |
| | partitions=0/0 files=0 size=0B |
| | row-size=16B cardinality=0 |
| | |
| 04:EXCHANGE [HASH(a.month_id,a.user_id)] |
| | |
| 00:SCAN HDFS [rawdata.event_ros_p7 a] |
| partitions=20/22 files=1346 size=10.48GB |
| runtime filters: RF000 -> a.month_id, RF001 -> a.user_id |
| row-size=12B cardinality=60.18M |
+------------------------------------------------------------------------------------+

 

This execution plan works as expected, It avoids redundant shuffle and Pre agg:
+------------------------------------------------------------------------------------+
| Explain String |
+------------------------------------------------------------------------------------+
| Max Per-Host Resource Reservation: Memory=47.94MB Threads=5 |
| Per-Host Resource Estimates: Memory=186MB |
| WARNING: The following tables are missing relevant table and/or column statistics. |
| rawdata.event_ros_p1 |
| |
| PLAN-ROOT SINK |
| | |
| 06:EXCHANGE [UNPARTITIONED] |
| | |
| 03:AGGREGATE [FINALIZE] |
| | output: count(*) |
| | group by: a.user_id, b.week_id |
| | row-size=20B cardinality=60.18M |
| | |
| 02:HASH JOIN [INNER JOIN, PARTITIONED] |
| | hash predicates: a.month_id = b.month_id, a.user_id = b.user_id |
| | runtime filters: RF000 <- b.month_id, RF001 <- b.user_id |
| | row-size=28B cardinality=60.18M |
| | |
| |--05:EXCHANGE [HASH(b.user_id)] |
| | | |
| | 01:SCAN HDFS [rawdata.event_ros_p1 b] |
| | partitions=0/0 files=0 size=0B |
| | row-size=16B cardinality=0 |
| | |
| 04:EXCHANGE [HASH(a.user_id)] |
| | |
| 00:SCAN HDFS [rawdata.event_ros_p7 a] |
| partitions=20/22 files=1346 size=10.48GB |
| runtime filters: RF000 -> a.month_id, RF001 -> a.user_id |
| row-size=12B cardinality=60.18M |
+------------------------------------------------------------------------------------+

 

 

 

 

 


> Avoid hash exchanges in more situations
> ---------------------------------------
>
>                 Key: IMPALA-10891
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10891
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Frontend
>    Affects Versions: Impala 4.0.0
>            Reporter: liuyao
>            Assignee: liuyao
>            Priority: Major
>
> if the partition keys of parent fragment and child fragment have an intersecion,  and the intersection  keys have a high cardinality,  We can think parent fragment and child fragment to have compatible partitions
>  
> The cardinality of user_id is 10000000.
>  
> This case has an unnecessary hash exchange:
>  
> This execution plan works as expected, It avoids redundant shuffle and Pre agg:
>  +------------------------------------------------------------------------------------+
> |Explain String|
> +------------------------------------------------------------------------------------+
> |Max Per-Host Resource Reservation: Memory=47.94MB Threads=5|
> |Per-Host Resource Estimates: Memory=186MB|
> |WARNING: The following tables are missing relevant table and/or column statistics.|
> |rawdata.event_ros_p1|
> | |
> |PLAN-ROOT SINK|
> | | |
> |06:EXCHANGE [UNPARTITIONED]|
> | | |
> |03:AGGREGATE [FINALIZE]|
> | |output: count(*)|
> | |group by: a.user_id, b.week_id|
> | |row-size=20B cardinality=60.18M|
> | | |
> |02:HASH JOIN [INNER JOIN, PARTITIONED]|
> | |hash predicates: a.month_id = b.month_id, a.user_id = b.user_id|
> | |runtime filters: RF000 <- b.month_id, RF001 <- b.user_id|
> | |row-size=28B cardinality=60.18M|
> | | |
> | |--05:EXCHANGE [HASH(b.user_id)]|
> | | | |
> | |01:SCAN HDFS [rawdata.event_ros_p1 b]|
> | |partitions=0/0 files=0 size=0B|
> | |row-size=16B cardinality=0|
> | | |
> |04:EXCHANGE [HASH(a.user_id)]|
> | | |
> |00:SCAN HDFS [rawdata.event_ros_p7 a]|
> |partitions=20/22 files=1346 size=10.48GB|
> |runtime filters: RF000 -> a.month_id, RF001 -> a.user_id|
> |row-size=12B cardinality=60.18M|
> +------------------------------------------------------------------------------------+
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org