You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Quanlong Huang (JIRA)" <ji...@apache.org> on 2019/03/26 01:51:00 UTC
[jira] [Updated] (IMPALA-6031) Distributed plan describes
coordinator-only nodes as scanning
[ https://issues.apache.org/jira/browse/IMPALA-6031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Quanlong Huang updated IMPALA-6031:
-----------------------------------
Fix Version/s: Impala 3.1.0
Impala 2.13.0
> Distributed plan describes coordinator-only nodes as scanning
> -------------------------------------------------------------
>
> Key: IMPALA-6031
> URL: https://issues.apache.org/jira/browse/IMPALA-6031
> Project: IMPALA
> Issue Type: Bug
> Components: Frontend
> Affects Versions: Impala 2.11.0
> Reporter: Jim Apple
> Assignee: Pooja Nilangekar
> Priority: Major
> Fix For: Impala 2.13.0, Impala 3.1.0
>
>
> In a cluster with one coordinator-only node and three executor-only nodes:
> {noformat}
> Query: explain select count(*) from web_sales a, web_sales b where a.ws_order_number = b.ws_order_number and a.ws_item_sk = b.ws_item_sk
> +------------------------------------------------------------------------------------------+
> | Explain String |
> +------------------------------------------------------------------------------------------+
> | Per-Host Resource Reservation: Memory=136.00MB |
> | Per-Host Resource Estimates: Memory=3.04GB |
> | |
> | F03:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1 |
> | PLAN-ROOT SINK |
> | | mem-estimate=0B mem-reservation=0B |
> | | |
> | 07:AGGREGATE [FINALIZE] |
> | | output: count:merge(*) |
> | | mem-estimate=10.00MB mem-reservation=0B |
> | | tuple-ids=2 row-size=8B cardinality=1 |
> | | |
> | 06:EXCHANGE [UNPARTITIONED] |
> | mem-estimate=0B mem-reservation=0B |
> | tuple-ids=2 row-size=8B cardinality=1 |
> | |
> | F02:PLAN FRAGMENT [HASH(a.ws_item_sk,a.ws_order_number)] hosts=4 instances=4 |
> | DATASTREAM SINK [FRAGMENT=F03, EXCHANGE=06, UNPARTITIONED] |
> | | mem-estimate=0B mem-reservation=0B |
> | 03:AGGREGATE |
> | | output: count(*) |
> | | mem-estimate=10.00MB mem-reservation=0B |
> | | tuple-ids=2 row-size=8B cardinality=1 |
> | | |
> | 02:HASH JOIN [INNER JOIN, PARTITIONED] |
> | | hash predicates: a.ws_item_sk = b.ws_item_sk, a.ws_order_number = b.ws_order_number |
> | | runtime filters: RF000 <- b.ws_item_sk, RF001 <- b.ws_order_number |
> | | mem-estimate=2.95GB mem-reservation=136.00MB |
> | | tuple-ids=0,1 row-size=32B cardinality=720000376 |
> | | |
> | |--05:EXCHANGE [HASH(b.ws_item_sk,b.ws_order_number)] |
> | | mem-estimate=0B mem-reservation=0B |
> | | tuple-ids=1 row-size=16B cardinality=720000376 |
> | | |
> | 04:EXCHANGE [HASH(a.ws_item_sk,a.ws_order_number)] |
> | mem-estimate=0B mem-reservation=0B |
> | tuple-ids=0 row-size=16B cardinality=720000376 |
> | |
> | F00:PLAN FRAGMENT [RANDOM] hosts=4 instances=4 |
> | DATASTREAM SINK [FRAGMENT=F02, EXCHANGE=04, HASH(a.ws_item_sk,a.ws_order_number)] |
> | | mem-estimate=0B mem-reservation=0B |
> | 00:SCAN HDFS [tpcds_1000_parquet.web_sales a, RANDOM] |
> | partitions=1824/1824 files=1824 size=47.08GB |
> | runtime filters: RF000 -> a.ws_item_sk, RF001 -> a.ws_order_number |
> | table stats: 720000376 rows total |
> | column stats: all |
> | mem-estimate=80.00MB mem-reservation=0B |
> | tuple-ids=0 row-size=16B cardinality=720000376 |
> | |
> | F01:PLAN FRAGMENT [RANDOM] hosts=4 instances=4 |
> | DATASTREAM SINK [FRAGMENT=F02, EXCHANGE=05, HASH(b.ws_item_sk,b.ws_order_number)] |
> | | mem-estimate=0B mem-reservation=0B |
> | 01:SCAN HDFS [tpcds_1000_parquet.web_sales b, RANDOM] |
> | partitions=1824/1824 files=1824 size=47.08GB |
> | table stats: 720000376 rows total |
> | column stats: all |
> | mem-estimate=80.00MB mem-reservation=0B |
> | tuple-ids=1 row-size=16B cardinality=720000376 |
> +------------------------------------------------------------------------------------------+
> {noformat}
> It looks like the scans are going to be on 4 hosts, but actually, after running the query:
> {noformat}
> summary;
> +-----------------+--------+----------+----------+---------+------------+-----------+---------------+--------------------------------------+
> | Operator | #Hosts | Avg Time | Max Time | #Rows | Est. #Rows | Peak Mem | Est. Peak Mem | Detail |
> +-----------------+--------+----------+----------+---------+------------+-----------+---------------+--------------------------------------+
> | 07:AGGREGATE | 1 | 0ns | 0ns | 1 | 1 | 28.00 KB | 10.00 MB | FINALIZE |
> | 06:EXCHANGE | 1 | 0ns | 0ns | 3 | 1 | 0 B | 0 B | UNPARTITIONED |
> | 03:AGGREGATE | 3 | 345.33ms | 378.00ms | 3 | 1 | 139.91 KB | 10.00 MB | |
> | 02:HASH JOIN | 3 | 90.39s | 97.03s | 720.00M | 720.00M | 2.57 GB | 2.95 GB | INNER JOIN, PARTITIONED |
> | |--05:EXCHANGE | 3 | 4.48s | 4.65s | 720.00M | 720.00M | 0 B | 0 B | HASH(b.ws_item_sk,b.ws_order_number) |
> | | 01:SCAN HDFS | 3 | 59.31s | 67.16s | 720.00M | 720.00M | 22.88 MB | 80.00 MB | tpcds_1000_parquet.web_sales b |
> | 04:EXCHANGE | 3 | 4.57s | 4.87s | 720.00M | 720.00M | 0 B | 0 B | HASH(a.ws_item_sk,a.ws_order_number) |
> | 00:SCAN HDFS | 3 | 21.09s | 22.68s | 720.00M | 720.00M | 23.45 MB | 80.00 MB | tpcds_1000_parquet.web_sales a |
> +-----------------+--------+----------+----------+---------+------------+-----------+---------------+--------------------------------------+
> {noformat}
> It looks to me like the distributed plan thinks the coordinator will scan, but the coordinator does not scan.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org