You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Quanlong Huang (JIRA)" <ji...@apache.org> on 2019/03/26 01:51:00 UTC

[jira] [Updated] (IMPALA-6031) Distributed plan describes coordinator-only nodes as scanning

     [ https://issues.apache.org/jira/browse/IMPALA-6031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Quanlong Huang updated IMPALA-6031:
-----------------------------------
    Fix Version/s: Impala 3.1.0
                   Impala 2.13.0

> Distributed plan describes coordinator-only nodes as scanning
> -------------------------------------------------------------
>
>                 Key: IMPALA-6031
>                 URL: https://issues.apache.org/jira/browse/IMPALA-6031
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Frontend
>    Affects Versions: Impala 2.11.0
>            Reporter: Jim Apple
>            Assignee: Pooja Nilangekar
>            Priority: Major
>             Fix For: Impala 2.13.0, Impala 3.1.0
>
>
> In a cluster with one coordinator-only node and three executor-only nodes:
> {noformat}
> Query: explain select count(*) from web_sales a, web_sales b where a.ws_order_number = b.ws_order_number and a.ws_item_sk = b.ws_item_sk
> +------------------------------------------------------------------------------------------+
> | Explain String                                                                           |
> +------------------------------------------------------------------------------------------+
> | Per-Host Resource Reservation: Memory=136.00MB                                           |
> | Per-Host Resource Estimates: Memory=3.04GB                                               |
> |                                                                                          |
> | F03:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1                                    |
> |   PLAN-ROOT SINK                                                                         |
> |   |  mem-estimate=0B mem-reservation=0B                                                  |
> |   |                                                                                      |
> |   07:AGGREGATE [FINALIZE]                                                                |
> |   |  output: count:merge(*)                                                              |
> |   |  mem-estimate=10.00MB mem-reservation=0B                                             |
> |   |  tuple-ids=2 row-size=8B cardinality=1                                               |
> |   |                                                                                      |
> |   06:EXCHANGE [UNPARTITIONED]                                                            |
> |      mem-estimate=0B mem-reservation=0B                                                  |
> |      tuple-ids=2 row-size=8B cardinality=1                                               |
> |                                                                                          |
> | F02:PLAN FRAGMENT [HASH(a.ws_item_sk,a.ws_order_number)] hosts=4 instances=4             |
> |   DATASTREAM SINK [FRAGMENT=F03, EXCHANGE=06, UNPARTITIONED]                             |
> |   |  mem-estimate=0B mem-reservation=0B                                                  |
> |   03:AGGREGATE                                                                           |
> |   |  output: count(*)                                                                    |
> |   |  mem-estimate=10.00MB mem-reservation=0B                                             |
> |   |  tuple-ids=2 row-size=8B cardinality=1                                               |
> |   |                                                                                      |
> |   02:HASH JOIN [INNER JOIN, PARTITIONED]                                                 |
> |   |  hash predicates: a.ws_item_sk = b.ws_item_sk, a.ws_order_number = b.ws_order_number |
> |   |  runtime filters: RF000 <- b.ws_item_sk, RF001 <- b.ws_order_number                  |
> |   |  mem-estimate=2.95GB mem-reservation=136.00MB                                        |
> |   |  tuple-ids=0,1 row-size=32B cardinality=720000376                                    |
> |   |                                                                                      |
> |   |--05:EXCHANGE [HASH(b.ws_item_sk,b.ws_order_number)]                                  |
> |   |     mem-estimate=0B mem-reservation=0B                                               |
> |   |     tuple-ids=1 row-size=16B cardinality=720000376                                   |
> |   |                                                                                      |
> |   04:EXCHANGE [HASH(a.ws_item_sk,a.ws_order_number)]                                     |
> |      mem-estimate=0B mem-reservation=0B                                                  |
> |      tuple-ids=0 row-size=16B cardinality=720000376                                      |
> |                                                                                          |
> | F00:PLAN FRAGMENT [RANDOM] hosts=4 instances=4                                           |
> |   DATASTREAM SINK [FRAGMENT=F02, EXCHANGE=04, HASH(a.ws_item_sk,a.ws_order_number)]      |
> |   |  mem-estimate=0B mem-reservation=0B                                                  |
> |   00:SCAN HDFS [tpcds_1000_parquet.web_sales a, RANDOM]                                  |
> |      partitions=1824/1824 files=1824 size=47.08GB                                        |
> |      runtime filters: RF000 -> a.ws_item_sk, RF001 -> a.ws_order_number                  |
> |      table stats: 720000376 rows total                                                   |
> |      column stats: all                                                                   |
> |      mem-estimate=80.00MB mem-reservation=0B                                             |
> |      tuple-ids=0 row-size=16B cardinality=720000376                                      |
> |                                                                                          |
> | F01:PLAN FRAGMENT [RANDOM] hosts=4 instances=4                                           |
> |   DATASTREAM SINK [FRAGMENT=F02, EXCHANGE=05, HASH(b.ws_item_sk,b.ws_order_number)]      |
> |   |  mem-estimate=0B mem-reservation=0B                                                  |
> |   01:SCAN HDFS [tpcds_1000_parquet.web_sales b, RANDOM]                                  |
> |      partitions=1824/1824 files=1824 size=47.08GB                                        |
> |      table stats: 720000376 rows total                                                   |
> |      column stats: all                                                                   |
> |      mem-estimate=80.00MB mem-reservation=0B                                             |
> |      tuple-ids=1 row-size=16B cardinality=720000376                                      |
> +------------------------------------------------------------------------------------------+
> {noformat}
> It looks like the scans are going to be on 4 hosts, but actually, after running the query:
> {noformat}
> summary;
> +-----------------+--------+----------+----------+---------+------------+-----------+---------------+--------------------------------------+
> | Operator        | #Hosts | Avg Time | Max Time | #Rows   | Est. #Rows | Peak Mem  | Est. Peak Mem | Detail                               |
> +-----------------+--------+----------+----------+---------+------------+-----------+---------------+--------------------------------------+
> | 07:AGGREGATE    | 1      | 0ns      | 0ns      | 1       | 1          | 28.00 KB  | 10.00 MB      | FINALIZE                             |
> | 06:EXCHANGE     | 1      | 0ns      | 0ns      | 3       | 1          | 0 B       | 0 B           | UNPARTITIONED                        |
> | 03:AGGREGATE    | 3      | 345.33ms | 378.00ms | 3       | 1          | 139.91 KB | 10.00 MB      |                                      |
> | 02:HASH JOIN    | 3      | 90.39s   | 97.03s   | 720.00M | 720.00M    | 2.57 GB   | 2.95 GB       | INNER JOIN, PARTITIONED              |
> | |--05:EXCHANGE  | 3      | 4.48s    | 4.65s    | 720.00M | 720.00M    | 0 B       | 0 B           | HASH(b.ws_item_sk,b.ws_order_number) |
> | |  01:SCAN HDFS | 3      | 59.31s   | 67.16s   | 720.00M | 720.00M    | 22.88 MB  | 80.00 MB      | tpcds_1000_parquet.web_sales b       |
> | 04:EXCHANGE     | 3      | 4.57s    | 4.87s    | 720.00M | 720.00M    | 0 B       | 0 B           | HASH(a.ws_item_sk,a.ws_order_number) |
> | 00:SCAN HDFS    | 3      | 21.09s   | 22.68s   | 720.00M | 720.00M    | 23.45 MB  | 80.00 MB      | tpcds_1000_parquet.web_sales a       |
> +-----------------+--------+----------+----------+---------+------------+-----------+---------------+--------------------------------------+
> {noformat}
> It looks to me like the distributed plan thinks the coordinator will scan, but the coordinator does not scan.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org