You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Tim Armstrong (Jira)" <ji...@apache.org> on 2020/10/31 00:51:00 UTC

[jira] [Assigned] (IMPALA-8081) Avoid over-parallelizing queries when there are small input splits

     [ https://issues.apache.org/jira/browse/IMPALA-8081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tim Armstrong reassigned IMPALA-8081:
-------------------------------------

    Assignee: Yida Wu

> Avoid over-parallelizing queries when there are small input splits
> ------------------------------------------------------------------
>
>                 Key: IMPALA-8081
>                 URL: https://issues.apache.org/jira/browse/IMPALA-8081
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Backend
>            Reporter: Janaki Lahorani
>            Assignee: Yida Wu
>            Priority: Major
>              Labels: multithreading
>
> Currently we maximise parallelism given the number of input splits available. This is often a good decision, unless there are very many small input splits, particularly small files. We could avoid this pathological behaviour by having a minimum threshold of input bytes per instance (this is still pretty crude, since file input bytes only correlates loosely with the amount of work required).
> An example:
> {noformat}
> [localhost.EXAMPLE.COM:21050] default> show files in functional.alltypes;
> Query: show files in functional.alltypes
> +-------------------------------------------------------------------------------+---------+--------------------+
> | Path                                                                          | Size    | Partition          |
> +-------------------------------------------------------------------------------+---------+--------------------+
> | hdfs://172.19.0.1:20500/test-warehouse/alltypes/year=2009/month=1/090101.txt  | 19.95KB | year=2009/month=1  |
> | hdfs://172.19.0.1:20500/test-warehouse/alltypes/year=2009/month=2/090201.txt  | 18.12KB | year=2009/month=2  |
> | hdfs://172.19.0.1:20500/test-warehouse/alltypes/year=2009/month=3/090301.txt  | 20.06KB | year=2009/month=3  |
> | hdfs://172.19.0.1:20500/test-warehouse/alltypes/year=2009/month=4/090401.txt  | 19.61KB | year=2009/month=4  |
> | hdfs://172.19.0.1:20500/test-warehouse/alltypes/year=2009/month=5/090501.txt  | 20.36KB | year=2009/month=5  |
> | hdfs://172.19.0.1:20500/test-warehouse/alltypes/year=2009/month=6/090601.txt  | 19.71KB | year=2009/month=6  |
> | hdfs://172.19.0.1:20500/test-warehouse/alltypes/year=2009/month=7/090701.txt  | 20.36KB | year=2009/month=7  |
> | hdfs://172.19.0.1:20500/test-warehouse/alltypes/year=2009/month=8/090801.txt  | 20.36KB | year=2009/month=8  |
> | hdfs://172.19.0.1:20500/test-warehouse/alltypes/year=2009/month=9/090901.txt  | 19.71KB | year=2009/month=9  |
> | hdfs://172.19.0.1:20500/test-warehouse/alltypes/year=2009/month=10/091001.txt | 20.36KB | year=2009/month=10 |
> | hdfs://172.19.0.1:20500/test-warehouse/alltypes/year=2009/month=11/091101.txt | 19.71KB | year=2009/month=11 |
> | hdfs://172.19.0.1:20500/test-warehouse/alltypes/year=2009/month=12/091201.txt | 20.36KB | year=2009/month=12 |
> | hdfs://172.19.0.1:20500/test-warehouse/alltypes/year=2010/month=1/100101.txt  | 20.36KB | year=2010/month=1  |
> | hdfs://172.19.0.1:20500/test-warehouse/alltypes/year=2010/month=2/100201.txt  | 18.39KB | year=2010/month=2  |
> | hdfs://172.19.0.1:20500/test-warehouse/alltypes/year=2010/month=3/100301.txt  | 20.36KB | year=2010/month=3  |
> | hdfs://172.19.0.1:20500/test-warehouse/alltypes/year=2010/month=4/100401.txt  | 19.71KB | year=2010/month=4  |
> | hdfs://172.19.0.1:20500/test-warehouse/alltypes/year=2010/month=5/100501.txt  | 20.36KB | year=2010/month=5  |
> | hdfs://172.19.0.1:20500/test-warehouse/alltypes/year=2010/month=6/100601.txt  | 19.71KB | year=2010/month=6  |
> | hdfs://172.19.0.1:20500/test-warehouse/alltypes/year=2010/month=7/100701.txt  | 20.36KB | year=2010/month=7  |
> | hdfs://172.19.0.1:20500/test-warehouse/alltypes/year=2010/month=8/100801.txt  | 20.36KB | year=2010/month=8  |
> | hdfs://172.19.0.1:20500/test-warehouse/alltypes/year=2010/month=9/100901.txt  | 19.71KB | year=2010/month=9  |
> | hdfs://172.19.0.1:20500/test-warehouse/alltypes/year=2010/month=10/101001.txt | 20.36KB | year=2010/month=10 |
> | hdfs://172.19.0.1:20500/test-warehouse/alltypes/year=2010/month=11/101101.txt | 19.71KB | year=2010/month=11 |
> | hdfs://172.19.0.1:20500/test-warehouse/alltypes/year=2010/month=12/101201.txt | 20.36KB | year=2010/month=12 |
> +-------------------------------------------------------------------------------+---------+--------------------+
> [localhost:21000] default> set mt_dop=8; select count(*) from functional.alltypes; summary;
> MT_DOP set to 8
> Query: select count(*) from functional.alltypes
> Query submitted at: 2020-10-30 17:47:26 (Coordinator: http://tarmstrong-box:25000)
> Query progress can be monitored at: http://tarmstrong-box:25000/query_plan?query_id=d242927a09d39968:3ae0ecec00000000
> +----------+
> | count(*) |
> +----------+
> | 7300     |
> +----------+
> Fetched 1 row(s) in 0.19s
> +---------------------+--------+----------+----------+-------+------------+-----------+---------------+---------------------+
> | Operator            | #Hosts | Avg Time | Max Time | #Rows | Est. #Rows | Peak Mem  | Est. Peak Mem | Detail              |
> +---------------------+--------+----------+----------+-------+------------+-----------+---------------+---------------------+
> | F01:ROOT            | 1      | 242.85us | 242.85us |       |            | 0 B       | 0 B           |                     |
> | 03:AGGREGATE        | 1      | 2.51ms   | 2.51ms   | 1     | 1          | 16.00 KB  | 10.00 MB      | FINALIZE            |
> | 02:EXCHANGE         | 1      | 616.31us | 616.31us | 24    | 1          | 240.00 KB | 16.00 KB      | UNPARTITIONED       |
> | F00:EXCHANGE SENDER | 24     | 1.34ms   | 2.66ms   |       |            | 16.00 KB  | 0 B           |                     |
> | 01:AGGREGATE        | 24     | 1.52ms   | 2.13ms   | 24    | 1          | 16.00 KB  | 10.00 MB      |                     |
> | 00:SCAN HDFS        | 24     | 32.76ms  | 35.75ms  | 7.30K | 7.30K      | 32.00 KB  | 16.00 MB      | functional.alltypes |
> +---------------------+--------+----------+----------+-------+------------+-----------+---------------+---------------------+
> {noformat}
> In this example, we create 8 instances per impala daemon to scan a tiny amount of data each. We would be better off, typically, in creating fewer instances to avoid the overhead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org