You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "YifanZhang (Jira)" <ji...@apache.org> on 2022/07/15 15:30:00 UTC
[jira] [Created] (IMPALA-11437) Set different parallelism for scan and non-scan fragments
YifanZhang created IMPALA-11437:
-----------------------------------
Summary: Set different parallelism for scan and non-scan fragments
Key: IMPALA-11437
URL: https://issues.apache.org/jira/browse/IMPALA-11437
Project: IMPALA
Issue Type: Improvement
Reporter: YifanZhang
Currently we can only set the same parallelism for all fragments in a query by setting 'mt_dop'. But sometimes we can get faster scanning at mt_dop=0 than at mt_dop>0, because the scanners are also multiple threaded when mt_dop=0.
In a one-node impala cluster we can get following results:
{code:java}
# set mt_dop=0;
+---------------------+--------+-------+----------+----------+-------+------------+----------+---------------+---------------------------------+
| Operator | #Hosts | #Inst | Avg Time | Max Time | #Rows | Est. #Rows | Peak Mem | Est. Peak Mem | Detail |
+---------------------+--------+-------+----------+----------+-------+------------+----------+---------------+---------------------------------+
| F02:ROOT | 1 | 1 | 0ns | 0ns | | | 4.01 MB | 4.00 MB | |
| 05:MERGING-EXCHANGE | 1 | 1 | 0ns | 0ns | 4 | 6 | 16.00 KB | 16.00 KB | UNPARTITIONED |
| F01:EXCHANGE SENDER | 1 | 1 | 0ns | 0ns | | | 1.05 KB | 0 B | |
| 02:SORT | 1 | 1 | 0ns | 0ns | 4 | 6 | 12.02 MB | 12.00 MB | |
| 04:AGGREGATE | 1 | 1 | 0ns | 0ns | 4 | 6 | 2.10 MB | 10.00 MB | FINALIZE |
| 03:EXCHANGE | 1 | 1 | 0ns | 0ns | 4 | 6 | 64.00 KB | 16.00 KB | HASH(l_returnflag,l_linestatus) |
| F00:EXCHANGE SENDER | 1 | 1 | 0ns | 0ns | | | 768 B | 0 B | |
| 01:AGGREGATE | 1 | 1 | 688.00ms | 688.00ms | 4 | 6 | 2.21 MB | 10.00 MB | STREAMING |
| 00:SCAN HDFS | 1 | 1 | 52.00ms | 52.00ms | 6.00M | 600.12K | 88.31 MB | 560.00 MB | tpch_parquet.lineitem |
+---------------------+--------+-------+----------+----------+-------+------------+----------+---------------+---------------------------------+
# set mt_dop=3;
+---------------------+--------+-------+----------+----------+-------+------------+----------+---------------+---------------------------------+
| Operator | #Hosts | #Inst | Avg Time | Max Time | #Rows | Est. #Rows | Peak Mem | Est. Peak Mem | Detail |
+---------------------+--------+-------+----------+----------+-------+------------+----------+---------------+---------------------------------+
| F02:ROOT | 1 | 1 | 0ns | 0ns | | | 4.01 MB | 4.00 MB | |
| 05:MERGING-EXCHANGE | 1 | 1 | 0ns | 0ns | 4 | 6 | 32.00 KB | 16.00 KB | UNPARTITIONED |
| F01:EXCHANGE SENDER | 1 | 3 | 0ns | 0ns | | | 1.05 KB | 0 B | |
| 02:SORT | 1 | 3 | 0ns | 0ns | 4 | 6 | 12.02 MB | 12.00 MB | |
| 04:AGGREGATE | 1 | 3 | 4.00ms | 4.00ms | 4 | 6 | 2.10 MB | 10.00 MB | FINALIZE |
| 03:EXCHANGE | 1 | 3 | 0ns | 0ns | 12 | 6 | 32.00 KB | 16.00 KB | HASH(l_returnflag,l_linestatus) |
| F00:EXCHANGE SENDER | 1 | 3 | 0ns | 0ns | | | 10.25 KB | 0 B | |
| 01:AGGREGATE | 1 | 3 | 237.33ms | 272.00ms | 12 | 6 | 2.21 MB | 10.00 MB | STREAMING |
| 00:SCAN HDFS | 1 | 3 | 224.00ms | 268.00ms | 6.00M | 600.12K | 28.12 MB | 80.00 MB | tpch_parquet.lineitem |
+---------------------+--------+-------+----------+----------+-------+------------+----------+---------------+---------------------------------+{code}
It would be good if we can set the degree of parallelism for all fragments except for scan fragments, or we can set different parallelism for scan fragments and other fragments.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)