You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "YifanZhang (Jira)" <ji...@apache.org> on 2022/07/15 15:30:00 UTC

[jira] [Created] (IMPALA-11437) Set different parallelism for scan and non-scan fragments

YifanZhang created IMPALA-11437:
-----------------------------------

             Summary: Set different parallelism for scan and non-scan fragments
                 Key: IMPALA-11437
                 URL: https://issues.apache.org/jira/browse/IMPALA-11437
             Project: IMPALA
          Issue Type: Improvement
            Reporter: YifanZhang


Currently we can only set the same parallelism for all fragments in a query by setting 'mt_dop'. But sometimes we can get faster scanning at mt_dop=0 than at mt_dop>0, because the scanners are also multiple threaded when mt_dop=0.

 

In a one-node impala cluster we can get following results:
{code:java}
# set mt_dop=0;
+---------------------+--------+-------+----------+----------+-------+------------+----------+---------------+---------------------------------+
| Operator            | #Hosts | #Inst | Avg Time | Max Time | #Rows | Est. #Rows | Peak Mem | Est. Peak Mem | Detail                          |
+---------------------+--------+-------+----------+----------+-------+------------+----------+---------------+---------------------------------+
| F02:ROOT            | 1      | 1     | 0ns      | 0ns      |       |            | 4.01 MB  | 4.00 MB       |                                 |
| 05:MERGING-EXCHANGE | 1      | 1     | 0ns      | 0ns      | 4     | 6          | 16.00 KB | 16.00 KB      | UNPARTITIONED                   |
| F01:EXCHANGE SENDER | 1      | 1     | 0ns      | 0ns      |       |            | 1.05 KB  | 0 B           |                                 |
| 02:SORT             | 1      | 1     | 0ns      | 0ns      | 4     | 6          | 12.02 MB | 12.00 MB      |                                 |
| 04:AGGREGATE        | 1      | 1     | 0ns      | 0ns      | 4     | 6          | 2.10 MB  | 10.00 MB      | FINALIZE                        |
| 03:EXCHANGE         | 1      | 1     | 0ns      | 0ns      | 4     | 6          | 64.00 KB | 16.00 KB      | HASH(l_returnflag,l_linestatus) |
| F00:EXCHANGE SENDER | 1      | 1     | 0ns      | 0ns      |       |            | 768 B    | 0 B           |                                 |
| 01:AGGREGATE        | 1      | 1     | 688.00ms | 688.00ms | 4     | 6          | 2.21 MB  | 10.00 MB      | STREAMING                       |
| 00:SCAN HDFS        | 1      | 1     | 52.00ms  | 52.00ms  | 6.00M | 600.12K    | 88.31 MB | 560.00 MB     | tpch_parquet.lineitem           |
+---------------------+--------+-------+----------+----------+-------+------------+----------+---------------+---------------------------------+


# set mt_dop=3;
+---------------------+--------+-------+----------+----------+-------+------------+----------+---------------+---------------------------------+
| Operator            | #Hosts | #Inst | Avg Time | Max Time | #Rows | Est. #Rows | Peak Mem | Est. Peak Mem | Detail                          |
+---------------------+--------+-------+----------+----------+-------+------------+----------+---------------+---------------------------------+
| F02:ROOT            | 1      | 1     | 0ns      | 0ns      |       |            | 4.01 MB  | 4.00 MB       |                                 |
| 05:MERGING-EXCHANGE | 1      | 1     | 0ns      | 0ns      | 4     | 6          | 32.00 KB | 16.00 KB      | UNPARTITIONED                   |
| F01:EXCHANGE SENDER | 1      | 3     | 0ns      | 0ns      |       |            | 1.05 KB  | 0 B           |                                 |
| 02:SORT             | 1      | 3     | 0ns      | 0ns      | 4     | 6          | 12.02 MB | 12.00 MB      |                                 |
| 04:AGGREGATE        | 1      | 3     | 4.00ms   | 4.00ms   | 4     | 6          | 2.10 MB  | 10.00 MB      | FINALIZE                        |
| 03:EXCHANGE         | 1      | 3     | 0ns      | 0ns      | 12    | 6          | 32.00 KB | 16.00 KB      | HASH(l_returnflag,l_linestatus) |
| F00:EXCHANGE SENDER | 1      | 3     | 0ns      | 0ns      |       |            | 10.25 KB | 0 B           |                                 |
| 01:AGGREGATE        | 1      | 3     | 237.33ms | 272.00ms | 12    | 6          | 2.21 MB  | 10.00 MB      | STREAMING                       |
| 00:SCAN HDFS        | 1      | 3     | 224.00ms | 268.00ms | 6.00M | 600.12K    | 28.12 MB | 80.00 MB      | tpch_parquet.lineitem           |
+---------------------+--------+-------+----------+----------+-------+------------+----------+---------------+---------------------------------+{code}
It would be good if we can set the degree of parallelism for all fragments except for scan fragments, or we can set different parallelism for scan fragments and other fragments.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)