You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "YifanZhang (Jira)" <ji...@apache.org> on 2022/07/20 03:50:00 UTC
[jira] [Comment Edited] (IMPALA-11437) Set different parallelism for scan and non-scan fragments
[ https://issues.apache.org/jira/browse/IMPALA-11437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17568808#comment-17568808 ]
YifanZhang edited comment on IMPALA-11437 at 7/20/22 3:49 AM:
--------------------------------------------------------------
[~stigahuang] [~rizaon] Thanks for your replies! I've attached profile of these queries.
The AverageScannerThreadConcurrency is (mean=12.43 min=9.80 max=16.00) when using MT_DOP=0, and the result of setting MT_DOP=16 is:
{code:java}
+---------------------+--------+-------+----------+----------+---------+------------+-----------+---------------+---------------------------------+
| Operator | #Hosts | #Inst | Avg Time | Max Time | #Rows | Est. #Rows | Peak Mem | Est. Peak Mem | Detail |
+---------------------+--------+-------+----------+----------+---------+------------+-----------+---------------+---------------------------------+
| F02:ROOT | 1 | 1 | 55.99us | 55.99us | | | 0 B | 0 B | |
| 05:MERGING-EXCHANGE | 1 | 1 | 177.51us | 177.51us | 4 | 72.25M | 64.00 KB | 617.73 MB | UNPARTITIONED |
| F01:EXCHANGE SENDER | 14 | 216 | 54.97us | 258.11us | | | 1.05 KB | 0 B | |
| 02:SORT | 14 | 216 | 277.66us | 596.34us | 4 | 72.25M | 12.02 MB | 36.91 MB | |
| 04:AGGREGATE | 14 | 216 | 3.20ms | 4.91ms | 4 | 72.25M | 34.04 MB | 128.00 MB | FINALIZE |
| 03:EXCHANGE | 14 | 216 | 24.02us | 774.71us | 864 | 72.25M | 1.70 MB | 37.12 MB | HASH(l_returnflag,l_linestatus) |
| F00:EXCHANGE SENDER | 14 | 216 | 3.31ms | 4.39ms | | | 178.00 KB | 0 B | |
| 01:AGGREGATE | 14 | 216 | 135.76ms | 301.63ms | 864 | 72.25M | 34.28 MB | 128.00 MB | STREAMING |
| 00:SCAN HDFS | 14 | 216 | 291.84ms | 621.23ms | 600.04M | 72.25M | 43.22 MB | 88.00 MB | tpch_parquet_100g.lineitem |
+---------------------+--------+-------+----------+----------+---------+------------+-----------+---------------+---------------------------------+ {code}
was (Author: zhangyifan27):
[~stigahuang] [~rizaon] Thanks for you replies! I've attached profile of these queries.
The AverageScannerThreadConcurrency is (mean=12.43 min=9.80 max=16.00) when using MT_DOP=0, and the result of setting MT_DOP=16 is:
{code:java}
+---------------------+--------+-------+----------+----------+---------+------------+-----------+---------------+---------------------------------+
| Operator | #Hosts | #Inst | Avg Time | Max Time | #Rows | Est. #Rows | Peak Mem | Est. Peak Mem | Detail |
+---------------------+--------+-------+----------+----------+---------+------------+-----------+---------------+---------------------------------+
| F02:ROOT | 1 | 1 | 55.99us | 55.99us | | | 0 B | 0 B | |
| 05:MERGING-EXCHANGE | 1 | 1 | 177.51us | 177.51us | 4 | 72.25M | 64.00 KB | 617.73 MB | UNPARTITIONED |
| F01:EXCHANGE SENDER | 14 | 216 | 54.97us | 258.11us | | | 1.05 KB | 0 B | |
| 02:SORT | 14 | 216 | 277.66us | 596.34us | 4 | 72.25M | 12.02 MB | 36.91 MB | |
| 04:AGGREGATE | 14 | 216 | 3.20ms | 4.91ms | 4 | 72.25M | 34.04 MB | 128.00 MB | FINALIZE |
| 03:EXCHANGE | 14 | 216 | 24.02us | 774.71us | 864 | 72.25M | 1.70 MB | 37.12 MB | HASH(l_returnflag,l_linestatus) |
| F00:EXCHANGE SENDER | 14 | 216 | 3.31ms | 4.39ms | | | 178.00 KB | 0 B | |
| 01:AGGREGATE | 14 | 216 | 135.76ms | 301.63ms | 864 | 72.25M | 34.28 MB | 128.00 MB | STREAMING |
| 00:SCAN HDFS | 14 | 216 | 291.84ms | 621.23ms | 600.04M | 72.25M | 43.22 MB | 88.00 MB | tpch_parquet_100g.lineitem |
+---------------------+--------+-------+----------+----------+---------+------------+-----------+---------------+---------------------------------+ {code}
> Set different parallelism for scan and non-scan fragments
> ---------------------------------------------------------
>
> Key: IMPALA-11437
> URL: https://issues.apache.org/jira/browse/IMPALA-11437
> Project: IMPALA
> Issue Type: Improvement
> Reporter: YifanZhang
> Priority: Major
> Attachments: mt_0_profile_95489998f125f0cb_a597643500000000.txt, mt_16_profile_814795acc2559628_34229cde00000000.txt, mt_2_profile_764cf3010a20e642_72f1316000000000.txt
>
>
> Currently we can only set the same parallelism for all fragments in a query by setting 'mt_dop'. But sometimes we can get faster scanning at mt_dop=0 than at mt_dop>0, because the scanners are also multiple threaded when mt_dop=0.
>
> In a 15 node impala cluster we can get following results:
> {code:java}
> # set mt_dop=0;
> +---------------------+--------+-------+----------+----------+---------+------------+-----------+---------------+---------------------------------+
> | Operator | #Hosts | #Inst | Avg Time | Max Time | #Rows | Est. #Rows | Peak Mem | Est. Peak Mem | Detail |
> +---------------------+--------+-------+----------+----------+---------+------------+-----------+---------------+---------------------------------+
> | F02:ROOT | 1 | 1 | 43.64us | 43.64us | | | 0 B | 0 B | |
> | 05:MERGING-EXCHANGE | 1 | 1 | 38.90us | 38.90us | 4 | 72.25M | 64.00 KB | 141.70 MB | UNPARTITIONED |
> | F01:EXCHANGE SENDER | 14 | 14 | 43.51us | 64.53us | | | 1.05 KB | 0 B | |
> | 02:SORT | 14 | 14 | 147.32us | 210.49us | 4 | 72.25M | 12.02 MB | 590.60 MB | |
> | 04:AGGREGATE | 14 | 14 | 1.39ms | 1.65ms | 4 | 72.25M | 34.04 MB | 128.00 MB | FINALIZE |
> | 03:EXCHANGE | 14 | 14 | 33.03us | 126.70us | 56 | 72.25M | 120.00 KB | 11.70 MB | HASH(l_returnflag,l_linestatus) |
> | F00:EXCHANGE SENDER | 14 | 14 | 187.56us | 214.18us | | | 26.50 KB | 0 B | |
> | 01:AGGREGATE | 14 | 14 | 2.20s | 2.44s | 56 | 72.25M | 34.28 MB | 128.00 MB | STREAMING |
> | 00:SCAN HDFS | 14 | 14 | 163.41ms | 190.82ms | 600.04M | 72.25M | 683.66 MB | 616.00 MB | tpch_parquet_100g.lineitem |
> +---------------------+--------+-------+----------+----------+---------+------------+-----------+---------------+---------------------------------+
>
> # set mt_dop=2;
> +---------------------+--------+-------+----------+----------+---------+------------+-----------+---------------+---------------------------------+
> | Operator | #Hosts | #Inst | Avg Time | Max Time | #Rows | Est. #Rows | Peak Mem | Est. Peak Mem | Detail |
> +---------------------+--------+-------+----------+----------+---------+------------+-----------+---------------+---------------------------------+
> | F02:ROOT | 1 | 1 | 32.11us | 32.11us | | | 0 B | 0 B | |
> | 05:MERGING-EXCHANGE | 1 | 1 | 45.53us | 45.53us | 4 | 72.25M | 64.00 KB | 283.39 MB | UNPARTITIONED |
> | F01:EXCHANGE SENDER | 14 | 28 | 42.33us | 86.27us | | | 1.05 KB | 0 B | |
> | 02:SORT | 14 | 28 | 168.09us | 257.61us | 4 | 72.25M | 12.02 MB | 295.30 MB | |
> | 04:AGGREGATE | 14 | 28 | 1.49ms | 1.77ms | 4 | 72.25M | 34.04 MB | 128.00 MB | FINALIZE |
> | 03:EXCHANGE | 14 | 28 | 26.13us | 134.21us | 112 | 72.25M | 232.00 KB | 13.39 MB | HASH(l_returnflag,l_linestatus) |
> | F00:EXCHANGE SENDER | 14 | 28 | 287.99us | 476.30us | | | 37.00 KB | 0 B | |
> | 01:AGGREGATE | 14 | 28 | 1.03s | 1.18s | 112 | 72.25M | 34.28 MB | 128.00 MB | STREAMING |
> | 00:SCAN HDFS | 14 | 28 | 2.09s | 2.40s | 600.04M | 72.25M | 43.22 MB | 88.00 MB | tpch_parquet_100g.lineitem |
> +---------------------+--------+-------+----------+----------+---------+------------+-----------+---------------+---------------------------------+ {code}
>
> It would be good if we can set the degree of parallelism for all fragments except for scan fragments, or we can set different parallelism for scan fragments and other fragments.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org