You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by "morningman (via GitHub)" <gi...@apache.org> on 2023/11/10 16:03:44 UTC
[PR] [opt](scanner) increase the connection num of s3 client [doris]
morningman opened a new pull request, #26795:
URL: https://github.com/apache/doris/pull/26795
## Proposed changes
The s3 client on BE side may be shared by many threads, so the connection num of s3 client
need to be large enough for all threads, otherwise, the parallelism will be low when scanning file on oss.
## Further comments
If this is a relatively large or complex change, kick off the discussion at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why you chose the solution you did and what alternatives you considered, etc...
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org
Re: [PR] [opt](scanner) increase the connection num of s3 client [doris]
Posted by "doris-robot (via GitHub)" <gi...@apache.org>.
doris-robot commented on PR #26795:
URL: https://github.com/apache/doris/pull/26795#issuecomment-1806661157
(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 45.17 seconds
stream load tsv: 559 seconds loaded 74807831229 Bytes, about 127 MB/s
stream load json: 21 seconds loaded 2358488459 Bytes, about 107 MB/s
stream load orc: 65 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 29.1 seconds inserted 10000000 Rows, about 343K ops/s
storage size: 17162504916 Bytes
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org
Re: [PR] [opt](scanner) increase the connection num of s3 client [doris]
Posted by "xiaokang (via GitHub)" <gi...@apache.org>.
xiaokang merged PR #26795:
URL: https://github.com/apache/doris/pull/26795
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org
Re: [PR] [opt](scanner) increase the connection num of s3 client [doris]
Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #26795:
URL: https://github.com/apache/doris/pull/26795#issuecomment-1806718933
PR approved by at least one committer and no changes requested.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org
Re: [PR] [opt](scanner) increase the connection num of s3 client [doris]
Posted by "morningman (via GitHub)" <gi...@apache.org>.
morningman commented on PR #26795:
URL: https://github.com/apache/doris/pull/26795#issuecomment-1806011786
run buildall
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org
Re: [PR] [opt](scanner) increase the connection num of s3 client [doris]
Posted by "AshinGau (via GitHub)" <gi...@apache.org>.
AshinGau commented on PR #26795:
URL: https://github.com/apache/doris/pull/26795#issuecomment-1806718508
LGTM
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org
Re: [PR] [opt](scanner) increase the connection num of s3 client [doris]
Posted by "doris-robot (via GitHub)" <gi...@apache.org>.
doris-robot commented on PR #26795:
URL: https://github.com/apache/doris/pull/26795#issuecomment-1806988938
TeamCity be ut coverage result:
Function Coverage: 36.77% (8406/22860)
Line Coverage: 29.29% (68161/232710)
Region Coverage: 27.92% (35241/126223)
Branch Coverage: 24.74% (18025/72850)
Coverage Report: http://coverage.selectdb-in.cc/coverage/6ea97c595eabd02c363374a390ed96dc9c02938a_6ea97c595eabd02c363374a390ed96dc9c02938a/report/index.html
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org
Re: [PR] [opt](scanner) increase the connection num of s3 client [doris]
Posted by "morningman (via GitHub)" <gi...@apache.org>.
morningman commented on PR #26795:
URL: https://github.com/apache/doris/pull/26795#issuecomment-1806982867
run buildall
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org
Re: [PR] [opt](scanner) increase the connection num of s3 client [doris]
Posted by "doris-robot (via GitHub)" <gi...@apache.org>.
doris-robot commented on PR #26795:
URL: https://github.com/apache/doris/pull/26795#issuecomment-1806656794
<details>
<summary>TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'</summary>
```
Tpch sf100 test result on commit db30ff405294de8ab2a7e1e58d44cae39432368d, data reload: false
run tpch-sf100 query with default conf and session variables
q1 5375 5143 5140 5140
q2 377 216 198 198
q3 2070 2026 2053 2026
q4 1481 1433 1427 1427
q5 4132 4181 4144 4144
q6 259 132 137 132
q7 2101 1598 1616 1598
q8 2750 2747 2746 2746
q9 10324 10426 10309 10309
q10 3495 3577 3565 3565
q11 373 262 255 255
q12 456 318 302 302
q13 4574 4189 4130 4130
q14 329 295 294 294
q15 645 585 571 571
q16 700 621 597 597
q17 1154 1075 1079 1075
q18 7732 7288 7396 7288
q19 1678 1698 1701 1698
q20 582 374 355 355
q21 4927 4604 4615 4604
q22 523 429 431 429
Total cold run time: 56037 ms
Total hot run time: 52883 ms
run tpch-sf100 query with default conf and set session variable runtime_filter_mode=off
q1 5111 5052 5075 5052
q2 388 270 239 239
q3 4056 4029 4035 4029
q4 2853 2810 2835 2810
q5 6494 6389 6462 6389
q6 244 130 129 129
q7 3141 2712 2714 2712
q8 4775 4753 4758 4753
q9 17777 17636 17635 17635
q10 4083 4149 4165 4149
q11 741 675 636 636
q12 1028 806 787 787
q13 4317 3927 3963 3927
q14 391 377 348 348
q15 606 564 576 564
q16 767 701 680 680
q17 3879 3870 3862 3862
q18 9542 9319 9381 9319
q19 1891 1763 1786 1763
q20 2430 2090 2045 2045
q21 8860 8831 8725 8725
q22 925 850 885 850
Total cold run time: 84299 ms
Total hot run time: 81403 ms
```
</details>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org
Re: [PR] [opt](scanner) increase the connection num of s3 client [doris]
Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on code in PR #26795:
URL: https://github.com/apache/doris/pull/26795#discussion_r1389606220
##########
be/src/vec/exec/scan/scanner_scheduler.cpp:
##########
@@ -112,13 +112,13 @@ Status ScannerScheduler::init(ExecEnv* env) {
config::doris_scanner_thread_pool_queue_size, "local_scan"));
// 3. remote scan thread pool
+ _remote_thread_pool_max_size = config::doris_max_remote_scanner_thread_pool_thread_num != -1
+ ? config::doris_max_remote_scanner_thread_pool_thread_num
+ : std::max(512, CpuInfo::num_cores() * 10);
Review Comment:
warning: 512 is a magic number; consider replacing it with a named constant [readability-magic-numbers]
```cpp
: std::max(512, CpuInfo::num_cores() * 10);
^
```
##########
be/src/vec/exec/scan/scanner_scheduler.cpp:
##########
@@ -112,13 +112,13 @@
config::doris_scanner_thread_pool_queue_size, "local_scan"));
// 3. remote scan thread pool
+ _remote_thread_pool_max_size = config::doris_max_remote_scanner_thread_pool_thread_num != -1
+ ? config::doris_max_remote_scanner_thread_pool_thread_num
+ : std::max(512, CpuInfo::num_cores() * 10);
Review Comment:
warning: 10 is a magic number; consider replacing it with a named constant [readability-magic-numbers]
```cpp
: std::max(512, CpuInfo::num_cores() * 10);
^
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org
Re: [PR] [opt](scanner) increase the connection num of s3 client [doris]
Posted by "morningman (via GitHub)" <gi...@apache.org>.
morningman commented on PR #26795:
URL: https://github.com/apache/doris/pull/26795#issuecomment-1806639618
run buildall
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org
Re: [PR] [opt](scanner) increase the connection num of s3 client [doris]
Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #26795:
URL: https://github.com/apache/doris/pull/26795#issuecomment-1807018794
PR approved by at least one committer and no changes requested.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org
Re: [PR] [opt](scanner) increase the connection num of s3 client [doris]
Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #26795:
URL: https://github.com/apache/doris/pull/26795#issuecomment-1806718943
PR approved by anyone and no changes requested.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org
Re: [PR] [opt](scanner) increase the connection num of s3 client [doris]
Posted by "doris-robot (via GitHub)" <gi...@apache.org>.
doris-robot commented on PR #26795:
URL: https://github.com/apache/doris/pull/26795#issuecomment-1806992487
<details>
<summary>TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'</summary>
```
Tpch sf100 test result on commit 6ea97c595eabd02c363374a390ed96dc9c02938a, data reload: false
run tpch-sf100 query with default conf and session variables
q1 5264 5079 5057 5057
q2 376 178 201 178
q3 2080 2097 2034 2034
q4 1506 1455 1429 1429
q5 4159 4111 4122 4111
q6 255 138 134 134
q7 2070 1626 1605 1605
q8 2788 2755 2754 2754
q9 10328 10301 10253 10253
q10 3462 3569 3565 3565
q11 374 255 258 255
q12 462 299 296 296
q13 4517 4120 4080 4080
q14 323 292 289 289
q15 616 575 567 567
q16 704 632 593 593
q17 1145 1114 1092 1092
q18 7851 7354 7384 7354
q19 1704 1713 1708 1708
q20 584 380 374 374
q21 4951 4609 4547 4547
q22 543 433 424 424
Total cold run time: 56062 ms
Total hot run time: 52699 ms
run tpch-sf100 query with default conf and set session variable runtime_filter_mode=off
q1 5011 5064 5083 5064
q2 335 250 240 240
q3 3945 4011 3977 3977
q4 2806 2758 2737 2737
q5 6479 6484 6445 6445
q6 248 128 130 128
q7 3139 2731 2744 2731
q8 4827 4747 4766 4747
q9 17912 17623 17667 17623
q10 4077 4137 4145 4137
q11 771 692 643 643
q12 988 790 819 790
q13 4302 3868 3887 3868
q14 385 342 356 342
q15 618 584 590 584
q16 784 703 698 698
q17 3824 3926 3869 3869
q18 9273 9260 9277 9260
q19 1861 1777 1785 1777
q20 2390 2056 2033 2033
q21 8951 8703 8693 8693
q22 938 832 856 832
Total cold run time: 83864 ms
Total hot run time: 81218 ms
```
</details>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org
Re: [PR] [opt](scanner) increase the connection num of s3 client [doris]
Posted by "doris-robot (via GitHub)" <gi...@apache.org>.
doris-robot commented on PR #26795:
URL: https://github.com/apache/doris/pull/26795#issuecomment-1806994048
(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 44.79 seconds
stream load tsv: 556 seconds loaded 74807831229 Bytes, about 128 MB/s
stream load json: 21 seconds loaded 2358488459 Bytes, about 107 MB/s
stream load orc: 65 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 33 seconds loaded 861443392 Bytes, about 24 MB/s
insert into select: 28.8 seconds inserted 10000000 Rows, about 347K ops/s
storage size: 17162227208 Bytes
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org