You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by "morningman (via GitHub)" <gi...@apache.org> on 2023/11/10 16:03:44 UTC

[PR] [opt](scanner) increase the connection num of s3 client [doris]

morningman opened a new pull request, #26795:
URL: https://github.com/apache/doris/pull/26795

   ## Proposed changes
   
   The s3 client on BE side may be shared by many threads, so the connection num of s3 client
   need to be large enough for all threads, otherwise, the parallelism will be low when scanning file on oss.
   
   ## Further comments
   
   If this is a relatively large or complex change, kick off the discussion at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why you chose the solution you did and what alternatives you considered, etc...
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


Re: [PR] [opt](scanner) increase the connection num of s3 client [doris]

Posted by "doris-robot (via GitHub)" <gi...@apache.org>.
doris-robot commented on PR #26795:
URL: https://github.com/apache/doris/pull/26795#issuecomment-1806661157

   (From new machine)TeamCity pipeline, clickbench performance test result:
    the sum of best hot time: 45.17 seconds
    stream load tsv:          559 seconds loaded 74807831229 Bytes, about 127 MB/s
    stream load json:         21 seconds loaded 2358488459 Bytes, about 107 MB/s
    stream load orc:          65 seconds loaded 1101869774 Bytes, about 16 MB/s
    stream load parquet:          32 seconds loaded 861443392 Bytes, about 25 MB/s
    insert into select:          29.1 seconds inserted 10000000 Rows, about 343K ops/s
    storage size: 17162504916 Bytes


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


Re: [PR] [opt](scanner) increase the connection num of s3 client [doris]

Posted by "xiaokang (via GitHub)" <gi...@apache.org>.
xiaokang merged PR #26795:
URL: https://github.com/apache/doris/pull/26795


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


Re: [PR] [opt](scanner) increase the connection num of s3 client [doris]

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #26795:
URL: https://github.com/apache/doris/pull/26795#issuecomment-1806718933

   PR approved by at least one committer and no changes requested.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


Re: [PR] [opt](scanner) increase the connection num of s3 client [doris]

Posted by "morningman (via GitHub)" <gi...@apache.org>.
morningman commented on PR #26795:
URL: https://github.com/apache/doris/pull/26795#issuecomment-1806011786

   run buildall


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


Re: [PR] [opt](scanner) increase the connection num of s3 client [doris]

Posted by "AshinGau (via GitHub)" <gi...@apache.org>.
AshinGau commented on PR #26795:
URL: https://github.com/apache/doris/pull/26795#issuecomment-1806718508

   LGTM


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


Re: [PR] [opt](scanner) increase the connection num of s3 client [doris]

Posted by "doris-robot (via GitHub)" <gi...@apache.org>.
doris-robot commented on PR #26795:
URL: https://github.com/apache/doris/pull/26795#issuecomment-1806988938

   TeamCity be ut coverage result:
    Function Coverage: 36.77% (8406/22860) 
    Line Coverage: 29.29% (68161/232710)
    Region Coverage: 27.92% (35241/126223)
    Branch Coverage: 24.74% (18025/72850)
    Coverage Report: http://coverage.selectdb-in.cc/coverage/6ea97c595eabd02c363374a390ed96dc9c02938a_6ea97c595eabd02c363374a390ed96dc9c02938a/report/index.html


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


Re: [PR] [opt](scanner) increase the connection num of s3 client [doris]

Posted by "morningman (via GitHub)" <gi...@apache.org>.
morningman commented on PR #26795:
URL: https://github.com/apache/doris/pull/26795#issuecomment-1806982867

   run buildall


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


Re: [PR] [opt](scanner) increase the connection num of s3 client [doris]

Posted by "doris-robot (via GitHub)" <gi...@apache.org>.
doris-robot commented on PR #26795:
URL: https://github.com/apache/doris/pull/26795#issuecomment-1806656794

   
   <details>
   <summary>TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'</summary>
   
   ```
   Tpch sf100 test result on commit db30ff405294de8ab2a7e1e58d44cae39432368d, data reload: false
   
   run tpch-sf100 query with default conf and session variables
   q1	5375	5143	5140	5140
   q2	377	216	198	198
   q3	2070	2026	2053	2026
   q4	1481	1433	1427	1427
   q5	4132	4181	4144	4144
   q6	259	132	137	132
   q7	2101	1598	1616	1598
   q8	2750	2747	2746	2746
   q9	10324	10426	10309	10309
   q10	3495	3577	3565	3565
   q11	373	262	255	255
   q12	456	318	302	302
   q13	4574	4189	4130	4130
   q14	329	295	294	294
   q15	645	585	571	571
   q16	700	621	597	597
   q17	1154	1075	1079	1075
   q18	7732	7288	7396	7288
   q19	1678	1698	1701	1698
   q20	582	374	355	355
   q21	4927	4604	4615	4604
   q22	523	429	431	429
   Total cold run time: 56037 ms
   Total hot run time: 52883 ms
   
   run tpch-sf100 query with default conf and set session variable runtime_filter_mode=off
   q1	5111	5052	5075	5052
   q2	388	270	239	239
   q3	4056	4029	4035	4029
   q4	2853	2810	2835	2810
   q5	6494	6389	6462	6389
   q6	244	130	129	129
   q7	3141	2712	2714	2712
   q8	4775	4753	4758	4753
   q9	17777	17636	17635	17635
   q10	4083	4149	4165	4149
   q11	741	675	636	636
   q12	1028	806	787	787
   q13	4317	3927	3963	3927
   q14	391	377	348	348
   q15	606	564	576	564
   q16	767	701	680	680
   q17	3879	3870	3862	3862
   q18	9542	9319	9381	9319
   q19	1891	1763	1786	1763
   q20	2430	2090	2045	2045
   q21	8860	8831	8725	8725
   q22	925	850	885	850
   Total cold run time: 84299 ms
   Total hot run time: 81403 ms
   ```
   </details>
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


Re: [PR] [opt](scanner) increase the connection num of s3 client [doris]

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on code in PR #26795:
URL: https://github.com/apache/doris/pull/26795#discussion_r1389606220


##########
be/src/vec/exec/scan/scanner_scheduler.cpp:
##########
@@ -112,13 +112,13 @@ Status ScannerScheduler::init(ExecEnv* env) {
                                    config::doris_scanner_thread_pool_queue_size, "local_scan"));
 
     // 3. remote scan thread pool
+    _remote_thread_pool_max_size = config::doris_max_remote_scanner_thread_pool_thread_num != -1
+                                           ? config::doris_max_remote_scanner_thread_pool_thread_num
+                                           : std::max(512, CpuInfo::num_cores() * 10);

Review Comment:
   warning: 512 is a magic number; consider replacing it with a named constant [readability-magic-numbers]
   ```cpp
                                              : std::max(512, CpuInfo::num_cores() * 10);
                                                         ^
   ```
   



##########
be/src/vec/exec/scan/scanner_scheduler.cpp:
##########
@@ -112,13 +112,13 @@
                                    config::doris_scanner_thread_pool_queue_size, "local_scan"));
 
     // 3. remote scan thread pool
+    _remote_thread_pool_max_size = config::doris_max_remote_scanner_thread_pool_thread_num != -1
+                                           ? config::doris_max_remote_scanner_thread_pool_thread_num
+                                           : std::max(512, CpuInfo::num_cores() * 10);

Review Comment:
   warning: 10 is a magic number; consider replacing it with a named constant [readability-magic-numbers]
   ```cpp
                                              : std::max(512, CpuInfo::num_cores() * 10);
                                                                                     ^
   ```
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


Re: [PR] [opt](scanner) increase the connection num of s3 client [doris]

Posted by "morningman (via GitHub)" <gi...@apache.org>.
morningman commented on PR #26795:
URL: https://github.com/apache/doris/pull/26795#issuecomment-1806639618

   run buildall


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


Re: [PR] [opt](scanner) increase the connection num of s3 client [doris]

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #26795:
URL: https://github.com/apache/doris/pull/26795#issuecomment-1807018794

   PR approved by at least one committer and no changes requested.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


Re: [PR] [opt](scanner) increase the connection num of s3 client [doris]

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #26795:
URL: https://github.com/apache/doris/pull/26795#issuecomment-1806718943

   PR approved by anyone and no changes requested.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


Re: [PR] [opt](scanner) increase the connection num of s3 client [doris]

Posted by "doris-robot (via GitHub)" <gi...@apache.org>.
doris-robot commented on PR #26795:
URL: https://github.com/apache/doris/pull/26795#issuecomment-1806992487

   
   <details>
   <summary>TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'</summary>
   
   ```
   Tpch sf100 test result on commit 6ea97c595eabd02c363374a390ed96dc9c02938a, data reload: false
   
   run tpch-sf100 query with default conf and session variables
   q1	5264	5079	5057	5057
   q2	376	178	201	178
   q3	2080	2097	2034	2034
   q4	1506	1455	1429	1429
   q5	4159	4111	4122	4111
   q6	255	138	134	134
   q7	2070	1626	1605	1605
   q8	2788	2755	2754	2754
   q9	10328	10301	10253	10253
   q10	3462	3569	3565	3565
   q11	374	255	258	255
   q12	462	299	296	296
   q13	4517	4120	4080	4080
   q14	323	292	289	289
   q15	616	575	567	567
   q16	704	632	593	593
   q17	1145	1114	1092	1092
   q18	7851	7354	7384	7354
   q19	1704	1713	1708	1708
   q20	584	380	374	374
   q21	4951	4609	4547	4547
   q22	543	433	424	424
   Total cold run time: 56062 ms
   Total hot run time: 52699 ms
   
   run tpch-sf100 query with default conf and set session variable runtime_filter_mode=off
   q1	5011	5064	5083	5064
   q2	335	250	240	240
   q3	3945	4011	3977	3977
   q4	2806	2758	2737	2737
   q5	6479	6484	6445	6445
   q6	248	128	130	128
   q7	3139	2731	2744	2731
   q8	4827	4747	4766	4747
   q9	17912	17623	17667	17623
   q10	4077	4137	4145	4137
   q11	771	692	643	643
   q12	988	790	819	790
   q13	4302	3868	3887	3868
   q14	385	342	356	342
   q15	618	584	590	584
   q16	784	703	698	698
   q17	3824	3926	3869	3869
   q18	9273	9260	9277	9260
   q19	1861	1777	1785	1777
   q20	2390	2056	2033	2033
   q21	8951	8703	8693	8693
   q22	938	832	856	832
   Total cold run time: 83864 ms
   Total hot run time: 81218 ms
   ```
   </details>
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


Re: [PR] [opt](scanner) increase the connection num of s3 client [doris]

Posted by "doris-robot (via GitHub)" <gi...@apache.org>.
doris-robot commented on PR #26795:
URL: https://github.com/apache/doris/pull/26795#issuecomment-1806994048

   (From new machine)TeamCity pipeline, clickbench performance test result:
    the sum of best hot time: 44.79 seconds
    stream load tsv:          556 seconds loaded 74807831229 Bytes, about 128 MB/s
    stream load json:         21 seconds loaded 2358488459 Bytes, about 107 MB/s
    stream load orc:          65 seconds loaded 1101869774 Bytes, about 16 MB/s
    stream load parquet:          33 seconds loaded 861443392 Bytes, about 24 MB/s
    insert into select:          28.8 seconds inserted 10000000 Rows, about 347K ops/s
    storage size: 17162227208 Bytes


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org