You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/05/28 03:47:36 UTC
[GitHub] [arrow] westonpace opened a new pull request #10421: ARROW-12903: [C++] Create new thread pool benchmark demonstrating the "scheduling" bottleneck
westonpace opened a new pull request #10421:
URL: https://github.com/apache/arrow/pull/10421
See JIRA for rationale (I also added a comment to the benchmark itself)
Even without any changes to the thread pools you should be able to see some speedup from ideal scheduling. With the changes being introduced in the work stealing PRs the speedup is even more stark.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] westonpace commented on pull request #10421: ARROW-12903: [C++] Create new thread pool benchmark demonstrating the "scheduling" bottleneck
Posted by GitBox <gi...@apache.org>.
westonpace commented on pull request #10421:
URL: https://github.com/apache/arrow/pull/10421#issuecomment-881178238
Closing for now to keep PR queue clean. Will reopen when we revisit work scheduling.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #10421: ARROW-12903: [C++] Create new thread pool benchmark demonstrating the "scheduling" bottleneck
Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #10421:
URL: https://github.com/apache/arrow/pull/10421#issuecomment-850090015
https://issues.apache.org/jira/browse/ARROW-12903
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] westonpace commented on pull request #10421: ARROW-12903: [C++] Create new thread pool benchmark demonstrating the "scheduling" bottleneck
Posted by GitBox <gi...@apache.org>.
westonpace commented on pull request #10421:
URL: https://github.com/apache/arrow/pull/10421#issuecomment-852682551
Just adding the benchmark...
```
ThreadPoolSpawn/threads:1/task_cost:1000/real_time 104576026 ns 39527736 ns 7 items_per_second=1.91249M/s
ThreadPoolSpawn/threads:2/task_cost:1000/real_time 81736943 ns 69631881 ns 8 items_per_second=2.44689M/s
ThreadPoolSpawn/threads:4/task_cost:1000/real_time 395577537 ns 337000146 ns 2 items_per_second=505.592k/s
ThreadPoolSpawn/threads:8/task_cost:1000/real_time 326650393 ns 290524204 ns 2 items_per_second=612.278k/s
ThreadPoolSpawn/threads:1/task_cost:10000/real_time 81345849 ns 2355243 ns 8 items_per_second=245.876k/s
ThreadPoolSpawn/threads:2/task_cost:10000/real_time 43399109 ns 2694481 ns 16 items_per_second=460.862k/s
ThreadPoolSpawn/threads:4/task_cost:10000/real_time 22975768 ns 3457033 ns 31 items_per_second=870.526k/s
ThreadPoolSpawn/threads:8/task_cost:10000/real_time 21717680 ns 14331911 ns 37 items_per_second=920.955k/s
ThreadPoolSpawn/threads:1/task_cost:100000/real_time 81517332 ns 270745 ns 8 items_per_second=24.5469k/s
ThreadPoolSpawn/threads:2/task_cost:100000/real_time 41615534 ns 281452 ns 17 items_per_second=48.083k/s
ThreadPoolSpawn/threads:4/task_cost:100000/real_time 21324149 ns 316989 ns 33 items_per_second=93.8373k/s
ThreadPoolSpawn/threads:8/task_cost:100000/real_time 12702954 ns 443910 ns 55 items_per_second=157.522k/s
ThreadPoolIdealSpawn/threads:1/task_cost:1000/real_time 107253606 ns 91704 ns 6 items_per_second=1.86475M/s
ThreadPoolIdealSpawn/threads:2/task_cost:1000/real_time 88176114 ns 117639 ns 8 items_per_second=2.2682M/s
ThreadPoolIdealSpawn/threads:4/task_cost:1000/real_time 86717904 ns 107266 ns 8 items_per_second=2.30634M/s
ThreadPoolIdealSpawn/threads:8/task_cost:1000/real_time 98762117 ns 209733 ns 7 items_per_second=2.02508M/s
ThreadPoolIdealSpawn/threads:1/task_cost:10000/real_time 84566727 ns 64819 ns 8 items_per_second=236.511k/s
ThreadPoolIdealSpawn/threads:2/task_cost:10000/real_time 46885981 ns 70276 ns 15 items_per_second=426.588k/s
ThreadPoolIdealSpawn/threads:4/task_cost:10000/real_time 27807860 ns 94742 ns 26 items_per_second=719.257k/s
ThreadPoolIdealSpawn/threads:8/task_cost:10000/real_time 16994645 ns 148328 ns 41 items_per_second=1.1769M/s
ThreadPoolIdealSpawn/threads:1/task_cost:100000/real_time 81094164 ns 52531 ns 8 items_per_second=24.675k/s
ThreadPoolIdealSpawn/threads:2/task_cost:100000/real_time 42196762 ns 85439 ns 16 items_per_second=47.4207k/s
ThreadPoolIdealSpawn/threads:4/task_cost:100000/real_time 22380385 ns 120278 ns 32 items_per_second=89.4086k/s
ThreadPoolIdealSpawn/threads:8/task_cost:100000/real_time 12873517 ns 180938 ns 56 items_per_second=155.435k/s
```
Early results from work-stealing (note, impl:1 is the single queue implementation, it benefits quite a bit from the generalized refactor (#10401) which shrinks the critical section considerably)...
```
ThreadPoolSpawn/impl:1/threads:1/task_cost:1000/real_time 109095290 ns 45507459 ns 6 items_per_second=1.83327M/s
ThreadPoolSpawn/impl:1/threads:2/task_cost:1000/real_time 84445897 ns 73408467 ns 8 items_per_second=2.36839M/s
ThreadPoolSpawn/impl:1/threads:4/task_cost:1000/real_time 384508473 ns 331111388 ns 2 items_per_second=520.147k/s
ThreadPoolSpawn/impl:1/threads:8/task_cost:1000/real_time 340298964 ns 310431590 ns 2 items_per_second=587.721k/s
ThreadPoolSpawn/impl:1/threads:1/task_cost:10000/real_time 84889601 ns 2927850 ns 8 items_per_second=235.612k/s
ThreadPoolSpawn/impl:1/threads:2/task_cost:10000/real_time 46962168 ns 4429182 ns 16 items_per_second=425.896k/s
ThreadPoolSpawn/impl:1/threads:4/task_cost:10000/real_time 27891032 ns 5498450 ns 24 items_per_second=717.112k/s
ThreadPoolSpawn/impl:1/threads:8/task_cost:10000/real_time 23484115 ns 15697174 ns 29 items_per_second=851.682k/s
ThreadPoolSpawn/impl:1/threads:1/task_cost:100000/real_time 86121178 ns 466594 ns 8 items_per_second=23.2347k/s
ThreadPoolSpawn/impl:1/threads:2/task_cost:100000/real_time 47425209 ns 563522 ns 14 items_per_second=42.1928k/s
ThreadPoolSpawn/impl:1/threads:4/task_cost:100000/real_time 26281335 ns 621087 ns 29 items_per_second=76.1377k/s
ThreadPoolSpawn/impl:1/threads:8/task_cost:100000/real_time 17440052 ns 774646 ns 48 items_per_second=114.736k/s
ThreadPoolSpawn/impl:2/threads:1/task_cost:1000/real_time 103240478 ns 38378988 ns 7 items_per_second=1.93723M/s
ThreadPoolSpawn/impl:2/threads:2/task_cost:1000/real_time 75653969 ns 52775243 ns 9 items_per_second=2.64363M/s
ThreadPoolSpawn/impl:2/threads:4/task_cost:1000/real_time 377306923 ns 319204178 ns 2 items_per_second=530.075k/s
ThreadPoolSpawn/impl:2/threads:8/task_cost:1000/real_time 299943726 ns 268612631 ns 2 items_per_second=666.795k/s
ThreadPoolSpawn/impl:2/threads:1/task_cost:10000/real_time 90443729 ns 4378503 ns 8 items_per_second=221.143k/s
ThreadPoolSpawn/impl:2/threads:2/task_cost:10000/real_time 44763896 ns 2994083 ns 15 items_per_second=446.811k/s
ThreadPoolSpawn/impl:2/threads:4/task_cost:10000/real_time 23488252 ns 3392981 ns 28 items_per_second=851.532k/s
ThreadPoolSpawn/impl:2/threads:8/task_cost:10000/real_time 17048732 ns 5469744 ns 45 items_per_second=1.17317M/s
ThreadPoolSpawn/impl:2/threads:1/task_cost:100000/real_time 90217957 ns 547220 ns 8 items_per_second=22.1796k/s
ThreadPoolSpawn/impl:2/threads:2/task_cost:100000/real_time 46026938 ns 456690 ns 15 items_per_second=43.4745k/s
ThreadPoolSpawn/impl:2/threads:4/task_cost:100000/real_time 26673468 ns 570498 ns 28 items_per_second=75.0184k/s
ThreadPoolSpawn/impl:2/threads:8/task_cost:100000/real_time 13344305 ns 605292 ns 52 items_per_second=149.952k/s
ThreadPoolIdealSpawn/impl:1/threads:1/task_cost:1000/real_time 79428262 ns 101330 ns 9 items_per_second=2.51801M/s
ThreadPoolIdealSpawn/impl:1/threads:2/task_cost:1000/real_time 41236138 ns 139716 ns 17 items_per_second=4.85011M/s
ThreadPoolIdealSpawn/impl:1/threads:4/task_cost:1000/real_time 25092115 ns 204669 ns 28 items_per_second=7.97063M/s
ThreadPoolIdealSpawn/impl:1/threads:8/task_cost:1000/real_time 25280289 ns 348096 ns 32 items_per_second=7.9113M/s
ThreadPoolIdealSpawn/impl:1/threads:1/task_cost:10000/real_time 12125072 ns 94012 ns 49 items_per_second=1.64956M/s
ThreadPoolIdealSpawn/impl:1/threads:2/task_cost:10000/real_time 6073941 ns 114892 ns 89 items_per_second=3.29276M/s
ThreadPoolIdealSpawn/impl:1/threads:4/task_cost:10000/real_time 5748868 ns 173750 ns 125 items_per_second=3.47895M/s
ThreadPoolIdealSpawn/impl:1/threads:8/task_cost:10000/real_time 9328825 ns 595124 ns 108 items_per_second=2.14389M/s
ThreadPoolIdealSpawn/impl:1/threads:1/task_cost:100000/real_time 2958410 ns 85393 ns 229 items_per_second=676.377k/s
ThreadPoolIdealSpawn/impl:1/threads:2/task_cost:100000/real_time 3074567 ns 191386 ns 235 items_per_second=650.498k/s
ThreadPoolIdealSpawn/impl:1/threads:4/task_cost:100000/real_time 2770940 ns 260583 ns 235 items_per_second=721.777k/s
ThreadPoolIdealSpawn/impl:1/threads:8/task_cost:100000/real_time 1502805 ns 173022 ns 446 items_per_second=1.33085M/s
ThreadPoolIdealSpawn/impl:2/threads:1/task_cost:1000/real_time 85477504 ns 110078 ns 9 items_per_second=2.33981M/s
ThreadPoolIdealSpawn/impl:2/threads:2/task_cost:1000/real_time 44590546 ns 134688 ns 15 items_per_second=4.48526M/s
ThreadPoolIdealSpawn/impl:2/threads:4/task_cost:1000/real_time 26524082 ns 176142 ns 24 items_per_second=7.54032M/s
ThreadPoolIdealSpawn/impl:2/threads:8/task_cost:1000/real_time 28468596 ns 355124 ns 43 items_per_second=7.02528M/s
ThreadPoolIdealSpawn/impl:2/threads:1/task_cost:10000/real_time 10248307 ns 85681 ns 53 items_per_second=1.95164M/s
ThreadPoolIdealSpawn/impl:2/threads:2/task_cost:10000/real_time 5843059 ns 104370 ns 133 items_per_second=3.42286M/s
ThreadPoolIdealSpawn/impl:2/threads:4/task_cost:10000/real_time 6790626 ns 193317 ns 100 items_per_second=2.94524M/s
ThreadPoolIdealSpawn/impl:2/threads:8/task_cost:10000/real_time 9542556 ns 623649 ns 118 items_per_second=2.09587M/s
ThreadPoolIdealSpawn/impl:2/threads:1/task_cost:100000/real_time 3225593 ns 101590 ns 209 items_per_second=620.351k/s
ThreadPoolIdealSpawn/impl:2/threads:2/task_cost:100000/real_time 3129998 ns 186755 ns 219 items_per_second=638.978k/s
ThreadPoolIdealSpawn/impl:2/threads:4/task_cost:100000/real_time 5807119 ns 254434 ns 100 items_per_second=344.405k/s
ThreadPoolIdealSpawn/impl:2/threads:8/task_cost:100000/real_time 7648492 ns 453787 ns 91 items_per_second=261.489k/s
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] westonpace commented on a change in pull request #10421: ARROW-12903: [C++] Create new thread pool benchmark demonstrating the "scheduling" bottleneck
Posted by GitBox <gi...@apache.org>.
westonpace commented on a change in pull request #10421:
URL: https://github.com/apache/arrow/pull/10421#discussion_r643623285
##########
File path: cpp/src/arrow/util/thread_pool.h
##########
@@ -288,6 +288,10 @@ class ARROW_EXPORT ThreadPool : public Executor {
// tasks are finished.
Status Shutdown(bool wait = true);
+ // Waits for the thread pool to reach a quiet state where all workers are
Review comment:
Fixed.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] westonpace closed pull request #10421: ARROW-12903: [C++] Create new thread pool benchmark demonstrating the "scheduling" bottleneck
Posted by GitBox <gi...@apache.org>.
westonpace closed pull request #10421:
URL: https://github.com/apache/arrow/pull/10421
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] pitrou commented on a change in pull request #10421: ARROW-12903: [C++] Create new thread pool benchmark demonstrating the "scheduling" bottleneck
Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #10421:
URL: https://github.com/apache/arrow/pull/10421#discussion_r642886129
##########
File path: cpp/src/arrow/util/thread_pool.h
##########
@@ -288,6 +288,10 @@ class ARROW_EXPORT ThreadPool : public Executor {
// tasks are finished.
Status Shutdown(bool wait = true);
+ // Waits for the thread pool to reach a quiet state where all workers are
Review comment:
"Wait"
##########
File path: cpp/src/arrow/util/thread_pool_benchmark.cc
##########
@@ -103,6 +103,52 @@ static void ThreadPoolSpawn(benchmark::State& state) { // NOLINT non-const refe
state.SetItemsProcessed(state.iterations() * nspawns);
}
+// The ThreadPoolSpawn benchmark submits all tasks from a single outside thread. This
+// ends up causing a worst-case scenario for the current simple thread pool. All threads
+// compete over the task queue mutex trying to grab the next thread off the queue and the
+// result is a large amount of contention.
+//
+// By spreading out the scheduling across multiple threads we can help reduce that
+// contention. This benchmark demonstrates the ideal case where we are able to perfectly
+// partition the scheduling across the available threads.
+//
+// Both situations could be encountered (the thread pool can't choose how it is used) but
+// by having both benchmarks we can express the importance of distributed scheduling.
+static void ThreadPoolIdealSpawn(benchmark::State& state) { // NOLINT non-const reference
+ const auto nthreads = static_cast<int>(state.range(0));
+ const auto workload_size = static_cast<int32_t>(state.range(1));
+
+ Workload workload(workload_size);
+
+ // Spawn enough tasks to make the pool start up overhead negligible
+ const int32_t nspawns = 200000000 / workload_size + 1;
+ const int32_t nspawns_per_thread = nspawns / nthreads;
+
+ for (auto _ : state) {
+ state.PauseTiming();
+ std::shared_ptr<ThreadPool> pool;
+ pool = *ThreadPool::Make(nthreads);
+ state.ResumeTiming();
+
+ for (int32_t i = 0; i < nthreads; ++i) {
+ // Pass the task by reference to avoid copying it around
+ ABORT_NOT_OK(pool->Spawn([&pool, &workload, nspawns_per_thread] {
+ for (int32_t j = 0; j < nspawns_per_thread; j++) {
+ ABORT_NOT_OK(pool->Spawn(std::ref(workload)));
+ }
+ }));
+ }
+
+ // Wait for all tasks to finish
+ pool->WaitForIdle();
Review comment:
What's the point, since you're calling `Shutdown(wait=true)` just below?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] westonpace edited a comment on pull request #10421: ARROW-12903: [C++] Create new thread pool benchmark demonstrating the "scheduling" bottleneck
Posted by GitBox <gi...@apache.org>.
westonpace edited a comment on pull request #10421:
URL: https://github.com/apache/arrow/pull/10421#issuecomment-852682551
Just adding the benchmark...
```
ThreadPoolSpawn/threads:1/task_cost:1000/real_time 104576026 ns 39527736 ns 7 items_per_second=1.91249M/s
ThreadPoolSpawn/threads:2/task_cost:1000/real_time 81736943 ns 69631881 ns 8 items_per_second=2.44689M/s
ThreadPoolSpawn/threads:4/task_cost:1000/real_time 395577537 ns 337000146 ns 2 items_per_second=505.592k/s
ThreadPoolSpawn/threads:8/task_cost:1000/real_time 326650393 ns 290524204 ns 2 items_per_second=612.278k/s
ThreadPoolSpawn/threads:1/task_cost:10000/real_time 81345849 ns 2355243 ns 8 items_per_second=245.876k/s
ThreadPoolSpawn/threads:2/task_cost:10000/real_time 43399109 ns 2694481 ns 16 items_per_second=460.862k/s
ThreadPoolSpawn/threads:4/task_cost:10000/real_time 22975768 ns 3457033 ns 31 items_per_second=870.526k/s
ThreadPoolSpawn/threads:8/task_cost:10000/real_time 21717680 ns 14331911 ns 37 items_per_second=920.955k/s
ThreadPoolSpawn/threads:1/task_cost:100000/real_time 81517332 ns 270745 ns 8 items_per_second=24.5469k/s
ThreadPoolSpawn/threads:2/task_cost:100000/real_time 41615534 ns 281452 ns 17 items_per_second=48.083k/s
ThreadPoolSpawn/threads:4/task_cost:100000/real_time 21324149 ns 316989 ns 33 items_per_second=93.8373k/s
ThreadPoolSpawn/threads:8/task_cost:100000/real_time 12702954 ns 443910 ns 55 items_per_second=157.522k/s
ThreadPoolIdealSpawn/threads:1/task_cost:1000/real_time 107253606 ns 91704 ns 6 items_per_second=1.86475M/s
ThreadPoolIdealSpawn/threads:2/task_cost:1000/real_time 88176114 ns 117639 ns 8 items_per_second=2.2682M/s
ThreadPoolIdealSpawn/threads:4/task_cost:1000/real_time 86717904 ns 107266 ns 8 items_per_second=2.30634M/s
ThreadPoolIdealSpawn/threads:8/task_cost:1000/real_time 98762117 ns 209733 ns 7 items_per_second=2.02508M/s
ThreadPoolIdealSpawn/threads:1/task_cost:10000/real_time 84566727 ns 64819 ns 8 items_per_second=236.511k/s
ThreadPoolIdealSpawn/threads:2/task_cost:10000/real_time 46885981 ns 70276 ns 15 items_per_second=426.588k/s
ThreadPoolIdealSpawn/threads:4/task_cost:10000/real_time 27807860 ns 94742 ns 26 items_per_second=719.257k/s
ThreadPoolIdealSpawn/threads:8/task_cost:10000/real_time 16994645 ns 148328 ns 41 items_per_second=1.1769M/s
ThreadPoolIdealSpawn/threads:1/task_cost:100000/real_time 81094164 ns 52531 ns 8 items_per_second=24.675k/s
ThreadPoolIdealSpawn/threads:2/task_cost:100000/real_time 42196762 ns 85439 ns 16 items_per_second=47.4207k/s
ThreadPoolIdealSpawn/threads:4/task_cost:100000/real_time 22380385 ns 120278 ns 32 items_per_second=89.4086k/s
ThreadPoolIdealSpawn/threads:8/task_cost:100000/real_time 12873517 ns 180938 ns 56 items_per_second=155.435k/s
```
Early results from work-stealing (note, impl:1 is the single queue implementation, it benefits quite a bit from the generalized refactor (#10401) which shrinks the critical section)...
```
ThreadPoolSpawn/impl:1/threads:1/task_cost:1000/real_time 109095290 ns 45507459 ns 6 items_per_second=1.83327M/s
ThreadPoolSpawn/impl:1/threads:2/task_cost:1000/real_time 84445897 ns 73408467 ns 8 items_per_second=2.36839M/s
ThreadPoolSpawn/impl:1/threads:4/task_cost:1000/real_time 384508473 ns 331111388 ns 2 items_per_second=520.147k/s
ThreadPoolSpawn/impl:1/threads:8/task_cost:1000/real_time 340298964 ns 310431590 ns 2 items_per_second=587.721k/s
ThreadPoolSpawn/impl:1/threads:1/task_cost:10000/real_time 84889601 ns 2927850 ns 8 items_per_second=235.612k/s
ThreadPoolSpawn/impl:1/threads:2/task_cost:10000/real_time 46962168 ns 4429182 ns 16 items_per_second=425.896k/s
ThreadPoolSpawn/impl:1/threads:4/task_cost:10000/real_time 27891032 ns 5498450 ns 24 items_per_second=717.112k/s
ThreadPoolSpawn/impl:1/threads:8/task_cost:10000/real_time 23484115 ns 15697174 ns 29 items_per_second=851.682k/s
ThreadPoolSpawn/impl:1/threads:1/task_cost:100000/real_time 86121178 ns 466594 ns 8 items_per_second=23.2347k/s
ThreadPoolSpawn/impl:1/threads:2/task_cost:100000/real_time 47425209 ns 563522 ns 14 items_per_second=42.1928k/s
ThreadPoolSpawn/impl:1/threads:4/task_cost:100000/real_time 26281335 ns 621087 ns 29 items_per_second=76.1377k/s
ThreadPoolSpawn/impl:1/threads:8/task_cost:100000/real_time 17440052 ns 774646 ns 48 items_per_second=114.736k/s
ThreadPoolSpawn/impl:2/threads:1/task_cost:1000/real_time 103240478 ns 38378988 ns 7 items_per_second=1.93723M/s
ThreadPoolSpawn/impl:2/threads:2/task_cost:1000/real_time 75653969 ns 52775243 ns 9 items_per_second=2.64363M/s
ThreadPoolSpawn/impl:2/threads:4/task_cost:1000/real_time 377306923 ns 319204178 ns 2 items_per_second=530.075k/s
ThreadPoolSpawn/impl:2/threads:8/task_cost:1000/real_time 299943726 ns 268612631 ns 2 items_per_second=666.795k/s
ThreadPoolSpawn/impl:2/threads:1/task_cost:10000/real_time 90443729 ns 4378503 ns 8 items_per_second=221.143k/s
ThreadPoolSpawn/impl:2/threads:2/task_cost:10000/real_time 44763896 ns 2994083 ns 15 items_per_second=446.811k/s
ThreadPoolSpawn/impl:2/threads:4/task_cost:10000/real_time 23488252 ns 3392981 ns 28 items_per_second=851.532k/s
ThreadPoolSpawn/impl:2/threads:8/task_cost:10000/real_time 17048732 ns 5469744 ns 45 items_per_second=1.17317M/s
ThreadPoolSpawn/impl:2/threads:1/task_cost:100000/real_time 90217957 ns 547220 ns 8 items_per_second=22.1796k/s
ThreadPoolSpawn/impl:2/threads:2/task_cost:100000/real_time 46026938 ns 456690 ns 15 items_per_second=43.4745k/s
ThreadPoolSpawn/impl:2/threads:4/task_cost:100000/real_time 26673468 ns 570498 ns 28 items_per_second=75.0184k/s
ThreadPoolSpawn/impl:2/threads:8/task_cost:100000/real_time 13344305 ns 605292 ns 52 items_per_second=149.952k/s
ThreadPoolIdealSpawn/impl:1/threads:1/task_cost:1000/real_time 79428262 ns 101330 ns 9 items_per_second=2.51801M/s
ThreadPoolIdealSpawn/impl:1/threads:2/task_cost:1000/real_time 41236138 ns 139716 ns 17 items_per_second=4.85011M/s
ThreadPoolIdealSpawn/impl:1/threads:4/task_cost:1000/real_time 25092115 ns 204669 ns 28 items_per_second=7.97063M/s
ThreadPoolIdealSpawn/impl:1/threads:8/task_cost:1000/real_time 25280289 ns 348096 ns 32 items_per_second=7.9113M/s
ThreadPoolIdealSpawn/impl:1/threads:1/task_cost:10000/real_time 12125072 ns 94012 ns 49 items_per_second=1.64956M/s
ThreadPoolIdealSpawn/impl:1/threads:2/task_cost:10000/real_time 6073941 ns 114892 ns 89 items_per_second=3.29276M/s
ThreadPoolIdealSpawn/impl:1/threads:4/task_cost:10000/real_time 5748868 ns 173750 ns 125 items_per_second=3.47895M/s
ThreadPoolIdealSpawn/impl:1/threads:8/task_cost:10000/real_time 9328825 ns 595124 ns 108 items_per_second=2.14389M/s
ThreadPoolIdealSpawn/impl:1/threads:1/task_cost:100000/real_time 2958410 ns 85393 ns 229 items_per_second=676.377k/s
ThreadPoolIdealSpawn/impl:1/threads:2/task_cost:100000/real_time 3074567 ns 191386 ns 235 items_per_second=650.498k/s
ThreadPoolIdealSpawn/impl:1/threads:4/task_cost:100000/real_time 2770940 ns 260583 ns 235 items_per_second=721.777k/s
ThreadPoolIdealSpawn/impl:1/threads:8/task_cost:100000/real_time 1502805 ns 173022 ns 446 items_per_second=1.33085M/s
ThreadPoolIdealSpawn/impl:2/threads:1/task_cost:1000/real_time 85477504 ns 110078 ns 9 items_per_second=2.33981M/s
ThreadPoolIdealSpawn/impl:2/threads:2/task_cost:1000/real_time 44590546 ns 134688 ns 15 items_per_second=4.48526M/s
ThreadPoolIdealSpawn/impl:2/threads:4/task_cost:1000/real_time 26524082 ns 176142 ns 24 items_per_second=7.54032M/s
ThreadPoolIdealSpawn/impl:2/threads:8/task_cost:1000/real_time 28468596 ns 355124 ns 43 items_per_second=7.02528M/s
ThreadPoolIdealSpawn/impl:2/threads:1/task_cost:10000/real_time 10248307 ns 85681 ns 53 items_per_second=1.95164M/s
ThreadPoolIdealSpawn/impl:2/threads:2/task_cost:10000/real_time 5843059 ns 104370 ns 133 items_per_second=3.42286M/s
ThreadPoolIdealSpawn/impl:2/threads:4/task_cost:10000/real_time 6790626 ns 193317 ns 100 items_per_second=2.94524M/s
ThreadPoolIdealSpawn/impl:2/threads:8/task_cost:10000/real_time 9542556 ns 623649 ns 118 items_per_second=2.09587M/s
ThreadPoolIdealSpawn/impl:2/threads:1/task_cost:100000/real_time 3225593 ns 101590 ns 209 items_per_second=620.351k/s
ThreadPoolIdealSpawn/impl:2/threads:2/task_cost:100000/real_time 3129998 ns 186755 ns 219 items_per_second=638.978k/s
ThreadPoolIdealSpawn/impl:2/threads:4/task_cost:100000/real_time 5807119 ns 254434 ns 100 items_per_second=344.405k/s
ThreadPoolIdealSpawn/impl:2/threads:8/task_cost:100000/real_time 7648492 ns 453787 ns 91 items_per_second=261.489k/s
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] westonpace commented on a change in pull request #10421: ARROW-12903: [C++] Create new thread pool benchmark demonstrating the "scheduling" bottleneck
Posted by GitBox <gi...@apache.org>.
westonpace commented on a change in pull request #10421:
URL: https://github.com/apache/arrow/pull/10421#discussion_r643622911
##########
File path: cpp/src/arrow/util/thread_pool_benchmark.cc
##########
@@ -103,6 +103,52 @@ static void ThreadPoolSpawn(benchmark::State& state) { // NOLINT non-const refe
state.SetItemsProcessed(state.iterations() * nspawns);
}
+// The ThreadPoolSpawn benchmark submits all tasks from a single outside thread. This
+// ends up causing a worst-case scenario for the current simple thread pool. All threads
+// compete over the task queue mutex trying to grab the next thread off the queue and the
+// result is a large amount of contention.
+//
+// By spreading out the scheduling across multiple threads we can help reduce that
+// contention. This benchmark demonstrates the ideal case where we are able to perfectly
+// partition the scheduling across the available threads.
+//
+// Both situations could be encountered (the thread pool can't choose how it is used) but
+// by having both benchmarks we can express the importance of distributed scheduling.
+static void ThreadPoolIdealSpawn(benchmark::State& state) { // NOLINT non-const reference
+ const auto nthreads = static_cast<int>(state.range(0));
+ const auto workload_size = static_cast<int32_t>(state.range(1));
+
+ Workload workload(workload_size);
+
+ // Spawn enough tasks to make the pool start up overhead negligible
+ const int32_t nspawns = 200000000 / workload_size + 1;
+ const int32_t nspawns_per_thread = nspawns / nthreads;
+
+ for (auto _ : state) {
+ state.PauseTiming();
+ std::shared_ptr<ThreadPool> pool;
+ pool = *ThreadPool::Make(nthreads);
+ state.ResumeTiming();
+
+ for (int32_t i = 0; i < nthreads; ++i) {
+ // Pass the task by reference to avoid copying it around
+ ABORT_NOT_OK(pool->Spawn([&pool, &workload, nspawns_per_thread] {
+ for (int32_t j = 0; j < nspawns_per_thread; j++) {
+ ABORT_NOT_OK(pool->Spawn(std::ref(workload)));
+ }
+ }));
+ }
+
+ // Wait for all tasks to finish
+ pool->WaitForIdle();
Review comment:
At this point we cannot know that all the tasks have been spawned. If I call `Shutdown(wait=true)` then a slow spawner will fail because `SpawnReal` returns `Status::Invalid` if `please_shutdown_` is `true`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org