You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/05/28 03:47:36 UTC

[GitHub] [arrow] westonpace opened a new pull request #10421: ARROW-12903: [C++] Create new thread pool benchmark demonstrating the "scheduling" bottleneck

westonpace opened a new pull request #10421:
URL: https://github.com/apache/arrow/pull/10421


   See JIRA for rationale (I also added a comment to the benchmark itself)
   
   Even without any changes to the thread pools you should be able to see some speedup from ideal scheduling.  With the changes being introduced in the work stealing PRs the speedup is even more stark.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] westonpace commented on pull request #10421: ARROW-12903: [C++] Create new thread pool benchmark demonstrating the "scheduling" bottleneck

Posted by GitBox <gi...@apache.org>.

westonpace commented on pull request #10421:
URL: https://github.com/apache/arrow/pull/10421#issuecomment-881178238


   Closing for now to keep PR queue clean.  Will reopen when we revisit work scheduling.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] github-actions[bot] commented on pull request #10421: ARROW-12903: [C++] Create new thread pool benchmark demonstrating the "scheduling" bottleneck

Posted by GitBox <gi...@apache.org>.

github-actions[bot] commented on pull request #10421:
URL: https://github.com/apache/arrow/pull/10421#issuecomment-850090015


   https://issues.apache.org/jira/browse/ARROW-12903


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] westonpace commented on pull request #10421: ARROW-12903: [C++] Create new thread pool benchmark demonstrating the "scheduling" bottleneck

Posted by GitBox <gi...@apache.org>.

westonpace commented on pull request #10421:
URL: https://github.com/apache/arrow/pull/10421#issuecomment-852682551


   Just adding the benchmark...
   
   ```
   ThreadPoolSpawn/threads:1/task_cost:1000/real_time         104576026 ns     39527736 ns            7 items_per_second=1.91249M/s
   ThreadPoolSpawn/threads:2/task_cost:1000/real_time          81736943 ns     69631881 ns            8 items_per_second=2.44689M/s
   ThreadPoolSpawn/threads:4/task_cost:1000/real_time         395577537 ns    337000146 ns            2 items_per_second=505.592k/s
   ThreadPoolSpawn/threads:8/task_cost:1000/real_time         326650393 ns    290524204 ns            2 items_per_second=612.278k/s
   ThreadPoolSpawn/threads:1/task_cost:10000/real_time         81345849 ns      2355243 ns            8 items_per_second=245.876k/s
   ThreadPoolSpawn/threads:2/task_cost:10000/real_time         43399109 ns      2694481 ns           16 items_per_second=460.862k/s
   ThreadPoolSpawn/threads:4/task_cost:10000/real_time         22975768 ns      3457033 ns           31 items_per_second=870.526k/s
   ThreadPoolSpawn/threads:8/task_cost:10000/real_time         21717680 ns     14331911 ns           37 items_per_second=920.955k/s
   ThreadPoolSpawn/threads:1/task_cost:100000/real_time        81517332 ns       270745 ns            8 items_per_second=24.5469k/s
   ThreadPoolSpawn/threads:2/task_cost:100000/real_time        41615534 ns       281452 ns           17 items_per_second=48.083k/s
   ThreadPoolSpawn/threads:4/task_cost:100000/real_time        21324149 ns       316989 ns           33 items_per_second=93.8373k/s
   ThreadPoolSpawn/threads:8/task_cost:100000/real_time        12702954 ns       443910 ns           55 items_per_second=157.522k/s
   ThreadPoolIdealSpawn/threads:1/task_cost:1000/real_time    107253606 ns        91704 ns            6 items_per_second=1.86475M/s
   ThreadPoolIdealSpawn/threads:2/task_cost:1000/real_time     88176114 ns       117639 ns            8 items_per_second=2.2682M/s
   ThreadPoolIdealSpawn/threads:4/task_cost:1000/real_time     86717904 ns       107266 ns            8 items_per_second=2.30634M/s
   ThreadPoolIdealSpawn/threads:8/task_cost:1000/real_time     98762117 ns       209733 ns            7 items_per_second=2.02508M/s
   ThreadPoolIdealSpawn/threads:1/task_cost:10000/real_time    84566727 ns        64819 ns            8 items_per_second=236.511k/s
   ThreadPoolIdealSpawn/threads:2/task_cost:10000/real_time    46885981 ns        70276 ns           15 items_per_second=426.588k/s
   ThreadPoolIdealSpawn/threads:4/task_cost:10000/real_time    27807860 ns        94742 ns           26 items_per_second=719.257k/s
   ThreadPoolIdealSpawn/threads:8/task_cost:10000/real_time    16994645 ns       148328 ns           41 items_per_second=1.1769M/s
   ThreadPoolIdealSpawn/threads:1/task_cost:100000/real_time   81094164 ns        52531 ns            8 items_per_second=24.675k/s
   ThreadPoolIdealSpawn/threads:2/task_cost:100000/real_time   42196762 ns        85439 ns           16 items_per_second=47.4207k/s
   ThreadPoolIdealSpawn/threads:4/task_cost:100000/real_time   22380385 ns       120278 ns           32 items_per_second=89.4086k/s
   ThreadPoolIdealSpawn/threads:8/task_cost:100000/real_time   12873517 ns       180938 ns           56 items_per_second=155.435k/s
   ```
   
   Early results from work-stealing (note, impl:1 is the single queue implementation, it benefits quite a bit from the generalized refactor (#10401) which shrinks the critical section considerably)...
   
   ```
   ThreadPoolSpawn/impl:1/threads:1/task_cost:1000/real_time         109095290 ns     45507459 ns            6 items_per_second=1.83327M/s
   ThreadPoolSpawn/impl:1/threads:2/task_cost:1000/real_time          84445897 ns     73408467 ns            8 items_per_second=2.36839M/s
   ThreadPoolSpawn/impl:1/threads:4/task_cost:1000/real_time         384508473 ns    331111388 ns            2 items_per_second=520.147k/s
   ThreadPoolSpawn/impl:1/threads:8/task_cost:1000/real_time         340298964 ns    310431590 ns            2 items_per_second=587.721k/s
   ThreadPoolSpawn/impl:1/threads:1/task_cost:10000/real_time         84889601 ns      2927850 ns            8 items_per_second=235.612k/s
   ThreadPoolSpawn/impl:1/threads:2/task_cost:10000/real_time         46962168 ns      4429182 ns           16 items_per_second=425.896k/s
   ThreadPoolSpawn/impl:1/threads:4/task_cost:10000/real_time         27891032 ns      5498450 ns           24 items_per_second=717.112k/s
   ThreadPoolSpawn/impl:1/threads:8/task_cost:10000/real_time         23484115 ns     15697174 ns           29 items_per_second=851.682k/s
   ThreadPoolSpawn/impl:1/threads:1/task_cost:100000/real_time        86121178 ns       466594 ns            8 items_per_second=23.2347k/s
   ThreadPoolSpawn/impl:1/threads:2/task_cost:100000/real_time        47425209 ns       563522 ns           14 items_per_second=42.1928k/s
   ThreadPoolSpawn/impl:1/threads:4/task_cost:100000/real_time        26281335 ns       621087 ns           29 items_per_second=76.1377k/s
   ThreadPoolSpawn/impl:1/threads:8/task_cost:100000/real_time        17440052 ns       774646 ns           48 items_per_second=114.736k/s
   ThreadPoolSpawn/impl:2/threads:1/task_cost:1000/real_time         103240478 ns     38378988 ns            7 items_per_second=1.93723M/s
   ThreadPoolSpawn/impl:2/threads:2/task_cost:1000/real_time          75653969 ns     52775243 ns            9 items_per_second=2.64363M/s
   ThreadPoolSpawn/impl:2/threads:4/task_cost:1000/real_time         377306923 ns    319204178 ns            2 items_per_second=530.075k/s
   ThreadPoolSpawn/impl:2/threads:8/task_cost:1000/real_time         299943726 ns    268612631 ns            2 items_per_second=666.795k/s
   ThreadPoolSpawn/impl:2/threads:1/task_cost:10000/real_time         90443729 ns      4378503 ns            8 items_per_second=221.143k/s
   ThreadPoolSpawn/impl:2/threads:2/task_cost:10000/real_time         44763896 ns      2994083 ns           15 items_per_second=446.811k/s
   ThreadPoolSpawn/impl:2/threads:4/task_cost:10000/real_time         23488252 ns      3392981 ns           28 items_per_second=851.532k/s
   ThreadPoolSpawn/impl:2/threads:8/task_cost:10000/real_time         17048732 ns      5469744 ns           45 items_per_second=1.17317M/s
   ThreadPoolSpawn/impl:2/threads:1/task_cost:100000/real_time        90217957 ns       547220 ns            8 items_per_second=22.1796k/s
   ThreadPoolSpawn/impl:2/threads:2/task_cost:100000/real_time        46026938 ns       456690 ns           15 items_per_second=43.4745k/s
   ThreadPoolSpawn/impl:2/threads:4/task_cost:100000/real_time        26673468 ns       570498 ns           28 items_per_second=75.0184k/s
   ThreadPoolSpawn/impl:2/threads:8/task_cost:100000/real_time        13344305 ns       605292 ns           52 items_per_second=149.952k/s
   ThreadPoolIdealSpawn/impl:1/threads:1/task_cost:1000/real_time     79428262 ns       101330 ns            9 items_per_second=2.51801M/s
   ThreadPoolIdealSpawn/impl:1/threads:2/task_cost:1000/real_time     41236138 ns       139716 ns           17 items_per_second=4.85011M/s
   ThreadPoolIdealSpawn/impl:1/threads:4/task_cost:1000/real_time     25092115 ns       204669 ns           28 items_per_second=7.97063M/s
   ThreadPoolIdealSpawn/impl:1/threads:8/task_cost:1000/real_time     25280289 ns       348096 ns           32 items_per_second=7.9113M/s
   ThreadPoolIdealSpawn/impl:1/threads:1/task_cost:10000/real_time    12125072 ns        94012 ns           49 items_per_second=1.64956M/s
   ThreadPoolIdealSpawn/impl:1/threads:2/task_cost:10000/real_time     6073941 ns       114892 ns           89 items_per_second=3.29276M/s
   ThreadPoolIdealSpawn/impl:1/threads:4/task_cost:10000/real_time     5748868 ns       173750 ns          125 items_per_second=3.47895M/s
   ThreadPoolIdealSpawn/impl:1/threads:8/task_cost:10000/real_time     9328825 ns       595124 ns          108 items_per_second=2.14389M/s
   ThreadPoolIdealSpawn/impl:1/threads:1/task_cost:100000/real_time    2958410 ns        85393 ns          229 items_per_second=676.377k/s
   ThreadPoolIdealSpawn/impl:1/threads:2/task_cost:100000/real_time    3074567 ns       191386 ns          235 items_per_second=650.498k/s
   ThreadPoolIdealSpawn/impl:1/threads:4/task_cost:100000/real_time    2770940 ns       260583 ns          235 items_per_second=721.777k/s
   ThreadPoolIdealSpawn/impl:1/threads:8/task_cost:100000/real_time    1502805 ns       173022 ns          446 items_per_second=1.33085M/s
   ThreadPoolIdealSpawn/impl:2/threads:1/task_cost:1000/real_time     85477504 ns       110078 ns            9 items_per_second=2.33981M/s
   ThreadPoolIdealSpawn/impl:2/threads:2/task_cost:1000/real_time     44590546 ns       134688 ns           15 items_per_second=4.48526M/s
   ThreadPoolIdealSpawn/impl:2/threads:4/task_cost:1000/real_time     26524082 ns       176142 ns           24 items_per_second=7.54032M/s
   ThreadPoolIdealSpawn/impl:2/threads:8/task_cost:1000/real_time     28468596 ns       355124 ns           43 items_per_second=7.02528M/s
   ThreadPoolIdealSpawn/impl:2/threads:1/task_cost:10000/real_time    10248307 ns        85681 ns           53 items_per_second=1.95164M/s
   ThreadPoolIdealSpawn/impl:2/threads:2/task_cost:10000/real_time     5843059 ns       104370 ns          133 items_per_second=3.42286M/s
   ThreadPoolIdealSpawn/impl:2/threads:4/task_cost:10000/real_time     6790626 ns       193317 ns          100 items_per_second=2.94524M/s
   ThreadPoolIdealSpawn/impl:2/threads:8/task_cost:10000/real_time     9542556 ns       623649 ns          118 items_per_second=2.09587M/s
   ThreadPoolIdealSpawn/impl:2/threads:1/task_cost:100000/real_time    3225593 ns       101590 ns          209 items_per_second=620.351k/s
   ThreadPoolIdealSpawn/impl:2/threads:2/task_cost:100000/real_time    3129998 ns       186755 ns          219 items_per_second=638.978k/s
   ThreadPoolIdealSpawn/impl:2/threads:4/task_cost:100000/real_time    5807119 ns       254434 ns          100 items_per_second=344.405k/s
   ThreadPoolIdealSpawn/impl:2/threads:8/task_cost:100000/real_time    7648492 ns       453787 ns           91 items_per_second=261.489k/s
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] westonpace commented on a change in pull request #10421: ARROW-12903: [C++] Create new thread pool benchmark demonstrating the "scheduling" bottleneck

Posted by GitBox <gi...@apache.org>.

westonpace commented on a change in pull request #10421:
URL: https://github.com/apache/arrow/pull/10421#discussion_r643623285



##########
File path: cpp/src/arrow/util/thread_pool.h
##########
@@ -288,6 +288,10 @@ class ARROW_EXPORT ThreadPool : public Executor {
   // tasks are finished.
   Status Shutdown(bool wait = true);
 
+  // Waits for the thread pool to reach a quiet state where all workers are

Review comment:
       Fixed.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] westonpace closed pull request #10421: ARROW-12903: [C++] Create new thread pool benchmark demonstrating the "scheduling" bottleneck

Posted by GitBox <gi...@apache.org>.

westonpace closed pull request #10421:
URL: https://github.com/apache/arrow/pull/10421


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] pitrou commented on a change in pull request #10421: ARROW-12903: [C++] Create new thread pool benchmark demonstrating the "scheduling" bottleneck

Posted by GitBox <gi...@apache.org>.

pitrou commented on a change in pull request #10421:
URL: https://github.com/apache/arrow/pull/10421#discussion_r642886129



##########
File path: cpp/src/arrow/util/thread_pool.h
##########
@@ -288,6 +288,10 @@ class ARROW_EXPORT ThreadPool : public Executor {
   // tasks are finished.
   Status Shutdown(bool wait = true);
 
+  // Waits for the thread pool to reach a quiet state where all workers are

Review comment:
       "Wait"

##########
File path: cpp/src/arrow/util/thread_pool_benchmark.cc
##########
@@ -103,6 +103,52 @@ static void ThreadPoolSpawn(benchmark::State& state) {  // NOLINT non-const refe
   state.SetItemsProcessed(state.iterations() * nspawns);
 }
 
+// The ThreadPoolSpawn benchmark submits all tasks from a single outside thread.  This
+// ends up causing a worst-case scenario for the current simple thread pool.  All threads
+// compete over the task queue mutex trying to grab the next thread off the queue and the
+// result is a large amount of contention.
+//
+// By spreading out the scheduling across multiple threads we can help reduce that
+// contention.  This benchmark demonstrates the ideal case where we are able to perfectly
+// partition the scheduling across the available threads.
+//
+// Both situations could be encountered (the thread pool can't choose how it is used) but
+// by having both benchmarks we can express the importance of distributed scheduling.
+static void ThreadPoolIdealSpawn(benchmark::State& state) {  // NOLINT non-const reference
+  const auto nthreads = static_cast<int>(state.range(0));
+  const auto workload_size = static_cast<int32_t>(state.range(1));
+
+  Workload workload(workload_size);
+
+  // Spawn enough tasks to make the pool start up overhead negligible
+  const int32_t nspawns = 200000000 / workload_size + 1;
+  const int32_t nspawns_per_thread = nspawns / nthreads;
+
+  for (auto _ : state) {
+    state.PauseTiming();
+    std::shared_ptr<ThreadPool> pool;
+    pool = *ThreadPool::Make(nthreads);
+    state.ResumeTiming();
+
+    for (int32_t i = 0; i < nthreads; ++i) {
+      // Pass the task by reference to avoid copying it around
+      ABORT_NOT_OK(pool->Spawn([&pool, &workload, nspawns_per_thread] {
+        for (int32_t j = 0; j < nspawns_per_thread; j++) {
+          ABORT_NOT_OK(pool->Spawn(std::ref(workload)));
+        }
+      }));
+    }
+
+    // Wait for all tasks to finish
+    pool->WaitForIdle();

Review comment:
       What's the point, since you're calling `Shutdown(wait=true)` just below?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] westonpace edited a comment on pull request #10421: ARROW-12903: [C++] Create new thread pool benchmark demonstrating the "scheduling" bottleneck

Posted by GitBox <gi...@apache.org>.

westonpace edited a comment on pull request #10421:
URL: https://github.com/apache/arrow/pull/10421#issuecomment-852682551


   Just adding the benchmark...
   
   ```
   ThreadPoolSpawn/threads:1/task_cost:1000/real_time         104576026 ns     39527736 ns            7 items_per_second=1.91249M/s
   ThreadPoolSpawn/threads:2/task_cost:1000/real_time          81736943 ns     69631881 ns            8 items_per_second=2.44689M/s
   ThreadPoolSpawn/threads:4/task_cost:1000/real_time         395577537 ns    337000146 ns            2 items_per_second=505.592k/s
   ThreadPoolSpawn/threads:8/task_cost:1000/real_time         326650393 ns    290524204 ns            2 items_per_second=612.278k/s
   ThreadPoolSpawn/threads:1/task_cost:10000/real_time         81345849 ns      2355243 ns            8 items_per_second=245.876k/s
   ThreadPoolSpawn/threads:2/task_cost:10000/real_time         43399109 ns      2694481 ns           16 items_per_second=460.862k/s
   ThreadPoolSpawn/threads:4/task_cost:10000/real_time         22975768 ns      3457033 ns           31 items_per_second=870.526k/s
   ThreadPoolSpawn/threads:8/task_cost:10000/real_time         21717680 ns     14331911 ns           37 items_per_second=920.955k/s
   ThreadPoolSpawn/threads:1/task_cost:100000/real_time        81517332 ns       270745 ns            8 items_per_second=24.5469k/s
   ThreadPoolSpawn/threads:2/task_cost:100000/real_time        41615534 ns       281452 ns           17 items_per_second=48.083k/s
   ThreadPoolSpawn/threads:4/task_cost:100000/real_time        21324149 ns       316989 ns           33 items_per_second=93.8373k/s
   ThreadPoolSpawn/threads:8/task_cost:100000/real_time        12702954 ns       443910 ns           55 items_per_second=157.522k/s
   ThreadPoolIdealSpawn/threads:1/task_cost:1000/real_time    107253606 ns        91704 ns            6 items_per_second=1.86475M/s
   ThreadPoolIdealSpawn/threads:2/task_cost:1000/real_time     88176114 ns       117639 ns            8 items_per_second=2.2682M/s
   ThreadPoolIdealSpawn/threads:4/task_cost:1000/real_time     86717904 ns       107266 ns            8 items_per_second=2.30634M/s
   ThreadPoolIdealSpawn/threads:8/task_cost:1000/real_time     98762117 ns       209733 ns            7 items_per_second=2.02508M/s
   ThreadPoolIdealSpawn/threads:1/task_cost:10000/real_time    84566727 ns        64819 ns            8 items_per_second=236.511k/s
   ThreadPoolIdealSpawn/threads:2/task_cost:10000/real_time    46885981 ns        70276 ns           15 items_per_second=426.588k/s
   ThreadPoolIdealSpawn/threads:4/task_cost:10000/real_time    27807860 ns        94742 ns           26 items_per_second=719.257k/s
   ThreadPoolIdealSpawn/threads:8/task_cost:10000/real_time    16994645 ns       148328 ns           41 items_per_second=1.1769M/s
   ThreadPoolIdealSpawn/threads:1/task_cost:100000/real_time   81094164 ns        52531 ns            8 items_per_second=24.675k/s
   ThreadPoolIdealSpawn/threads:2/task_cost:100000/real_time   42196762 ns        85439 ns           16 items_per_second=47.4207k/s
   ThreadPoolIdealSpawn/threads:4/task_cost:100000/real_time   22380385 ns       120278 ns           32 items_per_second=89.4086k/s
   ThreadPoolIdealSpawn/threads:8/task_cost:100000/real_time   12873517 ns       180938 ns           56 items_per_second=155.435k/s
   ```
   
   Early results from work-stealing (note, impl:1 is the single queue implementation, it benefits quite a bit from the generalized refactor (#10401) which shrinks the critical section)...
   
   ```
   ThreadPoolSpawn/impl:1/threads:1/task_cost:1000/real_time         109095290 ns     45507459 ns            6 items_per_second=1.83327M/s
   ThreadPoolSpawn/impl:1/threads:2/task_cost:1000/real_time          84445897 ns     73408467 ns            8 items_per_second=2.36839M/s
   ThreadPoolSpawn/impl:1/threads:4/task_cost:1000/real_time         384508473 ns    331111388 ns            2 items_per_second=520.147k/s
   ThreadPoolSpawn/impl:1/threads:8/task_cost:1000/real_time         340298964 ns    310431590 ns            2 items_per_second=587.721k/s
   ThreadPoolSpawn/impl:1/threads:1/task_cost:10000/real_time         84889601 ns      2927850 ns            8 items_per_second=235.612k/s
   ThreadPoolSpawn/impl:1/threads:2/task_cost:10000/real_time         46962168 ns      4429182 ns           16 items_per_second=425.896k/s
   ThreadPoolSpawn/impl:1/threads:4/task_cost:10000/real_time         27891032 ns      5498450 ns           24 items_per_second=717.112k/s
   ThreadPoolSpawn/impl:1/threads:8/task_cost:10000/real_time         23484115 ns     15697174 ns           29 items_per_second=851.682k/s
   ThreadPoolSpawn/impl:1/threads:1/task_cost:100000/real_time        86121178 ns       466594 ns            8 items_per_second=23.2347k/s
   ThreadPoolSpawn/impl:1/threads:2/task_cost:100000/real_time        47425209 ns       563522 ns           14 items_per_second=42.1928k/s
   ThreadPoolSpawn/impl:1/threads:4/task_cost:100000/real_time        26281335 ns       621087 ns           29 items_per_second=76.1377k/s
   ThreadPoolSpawn/impl:1/threads:8/task_cost:100000/real_time        17440052 ns       774646 ns           48 items_per_second=114.736k/s
   ThreadPoolSpawn/impl:2/threads:1/task_cost:1000/real_time         103240478 ns     38378988 ns            7 items_per_second=1.93723M/s
   ThreadPoolSpawn/impl:2/threads:2/task_cost:1000/real_time          75653969 ns     52775243 ns            9 items_per_second=2.64363M/s
   ThreadPoolSpawn/impl:2/threads:4/task_cost:1000/real_time         377306923 ns    319204178 ns            2 items_per_second=530.075k/s
   ThreadPoolSpawn/impl:2/threads:8/task_cost:1000/real_time         299943726 ns    268612631 ns            2 items_per_second=666.795k/s
   ThreadPoolSpawn/impl:2/threads:1/task_cost:10000/real_time         90443729 ns      4378503 ns            8 items_per_second=221.143k/s
   ThreadPoolSpawn/impl:2/threads:2/task_cost:10000/real_time         44763896 ns      2994083 ns           15 items_per_second=446.811k/s
   ThreadPoolSpawn/impl:2/threads:4/task_cost:10000/real_time         23488252 ns      3392981 ns           28 items_per_second=851.532k/s
   ThreadPoolSpawn/impl:2/threads:8/task_cost:10000/real_time         17048732 ns      5469744 ns           45 items_per_second=1.17317M/s
   ThreadPoolSpawn/impl:2/threads:1/task_cost:100000/real_time        90217957 ns       547220 ns            8 items_per_second=22.1796k/s
   ThreadPoolSpawn/impl:2/threads:2/task_cost:100000/real_time        46026938 ns       456690 ns           15 items_per_second=43.4745k/s
   ThreadPoolSpawn/impl:2/threads:4/task_cost:100000/real_time        26673468 ns       570498 ns           28 items_per_second=75.0184k/s
   ThreadPoolSpawn/impl:2/threads:8/task_cost:100000/real_time        13344305 ns       605292 ns           52 items_per_second=149.952k/s
   ThreadPoolIdealSpawn/impl:1/threads:1/task_cost:1000/real_time     79428262 ns       101330 ns            9 items_per_second=2.51801M/s
   ThreadPoolIdealSpawn/impl:1/threads:2/task_cost:1000/real_time     41236138 ns       139716 ns           17 items_per_second=4.85011M/s
   ThreadPoolIdealSpawn/impl:1/threads:4/task_cost:1000/real_time     25092115 ns       204669 ns           28 items_per_second=7.97063M/s
   ThreadPoolIdealSpawn/impl:1/threads:8/task_cost:1000/real_time     25280289 ns       348096 ns           32 items_per_second=7.9113M/s
   ThreadPoolIdealSpawn/impl:1/threads:1/task_cost:10000/real_time    12125072 ns        94012 ns           49 items_per_second=1.64956M/s
   ThreadPoolIdealSpawn/impl:1/threads:2/task_cost:10000/real_time     6073941 ns       114892 ns           89 items_per_second=3.29276M/s
   ThreadPoolIdealSpawn/impl:1/threads:4/task_cost:10000/real_time     5748868 ns       173750 ns          125 items_per_second=3.47895M/s
   ThreadPoolIdealSpawn/impl:1/threads:8/task_cost:10000/real_time     9328825 ns       595124 ns          108 items_per_second=2.14389M/s
   ThreadPoolIdealSpawn/impl:1/threads:1/task_cost:100000/real_time    2958410 ns        85393 ns          229 items_per_second=676.377k/s
   ThreadPoolIdealSpawn/impl:1/threads:2/task_cost:100000/real_time    3074567 ns       191386 ns          235 items_per_second=650.498k/s
   ThreadPoolIdealSpawn/impl:1/threads:4/task_cost:100000/real_time    2770940 ns       260583 ns          235 items_per_second=721.777k/s
   ThreadPoolIdealSpawn/impl:1/threads:8/task_cost:100000/real_time    1502805 ns       173022 ns          446 items_per_second=1.33085M/s
   ThreadPoolIdealSpawn/impl:2/threads:1/task_cost:1000/real_time     85477504 ns       110078 ns            9 items_per_second=2.33981M/s
   ThreadPoolIdealSpawn/impl:2/threads:2/task_cost:1000/real_time     44590546 ns       134688 ns           15 items_per_second=4.48526M/s
   ThreadPoolIdealSpawn/impl:2/threads:4/task_cost:1000/real_time     26524082 ns       176142 ns           24 items_per_second=7.54032M/s
   ThreadPoolIdealSpawn/impl:2/threads:8/task_cost:1000/real_time     28468596 ns       355124 ns           43 items_per_second=7.02528M/s
   ThreadPoolIdealSpawn/impl:2/threads:1/task_cost:10000/real_time    10248307 ns        85681 ns           53 items_per_second=1.95164M/s
   ThreadPoolIdealSpawn/impl:2/threads:2/task_cost:10000/real_time     5843059 ns       104370 ns          133 items_per_second=3.42286M/s
   ThreadPoolIdealSpawn/impl:2/threads:4/task_cost:10000/real_time     6790626 ns       193317 ns          100 items_per_second=2.94524M/s
   ThreadPoolIdealSpawn/impl:2/threads:8/task_cost:10000/real_time     9542556 ns       623649 ns          118 items_per_second=2.09587M/s
   ThreadPoolIdealSpawn/impl:2/threads:1/task_cost:100000/real_time    3225593 ns       101590 ns          209 items_per_second=620.351k/s
   ThreadPoolIdealSpawn/impl:2/threads:2/task_cost:100000/real_time    3129998 ns       186755 ns          219 items_per_second=638.978k/s
   ThreadPoolIdealSpawn/impl:2/threads:4/task_cost:100000/real_time    5807119 ns       254434 ns          100 items_per_second=344.405k/s
   ThreadPoolIdealSpawn/impl:2/threads:8/task_cost:100000/real_time    7648492 ns       453787 ns           91 items_per_second=261.489k/s
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] westonpace commented on a change in pull request #10421: ARROW-12903: [C++] Create new thread pool benchmark demonstrating the "scheduling" bottleneck

Posted by GitBox <gi...@apache.org>.

westonpace commented on a change in pull request #10421:
URL: https://github.com/apache/arrow/pull/10421#discussion_r643622911



##########
File path: cpp/src/arrow/util/thread_pool_benchmark.cc
##########
@@ -103,6 +103,52 @@ static void ThreadPoolSpawn(benchmark::State& state) {  // NOLINT non-const refe
   state.SetItemsProcessed(state.iterations() * nspawns);
 }
 
+// The ThreadPoolSpawn benchmark submits all tasks from a single outside thread.  This
+// ends up causing a worst-case scenario for the current simple thread pool.  All threads
+// compete over the task queue mutex trying to grab the next thread off the queue and the
+// result is a large amount of contention.
+//
+// By spreading out the scheduling across multiple threads we can help reduce that
+// contention.  This benchmark demonstrates the ideal case where we are able to perfectly
+// partition the scheduling across the available threads.
+//
+// Both situations could be encountered (the thread pool can't choose how it is used) but
+// by having both benchmarks we can express the importance of distributed scheduling.
+static void ThreadPoolIdealSpawn(benchmark::State& state) {  // NOLINT non-const reference
+  const auto nthreads = static_cast<int>(state.range(0));
+  const auto workload_size = static_cast<int32_t>(state.range(1));
+
+  Workload workload(workload_size);
+
+  // Spawn enough tasks to make the pool start up overhead negligible
+  const int32_t nspawns = 200000000 / workload_size + 1;
+  const int32_t nspawns_per_thread = nspawns / nthreads;
+
+  for (auto _ : state) {
+    state.PauseTiming();
+    std::shared_ptr<ThreadPool> pool;
+    pool = *ThreadPool::Make(nthreads);
+    state.ResumeTiming();
+
+    for (int32_t i = 0; i < nthreads; ++i) {
+      // Pass the task by reference to avoid copying it around
+      ABORT_NOT_OK(pool->Spawn([&pool, &workload, nspawns_per_thread] {
+        for (int32_t j = 0; j < nspawns_per_thread; j++) {
+          ABORT_NOT_OK(pool->Spawn(std::ref(workload)));
+        }
+      }));
+    }
+
+    // Wait for all tasks to finish
+    pool->WaitForIdle();

Review comment:
       At this point we cannot know that all the tasks have been spawned.  If I call `Shutdown(wait=true)` then a slow spawner will fail because `SpawnReal` returns `Status::Invalid` if `please_shutdown_` is `true`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org