You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/06/01 08:20:38 UTC

[GitHub] [arrow] pitrou commented on a change in pull request #10421: ARROW-12903: [C++] Create new thread pool benchmark demonstrating the "scheduling" bottleneck

pitrou commented on a change in pull request #10421:
URL: https://github.com/apache/arrow/pull/10421#discussion_r642886129



##########
File path: cpp/src/arrow/util/thread_pool.h
##########
@@ -288,6 +288,10 @@ class ARROW_EXPORT ThreadPool : public Executor {
   // tasks are finished.
   Status Shutdown(bool wait = true);
 
+  // Waits for the thread pool to reach a quiet state where all workers are

Review comment:
       "Wait"

##########
File path: cpp/src/arrow/util/thread_pool_benchmark.cc
##########
@@ -103,6 +103,52 @@ static void ThreadPoolSpawn(benchmark::State& state) {  // NOLINT non-const refe
   state.SetItemsProcessed(state.iterations() * nspawns);
 }
 
+// The ThreadPoolSpawn benchmark submits all tasks from a single outside thread.  This
+// ends up causing a worst-case scenario for the current simple thread pool.  All threads
+// compete over the task queue mutex trying to grab the next thread off the queue and the
+// result is a large amount of contention.
+//
+// By spreading out the scheduling across multiple threads we can help reduce that
+// contention.  This benchmark demonstrates the ideal case where we are able to perfectly
+// partition the scheduling across the available threads.
+//
+// Both situations could be encountered (the thread pool can't choose how it is used) but
+// by having both benchmarks we can express the importance of distributed scheduling.
+static void ThreadPoolIdealSpawn(benchmark::State& state) {  // NOLINT non-const reference
+  const auto nthreads = static_cast<int>(state.range(0));
+  const auto workload_size = static_cast<int32_t>(state.range(1));
+
+  Workload workload(workload_size);
+
+  // Spawn enough tasks to make the pool start up overhead negligible
+  const int32_t nspawns = 200000000 / workload_size + 1;
+  const int32_t nspawns_per_thread = nspawns / nthreads;
+
+  for (auto _ : state) {
+    state.PauseTiming();
+    std::shared_ptr<ThreadPool> pool;
+    pool = *ThreadPool::Make(nthreads);
+    state.ResumeTiming();
+
+    for (int32_t i = 0; i < nthreads; ++i) {
+      // Pass the task by reference to avoid copying it around
+      ABORT_NOT_OK(pool->Spawn([&pool, &workload, nspawns_per_thread] {
+        for (int32_t j = 0; j < nspawns_per_thread; j++) {
+          ABORT_NOT_OK(pool->Spawn(std::ref(workload)));
+        }
+      }));
+    }
+
+    // Wait for all tasks to finish
+    pool->WaitForIdle();

Review comment:
       What's the point, since you're calling `Shutdown(wait=true)` just below?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org