You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/06/02 17:53:21 UTC

[GitHub] [arrow] westonpace commented on a change in pull request #9615: ARROW-3316: [R] Multi-threaded conversion from R data.frame to Arrow table / record batch

westonpace commented on a change in pull request #9615:
URL: https://github.com/apache/arrow/pull/9615#discussion_r644193980



##########
File path: r/src/r_to_arrow.cpp
##########
@@ -46,6 +50,90 @@ using internal::MakeConverter;
 
 namespace r {
 
+class RTasks {
+ public:
+  using Task = internal::FnOnce<Status()>;
+
+  explicit RTasks(bool use_threads)
+      : use_threads_(use_threads),
+        stop_source_(),
+        parallel_tasks_(
+            use_threads ? arrow::internal::TaskGroup::MakeThreaded(
+                              arrow::internal::GetCpuThreadPool(), stop_source_.token())
+                        : nullptr) {}
+
+  // This Finish() method must never be called from a thread pool thread
+  // as this would deadlock.
+  //
+  // Usage is to :
+  // - create an RTasks instance on the main thread
+  // - add some tasks with .Append()
+  // - and then call .Finish() so that the parallel tasks are finished
+  Status Finish() {
+    Status status = Status::OK();
+
+    // run the delayed tasks now
+    for (auto& task : delayed_serial_tasks_) {
+      status &= std::move(task)();

Review comment:
       Rather than wrapping all of your tasks in `StoppingTask` you really only need the `StopSource` as a way to send a signal to `parallel_tasks_`.  Everywhere else you could handle stopping logic on your own.  So I think you could change this loop to...
   
   ```
   for (auto& task : delayed_serial_tasks_) {
     status &= std::move(task)();
     if (!status.ok()) {
       stop_source_.RequestStop();
       break;
     }
   }
   ```
   
   ...then you can get rid of `StoppingTask`.  If an error happens in a parallel task the `ThreadedTaskGroup` will already take care of stopping everything.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org