You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by "Wes McKinney (Jira)" <ji...@apache.org> on 2020/05/01 18:28:00 UTC

[jira] [Created] (ARROW-8667) [C++] Add multi-consumer Scheduler API to sit one layer above ThreadPool

Wes McKinney created ARROW-8667:
-----------------------------------

             Summary: [C++] Add multi-consumer Scheduler API to sit one layer above ThreadPool
                 Key: ARROW-8667
                 URL: https://issues.apache.org/jira/browse/ARROW-8667
             Project: Apache Arrow
          Issue Type: New Feature
          Components: C++
            Reporter: Wes McKinney
             Fix For: 1.0.0


I believe we should define an abstraction to allow for custom resource allocation strategies (round robin, even time, etc.) to be devised for situations where there are different thread pool consumers that are working independently of each other.

Consider the classic nested parallelism scenario:

* Task A in thread 1 may issue N subtasks that run in parallel
* Task B in thread 2 may issue K subtasks

With our current ThreadPool abstraction, it is easy to conceive scenarios where either Task A or Task B trample each other. 

One approach to remedy this problem is to have an API like so:

{code}
// Inform the scheduler that you want to submit tasks that are "your tasks"
int consumer_id = scheduler->NewConsumer();

for (...) {
  Future<T> fut = scheduler->Submit(consumer_id, DoWork, ...);
}

scheduler->FinishConsumer(consumer_id);
{code}

The idea is that the scheduler would maintain separate task queues for each consumer and e.g. track consumer-specific metrics of interest to determine how tasks are allocated.

The scheduler could have different logic to control tasks being assigned to worker threads:

* Round-robin
* Even-time allocation (run fewer tasks for consumers with "slow" tasks and more tasks from consumers with "fast" tasks -- though there are some nuances here like avoiding starving a consumer if they've been doing a lot of "slow" tasks and then a "fast" consumer shows up)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)