You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Weston Pace (Jira)" <ji...@apache.org> on 2021/04/27 19:40:00 UTC

[jira] [Updated] (ARROW-12560) [C++] Investigate utilizing aggressive thread creation when adding callback to finished future.

     [ https://issues.apache.org/jira/browse/ARROW-12560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Weston Pace updated ARROW-12560:
--------------------------------
    Summary: [C++] Investigate utilizing aggressive thread creation when adding callback to finished future.  (was: [C++] Investigate excessive thread creation when adding callback to finished future.)

> [C++] Investigate utilizing aggressive thread creation when adding callback to finished future.
> -----------------------------------------------------------------------------------------------
>
>                 Key: ARROW-12560
>                 URL: https://issues.apache.org/jira/browse/ARROW-12560
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++
>            Reporter: Weston Pace
>            Assignee: Weston Pace
>            Priority: Major
>              Labels: async-util
>
> Imagine there is a slow map function (that could run in parallel) and a vector generator given a long vector of tasks.  If we apply map to the generator and then readahead we won't actually get any parallelism because the vector generator returns everything synchronously and so no thread task will ever be submitted.
> This hypothetical situation is a reality in some situations in the scanner.  For example, if scanning CSV files and the CPU threads fall behind the I/O threads then all callbacks will be synchronous (since the futures will already have been completed by the I/O threads).
> In such a situation we might benefit from creating a new thread task even though we wouldn't normally create one.  For example, if we have an idle core.  You can think of this as an analogue of work stealing.
> On the other hand, creating new thread tasks at any random callback might not be the most efficient. We could mitigate this by marking a callback as "potentially long" as some kind of hint when we add the callback to indicate it as eligible for eager thread creation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)