You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Weston Pace (Jira)" <ji...@apache.org> on 2021/04/27 19:57:00 UTC

[jira] [Commented] (ARROW-12560) [C++] Investigate utilizing aggressive thread task creation when adding callback to finished future

    [ https://issues.apache.org/jira/browse/ARROW-12560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17333518#comment-17333518 ] 

Weston Pace commented on ARROW-12560:
-------------------------------------

Yes, a `TransferAlways` would work.  I tried a few iterations but they didn't work as intended.  The thread task has to be spawned by the consumer in this case instead of the producer.  One way it could work is by having `Transfer` "mark" the future in some way so that callbacks added to the future are always spawned as new thread tasks.

The utility could be more generally used outside of transfer (e.g. it could be used with an expensive map function to get a partitioned fan-out) but the synchronous utilities we have (e.g. TaskGroup) could achieve the same thing in those cases.

> [C++] Investigate utilizing aggressive thread task creation when adding callback to finished future
> ---------------------------------------------------------------------------------------------------
>
>                 Key: ARROW-12560
>                 URL: https://issues.apache.org/jira/browse/ARROW-12560
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++
>            Reporter: Weston Pace
>            Assignee: Weston Pace
>            Priority: Major
>              Labels: async-util
>
> Imagine there is a slow map function (that could run in parallel) and a vector generator given a long vector of tasks.  If we apply map to the generator and then readahead we won't actually get any parallelism because the vector generator returns everything synchronously and so no thread task will ever be submitted.
> This hypothetical situation is a reality in some situations in the scanner.  For example, if scanning CSV files and the CPU threads fall behind the I/O threads then all callbacks will be synchronous (since the futures will already have been completed by the I/O threads).
> In such a situation we might benefit from creating a new thread task even though we wouldn't normally create one.  For example, if we have an idle core.  You can think of this as an analogue of work stealing.
> On the other hand, creating new thread tasks at any random callback might not be the most efficient. We could mitigate this by marking a callback as "potentially long" as some kind of hint when we add the callback to indicate it as eligible for eager thread creation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)