You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Todd Farmer (Jira)" <ji...@apache.org> on 2022/07/12 14:05:02 UTC

[jira] [Assigned] (ARROW-13004) [C++] Allow the creation of future "chains" to better control parallelism

     [ https://issues.apache.org/jira/browse/ARROW-13004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Todd Farmer reassigned ARROW-13004:
-----------------------------------

    Assignee:     (was: Weston Pace)

This issue was last updated over 90 days ago, which may be an indication it is no longer being actively worked. To better reflect the current state, the issue is being unassigned. Please feel free to re-take assignment of the issue if it is being actively worked, or if you plan to start that work soon.

> [C++] Allow the creation of future "chains" to better control parallelism
> -------------------------------------------------------------------------
>
>                 Key: ARROW-13004
>                 URL: https://issues.apache.org/jira/browse/ARROW-13004
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++
>            Reporter: Weston Pace
>            Priority: Major
>
> This is a bit tricky to explain.  ShouldSchedule::Always works well for AddCallback but falls short for Transfer and Then.  An example may explain best.
> Consider three operators, Source, Transform, and Sink.  They are setup as...
> {code:java}
> source_fut = source(); // 1
> transform_fut = source_fut.Then(Transform(), ScheduleAlways); // 2
> sink_fut = transform_fut.Then(Consume()); // 3
> {code}
> The intent is to run Transform + Consume as a single thread task on each item generated by source().  This is what happens if source() is slow.  If source() is fast (let's pretend it's always finished) then this is not what happens.
> Line 2 causes a new thread task to be launched (since source_fut is finished).  It is possible that new thread task can mark transform_fut finished before line 3 is executed by the original thread.  This causes Consume() and Transform() to run on separate threads.
> The solution (at least as best I can come up with) is unfortunately a little complex (though the complexity can be hidden in future/async_generator internals).  Basically, it is worth waiting to schedule until the future chain has had a chance to finish connecting the pressure.  This means a future created with ScheduleAlways is created in an "unconsumed" mode.  Any callbacks that would normally be launched will not be launched until the future switches to "consumed".  Future.Wait(), VisitAsyncGenerator, CollectAsyncGenerator, and some of the async_generator operators would cause the future to be "consumed".  The "consume" signal will need to propagate backwards up the chain so futures will need to keep a reference to their antecedent future.
> This work meshes well with some other improvements I have been considering, in particular, splitting future/promise and restricting futures to a single callback.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)