You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Weston Pace (Jira)" <ji...@apache.org> on 2022/07/12 21:57:00 UTC

[jira] [Closed] (ARROW-13004) [C++] Allow the creation of future "chains" to better control parallelism

     [ https://issues.apache.org/jira/browse/ARROW-13004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Weston Pace closed ARROW-13004.
-------------------------------
    Resolution: Won't Fix

This is handled better by the exec plan and we are not likely to pursue this in futures

> [C++] Allow the creation of future "chains" to better control parallelism
> -------------------------------------------------------------------------
>
>                 Key: ARROW-13004
>                 URL: https://issues.apache.org/jira/browse/ARROW-13004
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++
>            Reporter: Weston Pace
>            Priority: Major
>
> This is a bit tricky to explain.  ShouldSchedule::Always works well for AddCallback but falls short for Transfer and Then.  An example may explain best.
> Consider three operators, Source, Transform, and Sink.  They are setup as...
> {code:java}
> source_fut = source(); // 1
> transform_fut = source_fut.Then(Transform(), ScheduleAlways); // 2
> sink_fut = transform_fut.Then(Consume()); // 3
> {code}
> The intent is to run Transform + Consume as a single thread task on each item generated by source().  This is what happens if source() is slow.  If source() is fast (let's pretend it's always finished) then this is not what happens.
> Line 2 causes a new thread task to be launched (since source_fut is finished).  It is possible that new thread task can mark transform_fut finished before line 3 is executed by the original thread.  This causes Consume() and Transform() to run on separate threads.
> The solution (at least as best I can come up with) is unfortunately a little complex (though the complexity can be hidden in future/async_generator internals).  Basically, it is worth waiting to schedule until the future chain has had a chance to finish connecting the pressure.  This means a future created with ScheduleAlways is created in an "unconsumed" mode.  Any callbacks that would normally be launched will not be launched until the future switches to "consumed".  Future.Wait(), VisitAsyncGenerator, CollectAsyncGenerator, and some of the async_generator operators would cause the future to be "consumed".  The "consume" signal will need to propagate backwards up the chain so futures will need to keep a reference to their antecedent future.
> This work meshes well with some other improvements I have been considering, in particular, splitting future/promise and restricting futures to a single callback.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)