You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Weston Pace (Jira)" <ji...@apache.org> on 2022/09/02 00:34:00 UTC

[jira] [Resolved] (ARROW-17350) [C++] Create a scheduler for asynchronous work

     [ https://issues.apache.org/jira/browse/ARROW-17350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Weston Pace resolved ARROW-17350.
---------------------------------
    Fix Version/s: 10.0.0
       Resolution: Fixed

Issue resolved by pull request 13912
[https://github.com/apache/arrow/pull/13912]

> [C++] Create a scheduler for asynchronous work
> ----------------------------------------------
>
>                 Key: ARROW-17350
>                 URL: https://issues.apache.org/jira/browse/ARROW-17350
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: C++
>            Reporter: Weston Pace
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 10.0.0
>
>          Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Note, in the interest of keeping things simple, this ideally replaces the AsyncTaskGroup.  This is needed to simplify the logic in ARROW-17287.
> The format and implementation will likely be inspired by the synchronous schedulers, TaskScheduler and TaskGroup but it will remain a separate implementation.  In the future, when we dedicate time to improving our synchronous scheduler, we can decide if it makes sense to merge these two types.
> {noformat}
> /// A utility which keeps tracks of, and schedules, asynchronous tasks
> ///
> /// An asynchronous task has a synchronous component and an asynchronous component.
> /// The synchronous component typically schedules some kind of work on an external
> /// resource (e.g. the I/O thread pool or some kind of kernel-based asynchronous
> /// resource like io_uring).  The asynchronous part represents the work
> /// done on that external resource.  Executing the synchronous part will be referred
> /// to as "submitting the task" since this usually includes submitting the asynchronous
> /// portion to the external thread pool.
> ///
> /// By default the scheduler will submit the task (execute the synchronous part) as
> /// soon as it is added, assuming the underlying thread pool hasn't terminated or the
> /// scheduler hasn't aborted.  In this mode the scheduler is simply acting as
> /// a task group, keeping track of the ongoing work.
> ///
> /// This can be used to provide structured concurrency for asynchronous development.
> /// A task group created at a high level can be distributed amongst low level components
> /// which register work to be completed.  The high level job can then wait for all work
> /// to be completed before cleaning up.
> ///
> /// A task scheduler must eventually be ended when all tasks have been added.  Once the
> /// scheduler has been ended it is an error to add further tasks.  Note, it is not an
> /// error to add additional tasks after a scheduler has aborted (though these tasks
> /// will be ignored and never submitted).  The scheduler has a futuer which will complete
> /// once the scheduler has been ended AND all remaining tasks have finished executing.
> /// Ending a scheduler will NOT cause the scheduler to flush existing tasks.
> ///
> /// Task failure (either the synchronous portion or the asynchronous portion) will cause
> /// the scheduler to enter an aborted state.  The first such failure will be reported in
> /// the final task future.
> ///
> /// The scheduler can also be manually aborted.  A cancellation status will be reported as
> /// the final task future.
> ///
> /// It is also possible to limit the number of concurrent tasks the scheduler will
> /// execute. This is done by setting a task limit.  The task limit initially assumes all
> /// tasks are equal but a custom cost can be supplied when scheduling a task (e.g. based
> /// on the total I/O cost of the task, or the expected RAM utilization of the task)
> ///
> /// When the total number of running tasks is limited then scheduler priority may also
> /// become a consideration.  By default the scheduler runs with a FIFO queue but a custom
> /// task queue can be provided.  One could, for example, use a priority queue to control
> /// the order in which tasks are executed.
> ///
> /// It is common to have multiple stages of execution.  For example, when scanning, we
> /// first inspect each fragment (the inspect stage) to figure out the row groups and then
> /// we scan row groups (the scan stage) to read in the data.  This sort of multi-stage
> /// execution should be represented as two seperate task groups.  The first task group can
> /// then have a custom finish callback which ends the second task group.
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)