You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Weston Pace (Jira)" <ji...@apache.org> on 2022/05/11 02:05:00 UTC

[jira] [Created] (ARROW-16522) [C++] Evolution of exec plan

Weston Pace created ARROW-16522:
-----------------------------------

             Summary: [C++] Evolution of exec plan
                 Key: ARROW-16522
                 URL: https://issues.apache.org/jira/browse/ARROW-16522
             Project: Apache Arrow
          Issue Type: Improvement
          Components: C++
            Reporter: Weston Pace


This is a high level JIRA that will be broken into a number of subtasks.  There are a number of things that each node has to reinvent (or, more likely, copy/paste) and it we probably know enough at this point that we can enhance the ExecNode/ExecPlan API a bit to move some of the burden out of the nodes and into the plan.  At a high level:

 * All nodes that schedule tasks use an async task group so they can make sure that all scheduled tasks are finished before the node is marked finish.  Task scheduling could happen at the plan level and there would be one spot keeping track of running tasks.
 * If we have centralized scheduling then that opens up the door for a centralized handling of errors and backpressure.  A node's default error behavior is then "stop scheduling tasks" and a node's default backpressure behavior is "stop running tasks".
 * There are no nodes that emit to multiple outputs today.  Sending data to multiple outputs is not simple.  This is a pipeline breaking activity and new tasks will need to be scheduled and the data will need to be held onto until each downstream task has run, and backpressure may need to be applied if one output is slower than the other.  Node's should not be given the choice of multiple outputs unless they are solving specifically that problem (e.g. temporary table).




--
This message was sent by Atlassian Jira
(v8.20.7#820007)