You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Weston Pace (Jira)" <ji...@apache.org> on 2022/07/08 20:26:00 UTC

[jira] [Created] (ARROW-17023) [C++] Add initial Acero design documents

Weston Pace created ARROW-17023:
-----------------------------------

             Summary: [C++] Add initial Acero design documents
                 Key: ARROW-17023
                 URL: https://issues.apache.org/jira/browse/ARROW-17023
             Project: Apache Arrow
          Issue Type: Improvement
          Components: C++
            Reporter: Weston Pace
            Assignee: Weston Pace


As Acero grows in complexity it will be difficult for new developers to be able to contribute meaningfully.  In addition, Acero should be open for extension by third party developers that wish to add new exec nodes.  These 3rd party developers will need to know details on how Acero schedules work and operates and will appreciate advice on efficient development.  At a minimum this first pass should explain:

 * Threading / Scheduling model for Acero (note, there are proposals to enhance the model we currently have)
 * Discussion of batch sizes and cache sizes and the morsel / batch model
 * General discussion / advice for writing operators in a column-major way
 * Design of current nodes, in particular, some more detail around how expression evaluation happens and how the hash-join node operates



--
This message was sent by Atlassian Jira
(v8.20.10#820010)