You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Todd Farmer (Jira)" <ji...@apache.org> on 2022/09/26 16:51:00 UTC

[jira] [Assigned] (ARROW-16915) [C++] Unify approaches to attach schemas on record batches exiting Acero

     [ https://issues.apache.org/jira/browse/ARROW-16915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Todd Farmer reassigned ARROW-16915:
-----------------------------------

    Assignee:     (was: Vibhatha Lakmal Abeykoon)

> [C++] Unify approaches to attach schemas on record batches exiting Acero
> ------------------------------------------------------------------------
>
>                 Key: ARROW-16915
>                 URL: https://issues.apache.org/jira/browse/ARROW-16915
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++
>            Reporter: Weston Pace
>            Priority: Major
>
> Internally, Acero uses ExecBatch everywhere, without schemas.  Originally, the various exit nodes would simply attach a boring schema based on the output data types and an inference of field names.
> However, as part of Substrait integration and other improvements the various sink nodes are being amended to support:
>  * Custom field names
>  * Custom metadata
> However, the current implementation is somewhat inconsistent.
> SinkNode:
>  - Does not support custom field names or metadata
> ConsumingSinkNode:
>  - Supports custom names but not custom metadata
> WriteNode
>  - Supports custom metadata but not custom names
> We should create a {{SinkNodeOptions}} base class that supports custom names and custom metadata and we should have a single place with utility methods for attaching a schema to an outgoing exec batch.  Then all of our sink nodes should use this single tool for modifying outgoing batches.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)