You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Todd Farmer (Jira)" <ji...@apache.org> on 2022/09/26 16:51:00 UTC
[jira] [Assigned] (ARROW-16915) [C++] Unify approaches to attach schemas on record batches exiting Acero
[ https://issues.apache.org/jira/browse/ARROW-16915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Todd Farmer reassigned ARROW-16915:
-----------------------------------
Assignee: (was: Vibhatha Lakmal Abeykoon)
> [C++] Unify approaches to attach schemas on record batches exiting Acero
> ------------------------------------------------------------------------
>
> Key: ARROW-16915
> URL: https://issues.apache.org/jira/browse/ARROW-16915
> Project: Apache Arrow
> Issue Type: Improvement
> Components: C++
> Reporter: Weston Pace
> Priority: Major
>
> Internally, Acero uses ExecBatch everywhere, without schemas. Originally, the various exit nodes would simply attach a boring schema based on the output data types and an inference of field names.
> However, as part of Substrait integration and other improvements the various sink nodes are being amended to support:
> * Custom field names
> * Custom metadata
> However, the current implementation is somewhat inconsistent.
> SinkNode:
> - Does not support custom field names or metadata
> ConsumingSinkNode:
> - Supports custom names but not custom metadata
> WriteNode
> - Supports custom metadata but not custom names
> We should create a {{SinkNodeOptions}} base class that supports custom names and custom metadata and we should have a single place with utility methods for attaching a schema to an outgoing exec batch. Then all of our sink nodes should use this single tool for modifying outgoing batches.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)