You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Antoine Pitrou (Jira)" <ji...@apache.org> on 2022/10/13 15:19:00 UTC
[jira] [Created] (ARROW-18037) [C++] Acero/dataset relies on ExecBatch::ToRecordBatch truncating excess columns
Antoine Pitrou created ARROW-18037:
--------------------------------------
Summary: [C++] Acero/dataset relies on ExecBatch::ToRecordBatch truncating excess columns
Key: ARROW-18037
URL: https://issues.apache.org/jira/browse/ARROW-18037
Project: Apache Arrow
Issue Type: Bug
Components: C++
Reporter: Antoine Pitrou
As found while working on ARROW-18004: the dataset scanner and the Acero engine rely on {{ExecBatch::ToRecordBatch}} returning successfully when the given schema has fewer fields than the ExecBatch has columns.
This apparently allows to implicitly drop the dataset-added columns ({{kAugmentedFields}} in {{arrow/dataset/scanner.cc}}) from a scan's final result.
However, it seems wrong and brittle to do this implicitly at the {{ExecBatch::ToRecordBatch}} level (hiding potential errors). Instead, it should probably be done explicitly inside Acero/dataset.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)