You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Weston Pace (Jira)" <ji...@apache.org> on 2021/09/29 04:02:00 UTC
[jira] [Created] (ARROW-14164) [C++][Dataset] Enhance dataset
writer to allow file-per-batch
Weston Pace created ARROW-14164:
-----------------------------------
Summary: [C++][Dataset] Enhance dataset writer to allow file-per-batch
Key: ARROW-14164
URL: https://issues.apache.org/jira/browse/ARROW-14164
Project: Apache Arrow
Issue Type: Improvement
Components: C++
Reporter: Weston Pace
The dataset writer currently groups incoming batches into large files which are controlled by max_rows_per_file. In the PR for this work [~jorisvandenbossche] recommended an option where each incoming batch creates a new file.
This would give the user fine grained control over how many files are created (provided they are doing a very basic scan/filter/project and not using any more sophisticated nodes which may resize batches.)
--
This message was sent by Atlassian Jira
(v8.3.4#803005)