You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Weston Pace (Jira)" <ji...@apache.org> on 2022/01/21 23:18:00 UTC
[jira] [Created] (ARROW-15409) [C++] The C++ API for writing datasets could be improved
Weston Pace created ARROW-15409:
-----------------------------------
Summary: [C++] The C++ API for writing datasets could be improved
Key: ARROW-15409
URL: https://issues.apache.org/jira/browse/ARROW-15409
Project: Apache Arrow
Issue Type: Improvement
Components: C++
Reporter: Weston Pace
I was working on write dataset testing in the C++ API today and ran into a number of things that were not very intuitive. All of these are abstracted away / hidden by the python / R interface so this really only applies to anyone using the C++ API directly.
* If no partitioning is specified the write will segfault. Instead it should us a default (no-op) partitioning.
* The min_rows_per_group option should probably default to something higher than 0
* It's not clear how to specify the format (you do it by creating a format, then setting the file write options, which sets the format privately)
* There is no default for basename_template
* There is no default for filesystem (should be local filesystem)
--
This message was sent by Atlassian Jira
(v8.20.1#820001)