You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Weston Pace (Jira)" <ji...@apache.org> on 2022/01/21 23:18:00 UTC

[jira] [Created] (ARROW-15409) [C++] The C++ API for writing datasets could be improved

Weston Pace created ARROW-15409:
-----------------------------------

             Summary: [C++] The C++ API for writing datasets could be improved
                 Key: ARROW-15409
                 URL: https://issues.apache.org/jira/browse/ARROW-15409
             Project: Apache Arrow
          Issue Type: Improvement
          Components: C++
            Reporter: Weston Pace


I was working on write dataset testing in the C++ API today and ran into a number of things that were not very intuitive.  All of these are abstracted away / hidden by the python / R interface so this really only applies to anyone using the C++ API directly.

 * If no partitioning is specified the write will segfault.  Instead it should us a default (no-op) partitioning.
 * The min_rows_per_group option should probably default to something higher than 0
 * It's not clear how to specify the format (you do it by creating a format, then setting the file write options, which sets the format privately)
 * There is no default for basename_template
 * There is no default for filesystem (should be local filesystem)



--
This message was sent by Atlassian Jira
(v8.20.1#820001)