You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "David Li (Jira)" <ji...@apache.org> on 2021/07/14 11:56:00 UTC

[jira] [Commented] (ARROW-13333) [C++] [Dataset] Support max file size option in write dataset

    [ https://issues.apache.org/jira/browse/ARROW-13333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17380531#comment-17380531 ] 

David Li commented on ARROW-13333:
----------------------------------

Same as ARROW-10439 perhaps?

> [C++] [Dataset] Support max file size option in write dataset
> -------------------------------------------------------------
>
>                 Key: ARROW-13333
>                 URL: https://issues.apache.org/jira/browse/ARROW-13333
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++
>            Reporter: Weston Pace
>            Priority: Major
>
> The existence FileSystemDatasetWriteOptions::basename_template would seem to imply that the dataset writer may write multiple files for a given partition.  However, the current implementation will always create one file per directory.
>  
> I'm not sure what the desired behavior is here but the two obvious choices are:
>  * Get rid of FileSystemDatasetWriteOptions::basename_template (or at least the \{i} parameter)
>  * Add an option to limit how many rows/bytes are put in a single file
>  
> ARROW-12358 is probably worth mentioning as whatever strategy is come up with here should probably be compatible with supporting append mode in the future.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)