You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Martin Thøgersen (Jira)" <ji...@apache.org> on 2022/01/28 13:05:00 UTC

[jira] [Commented] (ARROW-15494) [Docs] Clarify existing_data_behavior docstring

    [ https://issues.apache.org/jira/browse/ARROW-15494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17483763#comment-17483763 ] 

Martin Thøgersen commented on ARROW-15494:
------------------------------------------

As a side note, it's a bit odd that {{error}} accepts a {{base_dir/empty_dir/}} to exists, but not {{base_dir/empty_dir/empty_dir/}}.

> [Docs] Clarify existing_data_behavior docstring
> -----------------------------------------------
>
>                 Key: ARROW-15494
>                 URL: https://issues.apache.org/jira/browse/ARROW-15494
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Documentation
>    Affects Versions: 7.0.1
>            Reporter: Martin Thøgersen
>            Priority: Major
>
> Clarify wording slightly of \{{pyarrow.dataset.write_dataset()}} parameter {{existing_data_behavior}}
> [https://github.com/apache/arrow/blob/a27c55660e575a3987283d5d9e443642db48f215/python/pyarrow/dataset.py#L812-L827]
> Proposed wording:
> {noformat}
>     existing_data_behavior : 'error' | 'overwrite_or_ignore' | \
> 'delete_matching'
>         Controls how the dataset will handle data that already exists in
>         the destination.  The default behavior ('error') is to raise an error
>         if any data exists in the `base_dir` destination.
>         'overwrite_or_ignore' will ignore any existing data and will
>         overwrite files with the same name as an output file.  Other
>         existing files will be ignored.  This behavior, in combination
>         with a unique basename_template for each write, will allow for
>         an append workflow.
>         'delete_matching' is useful when you are writing a partitioned
>         dataset.  The first time each partition leaf-level directory is 
>         encountered the entire leaf-level directory will be deleted.  This
>         allows you to overwrite old partitions completely.
> {noformat}
> I.e. clarify that:
> - {{error}} applies to the base_dir level.
> - {{delete_matching}} applies to the leaf-level directory.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)