You are viewing a plain text version of this content. The canonical link for it is here.

Posted to jira@arrow.apache.org by "Joris Van den Bossche (Jira)" <ji...@apache.org> on 2021/07/01 13:30:00 UTC

[jira] [Commented] (ARROW-13224) [Python][Doc] Documentation missing for pyarrow.dataset.write_dataset

    [ https://issues.apache.org/jira/browse/ARROW-13224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17372791#comment-17372791 ] 

Joris Van den Bossche commented on ARROW-13224:
-----------------------------------------------

Indeed, we should add some documentation for writing datasets (python/dataset.rst only handles reading right now)

> [Python][Doc] Documentation missing for pyarrow.dataset.write_dataset
> ---------------------------------------------------------------------
>
>                 Key: ARROW-13224
>                 URL: https://issues.apache.org/jira/browse/ARROW-13224
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Documentation, Python
>            Reporter: Weston Pace
>            Assignee: Weston Pace
>            Priority: Major
>
> I don't believe this is meant to be internal.  pyarrow.parquet.write_to_dataset uses this (if use_legacy_dataset=False) but the parquet API doesn't expose the same features.  A new example should also probably be added to the Tabular Datasets section of the docs explaining why write_dataset can take in a scanner (e.g. memory preserving, ability to write a dataset from flight or any record batch source, etc.)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)