You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Mackenzie (JIRA)" <ji...@apache.org> on 2018/12/02 21:51:00 UTC

[jira] [Closed] (ARROW-3915) [Python] Support partition columns when incrementally writing

     [ https://issues.apache.org/jira/browse/ARROW-3915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mackenzie closed ARROW-3915.
----------------------------
    Resolution: Fixed

Scratch this. I just realized that you can actually already do this by repeated calls to `pq.write_to_dataset`. Since the different files written with `partition_cols` option don't overlap in any way, there's no need to keep a file handle open via `ParquetWriter`. Apologies for the noise here – I'm just getting familiar with the Parquet format at a lower level.

> [Python] Support partition columns when incrementally writing
> -------------------------------------------------------------
>
>                 Key: ARROW-3915
>                 URL: https://issues.apache.org/jira/browse/ARROW-3915
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: Python
>    Affects Versions: 0.11.1
>            Reporter: Mackenzie
>            Priority: Major
>              Labels: parquet
>             Fix For: 0.13.0
>
>
> Currently `partition_cols` support in pyarrow is implemented in: [https://github.com/apache/arrow/blob/69d207ff446c76f78fe27b960e7ebe89a607d992/python/pyarrow/parquet.py#L1205-L1235.]
> However, there is no way to easily do column partitioning when writing datasets incrementally via `ParquetWriter`. It would be very helpful if the column partitioning logic was made more modular and re-used in `ParquetWriter`.
> One option would be to support the `partition_cols` keyword argument in `ParquetWriter.write_table`. However, this would introduce the potential to have inconsistent partition columns in subsequent files. Perhaps the better approach would be to pass as a kwarg when constructing `ParquetWriter` and manage it as a property whose setter would throw an error if attempting to set while the writer is open.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)