You are viewing a plain text version of this content. The canonical link for it is here.

Posted to jira@arrow.apache.org by "Weston Pace (Jira)" <ji...@apache.org> on 2021/07/06 22:26:00 UTC

[jira] [Commented] (ARROW-13269) [C++] [Dataset] pyarrow.parquet.write_to_dataset does not send full schema to metadata_collector

    [ https://issues.apache.org/jira/browse/ARROW-13269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17376073#comment-17376073 ] 

Weston Pace commented on ARROW-13269:
-------------------------------------

Another potential fix could be to modify pyarrow.parquet.write_metadata.  The function currently takes the table schema (which will have the partition columns) and the collected metadata (which do not).  So it could add the columns from the table schema to the collected metadata.

> [C++] [Dataset] pyarrow.parquet.write_to_dataset does not send full schema to metadata_collector
> ------------------------------------------------------------------------------------------------
>
>                 Key: ARROW-13269
>                 URL: https://issues.apache.org/jira/browse/ARROW-13269
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: C++
>    Affects Versions: 4.0.0
>            Reporter: Weston Pace
>            Priority: Major
>
> If there are partition columns specified then the writers will only write the non-partition columns and thus they will not contain the fields used for the partition.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)