You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/01/12 13:14:15 UTC

[GitHub] [arrow] lidavidm commented on a change in pull request #12106: ARROW-13269: Improve metadata docs for partitioned datasets

lidavidm commented on a change in pull request #12106:
URL: https://github.com/apache/arrow/pull/12106#discussion_r783062792



##########
File path: python/pyarrow/parquet.py
##########
@@ -2305,6 +2305,9 @@ def write_metadata(schema, where, metadata_collector=None, **kwargs):
     ...     table.schema, root_path / '_common_metadata', **writer_kwargs)
 
     Write the `_metadata` parquet file with row groups statistics.
+    
+    Note: Partition columns should be removed from the table schema before
+    writing `_metadata` for partitioned datasets.

Review comment:
       Thanks for this. 
   
   Just a couple things:
   
   1) This shouldn't go between the code example and the explanation of the code sample, since it interrupts the flow.
   2) Maybe this can go in its own Notes section below? https://numpydoc.readthedocs.io/en/latest/format.html#notes 
       Or alternatively, there should be another example that demonstrates 1) writing a dataset with partition columns and 2) removing those columns before writing metadata.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org