You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Nischith (Jira)" <ji...@apache.org> on 2022/10/27 09:14:00 UTC

[jira] [Created] (ARROW-18171) Feature to append row groups to existing parquet file

Nischith created ARROW-18171:
--------------------------------

             Summary: Feature to append row groups to existing parquet file
                 Key: ARROW-18171
                 URL: https://issues.apache.org/jira/browse/ARROW-18171
             Project: Apache Arrow
          Issue Type: New Feature
          Components: Parquet, Python
            Reporter: Nischith


This is related to pyarrow.

Right now, it's possible to append row groups to parquet file as long as the writer is open. Once the writer is closed, it's not possible to append new row group to a parquet file. 

the only option in such situation is to either recreate the file or write multiple files to the dataset.

 

This is possible with fastparquet using _append=True_ parameter. - [API — fastparquet 0.7.1 documentation |https://fastparquet.readthedocs.io/en/latest/api.html#fastparquet.write]

Feature to append row groups to existing file can be beneficial in pyarrow as well.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)