You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Nischith (Jira)" <ji...@apache.org> on 2022/10/27 09:14:00 UTC
[jira] [Created] (ARROW-18171) Feature to append row groups to existing parquet file
Nischith created ARROW-18171:
--------------------------------
Summary: Feature to append row groups to existing parquet file
Key: ARROW-18171
URL: https://issues.apache.org/jira/browse/ARROW-18171
Project: Apache Arrow
Issue Type: New Feature
Components: Parquet, Python
Reporter: Nischith
This is related to pyarrow.
Right now, it's possible to append row groups to parquet file as long as the writer is open. Once the writer is closed, it's not possible to append new row group to a parquet file.
the only option in such situation is to either recreate the file or write multiple files to the dataset.
This is possible with fastparquet using _append=True_ parameter. - [API — fastparquet 0.7.1 documentation |https://fastparquet.readthedocs.io/en/latest/api.html#fastparquet.write]
Feature to append row groups to existing file can be beneficial in pyarrow as well.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)