You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Alex Mendelson (JIRA)" <ji...@apache.org> on 2018/08/08 07:36:00 UTC
[jira] [Created] (ARROW-3020) Addition of option to allow empty row
groups in pyarrow
Alex Mendelson created ARROW-3020:
-------------------------------------
Summary: Addition of option to allow empty row groups in pyarrow
Key: ARROW-3020
URL: https://issues.apache.org/jira/browse/ARROW-3020
Project: Apache Arrow
Issue Type: New Feature
Components: C++, Python
Reporter: Alex Mendelson
While our use case is not common, I was able to find one related request from roughly a year ago. Could this be added as a feature?
https://issues.apache.org/jira/browse/PARQUET-1047
*Motivation*
We have an application where each row is associated with one of N contexts, though a minority of contexts may have no associated rows. When encountering the Nth context, we will wish to retrieve all the associated rows. Row groups would provide a natural way to index the data, as the nth context could naturally relate to the nth row group.
Unfortunately, this is not possible at the present time, as pyarrow does not support writing empty row groups. If one writes a pyarrow.Table containing zero rows using pyarrow.parquet.ParquetWriter, it is omitted from the final file, and this distorts the indexing.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)