You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Weston Pace (Jira)" <ji...@apache.org> on 2022/02/23 05:22:00 UTC

[jira] [Created] (ARROW-15759) [C++] Investigate scanning parquet files at sub-row-group resolution

Weston Pace created ARROW-15759:
-----------------------------------

             Summary: [C++] Investigate scanning parquet files at sub-row-group resolution
                 Key: ARROW-15759
                 URL: https://issues.apache.org/jira/browse/ARROW-15759
             Project: Apache Arrow
          Issue Type: Improvement
          Components: C++
            Reporter: Weston Pace


Most of the Arrow APIs read from a parquet file one entire row group at a time.  The Parquet reader should allow us to read a single page at a time.  When scanning a dataset we often want to read in relatively small (e.g. 1M rows) sized batches to increase parallelism, decrease memory usage, and decrease latency.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)