You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Weston Pace (Jira)" <ji...@apache.org> on 2022/02/23 05:22:00 UTC
[jira] [Created] (ARROW-15759) [C++] Investigate scanning parquet files at sub-row-group resolution
Weston Pace created ARROW-15759:
-----------------------------------
Summary: [C++] Investigate scanning parquet files at sub-row-group resolution
Key: ARROW-15759
URL: https://issues.apache.org/jira/browse/ARROW-15759
Project: Apache Arrow
Issue Type: Improvement
Components: C++
Reporter: Weston Pace
Most of the Arrow APIs read from a parquet file one entire row group at a time. The Parquet reader should allow us to read a single page at a time. When scanning a dataset we often want to read in relatively small (e.g. 1M rows) sized batches to increase parallelism, decrease memory usage, and decrease latency.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)