You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "Jinpeng Zhou (Jira)" <ji...@apache.org> on 2023/06/20 21:51:00 UTC

[jira] [Updated] (PARQUET-2316) Allow partial prebuffer in parquet FileReader

     [ https://issues.apache.org/jira/browse/PARQUET-2316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jinpeng Zhou updated PARQUET-2316:
----------------------------------
    Description: 
The current FileReader can only work in  one of the two modes, coalescing (when Prebuffer is called) and non-coalescing (when Prefufer is not called), due to the if statement here: [[https://github.com/apache/arrow/blob/main/cpp/src/parquet/file_reader.cc#L203|http://example.com]/]]. 

Since Prebuffer is basically caching all specified column chunks, it would raise concerns on memory usage for systems with tight memory budget. In such scenarios, one may want to Prebuffer some small chunks while being able to read the rest chunks using  BufferedInputStream. 

  was:
The current FileReader can only work in  one of the two modes, coalescing (when Prebuffer is called) and non-coalescing (when Prefufer is not called), due to the if statement here: [[https://github.com/apache/arrow/blob/main/cpp/src/parquet/file_reader.cc#L203|http://example.com]|https://github.com/apache/arrow/blob/main/cpp/src/parquet/file_reader.cc#L203]. 

Since Prebuffer is basically caching all specified column chunks, it would raise concerns on memory usage for systems with tight memory budget. In such scenarios, one may want to Prebuffer some small chunks while being able to read the rest chunks using  BufferedInputStream. 


> Allow partial prebuffer in parquet FileReader
> ---------------------------------------------
>
>                 Key: PARQUET-2316
>                 URL: https://issues.apache.org/jira/browse/PARQUET-2316
>             Project: Parquet
>          Issue Type: Improvement
>          Components: parquet-cpp
>            Reporter: Jinpeng Zhou
>            Assignee: Jinpeng Zhou
>            Priority: Minor
>             Fix For: cpp-12.0.0
>
>
> The current FileReader can only work in  one of the two modes, coalescing (when Prebuffer is called) and non-coalescing (when Prefufer is not called), due to the if statement here: [[https://github.com/apache/arrow/blob/main/cpp/src/parquet/file_reader.cc#L203|http://example.com]/]]. 
> Since Prebuffer is basically caching all specified column chunks, it would raise concerns on memory usage for systems with tight memory budget. In such scenarios, one may want to Prebuffer some small chunks while being able to read the rest chunks using  BufferedInputStream. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)