You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/11/09 16:59:56 UTC

[GitHub] [arrow] emkornfield commented on a diff in pull request #14603: PARQUET-2210: [C++][Parquet] Skip pages based on header metadata using a callback

emkornfield commented on code in PR #14603:
URL: https://github.com/apache/arrow/pull/14603#discussion_r1018196450


##########
cpp/src/parquet/column_reader.cc:
##########
@@ -386,6 +397,16 @@ std::shared_ptr<Page> SerializedPageReader::NextPage() {
       throw ParquetException("Invalid page header");
     }
 
+   // Once we have the header, we will call the skip_page_call_back_ to
+   // determine if we should be skipping this page. If yes, we will advance the
+   // stream to the next page.
+   if(has_skip_page_callback_) {

Review Comment:
   > What we have done is to let the PageReader be aware of Offset Index belong to the pages of the RowGroup
   
   Do you have example code here for what you mean?
   
   > I can pick up [ARROW-10158](https://issues.apache.org/jira/browse/ARROW-10158) to contribute our implementation.
   
   Is the implementation compatible with the callback approach?  If you are willing to contribute it, it seems like it would be valuable.
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org