You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@arrow.apache.org by "Ted-Jiang (via GitHub)" <gi...@apache.org> on 2023/02/01 04:04:04 UTC

[GitHub] [arrow-datafusion] Ted-Jiang commented on pull request #5057: Parquet parallel scan

Ted-Jiang commented on PR #5057:
URL: https://github.com/apache/arrow-datafusion/pull/5057#issuecomment-1411432296

   
   Thanks for all kindly reply !   ❤️
    > here is the [exact place](https://github.com/apache/arrow-datafusion/blob/125a8580c19c78c99fbbe3a6afe373de2538b205/datafusion/core/src/physical_plan/file_format/parquet/row_groups.rs#L57) where DF decides to read/not to read RowGroup depending on range. So it actually isn't required to split files on ranges with boundaries same as RowGroups boundaries.
   
   So `PartitionedFile ` range is not used for fetch bytes from objectStore, finally we use rowgroup start, offset to fetch bytest ?  am i right?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org