You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/04/29 20:21:26 UTC

[GitHub] [arrow] westonpace commented on issue #13030: [JAVA] Is any way reading partial parquet file into arrow

westonpace commented on issue #13030:
URL: https://github.com/apache/arrow/issues/13030#issuecomment-1113692280

   A parquet file is made up of row groups, columns, and pages.  A page is indivisible as it represents a compressed buffer.  There is no way to read a part of a page and so it cannot be sliced.
   
   However, it is still a popular idea to partition file access based on file size.  One way to handle this is to return every row group whose first byte is in the asked-for range.
   
   For example, if a parquet file has 10 row groups and each row group is 900,000 bytes and you ask for the range [2000000,3000000] you would get the 3rd row group (that starts at byte 2,700,000).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org