You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/11/15 01:17:23 UTC

[GitHub] [arrow-rs] Jimexist commented on pull request #3102: parquet bloom filter part II: read sbbf bitset from row group reader, update API, and add cli demo

Jimexist commented on PR #3102:
URL: https://github.com/apache/arrow-rs/pull/3102#issuecomment-1314625409

   > Once we have read the file metadata we know the byte ranges of the column chunks, and page indexes, as well as the offsets of the bloom filter data for each column chunk. It should therefore be possible to get a fairly accurate overestimate of the length of each bloom filter, simply by process of elimination.
   
   Thanks for the suggestion. I wonder if that is future proof, e.g. if there are more data structure to be added later beside sbbf, page index, etc. would that be a problem? Thinking out loud... that this would just be ballooning the over-estimate and/or make the likelihood of needing to look at both locations before it can correctly locate which was the right one when parquet file was written.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org