You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "westonpace (via GitHub)" <gi...@apache.org> on 2023/06/09 17:41:03 UTC

[GitHub] [arrow] westonpace commented on issue #33683: [C++][Parquet] Would Arrow::FileReader support filter evaluating and optimize by BloomFilter

westonpace commented on issue #33683:
URL: https://github.com/apache/arrow/issues/33683#issuecomment-1584934046

   > Sorry for late reply because I'm a bit busy these days. I found a problem that bloom filter is not trival, it might enhance the performance, and might not. Should I add an use_bloom_filter options in ParquetFragmentScanOptions ?
   
   Yes, that would be a good place for it.   We would want a comment that provides users with enough information to help make the correct choice.  For example "This feature allows parquet bloom filters to be used to reduce the amount of data that needs to be read from the disk.  However, applying these filters can be expensive and, if the filter is not very selective, may cost more CPU time than they save." (I don't know if that is the actual reason, feel free to modify as appropriate based on your testing)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org