You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "westonpace (via GitHub)" <gi...@apache.org> on 2023/05/11 13:23:00 UTC

[GitHub] [arrow] westonpace commented on issue #33683: [C++][Parquet] Would Arrow::FileReader support filter evaluating and optimize by BloomFilter

westonpace commented on issue #33683:
URL: https://github.com/apache/arrow/issues/33683#issuecomment-1544003325

   Support for reading bloom filters from parquet files into memory was added in 12.0.0.  There is an open issue for using this feature to do pushdown filtering here: https://github.com/apache/arrow/issues/27277
   
   The datasets feature was already doing some pushdown using the parquet file statistics.  That issue asks to also use the bloom filter for pushdown filtering for datasets.
   
   The parquet reader itself hasn't done pushdown in the past, but I'd be generally in favor of moving the pushdown filtering out of the datasets layer and into the file reader layer itself if someone was motivated to do the work.  That would be more complex than just adding bloom filter filtering support to the datasets layer though because you'd have to figure out how to formulate filter expressions (you could add a dependency on arrow expressions but I'm not sure if that makes sense in the parquet layer).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org