You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Vinoth Chandar (Jira)" <ji...@apache.org> on 2019/11/19 23:44:00 UTC
[jira] [Created] (HUDI-351) Implement Range + Bloom Filter checking
in one go to improve speed of index
Vinoth Chandar created HUDI-351:
-----------------------------------
Summary: Implement Range + Bloom Filter checking in one go to improve speed of index
Key: HUDI-351
URL: https://issues.apache.org/jira/browse/HUDI-351
Project: Apache Hudi (incubating)
Issue Type: New Feature
Components: Index, Performance
Reporter: Vinoth Chandar
Currently, we read the min/max ranges once for range pruning and again read the footer metadata to check for bloom filter..
Once spark 2.4 support and the 2GB limitations are gone.. worth revisiting if we could do this in a single pass for cases where the bloom filters could fit into memory or implement this check as a RDD operation..
--
This message was sent by Atlassian Jira
(v8.3.4#803005)