You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Vinoth Chandar (Jira)" <ji...@apache.org> on 2019/11/19 23:44:00 UTC

[jira] [Created] (HUDI-351) Implement Range + Bloom Filter checking in one go to improve speed of index

Vinoth Chandar created HUDI-351:
-----------------------------------

             Summary: Implement Range + Bloom Filter checking in one go to improve speed of index
                 Key: HUDI-351
                 URL: https://issues.apache.org/jira/browse/HUDI-351
             Project: Apache Hudi (incubating)
          Issue Type: New Feature
          Components: Index, Performance
            Reporter: Vinoth Chandar


Currently, we read the min/max ranges once for range pruning and again read the footer metadata to check for bloom filter..

Once spark 2.4 support and the 2GB limitations are gone.. worth revisiting if we could do this in a single pass for cases where the bloom filters could fit into memory or implement this check as a RDD operation.. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)