You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "sivabalan narayanan (Jira)" <ji...@apache.org> on 2019/10/25 16:32:00 UTC

[jira] [Commented] (HUDI-106) Dynamically tune bloom filter entries

    [ https://issues.apache.org/jira/browse/HUDI-106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16959907#comment-16959907 ] 

sivabalan narayanan commented on HUDI-106:
------------------------------------------

Few questions: 
 * Do we plan to rely on compaction to convert older types to new one ? Or do we plan to come up with a new worker thread for this specific job. 
 * I do not know the specifics of Compaction. But is there a major compaction in hudi that will happen from time to time? 
 * This might be very ambitious or rather unnecessarily complex. But is the Compaction framework designed to be very generic, with a transformation function? If yes, we could spin up another job like BloomFilterConverter which would take existing bloom filters and convert them to new formats(Dynamic). Just that for compaction there could be multiple files involved at a time, where as for this, one file is taken as input and one file is the result. 

> Dynamically tune bloom filter entries
> -------------------------------------
>
>                 Key: HUDI-106
>                 URL: https://issues.apache.org/jira/browse/HUDI-106
>             Project: Apache Hudi (incubating)
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Vinoth Chandar
>            Assignee: Vinoth Chandar
>            Priority: Major
>              Labels: realtime-data-lakes
>             Fix For: 0.5.1
>
>
> Tuning bloom filters is currently based on a configuration, that could be cumbersome to tune per dataset to obtain good indexing performance.. Lets add support for Dynamic Bloom Filters, that can automatically achieve a configured false positive ratio depending on number of entries. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)