You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2019/11/01 17:43:21 UTC

[GitHub] [incubator-hudi] nsivabalan commented on issue #976: [HUDI-106] Adding support for DynamicBloomFilter

nsivabalan commented on issue #976: [HUDI-106] Adding support for DynamicBloomFilter
URL: https://github.com/apache/incubator-hudi/pull/976#issuecomment-548883578
 
 
   > Left some comments... can we also add a test to test the "dynamic" nature of the filter. e,g having more entries should result in larger filter with same fp ratio.. And also how are you enforcing a maximum dynamic bloom filter size. Can you share data on how big the bloom filter would be, if you say wrote 1M keys at fpp ratio 10^-9
   
   Few questions/clarifications:
   - I guess you can't bound the size in dynamic bloom filter. Size will grow according to the number to entries added. Initialize number of entries passed will be used to set the min size. 
   - I am trying to find ways to test the FP ratio. Not sure how would you test that. 
   - I was able to verify that adding more entries to the filter than the initial size, increases the size of the bloom. 
   - Here are the sizes of dynamic bloom filter with error rate 10^-9 and initial number of entries as 10k
    Size of bloom with 100 entries = 71940 bytes ~= 71kb
    Size of bloom with 1000 entries = 71940 bytes ~= 71kb
    Size of bloom with 10000 entries = 71940 bytes ~= 71kb
    Size of bloom with 100000 entries = 719088 bytes ~= 720kb
    Size of bloom with 1000000 entries  = 7190568 bytes ~= 7.1 MB

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services