You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@commons.apache.org by "Claude Warren (Jira)" <ji...@apache.org> on 2022/06/15 08:30:00 UTC

[jira] [Updated] (COLLECTIONS-824) BloomFilter: change name of SimpleHasher

     [ https://issues.apache.org/jira/browse/COLLECTIONS-824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Claude Warren updated COLLECTIONS-824:
--------------------------------------
    Description: 
The wikipedia article on [Double Hashing (variants)|https://en.wikipedia.org/wiki/Double_hashing#Variants] notes that this type of hasher when used in Bloom filters will result in a collision rate that is twice as likely as intended.

An alternative is to provide enhanced double hashing which changes the increment each loop iteration. This would require some additional work on wrapping the modulus of the increment.

It may be better to change the name to KMHasher and more explicitly state that the hash functions are:


 

g_i(x) = (h1(x) + i h2(x)) mod m; i in [1, k]
 

  was:
The wikipedia article on [Double Hashing (variants)|https://en.wikipedia.org/wiki/Double_hashing#Variants] notes that this type of hasher when used in Bloom filters will result in a collision rate that is twice as likely as intended.

An alternative is to provide enhanced double hashing which changes the increment each loop iteration. This would require some additional work on wrapping the modulus of the increment.

It may be better to change the name to KMHasher and more explicitly state that the hash functions are:
 {{g_i(x) = (h1(x) + i h2(x)) mod m; i in [1, k]}}
 


> BloomFilter: change name of SimpleHasher
> ----------------------------------------
>
>                 Key: COLLECTIONS-824
>                 URL: https://issues.apache.org/jira/browse/COLLECTIONS-824
>             Project: Commons Collections
>          Issue Type: Improvement
>          Components: Collection
>    Affects Versions: 4.5
>            Reporter: Claude Warren
>            Priority: Major
>              Labels: bloom-filter
>
> The wikipedia article on [Double Hashing (variants)|https://en.wikipedia.org/wiki/Double_hashing#Variants] notes that this type of hasher when used in Bloom filters will result in a collision rate that is twice as likely as intended.
> An alternative is to provide enhanced double hashing which changes the increment each loop iteration. This would require some additional work on wrapping the modulus of the increment.
> It may be better to change the name to KMHasher and more explicitly state that the hash functions are:
>  
> g_i(x) = (h1(x) + i h2(x)) mod m; i in [1, k]
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)