You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@commons.apache.org by "Claude Warren (Jira)" <ji...@apache.org> on 2022/06/24 11:30:00 UTC

[jira] [Resolved] (COLLECTIONS-801) Simplify Bloom filter implementation

     [ https://issues.apache.org/jira/browse/COLLECTIONS-801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Claude Warren resolved COLLECTIONS-801.
---------------------------------------
    Fix Version/s: 4.5
       Resolution: Fixed

Closed with pull request 
 
h1. Simplify bloom filters #258

> Simplify Bloom filter implementation
> ------------------------------------
>
>                 Key: COLLECTIONS-801
>                 URL: https://issues.apache.org/jira/browse/COLLECTIONS-801
>             Project: Commons Collections
>          Issue Type: Improvement
>    Affects Versions: 4.5
>            Reporter: Claude Warren
>            Priority: Minor
>             Fix For: 4.5
>
>
> The initial Bloom filter implementation has a number of issues arising from attempting to verify that filters were built the same way (ie. same number of bits, number of functions, hashing algorithm, etc.)  The net result is that there is significant overhead in the processing of the filters.  For a structure that is intended to speed up decisions this is a fault.  There are also issues with implementation hiding in the current code.
> The issue calls for:
>  * The removal of the hashing tracking, all the registration and verification that the hashes are the same.  This becomes an operational issue for the developer using the library.  In most cases this is a trivial problem.
>  * Removal of all methods that assume an implementation detail of the Bloom filter.  Rather than getting a list of indices or a list of bitmap longs (or bytes) the classes will implement a "Producer" style that uses IntConsumer, LongConsumer, etc to receive the indices or bit maps.  This pattern to carry forward to specialized filters like counting Bloom filters that will accept a BitCountConsumer to receive the counts for each bit.
>  * There are cases where Bloom filters need to be updated and cases where they do not.  The change should include merge() methods to produce new Bloom filters and mergeInPlace() methods to update the Bloom filters.
>  * There are issues with some assumptions in the Hasher implementations wherein the static hasher does not function as described.  Part of the goal of this change is to simplify the mechanism to pass the internal state of one bloom filter to another without either knowing the implementation of that state in the other.  This will be done via the "Producer" interface described above.  Hashers will be able to create IndexProducers that will produce the indices for a Bloom filter shape.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)