You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@commons.apache.org by "Claude Warren (Jira)" <ji...@apache.org> on 2022/06/15 10:19:00 UTC

[jira] [Updated] (COLLECTIONS-820) BloomFilter: support merge without checking

     [ https://issues.apache.org/jira/browse/COLLECTIONS-820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Claude Warren updated COLLECTIONS-820:
--------------------------------------
    Summary: BloomFilter: support merge without checking  (was: BloomFilter: renaming merge/mergeInPlace)

> BloomFilter: support merge without checking
> -------------------------------------------
>
>                 Key: COLLECTIONS-820
>                 URL: https://issues.apache.org/jira/browse/COLLECTIONS-820
>             Project: Commons Collections
>          Issue Type: Improvement
>          Components: Collection
>    Affects Versions: 4.5
>            Reporter: Claude Warren
>            Priority: Minor
>              Labels: bloom-filter
>
>     Support a merge without checking and then provide a method add with checking for a change in the filter. This will duplicate a lot of (simple) code. But it provides a nice method to check if a filter contains an object and do something if it did not, rather than calling contains before then doing a merge, or doing a merge and checking cardinality change afterwards.
> For a bitmap bloom filter this can be done branchless by accumulating all the zero bits that are set to 1 using:
> long[] flag = {0};
> int[] idx = new int[1];
> bitMapProducer.forEachBitMap(value -> {
>     int i = idx[0]++;
>     flag[0] |= (~bitMap[i] & value);
>     bitMap[i] |= value;
>     return true;
> });
> return flag[0] != 0;
> For a filter with an explicit cardinality you just check the cardinality before and after the merge.
> h3. !https://avatars.githubusercontent.com/u/886334?s=48&v=4|width=24,height=24!  *[aherbert|https://github.com/aherbert]*  [on 27 Feb|https://github.com/apache/commons-collections/pull/258#discussion_r813419143]
>  
> With the provision of the {{copy}} method the {{merge}} method seems redundant with {{{}mergeInPlace{}}}. It is equivalent to two lines of code:
>  
> {{BloomFilter filter2 = filter.copy(); }}
> {{filter2.mergeInPlace(other);}}
>  
> I think {{merge}} should be the collections equivalent of {{{}add{}}}. This acts in place. Very few collections operations return new collections unless they are views of the current collection, e.g. List.subList. Note that the CountingBloomFilter has {{add}} which acts in place. There is no {{add}} returning a new CountingBloomFilter and then an {{{}addInPlace{}}}. So there is some inconsistency here.
> IMO the more common operation is to merge a filter in place. So this should be the action of merge. Then the mergeInPlace can be dropped or replaced with {{copyAndMerge}} or {{{}mergeToCopy{}}}. You then also have the boolean result of merge to consider. Should the copy not be returned if the merge returns false? Since it is 2 lines of code I would not bother and drop this.
> The BloomFilter then has {{merge}} and CountingBloomFilter has {{{}add{}}}, both act in place (the common operation). The user can create a copy if desired for a merge if that is useful to them.
> Another comment on {{merge}} suggests changing it to support {{add}} and {{merge}} as both have their own use cases.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)