You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@commons.apache.org by "Claude Warren (Jira)" <ji...@apache.org> on 2022/07/03 10:01:00 UTC

[jira] [Comment Edited] (COLLECTIONS-820) BloomFilter: support merge without checking

    [ https://issues.apache.org/jira/browse/COLLECTIONS-820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17561833#comment-17561833 ] 

Claude Warren edited comment on COLLECTIONS-820 at 7/3/22 10:00 AM:
--------------------------------------------------------------------

Ahhh... I think I see now where this request comes from.

The reason the merge has a boolean result is to account for the counting Bloom filter that can overflow and therefore the merge effectively fails.  When this happens the counting Bloom filter is then invalid.  There may be other filter implementations that have the same types of limits but are as yet unimplemented or discussed.

All other current implementations work because they are simply flipping bits on so there really is no check.

The only other failure is to attempt to turn on a bit that is out of range and that throws and exception.

I did a check of my code base and did not find any case where there was a {{contains}} check followed by a {{merge}} if not contains.  I only found {{contains}} followed by searching.   In most cases my code has a distance calculation before a {{merge}} or no check at all.


was (Author: claudenw):
Ahhh... I think I see now where this request comes from.

The reason the merge has a boolean result is to account for the counting Bloom filter that can overflow and therefore the merge effectively fails.  When this happens the counting Bloom filter is then invalid.  There may be other filter implementations that have the same types of limits but are as yet unimplemented or discussed.

All other current implementations work because they are simply flipping bits on so there really is no check.

The only other failure is to attempt to turn on a bit that is out of range and that throws and exception.

> BloomFilter: support merge without checking
> -------------------------------------------
>
>                 Key: COLLECTIONS-820
>                 URL: https://issues.apache.org/jira/browse/COLLECTIONS-820
>             Project: Commons Collections
>          Issue Type: Improvement
>          Components: Collection
>    Affects Versions: 4.5
>            Reporter: Claude Warren
>            Priority: Minor
>              Labels: bloom-filter
>
> Support a {{merge}} without checking and then provide a method  {{add}} with checking for a change in the filter. This will duplicate a lot of (simple) code. But it provides a nice method to check if a filter contains an object and do something if it did not, rather than calling contains before then doing a merge, or doing a merge and checking cardinality change afterwards.
> For a bitmap bloom filter this can be done branchless by accumulating all the zero bits that are set to 1 using:
> {noformat}
> long[] flag = {0};
> int[] idx = new int[1];
> bitMapProducer.forEachBitMap(value -> {
>     int i = idx[0]++;
>     flag[0] |= (~bitMap[i] & value);
>     bitMap[i] |= value;
>     return true;
> });
> return flag[0] != 0;
> {noformat}
> For a filter with an explicit cardinality you just check the cardinality before and after the merge.
> h3. !https://avatars.githubusercontent.com/u/886334?s=48&v=4|width=24,height=24! *[aherbert|https://github.com/aherbert]* [on 27 Feb|https://github.com/apache/commons-collections/pull/258#discussion_r813419143]
>  
> With the provision of the {{copy}} method the {{merge}} method seems redundant with {{{}mergeInPlace{}}}. It is equivalent to two lines of code:
>  
> {noformat}
> BloomFilter filter2 = filter.copy(); 
> filter2.mergeInPlace(other);
> {noformat}
> I think {{merge}} should be the collections equivalent of {{{}add{}}}. This acts in place. Very few collections operations return new collections unless they are views of the current collection, e.g. List.subList. Note that the CountingBloomFilter has {{add}} which acts in place. There is no {{add}} returning a new CountingBloomFilter and then an {{{}addInPlace{}}}. So there is some inconsistency here.
> IMO the more common operation is to merge a filter in place. So this should be the action of merge. Then the mergeInPlace can be dropped or replaced with {{copyAndMerge}} or {{{}mergeToCopy{}}}. You then also have the boolean result of merge to consider. Should the copy not be returned if the merge returns false? Since it is 2 lines of code I would not bother and drop this.
> The BloomFilter then has {{merge}} and CountingBloomFilter has {{{}add{}}}, both act in place (the common operation). The user can create a copy if desired for a merge if that is useful to them.
> Another comment on {{merge}} suggests changing it to support {{add}} and {{merge}} as both have their own use cases.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)