You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@commons.apache.org by GitBox <gi...@apache.org> on 2022/08/23 10:21:33 UTC

[GitHub] [commons-collections] Claudenw opened a new pull request, #331: Collections 763: Remove BloomFilter constructors that create initial entry

Claudenw opened a new pull request, #331:
URL: https://github.com/apache/commons-collections/pull/331

   Removed constructors that took initial entry.
   
   Added `merge()` methods for IndexProducer and BitMapProducer.
   
   Added `CountingBloomFilter.remove()` methods for IndexProducer and BitMapProducer.
   Moved implementations of `ArrayCountingBloomFilter.merge()` to default implementations in `CountingBloomFilter`
   Moved implementations of `ArrayCOuntingBloomFilter.remove()` to default implementations in `CountingBloomFilter`
   
   Updated tests accordingly
   Added merge() tests.
   Added remove() tests.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [commons-collections] aherbert commented on pull request #331: Collections 763: Remove BloomFilter constructors that create initial entry

Posted by GitBox <gi...@apache.org>.

aherbert commented on PR #331:
URL: https://github.com/apache/commons-collections/pull/331#issuecomment-1242655670

   I have found the index sorting and duplicate elimination routines. I know how to fix the test. The question is whether the test is indicating a problem. I think it is.
   
   The default implementation of:
   - IndexProducer.forEach _will_ return duplicates
   - IndexProducer.asIndexArray _will not_ return duplicates
   
   Is this the functionality we wish to support? If so then it should be documented. Currently there is no indication in the javadoc that asIndexArray is either sorted or distinct.
   
   I can fix the test; I can add the documentation; but should this be done?
   
   Looking at the overrides for asIndexArray:
   
   - ArrayCountingBloomFilter: _will not_ return duplicates
   - EnhancedDoubleHasher: _will_ return duplicates
   - HasherCollection: _will_ return duplicates
   
   Note that the two hashers have a code comment that they **need** to return duplicates.
   
   So we have a contradictory functionality. BloomFilters and a default IndexProducer will produce sorted distinct indices for asIndexArray. Hashers will be unsorted and may contain duplicates. The HasherCollection can output a complete jumble of indices which make no practical sense. It could be modified to make the indices from each hasher unique, then duplicates will indicate multiple hashers produced the same index. However this is not supported in the forEach method and would be a performance overhead. Currently the HasherCollection thus acts as if one item but it is built from multiple hashers. This should be in the documentation to prevent the mistake that the collection will act as if it represents multiple items.
   
   So is the suggested fix:
   
   - document that the **default** asIndexArray method will output distinct indices (current behaviour, but no contract for overrides)
   - update the test in DefaultIndexProducerTest.testAsIndexArray to remove duplicates from the expected array. This will then enforce the **default** behaviour that the indices are distinct
   
   Note: this does not mandate that asIndexArray must **always** return no duplicates, only that the default implementation does.
   
   Or:
   
   - document the asIndexArray method to state that indices may be duplicates (current behaviour)
   - update the test in DefaultIndexProducerTest.testAsIndexArray to be robust to duplicates
   
   I would favour the second approach as it avoids a commitment to certain functionality and highlights that duplicates are possible. This has minimum impact on the rest of the code. I am happy to make the second change now and finish this merge.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [commons-collections] aherbert closed pull request #331: Collections 763: Remove BloomFilter constructors that create initial entry

Posted by GitBox <gi...@apache.org>.

aherbert closed pull request #331: Collections 763: Remove BloomFilter constructors that create initial entry
URL: https://github.com/apache/commons-collections/pull/331


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [commons-collections] Claudenw commented on a diff in pull request #331: Collections 763: Remove BloomFilter constructors that create initial entry

Posted by GitBox <gi...@apache.org>.

Claudenw commented on code in PR #331:
URL: https://github.com/apache/commons-collections/pull/331#discussion_r958112564


##########
src/main/java/org/apache/commons/collections4/bloomfilter/CountingBloomFilter.java:
##########
@@ -77,6 +79,92 @@ public interface CountingBloomFilter extends BloomFilter, BitCountProducer {
 
     // Modification Operations
 
+    /**
+     * Merges the specified Bloom filter into this Bloom filter.
+     *
+     * <p>Specifically: all counts for the indexes identified by the {@code other} filter will be incremented by 1,</p>
+     *
+     * <p>Note: If the other filter is a counting Bloom filter the index counts are ignored and it is treated as an
+     * IndexProducer.</p>
+     *
+     * <p>This method will return {@code true} if the filter is valid after the operation.</p>
+     *
+     * @param other the other Bloom filter
+     * @return {@code true} if the removal was successful and the state is valid
+     * @see #isValid()
+     * @see #add(BitCountProducer)
+     */
+    default boolean merge(final BloomFilter other) {
+        Objects.requireNonNull(other, "other");
+        try {
+            return add(BitCountProducer.from(other));
+        } catch (IndexOutOfBoundsException e) {
+            throw new IllegalArgumentException(e);
+        }
+    }
+
+    /**
+     * Merges the specified Hasher into this Bloom filter.
+     *
+     * <p>Specifically: all counts for the unique indexes identified by the {@code hasher} will be incremented by 1,</p>
+     *
+     * <p>This method will return {@code true} if the filter is valid after the operation.</p>
+     *
+     * @param hasher the hasher
+     * @return {@code true} if the removal was successful and the state is valid
+     * @see #isValid()
+     * @see #add(BitCountProducer)
+     */
+    default boolean merge(final Hasher hasher) {
+        Objects.requireNonNull(hasher, "hasher");
+        try {
+            return add(BitCountProducer.from(hasher.uniqueIndices(getShape())));
+        } catch (IndexOutOfBoundsException e) {
+            throw new IllegalArgumentException(
+                    String.format("Filter only accepts values in the [0,%d) range", getShape().getNumberOfBits()));

Review Comment:
   fixed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [commons-collections] Claudenw commented on a diff in pull request #331: Collections 763: Remove BloomFilter constructors that create initial entry

Posted by GitBox <gi...@apache.org>.

Claudenw commented on code in PR #331:
URL: https://github.com/apache/commons-collections/pull/331#discussion_r958113098


##########
src/main/java/org/apache/commons/collections4/bloomfilter/CountingBloomFilter.java:
##########
@@ -77,6 +79,92 @@ public interface CountingBloomFilter extends BloomFilter, BitCountProducer {
 
     // Modification Operations
 
+    /**
+     * Merges the specified Bloom filter into this Bloom filter.
+     *
+     * <p>Specifically: all counts for the indexes identified by the {@code other} filter will be incremented by 1,</p>
+     *
+     * <p>Note: If the other filter is a counting Bloom filter the index counts are ignored and it is treated as an
+     * IndexProducer.</p>
+     *
+     * <p>This method will return {@code true} if the filter is valid after the operation.</p>
+     *
+     * @param other the other Bloom filter
+     * @return {@code true} if the removal was successful and the state is valid
+     * @see #isValid()
+     * @see #add(BitCountProducer)
+     */
+    default boolean merge(final BloomFilter other) {
+        Objects.requireNonNull(other, "other");
+        try {
+            return add(BitCountProducer.from(other));
+        } catch (IndexOutOfBoundsException e) {
+            throw new IllegalArgumentException(e);
+        }
+    }
+
+    /**
+     * Merges the specified Hasher into this Bloom filter.
+     *
+     * <p>Specifically: all counts for the unique indexes identified by the {@code hasher} will be incremented by 1,</p>
+     *
+     * <p>This method will return {@code true} if the filter is valid after the operation.</p>
+     *
+     * @param hasher the hasher
+     * @return {@code true} if the removal was successful and the state is valid
+     * @see #isValid()
+     * @see #add(BitCountProducer)
+     */
+    default boolean merge(final Hasher hasher) {
+        Objects.requireNonNull(hasher, "hasher");
+        try {
+            return add(BitCountProducer.from(hasher.uniqueIndices(getShape())));
+        } catch (IndexOutOfBoundsException e) {
+            throw new IllegalArgumentException(
+                    String.format("Filter only accepts values in the [0,%d) range", getShape().getNumberOfBits()));
+        }
+    }
+
+    /**
+     * Merges the specified index producer into this Bloom filter.
+     *
+     * <p>Specifically: all counts for the indexes identified by the {@code indexProducer} will be incremented by 1,</p>
+     *
+     * <p>This method will return {@code true} if the filter is valid after the operation.</p>
+     *
+     * <p>Note: This method expects and index producer that does not return duplicates.</p>
+     *
+     * @param indexProducer the IndexProducer
+     * @return {@code true} if the removal was successful and the state is valid
+     * @see #isValid()
+     * @see #add(BitCountProducer)
+     */
+    default boolean merge(final IndexProducer indexProducer) {
+        Objects.requireNonNull(indexProducer, "producer");
+        try {
+            return add(BitCountProducer.from(indexProducer));
+        } catch (IndexOutOfBoundsException e) {
+            throw new IllegalArgumentException(
+                    String.format("Filter only accepts values in the [0,%d) range", getShape().getNumberOfBits()));
+        }
+    }
+
+    /**
+     * Merges the specified BitMap producer into this Bloom filter.
+     *
+     * <p>Specifically: all counts for the indexes identified by the {@code bitMapProducer} will be incremented by 1,</p>
+     *
+     * <p>This method will return {@code true} if the filter is valid after the operation.</p>
+     *
+     * @param bitMapProducer the BitMapProducer
+     * @return {@code true} if the removal was successful and the state is valid
+     * @see #isValid()
+     * @see #add(BitCountProducer)
+     */
+    default boolean merge(final BitMapProducer bitMapProducer) {
+        return merge(IndexProducer.fromBitMapProducer(bitMapProducer));

Review Comment:
   fixed



##########
src/test/java/org/apache/commons/collections4/bloomfilter/AbstractBloomFilterTest.java:
##########
@@ -186,8 +179,10 @@ public final void testContains() {
     @Test
     public final void testEstimateIntersection() {
 
-        final BloomFilter bf = createFilter(getTestShape(), from1);
-        final BloomFilter bf2 = createFilter(getTestShape(), bigHasher);
+        final BloomFilter bf = createEmptyFilter(getTestShape());
+        bf.merge(from1);
+        final BloomFilter bf2 = createEmptyFilter(getTestShape());
+        bf2.merge(from1);

Review Comment:
   fixed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [commons-collections] Claudenw commented on a diff in pull request #331: Collections 763: Remove BloomFilter constructors that create initial entry

Posted by GitBox <gi...@apache.org>.

Claudenw commented on code in PR #331:
URL: https://github.com/apache/commons-collections/pull/331#discussion_r958112874


##########
src/main/java/org/apache/commons/collections4/bloomfilter/CountingBloomFilter.java:
##########
@@ -77,6 +79,92 @@ public interface CountingBloomFilter extends BloomFilter, BitCountProducer {
 
     // Modification Operations
 
+    /**
+     * Merges the specified Bloom filter into this Bloom filter.
+     *
+     * <p>Specifically: all counts for the indexes identified by the {@code other} filter will be incremented by 1,</p>
+     *
+     * <p>Note: If the other filter is a counting Bloom filter the index counts are ignored and it is treated as an
+     * IndexProducer.</p>
+     *
+     * <p>This method will return {@code true} if the filter is valid after the operation.</p>
+     *
+     * @param other the other Bloom filter
+     * @return {@code true} if the removal was successful and the state is valid
+     * @see #isValid()
+     * @see #add(BitCountProducer)
+     */
+    default boolean merge(final BloomFilter other) {
+        Objects.requireNonNull(other, "other");
+        try {
+            return add(BitCountProducer.from(other));
+        } catch (IndexOutOfBoundsException e) {
+            throw new IllegalArgumentException(e);
+        }
+    }
+
+    /**
+     * Merges the specified Hasher into this Bloom filter.
+     *
+     * <p>Specifically: all counts for the unique indexes identified by the {@code hasher} will be incremented by 1,</p>
+     *
+     * <p>This method will return {@code true} if the filter is valid after the operation.</p>
+     *
+     * @param hasher the hasher
+     * @return {@code true} if the removal was successful and the state is valid
+     * @see #isValid()
+     * @see #add(BitCountProducer)
+     */
+    default boolean merge(final Hasher hasher) {
+        Objects.requireNonNull(hasher, "hasher");
+        try {
+            return add(BitCountProducer.from(hasher.uniqueIndices(getShape())));
+        } catch (IndexOutOfBoundsException e) {
+            throw new IllegalArgumentException(
+                    String.format("Filter only accepts values in the [0,%d) range", getShape().getNumberOfBits()));
+        }
+    }
+
+    /**
+     * Merges the specified index producer into this Bloom filter.
+     *
+     * <p>Specifically: all counts for the indexes identified by the {@code indexProducer} will be incremented by 1,</p>
+     *
+     * <p>This method will return {@code true} if the filter is valid after the operation.</p>
+     *
+     * <p>Note: This method expects and index producer that does not return duplicates.</p>
+     *
+     * @param indexProducer the IndexProducer
+     * @return {@code true} if the removal was successful and the state is valid
+     * @see #isValid()
+     * @see #add(BitCountProducer)
+     */
+    default boolean merge(final IndexProducer indexProducer) {
+        Objects.requireNonNull(indexProducer, "producer");
+        try {
+            return add(BitCountProducer.from(indexProducer));
+        } catch (IndexOutOfBoundsException e) {
+            throw new IllegalArgumentException(
+                    String.format("Filter only accepts values in the [0,%d) range", getShape().getNumberOfBits()));

Review Comment:
   fixed



##########
src/main/java/org/apache/commons/collections4/bloomfilter/CountingBloomFilter.java:
##########
@@ -77,6 +79,92 @@ public interface CountingBloomFilter extends BloomFilter, BitCountProducer {
 
     // Modification Operations
 
+    /**
+     * Merges the specified Bloom filter into this Bloom filter.
+     *
+     * <p>Specifically: all counts for the indexes identified by the {@code other} filter will be incremented by 1,</p>
+     *
+     * <p>Note: If the other filter is a counting Bloom filter the index counts are ignored and it is treated as an
+     * IndexProducer.</p>
+     *
+     * <p>This method will return {@code true} if the filter is valid after the operation.</p>
+     *
+     * @param other the other Bloom filter
+     * @return {@code true} if the removal was successful and the state is valid
+     * @see #isValid()
+     * @see #add(BitCountProducer)
+     */
+    default boolean merge(final BloomFilter other) {
+        Objects.requireNonNull(other, "other");

Review Comment:
   fixed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [commons-collections] Claudenw commented on a diff in pull request #331: Collections 763: Remove BloomFilter constructors that create initial entry

Posted by GitBox <gi...@apache.org>.

Claudenw commented on code in PR #331:
URL: https://github.com/apache/commons-collections/pull/331#discussion_r958605992


##########
src/main/java/org/apache/commons/collections4/bloomfilter/SparseBloomFilter.java:
##########
@@ -140,12 +78,9 @@ private boolean add(int idx) {
         return true;
     }
 
-    /**
-     * Performs a merge using an IndexProducer.
-     * @param indexProducer the IndexProducer to merge from.
-     * @throws IllegalArgumentException if producer sends illegal value.
-     */
-    private void merge(IndexProducer indexProducer) {
+    @Override
+    public boolean merge(IndexProducer indexProducer) {
+        Objects.requireNonNull(indexProducer, "indexProducer");

Review Comment:
   added paragraph in package-info.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [commons-collections] Claudenw commented on pull request #331: Collections 763: Remove BloomFilter constructors that create initial entry

Posted by GitBox <gi...@apache.org>.

Claudenw commented on PR #331:
URL: https://github.com/apache/commons-collections/pull/331#issuecomment-1242528177

   there is a method in defaultindexprovidertest that will remove dups from
   the original array and sort it so that assert array equals can work.
   
   In other tests I ran the main test in a loop 5k times to try to hit such
   problems.  I must have missed this one
   
   
   On Fri, Sep 9, 2022, 23:12 Alex Herbert ***@***.***> wrote:
   
   > @Claudenw <https://github.com/Claudenw> I merged this locally and was
   > doing some minor code clean-up. It seems that there is a flaky test:
   >
   > DefaultIndexProducerTest.testAsIndexArray
   >
   > In AbstractIndexProducerTest.testAsIndexArray the producer is written to a
   > list using asIndexArray. Then the producer forEach is called to remove
   > entries from the list. However the default implementation of
   > IndexProducer.asIndexArray removes duplicates (as it uses a BitSet to
   > collate the indices). So the List can have less entries than the number
   > produced by forEach (which contains the duplicates).
   >
   > You are generating 10 values in the range [0, 512). The likelihood of no
   > duplicates is (512)(511)(510)...(504)(503) / 512^10 = 0.915. So the
   > probability of a duplicate is around 8.5%. If you checkout this current PR
   > branch and run the DefaultIndexProducerTest repeatedly you will need to do
   > it about 10 times to see a failure. This is low enough that the 4 CI builds
   > may not hit a failure.
   >
   > Q. Should asIndexArray eliminate duplicates? If this is the correct
   > functionality then I can fix this test to handle the duplicates in forEach
   > and complete the merge.
   >
   > —
   > Reply to this email directly, view it on GitHub
   > <https://github.com/apache/commons-collections/pull/331#issuecomment-1242519168>,
   > or unsubscribe
   > <https://github.com/notifications/unsubscribe-auth/AASTVHWX2B24XKAENBCDRC3V5OY55ANCNFSM57K4LUZQ>
   > .
   > You are receiving this because you were mentioned.Message ID:
   > ***@***.***>
   >
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [commons-collections] aherbert commented on a diff in pull request #331: Collections 763: Remove BloomFilter constructors that create initial entry

Posted by GitBox <gi...@apache.org>.

aherbert commented on code in PR #331:
URL: https://github.com/apache/commons-collections/pull/331#discussion_r955784220


##########
src/test/java/org/apache/commons/collections4/bloomfilter/AbstractBloomFilterTest.java:
##########
@@ -62,49 +62,34 @@ protected final Shape getTestShape() {
      */
     protected abstract T createEmptyFilter(Shape shape);
 
-    /**
-     * Create the BloomFilter implementation we are testing.
-     *
-     * @param shape the shape of the filter.
-     * @param hasher the hasher to use to create the filter.
-     * @return a BloomFilter implementation.
-     */
-    protected abstract T createFilter(Shape shape, Hasher hasher);

Review Comment:
   This is a common pattern for _testing_.
   
   Out of curiosity I would ask you to search your use cases in your own production code to see how often you create a filter and then add only item immediately.
   
   I would think this is equivalent to having an ArrayList constructor that adds 1 item. This is not in the JDK. There are better alternatives for such a case such as a singleton list or using Arrays.asList.
   
   On top of this the item you are adding will require some source of indices (e.g. a Hasher, another filter, etc). So there will already be some lines of code to create that from the actual Object, or collection of Objects, that are being used to initialise the filter. 
   
   If you find use cases for this then we can add a more suitable functionality in later releases, or perhaps this exact functionality. For now the purpose is to reduce the initial public API to a manageable release and then receive feedback from the community on what features to enhance.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [commons-collections] aherbert commented on pull request #331: Collections 763: Remove BloomFilter constructors that create initial entry

Posted by GitBox <gi...@apache.org>.

aherbert commented on PR #331:
URL: https://github.com/apache/commons-collections/pull/331#issuecomment-1242519168

@Claudenw I merged this locally and was doing some minor code clean-up. It seems that there is a flaky test:

DefaultIndexProducerTest.testAsIndexArray

In AbstractIndexProducerTest.testAsIndexArray the producer is written to a list using asIndexArray. Then the producer forEach is called to remove entries from the list. However the default implementation of IndexProducer.asIndexArray removes duplicates (as it uses a BitSet to collate the indices). So the List<Integer> can have less entries than the number produced by forEach (which contains the duplicates).

You are generating 10 values in the range [0, 512). The likelihood of no duplicates is (512)(511)(510)...(504)(503) / 512^10 = 0.915. So the probability of a duplicate is around 8.5%. If you checkout this current PR branch and run the DefaultIndexProducerTest repeatedly you will need to do it about 10 times to see a failure. This is low enough that the 4 CI builds may not hit a failure.

Q. Should asIndexArray eliminate duplicates? If this is the correct functionality then I can fix this test to handle the duplicates in forEach and complete the merge.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [commons-collections] Claudenw commented on a diff in pull request #331: Collections 763: Remove BloomFilter constructors that create initial entry

Posted by GitBox <gi...@apache.org>.

Claudenw commented on code in PR #331:
URL: https://github.com/apache/commons-collections/pull/331#discussion_r955676036


##########
src/test/java/org/apache/commons/collections4/bloomfilter/AbstractBloomFilterTest.java:
##########
@@ -62,49 +62,34 @@ protected final Shape getTestShape() {
      */
     protected abstract T createEmptyFilter(Shape shape);
 
-    /**
-     * Create the BloomFilter implementation we are testing.
-     *
-     * @param shape the shape of the filter.
-     * @param hasher the hasher to use to create the filter.
-     * @return a BloomFilter implementation.
-     */
-    protected abstract T createFilter(Shape shape, Hasher hasher);

Review Comment:
   If this is a common pattern then why are we removing it from the interface?  I think the addition of the 2 merge functions is a good one, but I am still not convinced that the move from a simple constructor to 2 methods is a good one.  I think the convenience constructors should remain.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [commons-collections] Claudenw commented on a diff in pull request #331: Collections 763: Remove BloomFilter constructors that create initial entry

Posted by GitBox <gi...@apache.org>.

Claudenw commented on code in PR #331:
URL: https://github.com/apache/commons-collections/pull/331#discussion_r958113419


##########
src/test/java/org/apache/commons/collections4/bloomfilter/AbstractBloomFilterTest.java:
##########
@@ -62,49 +62,34 @@ protected final Shape getTestShape() {
      */
     protected abstract T createEmptyFilter(Shape shape);
 
-    /**
-     * Create the BloomFilter implementation we are testing.
-     *
-     * @param shape the shape of the filter.
-     * @param hasher the hasher to use to create the filter.
-     * @return a BloomFilter implementation.
-     */
-    protected abstract T createFilter(Shape shape, Hasher hasher);

Review Comment:
   changed



##########
src/test/java/org/apache/commons/collections4/bloomfilter/AbstractHasherTest.java:
##########
@@ -97,8 +97,8 @@ public void testUniqueIndex() {
         List<Integer> full = Arrays.stream(producer.asIndexArray()).boxed().collect(Collectors.toList());
         producer = hasher.uniqueIndices(shape);
         List<Integer> unique = Arrays.stream(producer.asIndexArray()).boxed().collect(Collectors.toList());
-        assertTrue( full.size() > unique.size() );
-        Set<Integer> set = new HashSet<Integer>( unique );
-        assertEquals( set.size(), unique.size() );
+        assertTrue(full.size() > unique.size());
+        Set<Integer> set = new HashSet<Integer>(unique);

Review Comment:
   changed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [commons-collections] Claudenw commented on a diff in pull request #331: Collections 763: Remove BloomFilter constructors that create initial entry

Posted by GitBox <gi...@apache.org>.

Claudenw commented on code in PR #331:
URL: https://github.com/apache/commons-collections/pull/331#discussion_r958608258


##########
src/test/java/org/apache/commons/collections4/bloomfilter/AbstractBloomFilterTest.java:
##########
@@ -93,16 +94,30 @@ protected final Shape getTestShape() {
      *
      */
     @Test
-    public void testConstructWithBadHasher() {
+    public void testMergeWithBadHasher() {
         // value too large
+        final BloomFilter f = createEmptyFilter(getTestShape());
         assertThrows(IllegalArgumentException.class,
-                () -> createFilter(getTestShape(), new BadHasher(getTestShape().getNumberOfBits())));
+                () -> f.merge(new BadHasher(getTestShape().getNumberOfBits())));
         // negative value
-        assertThrows(IllegalArgumentException.class, () -> createFilter(getTestShape(), new BadHasher(-1)));
+        BloomFilter f2 = createEmptyFilter(getTestShape());
+        assertThrows(IllegalArgumentException.class, () -> f2.merge(new BadHasher(-1)));
     }
 
     @Test
-    public void testConstructWitBitMapProducer() {
+    public void testMergeWithHasher() {
+        // value too large
+        final BloomFilter f = createEmptyFilter(getTestShape());
+        f.merge(from1);
+        int[] idx = f.asIndexArray();
+        assertEquals(getTestShape().getNumberOfHashFunctions(), idx.length);
+        for (int i=0; i<idx.length; i++) {

Review Comment:
   fixed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [commons-collections] aherbert commented on a diff in pull request #331: Collections 763: Remove BloomFilter constructors that create initial entry

Posted by GitBox <gi...@apache.org>.

aherbert commented on code in PR #331:
URL: https://github.com/apache/commons-collections/pull/331#discussion_r958353209


##########
src/test/java/org/apache/commons/collections4/properties/EmptyPropertiesTest.java:
##########
@@ -31,16 +31,39 @@
 import java.io.IOException;
 import java.io.PrintStream;
 import java.io.PrintWriter;
+import java.io.UnsupportedEncodingException;
 import java.nio.charset.StandardCharsets;
 import java.util.HashMap;
 import java.util.Properties;
-
 import org.apache.commons.io.input.NullReader;
 import org.apache.commons.lang3.ArrayUtils;
 import org.junit.jupiter.api.Test;
 
 public class EmptyPropertiesTest {
 
+    /**

Review Comment:
   You should not be including any changes to this package.



##########
src/test/java/org/apache/commons/collections4/bloomfilter/AbstractBloomFilterTest.java:
##########
@@ -93,16 +94,30 @@ protected final Shape getTestShape() {
      *
      */
     @Test
-    public void testConstructWithBadHasher() {
+    public void testMergeWithBadHasher() {
         // value too large
+        final BloomFilter f = createEmptyFilter(getTestShape());
         assertThrows(IllegalArgumentException.class,
-                () -> createFilter(getTestShape(), new BadHasher(getTestShape().getNumberOfBits())));
+                () -> f.merge(new BadHasher(getTestShape().getNumberOfBits())));
         // negative value
-        assertThrows(IllegalArgumentException.class, () -> createFilter(getTestShape(), new BadHasher(-1)));
+        BloomFilter f2 = createEmptyFilter(getTestShape());
+        assertThrows(IllegalArgumentException.class, () -> f2.merge(new BadHasher(-1)));
     }
 
     @Test
-    public void testConstructWitBitMapProducer() {
+    public void testMergeWithHasher() {
+        // value too large
+        final BloomFilter f = createEmptyFilter(getTestShape());
+        f.merge(from1);
+        int[] idx = f.asIndexArray();
+        assertEquals(getTestShape().getNumberOfHashFunctions(), idx.length);
+        for (int i=0; i<idx.length; i++) {

Review Comment:
   Whitespace:
   ```Java
   for (int i = 0; i < idx.length; i++) {
       assertEquals(i + 1, idx[i]);
   ```



##########
src/main/java/org/apache/commons/collections4/bloomfilter/BloomFilter.java:
##########
@@ -60,6 +60,11 @@ public interface BloomFilter extends IndexProducer, BitMapProducer {
      */
     Shape getShape();
 
+    /**

Review Comment:
   Why is the clear method being added here? Your PR branch should be rebased on the current master as this change should not be in this PR.



##########
src/test/java/org/apache/commons/collections4/bloomfilter/SparseBloomFilterTest.java:
##########
@@ -33,66 +33,31 @@ protected SparseBloomFilter createEmptyFilter(final Shape shape) {
 
     @Override
     protected SparseBloomFilter createFilter(final Shape shape, final Hasher hasher) {
-        return new SparseBloomFilter(shape, hasher);

Review Comment:
   Since these methods just do the same as `createEmptyFilter(Shape)` then `merge(source)` I do not understand why they have to be overridden. The implementation can be in the Abstract test:
   ```Java
   protected BloomFilter createFilter(final Shape shape, final Hasher hasher) {
       BloomFilter bf = createEmptyFilter(shape);
       bf.merge(hasher);
       return bf;
   }
   ```
   Is it because the return argument type is incorrect? Then this can be fixed with:
   ```Java
   protected SparseBloomFilter createFilter(final Shape shape, final Hasher hasher) {
       return (SparseBloomFilter) super.createFilter(shape, hasher);
   }
   ```
   Doing it this way ensures all filters are using the same create and merge method to populate the new filter.
   



##########
src/test/java/org/apache/commons/collections4/bloomfilter/SparseBloomFilterTest.java:
##########
@@ -33,66 +33,31 @@ protected SparseBloomFilter createEmptyFilter(final Shape shape) {
 
     @Override
     protected SparseBloomFilter createFilter(final Shape shape, final Hasher hasher) {
-        return new SparseBloomFilter(shape, hasher);
+        SparseBloomFilter bf = new SparseBloomFilter(shape);
+        bf.merge(hasher);
+        return bf;
     }
 
     @Override
     protected SparseBloomFilter createFilter(final Shape shape, final BitMapProducer producer) {
-        return new SparseBloomFilter(shape, producer);
+        SparseBloomFilter bf = new SparseBloomFilter(shape);
+        bf.merge(producer);
+        return bf;
     }
 
     @Override
     protected SparseBloomFilter createFilter(final Shape shape, final IndexProducer producer) {
-        return new SparseBloomFilter(shape, producer);
+        SparseBloomFilter bf = new SparseBloomFilter(shape);
+        bf.merge(producer);
+        return bf;
     }
 
-    private void executeNestedTest(SparseBloomFilterTest nestedTest) {
-        nestedTest.testContains();
-        nestedTest.testEstimateIntersection();
-        nestedTest.testEstimateN();
-        nestedTest.testEstimateUnion();
-        nestedTest.testIsFull();
-        nestedTest.testMerge();
-    }
-
-    @Test
-    public void testConstructors() {
-
-        // copy of Sparse
-        SparseBloomFilterTest nestedTest = new SparseBloomFilterTest() {
-
-            @Override
-            protected SparseBloomFilter createEmptyFilter(Shape shape) {
-                return new SparseBloomFilter(new SparseBloomFilter(shape));
-            }
-
-            @Override
-            protected SparseBloomFilter createFilter(Shape shape, Hasher hasher) {
-                return new SparseBloomFilter(new SparseBloomFilter(shape, hasher));
-            }
-        };
-        executeNestedTest(nestedTest);
-
-        // copy of Simple
-        nestedTest = new SparseBloomFilterTest() {
-
-            @Override
-            protected SparseBloomFilter createEmptyFilter(Shape shape) {
-                return new SparseBloomFilter(new SimpleBloomFilter(shape));
-            }
-
-            @Override
-            protected SparseBloomFilter createFilter(Shape shape, Hasher hasher) {
-                return new SparseBloomFilter(new SimpleBloomFilter(shape, hasher));
-            }
-        };
-        executeNestedTest(nestedTest);
-    }
 
     @Test
     public void testBitMapProducerEdgeCases() {
         int[] values = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 65, 66, 67, 68, 69, 70, 71 };
-        BloomFilter bf = createFilter(getTestShape(), IndexProducer.fromIndexArray(values));

Review Comment:
   Why not use `createFilter` here. You have already reinstated the method. You should roll back the changes throughout the test suite where you previously dropped the method.
   
   Your current diff seems to have too many changes. I think you should:
   
   - branch from master
   - copy all the modified classes in `main/java/...` over the existing classes
   - modify only the 'createFilter' methods in the test classes and any other location that previously used the removed constructors.
   - set your upstream to this PR branch in your github and force push the changes. That should clear all your current git history and just introduce the desired changes.
   
   



##########
pom.xml:
##########
@@ -588,9 +588,9 @@
     <commons.jira.pid>12310465</commons.jira.pid>
     <!-- The RC version used in the staging repository URL. -->
     <commons.rc.version>RC1</commons.rc.version>
-    <checkstyle.version>3.1.2</checkstyle.version>
+    <checkstyle.version>3.2.0</checkstyle.version>

Review Comment:
   The changes to pom.xml and changes.xml should not be here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [commons-collections] Claudenw commented on a diff in pull request #331: Collections 763: Remove BloomFilter constructors that create initial entry

Posted by GitBox <gi...@apache.org>.

Claudenw commented on code in PR #331:
URL: https://github.com/apache/commons-collections/pull/331#discussion_r958625714


##########
src/main/java/org/apache/commons/collections4/bloomfilter/SimpleBloomFilter.java:
##########
@@ -133,12 +81,9 @@ public SimpleBloomFilter copy() {
         return new SimpleBloomFilter(this);
     }
 
-    /**
-     * Performs a merge using an IndexProducer.
-     * @param indexProducer the IndexProducer to merge from.
-     * @throws IllegalArgumentException if producer sends illegal value.
-     */
-    private void merge(IndexProducer indexProducer) {
+    @Override
+    public boolean merge(IndexProducer indexProducer) {
+        Objects.requireNonNull(indexProducer, "indexProducer");
         indexProducer.forEachIndex(idx -> {
             if (idx < 0 || idx >= shape.getNumberOfBits()) {
                 throw new IllegalArgumentException(String.format(

Review Comment:
   fixed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [commons-collections] Claudenw commented on a diff in pull request #331: Collections 763: Remove BloomFilter constructors that create initial entry

Posted by GitBox <gi...@apache.org>.

Claudenw commented on code in PR #331:
URL: https://github.com/apache/commons-collections/pull/331#discussion_r958113644


##########
src/test/java/org/apache/commons/collections4/bloomfilter/DefaultBloomFilterTest.java:
##########
@@ -196,6 +196,7 @@ public AbstractDefaultBloomFilter copy() {
             result.indices.addAll(indices);
             return result;
         }
+

Review Comment:
   removed



##########
src/test/java/org/apache/commons/collections4/bloomfilter/SetOperationsTest.java:
##########
@@ -49,52 +49,63 @@ private static void assertSymmetricOperation(double expected, ToDoubleBiFunction
         assertEquals(expected, operation.applyAsDouble(filter2, filter1), "op(filter2, filter1)");
     }
 
+    private BloomFilter forTest(Shape shape, Hasher hasher) {

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [commons-collections] Claudenw commented on a diff in pull request #331: Collections 763: Remove BloomFilter constructors that create initial entry

Posted by GitBox <gi...@apache.org>.

Claudenw commented on code in PR #331:
URL: https://github.com/apache/commons-collections/pull/331#discussion_r958634755


##########
src/test/java/org/apache/commons/collections4/bloomfilter/SparseBloomFilterTest.java:
##########
@@ -33,66 +33,31 @@ protected SparseBloomFilter createEmptyFilter(final Shape shape) {
 
     @Override
     protected SparseBloomFilter createFilter(final Shape shape, final Hasher hasher) {
-        return new SparseBloomFilter(shape, hasher);

Review Comment:
   fixed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [commons-collections] aherbert commented on a diff in pull request #331: Collections 763: Remove BloomFilter constructors that create initial entry

Posted by GitBox <gi...@apache.org>.

aherbert commented on code in PR #331:
URL: https://github.com/apache/commons-collections/pull/331#discussion_r966184418


##########
src/test/java/org/apache/commons/collections4/bloomfilter/DefaultIndexProducerTest.java:
##########
@@ -0,0 +1,65 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.commons.collections4.bloomfilter;
+
+import static org.junit.jupiter.api.Assertions.assertEquals;
+
+import java.util.function.IntPredicate;
+
+import org.junit.jupiter.api.Test;
+
+public class DefaultIndexProducerTest extends AbstractIndexProducerTest {
+
+    @Override
+    protected IndexProducer createProducer() {
+        return IndexProducer.fromIndexArray(1, 2);
+    }
+
+    @Override
+    protected IndexProducer createEmptyProducer() {
+        return new IndexProducer() {
+
+            @Override
+            public boolean forEachIndex(IntPredicate predicate)
+            {
+                return true;
+            }
+        };
+    }
+    
+    @Test
+    public void testFromBitMapProducer() {
+        IndexProducer ip = IndexProducer.fromBitMapProducer(BitMapProducer.fromBitMapArray( 0x07ffL));
+        int[] ary = ip.asIndexArray();
+        assertEquals(11, ary.length);
+        for (int i=0;i<11;i++)
+        {
+            assertEquals(i, ary[i]);
+        }
+    }
+    
+    @Test
+    public void testFromIndexArray() {
+        IndexProducer ip = IndexProducer.fromIndexArray(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10);

Review Comment:
   Again this is a linear sequence.



##########
src/test/java/org/apache/commons/collections4/bloomfilter/DefaultBitMapProducerTest.java:
##########
@@ -52,4 +56,30 @@ public boolean forEachBitMap(LongPredicate predicate) {
             return true;
         }
     }
+    
+    @Test
+    public void testDefaultExpansion() {
+        BitMapProducer bmp = BitMapProducer.fromBitMapArray(0L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L);

Review Comment:
   Testing with a linear sequence is too simple. I would prefer some random indices.



##########
src/main/java/org/apache/commons/collections4/bloomfilter/BloomFilter.java:
##########
@@ -135,28 +135,57 @@ default boolean contains(BitMapProducer bitMapProducer) {
      * @param other The bloom filter to merge into this one.
      * @return true if the merge was successful
      */
-    boolean merge(BloomFilter other);
+    default boolean merge(BloomFilter other) {
+        return (characteristics() & SPARSE) != 0 ? merge( (IndexProducer) other ) : merge( (BitMapProducer) other);

Review Comment:
   remove single whitespace after `merge(`



##########
src/main/java/org/apache/commons/collections4/bloomfilter/CountingBloomFilter.java:
##########
@@ -77,6 +79,89 @@ public interface CountingBloomFilter extends BloomFilter, BitCountProducer {
 
     // Modification Operations
 
+    /**
+     * Merges the specified Bloom filter into this Bloom filter.
+     *
+     * <p>Specifically: all counts for the indexes identified by the {@code other} filter will be incremented by 1,</p>
+     *
+     * <p>Note: If the other filter is a counting Bloom filter the index counts are ignored and it is treated as an
+     * IndexProducer.</p>
+     *
+     * <p>This method will return {@code true} if the filter is valid after the operation.</p>
+     *
+     * @param other the other Bloom filter
+     * @return {@code true} if the removal was successful and the state is valid
+     * @see #isValid()
+     * @see #add(BitCountProducer)
+     */
+    default boolean merge(final BloomFilter other) {
+        Objects.requireNonNull(other, "other");
+        return merge((IndexProducer) other);
+    }
+
+    /**
+     * Merges the specified Hasher into this Bloom filter.
+     *
+     * <p>Specifically: all counts for the unique indexes identified by the {@code hasher} will be incremented by 1,</p>
+     *
+     * <p>This method will return {@code true} if the filter is valid after the operation.</p>
+     *
+     * @param hasher the hasher
+     * @return {@code true} if the removal was successful and the state is valid
+     * @see #isValid()
+     * @see #add(BitCountProducer)
+     */
+    default boolean merge(final Hasher hasher) {
+        Objects.requireNonNull(hasher, "hasher");
+        try {
+            return add(BitCountProducer.from(hasher.uniqueIndices(getShape())));
+        } catch (IndexOutOfBoundsException e) {
+            throw new IllegalArgumentException(
+                    String.format("Filter only accepts values in the [0,%d) range", getShape().getNumberOfBits()), e);
+        }
+    }
+
+    /**
+     * Merges the specified index producer into this Bloom filter.
+     *
+     * <p>Specifically: all counts for the indexes identified by the {@code indexProducer} will be incremented by 1,</p>
+     *
+     * <p>This method will return {@code true} if the filter is valid after the operation.</p>
+     *
+     * <p>Note: Indices that are returned multiple times will be incremented multiple times.</p>
+     *
+     * @param indexProducer the IndexProducer
+     * @return {@code true} if the removal was successful and the state is valid
+     * @see #isValid()
+     * @see #add(BitCountProducer)
+     */
+    default boolean merge(final IndexProducer indexProducer) {
+        Objects.requireNonNull(indexProducer, "producer");

Review Comment:
   `"indexProducer"`



##########
src/main/java/org/apache/commons/collections4/bloomfilter/package-info.java:
##########
@@ -20,58 +20,59 @@
  *
  * <h2>Background:</h2>
  *
- * <p>The Bloom filter is a probabilistic data structure that indicates where things are not.
- * Conceptually it is a bit vector. You create a Bloom filter by creating hashes
- * and converting those to enabled bits in the vector. Multiple Bloom filters may be merged
- * together into one Bloom filter.  It is possible to test if a filter {@code B} has merged into
- * another filter {@code A} by verifying that {@code (A & B) == B}.</p>
- *
- * <p>Bloom filters are generally used where hash
- * tables would be too large, or as a filter front end for longer processes. For example
- * most browsers have a Bloom filter that is built from all known bad URLs (ones that
- * serve up malware). When you enter a URL the browser builds a Bloom filter and checks to
- * see if it is "in" the bad URL filter. If not the URL is good, if it matches, then the
- * expensive lookup on a remote system is made to see if it actually is in the list. There
- * are lots of other uses, and in most cases the reason is to perform a fast check as a
- * gateway for a longer operation. </p>
+ * <p>The Bloom filter is a probabilistic data structure that indicates where things are not. Conceptually it is a bit
+ * vector. You create a Bloom filter by creating hashes and converting those to enabled bits in the vector. Multiple
+ * Bloom filters may be merged together into one Bloom filter. It is possible to test if a filter {@code B} has merged
+ * into another filter {@code A} by verifying that {@code (A & B) == B}.</p>
+ *
+ * <p>Bloom filters are generally used where hash tables would be too large, or as a filter front end for longer processes.
+ * For example most browsers have a Bloom filter that is built from all known bad URLs (ones that serve up malware).
+ * When you enter a URL the browser builds a Bloom filter and checks to see if it is "in" the bad URL filter. If not the
+ * URL is good, if it matches, then the expensive lookup on a remote system is made to see if it actually is in the
+ * list. There are lots of other uses, and in most cases the reason is to perform a fast check as a gateway for a longer
+ * operation.</p>
  *
  * <h3>BloomFilter</h3>
  *
- * <p>The Bloom filter architecture here is designed so that the implementation of the storage of bits is abstracted.
+ * <p>The Bloom filter architecture here is designed for speed of execution, so some methods like {@code merge}, {@code remove},
+ * {@code add}, and {@code subtract} may throw exceptions.  One an exception is thrown the state of the Bloom filter is unknown.
+ * The choice to use not use atomic transactions was made to achive maximum performance under correct usage.</p>
+ *
+ * <p>In addition the architecture is designed so that the implementation of the storage of bits is abstracted.
  * Programs that utilize the Bloom filters may use the {@code BitMapProducer} or {@code IndexProducer} to retrieve a
- * representation of the internal structure.  Additional methods are available in the {@code BitMap} to assist in
+ * representation of the internal structure. Additional methods are available in the {@code BitMap} to assist in
  * manipulation of the representations.</p>
  *
- * <p>The bloom filter code is an interface that requires implementation of 6 methods:</p>
+ * <p>The bloom filter code is an interface that requires implementation of 9 methods:</p>
  * <ul>
- * <li>{@code cardinality()}
- * returns the number of bits enabled in the Bloom filter.</li>
+ * <li>{@code cardinality()} returns the number of bits enabled in the Bloom filter.</li>
  *
- * <li>{@code contains(BitMapProducer)} which
- * returns true if the bits specified by the bit maps generated by the BitMapProducer are enabled in the Bloom filter.</li>
+ * <li>{@code characteristics()} which Returns a bitmap int of characteristics values.</li>
  *
- *  <li>{@code contains(IndexProducer)} which
- * returns true if the bits specified by the indices generated by IndexProducer are enabled in the Bloom filter.</li>
+ * <li>{@code clear()} which resets the Bloomfilter to its initial empty state.</li>
  *
- * <li>{@code getShape()} which
- * returns the shape the Bloom filter was created with.</li>
-
- * <li>{@code isSparse()} which
- * returns true if an the implementation tracks indices natively, false if bit maps are used.  In cases where
- * neither are used the {@code isSparse} return value should reflect which is faster to produce.</li>
+ * <li>{@code contains(IndexProducer)} which returns true if the bits specified by the indices generated by
+ * IndexProducer are enabled in the Bloom filter.</li>
+ *
+ * <li>{@code copy()} which returns a fresh copy of the bitmap.</li>
+ *
+ * <li>{@code getShape()} which returns the shape the Bloom filter was created with.</li>
+ *
+ * <li>{@code characteristics()} which an integer of characteristics flags.</li>
+ *
+ * <li>{@code merge(BitMapProducer)} which Merges the BitMaps from the BitMapProducer into the internal
+ * representation of the Bloom filter.</li>
  *
- * <li>{@code mergeInPlace(BloomFilter)} which
- * utilizes either the {@code BitMapProducer} or {@code IndexProducer} from the argument to enable extra bits
- * in the internal representation of the Bloom filter.</li>
+ * <li>{@code merge(IndexProducer)} which Merges the indices from the IndexProducer into the internal

Review Comment:
   `merges`



##########
src/test/java/org/apache/commons/collections4/bloomfilter/DefaultBloomFilterTest.java:
##########
@@ -163,28 +123,30 @@ public boolean merge(BloomFilter other) {
                             String.format("Value in list %s is less than 0", indices.first()));
                 }
             }
-            return true;
         }
 
         @Override
-        public int cardinality() {
-            return indices.size();
+        public boolean merge(IndexProducer indexProducer) {
+            boolean result = indexProducer.forEachIndex((x) -> {

Review Comment:
   You do not need parentheses around the `(x)` for single argument lambda functions.



##########
src/test/java/org/apache/commons/collections4/bloomfilter/DefaultBitMapProducerTest.java:
##########
@@ -56,26 +56,25 @@ public boolean forEachBitMap(LongPredicate predicate) {
             return true;
         }
     }
-    
+
     @Test
     public void testDefaultExpansion() {
         BitMapProducer bmp = BitMapProducer.fromBitMapArray(0L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L);
         long[] ary = bmp.asBitMapArray();
         assertEquals( 17,  ary.length);
-        for (int i=0;i<17;i++)
-        {
-            assertEquals((long)i, ary[i]);
+        for (int i=0; i<17; i++) {

Review Comment:
   Typically we use whitespace around this: `int i = 0; i < 17; i++)`
   
   In this case I think the test should be modified to create an expected array and use assertArrayEquals



##########
src/main/java/org/apache/commons/collections4/bloomfilter/package-info.java:
##########
@@ -20,58 +20,59 @@
  *
  * <h2>Background:</h2>
  *
- * <p>The Bloom filter is a probabilistic data structure that indicates where things are not.
- * Conceptually it is a bit vector. You create a Bloom filter by creating hashes
- * and converting those to enabled bits in the vector. Multiple Bloom filters may be merged
- * together into one Bloom filter.  It is possible to test if a filter {@code B} has merged into
- * another filter {@code A} by verifying that {@code (A & B) == B}.</p>
- *
- * <p>Bloom filters are generally used where hash
- * tables would be too large, or as a filter front end for longer processes. For example
- * most browsers have a Bloom filter that is built from all known bad URLs (ones that
- * serve up malware). When you enter a URL the browser builds a Bloom filter and checks to
- * see if it is "in" the bad URL filter. If not the URL is good, if it matches, then the
- * expensive lookup on a remote system is made to see if it actually is in the list. There
- * are lots of other uses, and in most cases the reason is to perform a fast check as a
- * gateway for a longer operation. </p>
+ * <p>The Bloom filter is a probabilistic data structure that indicates where things are not. Conceptually it is a bit
+ * vector. You create a Bloom filter by creating hashes and converting those to enabled bits in the vector. Multiple
+ * Bloom filters may be merged together into one Bloom filter. It is possible to test if a filter {@code B} has merged
+ * into another filter {@code A} by verifying that {@code (A & B) == B}.</p>
+ *
+ * <p>Bloom filters are generally used where hash tables would be too large, or as a filter front end for longer processes.
+ * For example most browsers have a Bloom filter that is built from all known bad URLs (ones that serve up malware).
+ * When you enter a URL the browser builds a Bloom filter and checks to see if it is "in" the bad URL filter. If not the
+ * URL is good, if it matches, then the expensive lookup on a remote system is made to see if it actually is in the
+ * list. There are lots of other uses, and in most cases the reason is to perform a fast check as a gateway for a longer
+ * operation.</p>
  *
  * <h3>BloomFilter</h3>
  *
- * <p>The Bloom filter architecture here is designed so that the implementation of the storage of bits is abstracted.
+ * <p>The Bloom filter architecture here is designed for speed of execution, so some methods like {@code merge}, {@code remove},
+ * {@code add}, and {@code subtract} may throw exceptions.  One an exception is thrown the state of the Bloom filter is unknown.

Review Comment:
   `Once` an exception is thrown.



##########
src/test/java/org/apache/commons/collections4/bloomfilter/DefaultIndexProducerTest.java:
##########
@@ -0,0 +1,65 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.commons.collections4.bloomfilter;
+
+import static org.junit.jupiter.api.Assertions.assertEquals;
+
+import java.util.function.IntPredicate;
+
+import org.junit.jupiter.api.Test;
+
+public class DefaultIndexProducerTest extends AbstractIndexProducerTest {
+
+    @Override
+    protected IndexProducer createProducer() {
+        return IndexProducer.fromIndexArray(1, 2);
+    }
+
+    @Override
+    protected IndexProducer createEmptyProducer() {
+        return new IndexProducer() {
+
+            @Override
+            public boolean forEachIndex(IntPredicate predicate)
+            {
+                return true;
+            }
+        };
+    }
+    
+    @Test
+    public void testFromBitMapProducer() {
+        IndexProducer ip = IndexProducer.fromBitMapProducer(BitMapProducer.fromBitMapArray( 0x07ffL));

Review Comment:
   These tests use a continuous range. A more generic test could use the BitMap class to create the input:
   ```Java
   // Randomly create
   int[] expected = {13, 3, 18, 1, 31, 57, 7};
   long[] bits = new long[BitMap.numberOfBitMaps(Arrays.stream(expected).max().getAsInt())];
   for (int bitIndex : expected) {
       BitMap.set(bits, bitIndex);
   }
   IndexProducer ip = IndexProducer.fromBitMapProducer(BitMapProducer.fromBitMapArray(bits));
   int[] ary = ip.asIndexArray();
   Arrays.sort(expected);
   assertArrayEquals(expected, ary);
   ```



##########
src/main/java/org/apache/commons/collections4/bloomfilter/SparseBloomFilter.java:
##########
@@ -140,23 +78,27 @@ private boolean add(int idx) {
         return true;
     }
 
-    /**
-     * Performs a merge using an IndexProducer.
-     * @param indexProducer the IndexProducer to merge from.
-     * @throws IllegalArgumentException if producer sends illegal value.
-     */
-    private void merge(IndexProducer indexProducer) {
+    @Override
+    public boolean merge(IndexProducer indexProducer) {
+        Objects.requireNonNull(indexProducer, "indexProducer");
         indexProducer.forEachIndex(this::add);
         if (!this.indices.isEmpty()) {
             if (this.indices.last() >= shape.getNumberOfBits()) {
                 throw new IllegalArgumentException(String.format("Value in list %s is greater than maximum value (%s)",
-                        this.indices.last(), shape.getNumberOfBits()));
+                        this.indices.last(), shape.getNumberOfBits()-1));

Review Comment:
   Whitepsace formatting: `shape.getNumberOfBits() - 1`



##########
src/main/java/org/apache/commons/collections4/bloomfilter/CountingBloomFilter.java:
##########
@@ -77,6 +79,89 @@ public interface CountingBloomFilter extends BloomFilter, BitCountProducer {
 
     // Modification Operations
 
+    /**
+     * Merges the specified Bloom filter into this Bloom filter.
+     *
+     * <p>Specifically: all counts for the indexes identified by the {@code other} filter will be incremented by 1,</p>
+     *
+     * <p>Note: If the other filter is a counting Bloom filter the index counts are ignored and it is treated as an
+     * IndexProducer.</p>
+     *
+     * <p>This method will return {@code true} if the filter is valid after the operation.</p>
+     *
+     * @param other the other Bloom filter
+     * @return {@code true} if the removal was successful and the state is valid
+     * @see #isValid()
+     * @see #add(BitCountProducer)
+     */
+    default boolean merge(final BloomFilter other) {
+        Objects.requireNonNull(other, "other");
+        return merge((IndexProducer) other);
+    }
+
+    /**
+     * Merges the specified Hasher into this Bloom filter.
+     *
+     * <p>Specifically: all counts for the unique indexes identified by the {@code hasher} will be incremented by 1,</p>
+     *
+     * <p>This method will return {@code true} if the filter is valid after the operation.</p>
+     *
+     * @param hasher the hasher
+     * @return {@code true} if the removal was successful and the state is valid
+     * @see #isValid()
+     * @see #add(BitCountProducer)
+     */
+    default boolean merge(final Hasher hasher) {
+        Objects.requireNonNull(hasher, "hasher");
+        try {

Review Comment:
   Why not:
   ```Java
   Objects.requireNonNull(hasher, "hasher");
   merge(hasher.uniqueIndices(getShape()));
   ```
   
   This has parity with the default `remove` method.



##########
src/main/java/org/apache/commons/collections4/bloomfilter/package-info.java:
##########
@@ -20,58 +20,59 @@
  *
  * <h2>Background:</h2>
  *
- * <p>The Bloom filter is a probabilistic data structure that indicates where things are not.
- * Conceptually it is a bit vector. You create a Bloom filter by creating hashes
- * and converting those to enabled bits in the vector. Multiple Bloom filters may be merged
- * together into one Bloom filter.  It is possible to test if a filter {@code B} has merged into
- * another filter {@code A} by verifying that {@code (A & B) == B}.</p>
- *
- * <p>Bloom filters are generally used where hash
- * tables would be too large, or as a filter front end for longer processes. For example
- * most browsers have a Bloom filter that is built from all known bad URLs (ones that
- * serve up malware). When you enter a URL the browser builds a Bloom filter and checks to
- * see if it is "in" the bad URL filter. If not the URL is good, if it matches, then the
- * expensive lookup on a remote system is made to see if it actually is in the list. There
- * are lots of other uses, and in most cases the reason is to perform a fast check as a
- * gateway for a longer operation. </p>
+ * <p>The Bloom filter is a probabilistic data structure that indicates where things are not. Conceptually it is a bit
+ * vector. You create a Bloom filter by creating hashes and converting those to enabled bits in the vector. Multiple
+ * Bloom filters may be merged together into one Bloom filter. It is possible to test if a filter {@code B} has merged
+ * into another filter {@code A} by verifying that {@code (A & B) == B}.</p>
+ *
+ * <p>Bloom filters are generally used where hash tables would be too large, or as a filter front end for longer processes.
+ * For example most browsers have a Bloom filter that is built from all known bad URLs (ones that serve up malware).
+ * When you enter a URL the browser builds a Bloom filter and checks to see if it is "in" the bad URL filter. If not the
+ * URL is good, if it matches, then the expensive lookup on a remote system is made to see if it actually is in the
+ * list. There are lots of other uses, and in most cases the reason is to perform a fast check as a gateway for a longer
+ * operation.</p>
  *
  * <h3>BloomFilter</h3>
  *
- * <p>The Bloom filter architecture here is designed so that the implementation of the storage of bits is abstracted.
+ * <p>The Bloom filter architecture here is designed for speed of execution, so some methods like {@code merge}, {@code remove},
+ * {@code add}, and {@code subtract} may throw exceptions.  One an exception is thrown the state of the Bloom filter is unknown.
+ * The choice to use not use atomic transactions was made to achive maximum performance under correct usage.</p>

Review Comment:
   `achieve`



##########
src/main/java/org/apache/commons/collections4/bloomfilter/CountingBloomFilter.java:
##########
@@ -110,7 +198,52 @@ public interface CountingBloomFilter extends BloomFilter, BitCountProducer {
      * @see #isValid()
      * @see #subtract(BitCountProducer)
      */
-    boolean remove(Hasher hasher);
+    default boolean remove(final Hasher hasher) {
+        Objects.requireNonNull(hasher, "hasher");
+        return remove(hasher.uniqueIndices(getShape()));
+    }
+
+    /**
+     * Removes the values from the specified IndexProducer from the Bloom filter from this Bloom filter.
+     *
+     * <p>Specifically all counts for the unique indices produced by the {@code hasher} will be
+     * decremented by 1.</p>
+     *
+     * <p>This method will return {@code true} if the filter is valid after the operation.</p>
+     *
+     * <p>Node: This method expects index producers that produce unique values.</p>
+     *
+     * @param indexProducer the IndexProducer to provide the indexes
+     * @return {@code true} if the removal was successful and the state is valid
+     * @see #isValid()
+     * @see #subtract(BitCountProducer)
+     */
+    default boolean remove(final IndexProducer indexProducer) {
+        Objects.requireNonNull(indexProducer, "indexProducer");
+        try {
+            return subtract(BitCountProducer.from(indexProducer));
+        } catch (IndexOutOfBoundsException e) {
+            throw new IllegalArgumentException(
+                    String.format("Filter only accepts values in the [0,%d) range", getShape().getNumberOfBits()));
+        }
+    }
+
+    /**
+     * Removes the specified BitMapProducer from this Bloom filter.
+     *
+     * <p>Specifically all counts for the indices produced by the {@code bitMapProducer} will be
+     * decremented by 1.</p>
+     *
+     * <p>This method will return {@code true} if the filter is valid after the operation.</p>
+     *
+     * @param bitMapProducer the BitMapProducer to provide the indexes
+     * @return {@code true} if the removal was successful and the state is valid
+     * @see #isValid()
+     * @see #subtract(BitCountProducer)
+     */
+    default boolean remove(final BitMapProducer bitMapProducer) {
+        return remove(IndexProducer.fromBitMapProducer(bitMapProducer));

Review Comment:
   Missing `Objects.requireNonNull(bitMapProducer, "bitMapProducer");`



##########
src/test/java/org/apache/commons/collections4/bloomfilter/AbstractBloomFilterTest.java:
##########
@@ -93,16 +94,30 @@ protected final Shape getTestShape() {
      *
      */
     @Test
-    public void testConstructWithBadHasher() {
+    public void testMergeWithBadHasher() {
         // value too large
+        final BloomFilter f = createEmptyFilter(getTestShape());
         assertThrows(IllegalArgumentException.class,
-                () -> createFilter(getTestShape(), new BadHasher(getTestShape().getNumberOfBits())));
+                () -> f.merge(new BadHasher(getTestShape().getNumberOfBits())));
         // negative value
-        assertThrows(IllegalArgumentException.class, () -> createFilter(getTestShape(), new BadHasher(-1)));
+        BloomFilter f2 = createEmptyFilter(getTestShape());
+        assertThrows(IllegalArgumentException.class, () -> f2.merge(new BadHasher(-1)));
     }
 
     @Test
-    public void testConstructWitBitMapProducer() {
+    public void testMergeWithHasher() {
+        // value too large
+        final BloomFilter f = createEmptyFilter(getTestShape());
+        f.merge(from1);
+        int[] idx = f.asIndexArray();
+        assertEquals(getTestShape().getNumberOfHashFunctions(), idx.length);
+        for (int i=0; i<idx.length; i++) {

Review Comment:
   Not seeing the fix in the `for` line
   



##########
src/main/java/org/apache/commons/collections4/bloomfilter/package-info.java:
##########
@@ -20,58 +20,59 @@
  *
  * <h2>Background:</h2>
  *
- * <p>The Bloom filter is a probabilistic data structure that indicates where things are not.
- * Conceptually it is a bit vector. You create a Bloom filter by creating hashes
- * and converting those to enabled bits in the vector. Multiple Bloom filters may be merged
- * together into one Bloom filter.  It is possible to test if a filter {@code B} has merged into
- * another filter {@code A} by verifying that {@code (A & B) == B}.</p>
- *
- * <p>Bloom filters are generally used where hash
- * tables would be too large, or as a filter front end for longer processes. For example
- * most browsers have a Bloom filter that is built from all known bad URLs (ones that
- * serve up malware). When you enter a URL the browser builds a Bloom filter and checks to
- * see if it is "in" the bad URL filter. If not the URL is good, if it matches, then the
- * expensive lookup on a remote system is made to see if it actually is in the list. There
- * are lots of other uses, and in most cases the reason is to perform a fast check as a
- * gateway for a longer operation. </p>
+ * <p>The Bloom filter is a probabilistic data structure that indicates where things are not. Conceptually it is a bit
+ * vector. You create a Bloom filter by creating hashes and converting those to enabled bits in the vector. Multiple
+ * Bloom filters may be merged together into one Bloom filter. It is possible to test if a filter {@code B} has merged
+ * into another filter {@code A} by verifying that {@code (A & B) == B}.</p>
+ *
+ * <p>Bloom filters are generally used where hash tables would be too large, or as a filter front end for longer processes.
+ * For example most browsers have a Bloom filter that is built from all known bad URLs (ones that serve up malware).
+ * When you enter a URL the browser builds a Bloom filter and checks to see if it is "in" the bad URL filter. If not the
+ * URL is good, if it matches, then the expensive lookup on a remote system is made to see if it actually is in the
+ * list. There are lots of other uses, and in most cases the reason is to perform a fast check as a gateway for a longer
+ * operation.</p>
  *
  * <h3>BloomFilter</h3>
  *
- * <p>The Bloom filter architecture here is designed so that the implementation of the storage of bits is abstracted.
+ * <p>The Bloom filter architecture here is designed for speed of execution, so some methods like {@code merge}, {@code remove},
+ * {@code add}, and {@code subtract} may throw exceptions.  One an exception is thrown the state of the Bloom filter is unknown.
+ * The choice to use not use atomic transactions was made to achive maximum performance under correct usage.</p>
+ *
+ * <p>In addition the architecture is designed so that the implementation of the storage of bits is abstracted.
  * Programs that utilize the Bloom filters may use the {@code BitMapProducer} or {@code IndexProducer} to retrieve a
- * representation of the internal structure.  Additional methods are available in the {@code BitMap} to assist in
+ * representation of the internal structure. Additional methods are available in the {@code BitMap} to assist in
  * manipulation of the representations.</p>
  *
- * <p>The bloom filter code is an interface that requires implementation of 6 methods:</p>
+ * <p>The bloom filter code is an interface that requires implementation of 9 methods:</p>
  * <ul>
- * <li>{@code cardinality()}
- * returns the number of bits enabled in the Bloom filter.</li>
+ * <li>{@code cardinality()} returns the number of bits enabled in the Bloom filter.</li>
  *
- * <li>{@code contains(BitMapProducer)} which
- * returns true if the bits specified by the bit maps generated by the BitMapProducer are enabled in the Bloom filter.</li>
+ * <li>{@code characteristics()} which Returns a bitmap int of characteristics values.</li>

Review Comment:
   `returns`



##########
src/main/java/org/apache/commons/collections4/bloomfilter/package-info.java:
##########
@@ -20,58 +20,59 @@
  *
  * <h2>Background:</h2>
  *
- * <p>The Bloom filter is a probabilistic data structure that indicates where things are not.
- * Conceptually it is a bit vector. You create a Bloom filter by creating hashes
- * and converting those to enabled bits in the vector. Multiple Bloom filters may be merged
- * together into one Bloom filter.  It is possible to test if a filter {@code B} has merged into
- * another filter {@code A} by verifying that {@code (A & B) == B}.</p>
- *
- * <p>Bloom filters are generally used where hash
- * tables would be too large, or as a filter front end for longer processes. For example
- * most browsers have a Bloom filter that is built from all known bad URLs (ones that
- * serve up malware). When you enter a URL the browser builds a Bloom filter and checks to
- * see if it is "in" the bad URL filter. If not the URL is good, if it matches, then the
- * expensive lookup on a remote system is made to see if it actually is in the list. There
- * are lots of other uses, and in most cases the reason is to perform a fast check as a
- * gateway for a longer operation. </p>
+ * <p>The Bloom filter is a probabilistic data structure that indicates where things are not. Conceptually it is a bit
+ * vector. You create a Bloom filter by creating hashes and converting those to enabled bits in the vector. Multiple
+ * Bloom filters may be merged together into one Bloom filter. It is possible to test if a filter {@code B} has merged
+ * into another filter {@code A} by verifying that {@code (A & B) == B}.</p>
+ *
+ * <p>Bloom filters are generally used where hash tables would be too large, or as a filter front end for longer processes.
+ * For example most browsers have a Bloom filter that is built from all known bad URLs (ones that serve up malware).
+ * When you enter a URL the browser builds a Bloom filter and checks to see if it is "in" the bad URL filter. If not the
+ * URL is good, if it matches, then the expensive lookup on a remote system is made to see if it actually is in the
+ * list. There are lots of other uses, and in most cases the reason is to perform a fast check as a gateway for a longer
+ * operation.</p>
  *
  * <h3>BloomFilter</h3>
  *
- * <p>The Bloom filter architecture here is designed so that the implementation of the storage of bits is abstracted.
+ * <p>The Bloom filter architecture here is designed for speed of execution, so some methods like {@code merge}, {@code remove},
+ * {@code add}, and {@code subtract} may throw exceptions.  One an exception is thrown the state of the Bloom filter is unknown.
+ * The choice to use not use atomic transactions was made to achive maximum performance under correct usage.</p>
+ *
+ * <p>In addition the architecture is designed so that the implementation of the storage of bits is abstracted.
  * Programs that utilize the Bloom filters may use the {@code BitMapProducer} or {@code IndexProducer} to retrieve a
- * representation of the internal structure.  Additional methods are available in the {@code BitMap} to assist in
+ * representation of the internal structure. Additional methods are available in the {@code BitMap} to assist in
  * manipulation of the representations.</p>
  *
- * <p>The bloom filter code is an interface that requires implementation of 6 methods:</p>
+ * <p>The bloom filter code is an interface that requires implementation of 9 methods:</p>

Review Comment:
   If you drop the explicit 9 then you will not be caught out in the future, e.g. implementation of the following methods:
   
   Would it help to have each method inserted not with `@code` but a `@link` to the method. This will be caught by javadoc parsing if the method names are incorrect.
   
   ```
   // Will render as 'cardinality()'
   {@link BloomFilter#cardinality()}
   // Will render as 'cardinality'
   {@link BloomFilter#cardinality() cardinality}
   ```
   



##########
src/main/java/org/apache/commons/collections4/bloomfilter/BloomFilter.java:
##########
@@ -135,28 +135,57 @@ default boolean contains(BitMapProducer bitMapProducer) {
      * @param other The bloom filter to merge into this one.
      * @return true if the merge was successful
      */
-    boolean merge(BloomFilter other);
+    default boolean merge(BloomFilter other) {
+        return (characteristics() & SPARSE) != 0 ? merge( (IndexProducer) other ) : merge( (BitMapProducer) other);
+    }
 
     /**
      * Merges the specified hasher into this Bloom filter. Specifically all
      * bit indexes that are identified by the {@code hasher} will be enabled in this filter.
      *
      * <p><em>Note: This method should return {@code true} even if no additional bit indexes were
      * enabled. A {@code false} result indicates that this filter may or may not contain
-     * the {@code other} Bloom filter.</em>  This state may occur in complex Bloom filter implementations like
+     * the {@code hasher} values.</em>  This state may occur in complex Bloom filter implementations like
      * counting Bloom filters.</p>
      *
      * @param hasher The hasher to merge.
      * @return true if the merge was successful
      */
     default boolean merge(Hasher hasher) {
         Objects.requireNonNull(hasher, "hasher");
-        Shape shape = getShape();
-        // create the Bloom filter that is most likely to merge quickly with this one
-        BloomFilter result = (characteristics() & SPARSE) != 0 ? new SparseBloomFilter(shape, hasher) : new SimpleBloomFilter(shape, hasher);
-        return merge(result);
+        return merge( hasher.indices(getShape()));

Review Comment:
   remove single whitespace after `merge(`



##########
src/main/java/org/apache/commons/collections4/bloomfilter/package-info.java:
##########
@@ -20,58 +20,59 @@
  *
  * <h2>Background:</h2>
  *
- * <p>The Bloom filter is a probabilistic data structure that indicates where things are not.
- * Conceptually it is a bit vector. You create a Bloom filter by creating hashes
- * and converting those to enabled bits in the vector. Multiple Bloom filters may be merged
- * together into one Bloom filter.  It is possible to test if a filter {@code B} has merged into
- * another filter {@code A} by verifying that {@code (A & B) == B}.</p>
- *
- * <p>Bloom filters are generally used where hash
- * tables would be too large, or as a filter front end for longer processes. For example
- * most browsers have a Bloom filter that is built from all known bad URLs (ones that
- * serve up malware). When you enter a URL the browser builds a Bloom filter and checks to
- * see if it is "in" the bad URL filter. If not the URL is good, if it matches, then the
- * expensive lookup on a remote system is made to see if it actually is in the list. There
- * are lots of other uses, and in most cases the reason is to perform a fast check as a
- * gateway for a longer operation. </p>
+ * <p>The Bloom filter is a probabilistic data structure that indicates where things are not. Conceptually it is a bit
+ * vector. You create a Bloom filter by creating hashes and converting those to enabled bits in the vector. Multiple
+ * Bloom filters may be merged together into one Bloom filter. It is possible to test if a filter {@code B} has merged
+ * into another filter {@code A} by verifying that {@code (A & B) == B}.</p>
+ *
+ * <p>Bloom filters are generally used where hash tables would be too large, or as a filter front end for longer processes.
+ * For example most browsers have a Bloom filter that is built from all known bad URLs (ones that serve up malware).
+ * When you enter a URL the browser builds a Bloom filter and checks to see if it is "in" the bad URL filter. If not the
+ * URL is good, if it matches, then the expensive lookup on a remote system is made to see if it actually is in the
+ * list. There are lots of other uses, and in most cases the reason is to perform a fast check as a gateway for a longer
+ * operation.</p>
  *
  * <h3>BloomFilter</h3>
  *
- * <p>The Bloom filter architecture here is designed so that the implementation of the storage of bits is abstracted.
+ * <p>The Bloom filter architecture here is designed for speed of execution, so some methods like {@code merge}, {@code remove},
+ * {@code add}, and {@code subtract} may throw exceptions.  One an exception is thrown the state of the Bloom filter is unknown.
+ * The choice to use not use atomic transactions was made to achive maximum performance under correct usage.</p>
+ *
+ * <p>In addition the architecture is designed so that the implementation of the storage of bits is abstracted.
  * Programs that utilize the Bloom filters may use the {@code BitMapProducer} or {@code IndexProducer} to retrieve a
- * representation of the internal structure.  Additional methods are available in the {@code BitMap} to assist in
+ * representation of the internal structure. Additional methods are available in the {@code BitMap} to assist in
  * manipulation of the representations.</p>
  *
- * <p>The bloom filter code is an interface that requires implementation of 6 methods:</p>
+ * <p>The bloom filter code is an interface that requires implementation of 9 methods:</p>
  * <ul>
- * <li>{@code cardinality()}
- * returns the number of bits enabled in the Bloom filter.</li>
+ * <li>{@code cardinality()} returns the number of bits enabled in the Bloom filter.</li>
  *
- * <li>{@code contains(BitMapProducer)} which
- * returns true if the bits specified by the bit maps generated by the BitMapProducer are enabled in the Bloom filter.</li>
+ * <li>{@code characteristics()} which Returns a bitmap int of characteristics values.</li>
  *
- *  <li>{@code contains(IndexProducer)} which
- * returns true if the bits specified by the indices generated by IndexProducer are enabled in the Bloom filter.</li>
+ * <li>{@code clear()} which resets the Bloomfilter to its initial empty state.</li>
  *
- * <li>{@code getShape()} which
- * returns the shape the Bloom filter was created with.</li>
-
- * <li>{@code isSparse()} which
- * returns true if an the implementation tracks indices natively, false if bit maps are used.  In cases where
- * neither are used the {@code isSparse} return value should reflect which is faster to produce.</li>
+ * <li>{@code contains(IndexProducer)} which returns true if the bits specified by the indices generated by
+ * IndexProducer are enabled in the Bloom filter.</li>
+ *
+ * <li>{@code copy()} which returns a fresh copy of the bitmap.</li>
+ *
+ * <li>{@code getShape()} which returns the shape the Bloom filter was created with.</li>
+ *
+ * <li>{@code characteristics()} which an integer of characteristics flags.</li>
+ *
+ * <li>{@code merge(BitMapProducer)} which Merges the BitMaps from the BitMapProducer into the internal

Review Comment:
   `merges`



##########
src/main/java/org/apache/commons/collections4/bloomfilter/SimpleBloomFilter.java:
##########
@@ -133,12 +81,9 @@ public SimpleBloomFilter copy() {
         return new SimpleBloomFilter(this);
     }
 
-    /**
-     * Performs a merge using an IndexProducer.
-     * @param indexProducer the IndexProducer to merge from.
-     * @throws IllegalArgumentException if producer sends illegal value.
-     */
-    private void merge(IndexProducer indexProducer) {
+    @Override
+    public boolean merge(IndexProducer indexProducer) {
+        Objects.requireNonNull(indexProducer, "indexProducer");
         indexProducer.forEachIndex(idx -> {
             if (idx < 0 || idx >= shape.getNumberOfBits()) {
                 throw new IllegalArgumentException(String.format(

Review Comment:
   Not yet fixed. It should be:
   `"IndexProducer should only send values in the range[0,%s)", shape.getNumberOfBits()));`



##########
src/main/java/org/apache/commons/collections4/bloomfilter/package-info.java:
##########
@@ -80,22 +81,23 @@
  *
  * <h3>Hasher</h3>
  *
- * <p>A Hasher converts bytes into a series of integers based on a Shape.  With the exception of the HasherCollecton,
- *  each hasher represents one item being added to the Bloom filter.  The HasherCollection represents the
- *  number of items as the sum of the number of items represented by the Hashers in the collection.</p>
+ * <p>A Hasher converts bytes into a series of integers based on a Shape. With the exception of the HasherCollecton,
+ * each hasher represents one item being added to the Bloom filter. The HasherCollection represents the number of
+ * items as the sum of the number of items represented by the Hashers in the collection.</p>
  *
- *  <p>The SimpleHasher uses a combinatorial generation technique to create the integers. It is easily
- *  initialized by using a standard {@code MessageDigest} or other Hash function to hash the item to insert and
- *  then splitting the hash bytes in half and considering each as a long value.</p>
+ * <p>The EnhancedDoubleHasher uses a combinatorial generation technique to create the integers. It is easily
+ * initialized by using a byte array returned by the standard {@code MessageDigest} or other Hash function to

Review Comment:
   `hash function`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [commons-collections] aherbert commented on pull request #331: Collections 763: Remove BloomFilter constructors that create initial entry

Posted by GitBox <gi...@apache.org>.

aherbert commented on PR #331:
URL: https://github.com/apache/commons-collections/pull/331#issuecomment-1231755539

   @Claudenw The CI build failed on coverage. Some of this is not related to your changes. However these methods have limited coverage, see the [Codecov report](https://app.codecov.io/gh/apache/commons-collections/compare/331/tree/src/main/java/org/apache/commons/collections4/bloomfilter):
   
   ```
   CountingBloomFilter
   
   // Not covering catching the IndexOutOfBoundException
   default boolean remove(final IndexProducer indexProducer) {
   
   // Never called
   default boolean remove(final BitMapProducer bitMapProducer)
   
   
   SimpleBloomFilter
   
   // Does not trigger the condition 'if (idxLimit < idx[0])'
   public boolean merge(BitMapProducer bitMapProducer) {
   ```
   
   I am not sure if these are new coverage changes due to the reorganisation of the tests. Perhaps some methods are no longer being called.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [commons-collections] Claudenw commented on a diff in pull request #331: Collections 763: Remove BloomFilter constructors that create initial entry

Posted by GitBox <gi...@apache.org>.

Claudenw commented on code in PR #331:
URL: https://github.com/apache/commons-collections/pull/331#discussion_r958610813


##########
src/test/java/org/apache/commons/collections4/bloomfilter/AbstractBloomFilterTest.java:
##########
@@ -316,20 +345,20 @@ public final void testMerge() {
         assertThrows(IllegalArgumentException.class, () -> bf1.merge(new BadHasher(-1)));
 
         // test error when bloom filter returns values out of range
-        final BloomFilter bf5 = new SimpleBloomFilter(
-                Shape.fromKM(getTestShape().getNumberOfHashFunctions(), 3 * Long.SIZE),
-                new IncrementingHasher(Long.SIZE * 2, 1));
+        BloomFilter bf5 = new SimpleBloomFilter(
+                Shape.fromKM(getTestShape().getNumberOfHashFunctions(), 3 * Long.SIZE));
+        bf5.merge(new IncrementingHasher(Long.SIZE * 2, 1));
         assertThrows(IllegalArgumentException.class, () -> bf1.merge(bf5));
 
-        final BloomFilter bf6 = new SparseBloomFilter(
-                Shape.fromKM(getTestShape().getNumberOfHashFunctions(), 3 * Long.SIZE),
-                new IncrementingHasher(Long.SIZE * 2, 1));
+        BloomFilter bf6 = new SparseBloomFilter(
+                Shape.fromKM(getTestShape().getNumberOfHashFunctions(), 3 * Long.SIZE));
+        bf6.merge(new IncrementingHasher(Long.SIZE * 2, 1));
         assertThrows(IllegalArgumentException.class, () -> bf1.merge(bf6));
     }
 
-    private static void assertIndexProducerConstructor(Shape shape, int[] values, int[] expected) {
+    private void assertIndexProducerConstructor(Shape shape, int[] values, int[] expected) {

Review Comment:
   fixed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [commons-collections] aherbert commented on a diff in pull request #331: Collections 763: Remove BloomFilter constructors that create initial entry

Posted by GitBox <gi...@apache.org>.

aherbert commented on code in PR #331:
URL: https://github.com/apache/commons-collections/pull/331#discussion_r958540943


##########
src/test/java/org/apache/commons/collections4/bloomfilter/DefaultBloomFilterTest.java:
##########
@@ -163,28 +123,36 @@ public boolean merge(BloomFilter other) {
                             String.format("Value in list %s is less than 0", indices.first()));
                 }
             }
+        }
+        @Override

Review Comment:
   Requires an extra blank line



##########
src/test/java/org/apache/commons/collections4/bloomfilter/SparseBloomFilterTest.java:
##########
@@ -33,66 +33,31 @@ protected SparseBloomFilter createEmptyFilter(final Shape shape) {
 
     @Override
     protected SparseBloomFilter createFilter(final Shape shape, final Hasher hasher) {
-        return new SparseBloomFilter(shape, hasher);
+        SparseBloomFilter bf = new SparseBloomFilter(shape);
+        bf.merge(hasher);
+        return bf;
     }
 
     @Override
     protected SparseBloomFilter createFilter(final Shape shape, final BitMapProducer producer) {
-        return new SparseBloomFilter(shape, producer);
+        SparseBloomFilter bf = new SparseBloomFilter(shape);
+        bf.merge(producer);
+        return bf;
     }
 
     @Override
     protected SparseBloomFilter createFilter(final Shape shape, final IndexProducer producer) {
-        return new SparseBloomFilter(shape, producer);
+        SparseBloomFilter bf = new SparseBloomFilter(shape);
+        bf.merge(producer);
+        return bf;
     }
 
-    private void executeNestedTest(SparseBloomFilterTest nestedTest) {
-        nestedTest.testContains();
-        nestedTest.testEstimateIntersection();
-        nestedTest.testEstimateN();
-        nestedTest.testEstimateUnion();
-        nestedTest.testIsFull();
-        nestedTest.testMerge();
-    }
-
-    @Test
-    public void testConstructors() {
-
-        // copy of Sparse
-        SparseBloomFilterTest nestedTest = new SparseBloomFilterTest() {
-
-            @Override
-            protected SparseBloomFilter createEmptyFilter(Shape shape) {
-                return new SparseBloomFilter(new SparseBloomFilter(shape));
-            }
-
-            @Override
-            protected SparseBloomFilter createFilter(Shape shape, Hasher hasher) {
-                return new SparseBloomFilter(new SparseBloomFilter(shape, hasher));
-            }
-        };
-        executeNestedTest(nestedTest);
-
-        // copy of Simple
-        nestedTest = new SparseBloomFilterTest() {
-
-            @Override
-            protected SparseBloomFilter createEmptyFilter(Shape shape) {
-                return new SparseBloomFilter(new SimpleBloomFilter(shape));
-            }
-
-            @Override
-            protected SparseBloomFilter createFilter(Shape shape, Hasher hasher) {
-                return new SparseBloomFilter(new SimpleBloomFilter(shape, hasher));
-            }
-        };
-        executeNestedTest(nestedTest);
-    }
 
     @Test
     public void testBitMapProducerEdgeCases() {
         int[] values = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 65, 66, 67, 68, 69, 70, 71 };
-        BloomFilter bf = createFilter(getTestShape(), IndexProducer.fromIndexArray(values));

Review Comment:
   This comment has not been addressed as all the `createEmptyFilter` followed by `merge` does not seem required.



##########
src/test/java/org/apache/commons/collections4/bloomfilter/SetOperationsTest.java:
##########
@@ -49,52 +49,63 @@ private static void assertSymmetricOperation(double expected, ToDoubleBiFunction
         assertEquals(expected, operation.applyAsDouble(filter2, filter1), "op(filter2, filter1)");
     }
 
+    private BloomFilter createTest(Shape shape, Hasher hasher) {

Review Comment:
   These should be more appropriately named, e.g. `createFilter`



##########
src/test/java/org/apache/commons/collections4/bloomfilter/EnhancedDoubleHasherTest.java:
##########
@@ -43,39 +43,39 @@ protected int getHasherSize(Hasher hasher) {
     @Test
     public void testByteConstructor() {
         // single value become increment.
-        EnhancedDoubleHasher hasher = new EnhancedDoubleHasher( new byte[] { 1 } );
-        assertEquals( 0, hasher.getInitial() );
-        assertEquals( 0x01_00_00_00_00_00_00_00L, hasher.getIncrement() );
+        EnhancedDoubleHasher hasher = new EnhancedDoubleHasher(new byte[] { 1 });

Review Comment:
   These formatting changes should not be part of this PR. I have applied formatting to master so you can rebase and should pick up the changes.



##########
src/test/java/org/apache/commons/collections4/bloomfilter/AbstractBloomFilterTest.java:
##########
@@ -316,20 +345,20 @@ public final void testMerge() {
         assertThrows(IllegalArgumentException.class, () -> bf1.merge(new BadHasher(-1)));
 
         // test error when bloom filter returns values out of range
-        final BloomFilter bf5 = new SimpleBloomFilter(
-                Shape.fromKM(getTestShape().getNumberOfHashFunctions(), 3 * Long.SIZE),
-                new IncrementingHasher(Long.SIZE * 2, 1));
+        BloomFilter bf5 = new SimpleBloomFilter(
+                Shape.fromKM(getTestShape().getNumberOfHashFunctions(), 3 * Long.SIZE));
+        bf5.merge(new IncrementingHasher(Long.SIZE * 2, 1));
         assertThrows(IllegalArgumentException.class, () -> bf1.merge(bf5));
 
-        final BloomFilter bf6 = new SparseBloomFilter(
-                Shape.fromKM(getTestShape().getNumberOfHashFunctions(), 3 * Long.SIZE),
-                new IncrementingHasher(Long.SIZE * 2, 1));
+        BloomFilter bf6 = new SparseBloomFilter(
+                Shape.fromKM(getTestShape().getNumberOfHashFunctions(), 3 * Long.SIZE));
+        bf6.merge(new IncrementingHasher(Long.SIZE * 2, 1));
         assertThrows(IllegalArgumentException.class, () -> bf1.merge(bf6));
     }
 
-    private static void assertIndexProducerConstructor(Shape shape, int[] values, int[] expected) {
+    private void assertIndexProducerConstructor(Shape shape, int[] values, int[] expected) {

Review Comment:
   This method name is now incorrect. Perhaps `assertIndexProducerMerge`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [commons-collections] aherbert commented on pull request #331: Collections 763: Remove BloomFilter constructors that create initial entry

Posted by GitBox <gi...@apache.org>.

aherbert commented on PR #331:
URL: https://github.com/apache/commons-collections/pull/331#issuecomment-1242674514

   I have merged this in commit: 9a58c1bbdf7c65e71b33ab1ba406f1de5a8191df
   
   I have made some further modifications. I updated the IndexProducer to document that the forEach and asIndexArray that `Indices ordering and uniqueness is not guaranteed`.
   
   I have refactored the IndexProducer test so that the distinct and ordering behaviour can be asserted. Please check the current master. I found one bug which is currently disabled in BitCountProducerFromIndexProducerTest. If you create a BitCountProducer from an IndexProducer that has duplicates then the counts are not correct. Can you have a look and consider how to fix this, either by documentation that duplicate indices are ignored, or by actually correcting the counts.
   
   Since an IndexProducer makes no guarantee on duplicates it may be fair to just state that the BitCountProducer will create a count of 1 even in the event of duplicates.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [commons-collections] Claudenw commented on a diff in pull request #331: Collections 763: Remove BloomFilter constructors that create initial entry

Posted by GitBox <gi...@apache.org>.

Claudenw commented on code in PR #331:
URL: https://github.com/apache/commons-collections/pull/331#discussion_r958612623


##########
src/test/java/org/apache/commons/collections4/bloomfilter/DefaultBloomFilterTest.java:
##########
@@ -163,28 +123,36 @@ public boolean merge(BloomFilter other) {
                             String.format("Value in list %s is less than 0", indices.first()));
                 }
             }
+        }
+        @Override

Review Comment:
   fixed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [commons-collections] codecov-commenter commented on pull request #331: Collections 763: Remove BloomFilter constructors that create initial entry

Posted by GitBox <gi...@apache.org>.

codecov-commenter commented on PR #331:
URL: https://github.com/apache/commons-collections/pull/331#issuecomment-1223869588

   # [Codecov](https://codecov.io/gh/apache/commons-collections/pull/331?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#331](https://codecov.io/gh/apache/commons-collections/pull/331?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (2264487) into [master](https://codecov.io/gh/apache/commons-collections/commit/fe783da49fefeacbe6910190ab4f27575f77d7aa?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (fe783da) will **decrease** coverage by `0.05%`.
   > The diff coverage is `82.85%`.
   
   ```diff
   @@             Coverage Diff              @@
   ##             master     #331      +/-   ##
   ============================================
   - Coverage     86.01%   85.96%   -0.06%     
   + Complexity     4675     4666       -9     
   ============================================
     Files           288      289       +1     
     Lines         13473    13444      -29     
     Branches       1980     1978       -2     
   ============================================
   - Hits          11589    11557      -32     
   - Misses         1323     1327       +4     
   + Partials        561      560       -1     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/commons-collections/pull/331?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [...ections4/bloomfilter/ArrayCountingBloomFilter.java](https://codecov.io/gh/apache/commons-collections/pull/331/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2NvbW1vbnMvY29sbGVjdGlvbnM0L2Jsb29tZmlsdGVyL0FycmF5Q291bnRpbmdCbG9vbUZpbHRlci5qYXZh) | `100.00% <ø> (ø)` | |
   | [.../collections4/bloomfilter/CountingBloomFilter.java](https://codecov.io/gh/apache/commons-collections/pull/331/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2NvbW1vbnMvY29sbGVjdGlvbnM0L2Jsb29tZmlsdGVyL0NvdW50aW5nQmxvb21GaWx0ZXIuamF2YQ==) | `76.00% <76.00%> (ø)` | |
   | [.../commons/collections4/bloomfilter/BloomFilter.java](https://codecov.io/gh/apache/commons-collections/pull/331/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2NvbW1vbnMvY29sbGVjdGlvbnM0L2Jsb29tZmlsdGVyL0Jsb29tRmlsdGVyLmphdmE=) | `100.00% <100.00%> (ø)` | |
   | [...ns/collections4/bloomfilter/SimpleBloomFilter.java](https://codecov.io/gh/apache/commons-collections/pull/331/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2NvbW1vbnMvY29sbGVjdGlvbnM0L2Jsb29tZmlsdGVyL1NpbXBsZUJsb29tRmlsdGVyLmphdmE=) | `92.64% <100.00%> (-1.61%)` | :arrow_down: |
   | [...ns/collections4/bloomfilter/SparseBloomFilter.java](https://codecov.io/gh/apache/commons-collections/pull/331/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2NvbW1vbnMvY29sbGVjdGlvbnM0L2Jsb29tZmlsdGVyL1NwYXJzZUJsb29tRmlsdGVyLmphdmE=) | `100.00% <100.00%> (+3.33%)` | :arrow_up: |
   
   :mega: We’re building smart automated test selection to slash your CI/CD build times. [Learn more](https://about.codecov.io/iterative-testing/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [commons-collections] aherbert commented on a diff in pull request #331: Collections 763: Remove BloomFilter constructors that create initial entry

Posted by GitBox <gi...@apache.org>.

aherbert commented on code in PR #331:
URL: https://github.com/apache/commons-collections/pull/331#discussion_r953714695


##########
src/main/java/org/apache/commons/collections4/bloomfilter/CountingBloomFilter.java:
##########
@@ -110,7 +201,52 @@ public interface CountingBloomFilter extends BloomFilter, BitCountProducer {
      * @see #isValid()
      * @see #subtract(BitCountProducer)
      */
-    boolean remove(Hasher hasher);
+    default boolean remove(final Hasher hasher) {
+        Objects.requireNonNull(hasher, "hasher");
+        return subtract(BitCountProducer.from(hasher.uniqueIndices(getShape())));
+    }
+
+    /**
+     * Removes the values from the specified IndexProducer from the Bloom filter from this Bloom filter.
+     *
+     * <p>Specifically all counts for the unique indices produced by the {@code hasher} will be
+     * decremented by 1.</p>
+     *
+     * <p>This method will return {@code true} if the filter is valid after the operation.</p>
+     *
+     * <p>Node: This method expects index producers that produce unique values.</p>
+     *
+     * @param indexProducer the IndexProducer to provide the indexes
+     * @return {@code true} if the removal was successful and the state is valid
+     * @see #isValid()
+     * @see #subtract(BitCountProducer)
+     */
+    default boolean remove(final IndexProducer indexProducer) {
+        Objects.requireNonNull(indexProducer, "indexProducer");
+        try {
+            return subtract(BitCountProducer.from(indexProducer));
+        } catch (IndexOutOfBoundsException e) {
+            throw new IllegalArgumentException(
+                    String.format("Filter only accepts values in the [0,%d) range", getShape().getNumberOfBits()));
+        }
+    }
+
+    /**
+     * Removes the specified BitMapProducer from this Bloom filter.
+     *
+     * <p>Specifically all counts for the indices produced by the {@code bitMapProducer} will be
+     * decremented by 1.</p>
+     *
+     * <p>This method will return {@code true} if the filter is valid after the operation.</p>
+     *
+     * @param bitMapProducer the BitMapProducer to provide the indexes
+     * @return {@code true} if the removal was successful and the state is valid
+     * @see #isValid()
+     * @see #subtract(BitCountProducer)
+     */
+    default boolean remove(final BitMapProducer bitMapProducer) {
+        return remove(IndexProducer.fromBitMapProducer(bitMapProducer));

Review Comment:
   This double wraps the input: `BitMapProducer -> IndexProducer -> BitCountProducer`. This could be a future optimisation to go direct: `BitMapProducer -> BitCountProducer`. The method could be done in an internal helper class.



##########
src/test/java/org/apache/commons/collections4/bloomfilter/AbstractHasherTest.java:
##########
@@ -97,8 +97,8 @@ public void testUniqueIndex() {
         List<Integer> full = Arrays.stream(producer.asIndexArray()).boxed().collect(Collectors.toList());
         producer = hasher.uniqueIndices(shape);
         List<Integer> unique = Arrays.stream(producer.asIndexArray()).boxed().collect(Collectors.toList());
-        assertTrue( full.size() > unique.size() );
-        Set<Integer> set = new HashSet<Integer>( unique );
-        assertEquals( set.size(), unique.size() );
+        assertTrue(full.size() > unique.size());
+        Set<Integer> set = new HashSet<Integer>(unique);

Review Comment:
   Note: You can use the diamond operator `<>` in generics so you do not require the `<Integer>`



##########
src/test/java/org/apache/commons/collections4/bloomfilter/DefaultBloomFilterTest.java:
##########
@@ -196,6 +196,7 @@ public AbstractDefaultBloomFilter copy() {
             result.indices.addAll(indices);
             return result;
         }
+

Review Comment:
   Remove the extra line



##########
src/test/java/org/apache/commons/collections4/bloomfilter/AbstractBloomFilterTest.java:
##########
@@ -62,49 +62,34 @@ protected final Shape getTestShape() {
      */
     protected abstract T createEmptyFilter(Shape shape);
 
-    /**
-     * Create the BloomFilter implementation we are testing.
-     *
-     * @param shape the shape of the filter.
-     * @param hasher the hasher to use to create the filter.
-     * @return a BloomFilter implementation.
-     */
-    protected abstract T createFilter(Shape shape, Hasher hasher);

Review Comment:
   Why not maintain these methods as non-abstract, e.g.
   ```Java
   protected T createFilter(Shape shape, Hasher hasher) {
       T f = createEmptyFilter(Shape shape);
       f.merge(hasher);
       return f;
   }
   ```
   
   Then you do not have to change all the existing tests to create and then merge the argument.
   



##########
src/test/java/org/apache/commons/collections4/bloomfilter/AbstractBloomFilterTest.java:
##########
@@ -186,8 +179,10 @@ public final void testContains() {
     @Test
     public final void testEstimateIntersection() {
 
-        final BloomFilter bf = createFilter(getTestShape(), from1);
-        final BloomFilter bf2 = createFilter(getTestShape(), bigHasher);
+        final BloomFilter bf = createEmptyFilter(getTestShape());
+        bf.merge(from1);
+        final BloomFilter bf2 = createEmptyFilter(getTestShape());
+        bf2.merge(from1);

Review Comment:
   `bf2` should merge `bigHasher`



##########
src/main/java/org/apache/commons/collections4/bloomfilter/SparseBloomFilter.java:
##########
@@ -140,12 +78,9 @@ private boolean add(int idx) {
         return true;
     }
 
-    /**
-     * Performs a merge using an IndexProducer.
-     * @param indexProducer the IndexProducer to merge from.
-     * @throws IllegalArgumentException if producer sends illegal value.
-     */
-    private void merge(IndexProducer indexProducer) {
+    @Override
+    public boolean merge(IndexProducer indexProducer) {
+        Objects.requireNonNull(indexProducer, "indexProducer");

Review Comment:
   Note:
   
   This merge method (and others such as remove/add/subtract) throw exceptions. However the filter will be left in a corrupted state if some of the indices have been processed before the exception. Implementing these methods as an atomic transaction would be slower and more involved. So I suggest that the non-atomic nature of the bulk updates to a filter are commented, perhaps at the package level, as a design decision to achieve maximum performance under correct usage.



##########
src/test/java/org/apache/commons/collections4/bloomfilter/SetOperationsTest.java:
##########
@@ -49,52 +49,63 @@ private static void assertSymmetricOperation(double expected, ToDoubleBiFunction
         assertEquals(expected, operation.applyAsDouble(filter2, filter1), "op(filter2, filter1)");
     }
 
+    private BloomFilter forTest(Shape shape, Hasher hasher) {

Review Comment:
   I would name these `createFilter` to be self documenting.



##########
src/main/java/org/apache/commons/collections4/bloomfilter/CountingBloomFilter.java:
##########
@@ -92,12 +180,15 @@ public interface CountingBloomFilter extends BloomFilter, BitCountProducer {
      * @see #isValid()
      * @see #subtract(BitCountProducer)
      */
-    boolean remove(BloomFilter other);
+    default boolean remove(final BloomFilter other) {
+        Objects.requireNonNull(other, "other");
+        return subtract(BitCountProducer.from(other));

Review Comment:
   Note: This calls subtract with a BitCountProducer. But does not catch and wrap IOOBE. Should it be:
   ```Java
   return remove((IndexProducer) other);
   ```



##########
src/main/java/org/apache/commons/collections4/bloomfilter/CountingBloomFilter.java:
##########
@@ -77,6 +79,92 @@ public interface CountingBloomFilter extends BloomFilter, BitCountProducer {
 
     // Modification Operations
 
+    /**
+     * Merges the specified Bloom filter into this Bloom filter.
+     *
+     * <p>Specifically: all counts for the indexes identified by the {@code other} filter will be incremented by 1,</p>
+     *
+     * <p>Note: If the other filter is a counting Bloom filter the index counts are ignored and it is treated as an
+     * IndexProducer.</p>
+     *
+     * <p>This method will return {@code true} if the filter is valid after the operation.</p>
+     *
+     * @param other the other Bloom filter
+     * @return {@code true} if the removal was successful and the state is valid
+     * @see #isValid()
+     * @see #add(BitCountProducer)
+     */
+    default boolean merge(final BloomFilter other) {
+        Objects.requireNonNull(other, "other");

Review Comment:
   Could this be changed to:
   ```Java
   return merge((IndexProducer) other);
   ```



##########
src/main/java/org/apache/commons/collections4/bloomfilter/CountingBloomFilter.java:
##########
@@ -110,7 +201,52 @@ public interface CountingBloomFilter extends BloomFilter, BitCountProducer {
      * @see #isValid()
      * @see #subtract(BitCountProducer)
      */
-    boolean remove(Hasher hasher);
+    default boolean remove(final Hasher hasher) {
+        Objects.requireNonNull(hasher, "hasher");
+        return subtract(BitCountProducer.from(hasher.uniqueIndices(getShape())));

Review Comment:
   No catch of IOOBE and wrapping to IAE



##########
src/main/java/org/apache/commons/collections4/bloomfilter/CountingBloomFilter.java:
##########
@@ -77,6 +79,92 @@ public interface CountingBloomFilter extends BloomFilter, BitCountProducer {
 
     // Modification Operations
 
+    /**
+     * Merges the specified Bloom filter into this Bloom filter.
+     *
+     * <p>Specifically: all counts for the indexes identified by the {@code other} filter will be incremented by 1,</p>
+     *
+     * <p>Note: If the other filter is a counting Bloom filter the index counts are ignored and it is treated as an
+     * IndexProducer.</p>
+     *
+     * <p>This method will return {@code true} if the filter is valid after the operation.</p>
+     *
+     * @param other the other Bloom filter
+     * @return {@code true} if the removal was successful and the state is valid
+     * @see #isValid()
+     * @see #add(BitCountProducer)
+     */
+    default boolean merge(final BloomFilter other) {
+        Objects.requireNonNull(other, "other");
+        try {
+            return add(BitCountProducer.from(other));
+        } catch (IndexOutOfBoundsException e) {
+            throw new IllegalArgumentException(e);
+        }
+    }
+
+    /**
+     * Merges the specified Hasher into this Bloom filter.
+     *
+     * <p>Specifically: all counts for the unique indexes identified by the {@code hasher} will be incremented by 1,</p>
+     *
+     * <p>This method will return {@code true} if the filter is valid after the operation.</p>
+     *
+     * @param hasher the hasher
+     * @return {@code true} if the removal was successful and the state is valid
+     * @see #isValid()
+     * @see #add(BitCountProducer)
+     */
+    default boolean merge(final Hasher hasher) {
+        Objects.requireNonNull(hasher, "hasher");
+        try {
+            return add(BitCountProducer.from(hasher.uniqueIndices(getShape())));
+        } catch (IndexOutOfBoundsException e) {
+            throw new IllegalArgumentException(
+                    String.format("Filter only accepts values in the [0,%d) range", getShape().getNumberOfBits()));
+        }
+    }
+
+    /**
+     * Merges the specified index producer into this Bloom filter.
+     *
+     * <p>Specifically: all counts for the indexes identified by the {@code indexProducer} will be incremented by 1,</p>
+     *
+     * <p>This method will return {@code true} if the filter is valid after the operation.</p>
+     *
+     * <p>Note: This method expects and index producer that does not return duplicates.</p>
+     *
+     * @param indexProducer the IndexProducer
+     * @return {@code true} if the removal was successful and the state is valid
+     * @see #isValid()
+     * @see #add(BitCountProducer)
+     */
+    default boolean merge(final IndexProducer indexProducer) {
+        Objects.requireNonNull(indexProducer, "producer");
+        try {
+            return add(BitCountProducer.from(indexProducer));
+        } catch (IndexOutOfBoundsException e) {
+            throw new IllegalArgumentException(
+                    String.format("Filter only accepts values in the [0,%d) range", getShape().getNumberOfBits()));

Review Comment:
   The caught exception should be added as the cause.
   
   Note: the other default methods that catch and wrap the IOOBE do not add a message but do add the cause. There should be consistency here. Perhaps change all to use the same message and add the IOOBE as the cause.



##########
src/main/java/org/apache/commons/collections4/bloomfilter/SimpleBloomFilter.java:
##########
@@ -133,12 +81,9 @@ public SimpleBloomFilter copy() {
         return new SimpleBloomFilter(this);
     }
 
-    /**
-     * Performs a merge using an IndexProducer.
-     * @param indexProducer the IndexProducer to merge from.
-     * @throws IllegalArgumentException if producer sends illegal value.
-     */
-    private void merge(IndexProducer indexProducer) {
+    @Override
+    public boolean merge(IndexProducer indexProducer) {
+        Objects.requireNonNull(indexProducer, "indexProducer");
         indexProducer.forEachIndex(idx -> {
             if (idx < 0 || idx >= shape.getNumberOfBits()) {
                 throw new IllegalArgumentException(String.format(

Review Comment:
   I note that here the index range uses a closed range [0, x-1]. Other exception message use an open range on the upper bound [0, x).



##########
src/main/java/org/apache/commons/collections4/bloomfilter/CountingBloomFilter.java:
##########
@@ -77,6 +79,92 @@ public interface CountingBloomFilter extends BloomFilter, BitCountProducer {
 
     // Modification Operations
 
+    /**
+     * Merges the specified Bloom filter into this Bloom filter.
+     *
+     * <p>Specifically: all counts for the indexes identified by the {@code other} filter will be incremented by 1,</p>
+     *
+     * <p>Note: If the other filter is a counting Bloom filter the index counts are ignored and it is treated as an
+     * IndexProducer.</p>
+     *
+     * <p>This method will return {@code true} if the filter is valid after the operation.</p>
+     *
+     * @param other the other Bloom filter
+     * @return {@code true} if the removal was successful and the state is valid
+     * @see #isValid()
+     * @see #add(BitCountProducer)
+     */
+    default boolean merge(final BloomFilter other) {
+        Objects.requireNonNull(other, "other");
+        try {
+            return add(BitCountProducer.from(other));
+        } catch (IndexOutOfBoundsException e) {
+            throw new IllegalArgumentException(e);
+        }
+    }
+
+    /**
+     * Merges the specified Hasher into this Bloom filter.
+     *
+     * <p>Specifically: all counts for the unique indexes identified by the {@code hasher} will be incremented by 1,</p>
+     *
+     * <p>This method will return {@code true} if the filter is valid after the operation.</p>
+     *
+     * @param hasher the hasher
+     * @return {@code true} if the removal was successful and the state is valid
+     * @see #isValid()
+     * @see #add(BitCountProducer)
+     */
+    default boolean merge(final Hasher hasher) {
+        Objects.requireNonNull(hasher, "hasher");
+        try {
+            return add(BitCountProducer.from(hasher.uniqueIndices(getShape())));
+        } catch (IndexOutOfBoundsException e) {
+            throw new IllegalArgumentException(
+                    String.format("Filter only accepts values in the [0,%d) range", getShape().getNumberOfBits()));
+        }
+    }
+
+    /**
+     * Merges the specified index producer into this Bloom filter.
+     *
+     * <p>Specifically: all counts for the indexes identified by the {@code indexProducer} will be incremented by 1,</p>
+     *
+     * <p>This method will return {@code true} if the filter is valid after the operation.</p>
+     *
+     * <p>Note: This method expects and index producer that does not return duplicates.</p>
+     *
+     * @param indexProducer the IndexProducer
+     * @return {@code true} if the removal was successful and the state is valid
+     * @see #isValid()
+     * @see #add(BitCountProducer)
+     */
+    default boolean merge(final IndexProducer indexProducer) {
+        Objects.requireNonNull(indexProducer, "producer");
+        try {
+            return add(BitCountProducer.from(indexProducer));
+        } catch (IndexOutOfBoundsException e) {
+            throw new IllegalArgumentException(
+                    String.format("Filter only accepts values in the [0,%d) range", getShape().getNumberOfBits()));
+        }
+    }
+
+    /**
+     * Merges the specified BitMap producer into this Bloom filter.
+     *
+     * <p>Specifically: all counts for the indexes identified by the {@code bitMapProducer} will be incremented by 1,</p>
+     *
+     * <p>This method will return {@code true} if the filter is valid after the operation.</p>
+     *
+     * @param bitMapProducer the BitMapProducer
+     * @return {@code true} if the removal was successful and the state is valid
+     * @see #isValid()
+     * @see #add(BitCountProducer)
+     */
+    default boolean merge(final BitMapProducer bitMapProducer) {
+        return merge(IndexProducer.fromBitMapProducer(bitMapProducer));

Review Comment:
   `Objects.requireNonNull(indexProducer, "bitMapProducer");` ?
   
   Just to get the correct exception message rather than the message from the IndexProducer.



##########
src/main/java/org/apache/commons/collections4/bloomfilter/CountingBloomFilter.java:
##########
@@ -77,6 +79,92 @@ public interface CountingBloomFilter extends BloomFilter, BitCountProducer {
 
     // Modification Operations
 
+    /**
+     * Merges the specified Bloom filter into this Bloom filter.
+     *
+     * <p>Specifically: all counts for the indexes identified by the {@code other} filter will be incremented by 1,</p>
+     *
+     * <p>Note: If the other filter is a counting Bloom filter the index counts are ignored and it is treated as an
+     * IndexProducer.</p>
+     *
+     * <p>This method will return {@code true} if the filter is valid after the operation.</p>
+     *
+     * @param other the other Bloom filter
+     * @return {@code true} if the removal was successful and the state is valid
+     * @see #isValid()
+     * @see #add(BitCountProducer)
+     */
+    default boolean merge(final BloomFilter other) {
+        Objects.requireNonNull(other, "other");
+        try {
+            return add(BitCountProducer.from(other));
+        } catch (IndexOutOfBoundsException e) {
+            throw new IllegalArgumentException(e);
+        }
+    }
+
+    /**
+     * Merges the specified Hasher into this Bloom filter.
+     *
+     * <p>Specifically: all counts for the unique indexes identified by the {@code hasher} will be incremented by 1,</p>
+     *
+     * <p>This method will return {@code true} if the filter is valid after the operation.</p>
+     *
+     * @param hasher the hasher
+     * @return {@code true} if the removal was successful and the state is valid
+     * @see #isValid()
+     * @see #add(BitCountProducer)
+     */
+    default boolean merge(final Hasher hasher) {
+        Objects.requireNonNull(hasher, "hasher");
+        try {
+            return add(BitCountProducer.from(hasher.uniqueIndices(getShape())));
+        } catch (IndexOutOfBoundsException e) {
+            throw new IllegalArgumentException(
+                    String.format("Filter only accepts values in the [0,%d) range", getShape().getNumberOfBits()));

Review Comment:
   The caught exception should be added as the cause



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [commons-collections] Claudenw commented on pull request #331: Collections 763: Remove BloomFilter constructors that create initial entry

Posted by GitBox <gi...@apache.org>.

Claudenw commented on PR #331:
URL: https://github.com/apache/commons-collections/pull/331#issuecomment-1242671100

   IndexProducers may return duplicates and make no order guarantees.  (there used to be an order guarantee but we removed that).
   
   Hasher based IndexProducers, by their nature, generally return unordered and possible duplicate values.  There is a hasher method to produce an IndexProducer that guaranteed uniqueness.
   
   BloomFilter based IndexProducers, by their nature, generally return ordered and unique values, though I can think of implementations where the order may not be true, we don't have one.
   
   The default implementation of IndexProducer uses BitSet in its implementation to simplify the code to produce the index list.  So the uniqueness is an artefact of the implementation.  If you have a fast implementation that can take the forEachIndex() and convert it to an array without imposing the uniqueness constraint then please implement that.
   
   In short I concur with your assessment.
   
   The HasherCollection is intended to simplify creation of some filters.  In practice it is that same as calling functions with hasher arguments once for each hasher in the collection.  Due to the difference between classes of Bloom filters (e.g. standard, counting, stable) there are times when the duplicates are required.
   
   HasherCollections work well in distributed systems where an object is represented in a Bloom filter by hashes of multiple properties.  A for query across the systems a HasherCollection is constructed and passed to the endpoints.  The endpoints can then build filters based on the shape at the endpoint.  So the specific back end systems do not have to agree on shape, but do agree on Hash algorithm.  
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org