You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@commons.apache.org by GitBox <gi...@apache.org> on 2022/08/24 12:54:39 UTC

[GitHub] [commons-collections] aherbert commented on a diff in pull request #331: Collections 763: Remove BloomFilter constructors that create initial entry

aherbert commented on code in PR #331:
URL: https://github.com/apache/commons-collections/pull/331#discussion_r953714695


##########
src/main/java/org/apache/commons/collections4/bloomfilter/CountingBloomFilter.java:
##########
@@ -110,7 +201,52 @@ public interface CountingBloomFilter extends BloomFilter, BitCountProducer {
      * @see #isValid()
      * @see #subtract(BitCountProducer)
      */
-    boolean remove(Hasher hasher);
+    default boolean remove(final Hasher hasher) {
+        Objects.requireNonNull(hasher, "hasher");
+        return subtract(BitCountProducer.from(hasher.uniqueIndices(getShape())));
+    }
+
+    /**
+     * Removes the values from the specified IndexProducer from the Bloom filter from this Bloom filter.
+     *
+     * <p>Specifically all counts for the unique indices produced by the {@code hasher} will be
+     * decremented by 1.</p>
+     *
+     * <p>This method will return {@code true} if the filter is valid after the operation.</p>
+     *
+     * <p>Node: This method expects index producers that produce unique values.</p>
+     *
+     * @param indexProducer the IndexProducer to provide the indexes
+     * @return {@code true} if the removal was successful and the state is valid
+     * @see #isValid()
+     * @see #subtract(BitCountProducer)
+     */
+    default boolean remove(final IndexProducer indexProducer) {
+        Objects.requireNonNull(indexProducer, "indexProducer");
+        try {
+            return subtract(BitCountProducer.from(indexProducer));
+        } catch (IndexOutOfBoundsException e) {
+            throw new IllegalArgumentException(
+                    String.format("Filter only accepts values in the [0,%d) range", getShape().getNumberOfBits()));
+        }
+    }
+
+    /**
+     * Removes the specified BitMapProducer from this Bloom filter.
+     *
+     * <p>Specifically all counts for the indices produced by the {@code bitMapProducer} will be
+     * decremented by 1.</p>
+     *
+     * <p>This method will return {@code true} if the filter is valid after the operation.</p>
+     *
+     * @param bitMapProducer the BitMapProducer to provide the indexes
+     * @return {@code true} if the removal was successful and the state is valid
+     * @see #isValid()
+     * @see #subtract(BitCountProducer)
+     */
+    default boolean remove(final BitMapProducer bitMapProducer) {
+        return remove(IndexProducer.fromBitMapProducer(bitMapProducer));

Review Comment:
   This double wraps the input: `BitMapProducer -> IndexProducer -> BitCountProducer`. This could be a future optimisation to go direct: `BitMapProducer -> BitCountProducer`. The method could be done in an internal helper class.



##########
src/test/java/org/apache/commons/collections4/bloomfilter/AbstractHasherTest.java:
##########
@@ -97,8 +97,8 @@ public void testUniqueIndex() {
         List<Integer> full = Arrays.stream(producer.asIndexArray()).boxed().collect(Collectors.toList());
         producer = hasher.uniqueIndices(shape);
         List<Integer> unique = Arrays.stream(producer.asIndexArray()).boxed().collect(Collectors.toList());
-        assertTrue( full.size() > unique.size() );
-        Set<Integer> set = new HashSet<Integer>( unique );
-        assertEquals( set.size(), unique.size() );
+        assertTrue(full.size() > unique.size());
+        Set<Integer> set = new HashSet<Integer>(unique);

Review Comment:
   Note: You can use the diamond operator `<>` in generics so you do not require the `<Integer>`



##########
src/test/java/org/apache/commons/collections4/bloomfilter/DefaultBloomFilterTest.java:
##########
@@ -196,6 +196,7 @@ public AbstractDefaultBloomFilter copy() {
             result.indices.addAll(indices);
             return result;
         }
+

Review Comment:
   Remove the extra line



##########
src/test/java/org/apache/commons/collections4/bloomfilter/AbstractBloomFilterTest.java:
##########
@@ -62,49 +62,34 @@ protected final Shape getTestShape() {
      */
     protected abstract T createEmptyFilter(Shape shape);
 
-    /**
-     * Create the BloomFilter implementation we are testing.
-     *
-     * @param shape the shape of the filter.
-     * @param hasher the hasher to use to create the filter.
-     * @return a BloomFilter implementation.
-     */
-    protected abstract T createFilter(Shape shape, Hasher hasher);

Review Comment:
   Why not maintain these methods as non-abstract, e.g.
   ```Java
   protected T createFilter(Shape shape, Hasher hasher) {
       T f = createEmptyFilter(Shape shape);
       f.merge(hasher);
       return f;
   }
   ```
   
   Then you do not have to change all the existing tests to create and then merge the argument.
   



##########
src/test/java/org/apache/commons/collections4/bloomfilter/AbstractBloomFilterTest.java:
##########
@@ -186,8 +179,10 @@ public final void testContains() {
     @Test
     public final void testEstimateIntersection() {
 
-        final BloomFilter bf = createFilter(getTestShape(), from1);
-        final BloomFilter bf2 = createFilter(getTestShape(), bigHasher);
+        final BloomFilter bf = createEmptyFilter(getTestShape());
+        bf.merge(from1);
+        final BloomFilter bf2 = createEmptyFilter(getTestShape());
+        bf2.merge(from1);

Review Comment:
   `bf2` should merge `bigHasher`



##########
src/main/java/org/apache/commons/collections4/bloomfilter/SparseBloomFilter.java:
##########
@@ -140,12 +78,9 @@ private boolean add(int idx) {
         return true;
     }
 
-    /**
-     * Performs a merge using an IndexProducer.
-     * @param indexProducer the IndexProducer to merge from.
-     * @throws IllegalArgumentException if producer sends illegal value.
-     */
-    private void merge(IndexProducer indexProducer) {
+    @Override
+    public boolean merge(IndexProducer indexProducer) {
+        Objects.requireNonNull(indexProducer, "indexProducer");

Review Comment:
   Note:
   
   This merge method (and others such as remove/add/subtract) throw exceptions. However the filter will be left in a corrupted state if some of the indices have been processed before the exception. Implementing these methods as an atomic transaction would be slower and more involved. So I suggest that the non-atomic nature of the bulk updates to a filter are commented, perhaps at the package level, as a design decision to achieve maximum performance under correct usage.



##########
src/test/java/org/apache/commons/collections4/bloomfilter/SetOperationsTest.java:
##########
@@ -49,52 +49,63 @@ private static void assertSymmetricOperation(double expected, ToDoubleBiFunction
         assertEquals(expected, operation.applyAsDouble(filter2, filter1), "op(filter2, filter1)");
     }
 
+    private BloomFilter forTest(Shape shape, Hasher hasher) {

Review Comment:
   I would name these `createFilter` to be self documenting.



##########
src/main/java/org/apache/commons/collections4/bloomfilter/CountingBloomFilter.java:
##########
@@ -92,12 +180,15 @@ public interface CountingBloomFilter extends BloomFilter, BitCountProducer {
      * @see #isValid()
      * @see #subtract(BitCountProducer)
      */
-    boolean remove(BloomFilter other);
+    default boolean remove(final BloomFilter other) {
+        Objects.requireNonNull(other, "other");
+        return subtract(BitCountProducer.from(other));

Review Comment:
   Note: This calls subtract with a BitCountProducer. But does not catch and wrap IOOBE. Should it be:
   ```Java
   return remove((IndexProducer) other);
   ```



##########
src/main/java/org/apache/commons/collections4/bloomfilter/CountingBloomFilter.java:
##########
@@ -77,6 +79,92 @@ public interface CountingBloomFilter extends BloomFilter, BitCountProducer {
 
     // Modification Operations
 
+    /**
+     * Merges the specified Bloom filter into this Bloom filter.
+     *
+     * <p>Specifically: all counts for the indexes identified by the {@code other} filter will be incremented by 1,</p>
+     *
+     * <p>Note: If the other filter is a counting Bloom filter the index counts are ignored and it is treated as an
+     * IndexProducer.</p>
+     *
+     * <p>This method will return {@code true} if the filter is valid after the operation.</p>
+     *
+     * @param other the other Bloom filter
+     * @return {@code true} if the removal was successful and the state is valid
+     * @see #isValid()
+     * @see #add(BitCountProducer)
+     */
+    default boolean merge(final BloomFilter other) {
+        Objects.requireNonNull(other, "other");

Review Comment:
   Could this be changed to:
   ```Java
   return merge((IndexProducer) other);
   ```



##########
src/main/java/org/apache/commons/collections4/bloomfilter/CountingBloomFilter.java:
##########
@@ -110,7 +201,52 @@ public interface CountingBloomFilter extends BloomFilter, BitCountProducer {
      * @see #isValid()
      * @see #subtract(BitCountProducer)
      */
-    boolean remove(Hasher hasher);
+    default boolean remove(final Hasher hasher) {
+        Objects.requireNonNull(hasher, "hasher");
+        return subtract(BitCountProducer.from(hasher.uniqueIndices(getShape())));

Review Comment:
   No catch of IOOBE and wrapping to IAE



##########
src/main/java/org/apache/commons/collections4/bloomfilter/CountingBloomFilter.java:
##########
@@ -77,6 +79,92 @@ public interface CountingBloomFilter extends BloomFilter, BitCountProducer {
 
     // Modification Operations
 
+    /**
+     * Merges the specified Bloom filter into this Bloom filter.
+     *
+     * <p>Specifically: all counts for the indexes identified by the {@code other} filter will be incremented by 1,</p>
+     *
+     * <p>Note: If the other filter is a counting Bloom filter the index counts are ignored and it is treated as an
+     * IndexProducer.</p>
+     *
+     * <p>This method will return {@code true} if the filter is valid after the operation.</p>
+     *
+     * @param other the other Bloom filter
+     * @return {@code true} if the removal was successful and the state is valid
+     * @see #isValid()
+     * @see #add(BitCountProducer)
+     */
+    default boolean merge(final BloomFilter other) {
+        Objects.requireNonNull(other, "other");
+        try {
+            return add(BitCountProducer.from(other));
+        } catch (IndexOutOfBoundsException e) {
+            throw new IllegalArgumentException(e);
+        }
+    }
+
+    /**
+     * Merges the specified Hasher into this Bloom filter.
+     *
+     * <p>Specifically: all counts for the unique indexes identified by the {@code hasher} will be incremented by 1,</p>
+     *
+     * <p>This method will return {@code true} if the filter is valid after the operation.</p>
+     *
+     * @param hasher the hasher
+     * @return {@code true} if the removal was successful and the state is valid
+     * @see #isValid()
+     * @see #add(BitCountProducer)
+     */
+    default boolean merge(final Hasher hasher) {
+        Objects.requireNonNull(hasher, "hasher");
+        try {
+            return add(BitCountProducer.from(hasher.uniqueIndices(getShape())));
+        } catch (IndexOutOfBoundsException e) {
+            throw new IllegalArgumentException(
+                    String.format("Filter only accepts values in the [0,%d) range", getShape().getNumberOfBits()));
+        }
+    }
+
+    /**
+     * Merges the specified index producer into this Bloom filter.
+     *
+     * <p>Specifically: all counts for the indexes identified by the {@code indexProducer} will be incremented by 1,</p>
+     *
+     * <p>This method will return {@code true} if the filter is valid after the operation.</p>
+     *
+     * <p>Note: This method expects and index producer that does not return duplicates.</p>
+     *
+     * @param indexProducer the IndexProducer
+     * @return {@code true} if the removal was successful and the state is valid
+     * @see #isValid()
+     * @see #add(BitCountProducer)
+     */
+    default boolean merge(final IndexProducer indexProducer) {
+        Objects.requireNonNull(indexProducer, "producer");
+        try {
+            return add(BitCountProducer.from(indexProducer));
+        } catch (IndexOutOfBoundsException e) {
+            throw new IllegalArgumentException(
+                    String.format("Filter only accepts values in the [0,%d) range", getShape().getNumberOfBits()));

Review Comment:
   The caught exception should be added as the cause.
   
   Note: the other default methods that catch and wrap the IOOBE do not add a message but do add the cause. There should be consistency here. Perhaps change all to use the same message and add the IOOBE as the cause.



##########
src/main/java/org/apache/commons/collections4/bloomfilter/SimpleBloomFilter.java:
##########
@@ -133,12 +81,9 @@ public SimpleBloomFilter copy() {
         return new SimpleBloomFilter(this);
     }
 
-    /**
-     * Performs a merge using an IndexProducer.
-     * @param indexProducer the IndexProducer to merge from.
-     * @throws IllegalArgumentException if producer sends illegal value.
-     */
-    private void merge(IndexProducer indexProducer) {
+    @Override
+    public boolean merge(IndexProducer indexProducer) {
+        Objects.requireNonNull(indexProducer, "indexProducer");
         indexProducer.forEachIndex(idx -> {
             if (idx < 0 || idx >= shape.getNumberOfBits()) {
                 throw new IllegalArgumentException(String.format(

Review Comment:
   I note that here the index range uses a closed range [0, x-1]. Other exception message use an open range on the upper bound [0, x).



##########
src/main/java/org/apache/commons/collections4/bloomfilter/CountingBloomFilter.java:
##########
@@ -77,6 +79,92 @@ public interface CountingBloomFilter extends BloomFilter, BitCountProducer {
 
     // Modification Operations
 
+    /**
+     * Merges the specified Bloom filter into this Bloom filter.
+     *
+     * <p>Specifically: all counts for the indexes identified by the {@code other} filter will be incremented by 1,</p>
+     *
+     * <p>Note: If the other filter is a counting Bloom filter the index counts are ignored and it is treated as an
+     * IndexProducer.</p>
+     *
+     * <p>This method will return {@code true} if the filter is valid after the operation.</p>
+     *
+     * @param other the other Bloom filter
+     * @return {@code true} if the removal was successful and the state is valid
+     * @see #isValid()
+     * @see #add(BitCountProducer)
+     */
+    default boolean merge(final BloomFilter other) {
+        Objects.requireNonNull(other, "other");
+        try {
+            return add(BitCountProducer.from(other));
+        } catch (IndexOutOfBoundsException e) {
+            throw new IllegalArgumentException(e);
+        }
+    }
+
+    /**
+     * Merges the specified Hasher into this Bloom filter.
+     *
+     * <p>Specifically: all counts for the unique indexes identified by the {@code hasher} will be incremented by 1,</p>
+     *
+     * <p>This method will return {@code true} if the filter is valid after the operation.</p>
+     *
+     * @param hasher the hasher
+     * @return {@code true} if the removal was successful and the state is valid
+     * @see #isValid()
+     * @see #add(BitCountProducer)
+     */
+    default boolean merge(final Hasher hasher) {
+        Objects.requireNonNull(hasher, "hasher");
+        try {
+            return add(BitCountProducer.from(hasher.uniqueIndices(getShape())));
+        } catch (IndexOutOfBoundsException e) {
+            throw new IllegalArgumentException(
+                    String.format("Filter only accepts values in the [0,%d) range", getShape().getNumberOfBits()));
+        }
+    }
+
+    /**
+     * Merges the specified index producer into this Bloom filter.
+     *
+     * <p>Specifically: all counts for the indexes identified by the {@code indexProducer} will be incremented by 1,</p>
+     *
+     * <p>This method will return {@code true} if the filter is valid after the operation.</p>
+     *
+     * <p>Note: This method expects and index producer that does not return duplicates.</p>
+     *
+     * @param indexProducer the IndexProducer
+     * @return {@code true} if the removal was successful and the state is valid
+     * @see #isValid()
+     * @see #add(BitCountProducer)
+     */
+    default boolean merge(final IndexProducer indexProducer) {
+        Objects.requireNonNull(indexProducer, "producer");
+        try {
+            return add(BitCountProducer.from(indexProducer));
+        } catch (IndexOutOfBoundsException e) {
+            throw new IllegalArgumentException(
+                    String.format("Filter only accepts values in the [0,%d) range", getShape().getNumberOfBits()));
+        }
+    }
+
+    /**
+     * Merges the specified BitMap producer into this Bloom filter.
+     *
+     * <p>Specifically: all counts for the indexes identified by the {@code bitMapProducer} will be incremented by 1,</p>
+     *
+     * <p>This method will return {@code true} if the filter is valid after the operation.</p>
+     *
+     * @param bitMapProducer the BitMapProducer
+     * @return {@code true} if the removal was successful and the state is valid
+     * @see #isValid()
+     * @see #add(BitCountProducer)
+     */
+    default boolean merge(final BitMapProducer bitMapProducer) {
+        return merge(IndexProducer.fromBitMapProducer(bitMapProducer));

Review Comment:
   `Objects.requireNonNull(indexProducer, "bitMapProducer");` ?
   
   Just to get the correct exception message rather than the message from the IndexProducer.



##########
src/main/java/org/apache/commons/collections4/bloomfilter/CountingBloomFilter.java:
##########
@@ -77,6 +79,92 @@ public interface CountingBloomFilter extends BloomFilter, BitCountProducer {
 
     // Modification Operations
 
+    /**
+     * Merges the specified Bloom filter into this Bloom filter.
+     *
+     * <p>Specifically: all counts for the indexes identified by the {@code other} filter will be incremented by 1,</p>
+     *
+     * <p>Note: If the other filter is a counting Bloom filter the index counts are ignored and it is treated as an
+     * IndexProducer.</p>
+     *
+     * <p>This method will return {@code true} if the filter is valid after the operation.</p>
+     *
+     * @param other the other Bloom filter
+     * @return {@code true} if the removal was successful and the state is valid
+     * @see #isValid()
+     * @see #add(BitCountProducer)
+     */
+    default boolean merge(final BloomFilter other) {
+        Objects.requireNonNull(other, "other");
+        try {
+            return add(BitCountProducer.from(other));
+        } catch (IndexOutOfBoundsException e) {
+            throw new IllegalArgumentException(e);
+        }
+    }
+
+    /**
+     * Merges the specified Hasher into this Bloom filter.
+     *
+     * <p>Specifically: all counts for the unique indexes identified by the {@code hasher} will be incremented by 1,</p>
+     *
+     * <p>This method will return {@code true} if the filter is valid after the operation.</p>
+     *
+     * @param hasher the hasher
+     * @return {@code true} if the removal was successful and the state is valid
+     * @see #isValid()
+     * @see #add(BitCountProducer)
+     */
+    default boolean merge(final Hasher hasher) {
+        Objects.requireNonNull(hasher, "hasher");
+        try {
+            return add(BitCountProducer.from(hasher.uniqueIndices(getShape())));
+        } catch (IndexOutOfBoundsException e) {
+            throw new IllegalArgumentException(
+                    String.format("Filter only accepts values in the [0,%d) range", getShape().getNumberOfBits()));

Review Comment:
   The caught exception should be added as the cause



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org