You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@commons.apache.org by GitBox <gi...@apache.org> on 2022/11/04 16:22:22 UTC

[GitHub] [commons-collections] aherbert commented on a diff in pull request #335: [Collections-834][bloom filters] bit count producer operation is not clearly defined

aherbert commented on code in PR #335:
URL: https://github.com/apache/commons-collections/pull/335#discussion_r1014124634


##########
src/main/java/org/apache/commons/collections4/bloomfilter/IndexProducer.java:
##########
@@ -64,6 +65,11 @@ public boolean forEachIndex(IntPredicate predicate) {
                 }
                 return true;
             }
+
+            @Override
+            public int[] asIndexArray() {
+                return Arrays.copyOf( values, values.length );

Review Comment:
   Better served by `return values.clone()`



##########
src/test/java/org/apache/commons/collections4/bloomfilter/AbstractBitCountProducerTest.java:
##########
@@ -60,23 +66,94 @@ public boolean test(int index, int count) {
     @Override

Review Comment:
   On line 63 there is the text `creates an producer`



##########
src/test/java/org/apache/commons/collections4/bloomfilter/AbstractBitCountProducerTest.java:
##########
@@ -60,23 +66,94 @@ public boolean test(int index, int count) {
     @Override
     protected abstract BitCountProducer createEmptyProducer();
 
+    @Test
+    public final void testForEachCountPredicates() {
+        BitCountProducer populated = createProducer();
+        BitCountProducer empty = createEmptyProducer();
+
+        assertFalse(populated.forEachCount(FALSE_CONSUMER), "non-empty should be false");
+        assertTrue(empty.forEachCount(FALSE_CONSUMER), "empty should be true");
+
+        assertTrue(populated.forEachCount(TRUE_CONSUMER), "non-empty should be true");
+        assertTrue(empty.forEachCount(TRUE_CONSUMER), "empty should be true");
+    }
+
+    @Test
+    public final void testEmptyBitCountProducer() {
+        BitCountProducer empty = createEmptyProducer();
+        int ary[] = empty.asIndexArray();
+        assertEquals(0, ary.length);
+        assertTrue(empty.forEachCount((i, j) -> {
+            throw new AssertionError("forEachCount consumer should not be called");
+        }));
+    }
+
+    @Test
+    public final void testIndexConsistency() {
+        BitCountProducer producer = createProducer();
+        BitSet bs1 = new BitSet();
+        BitSet bs2 = new BitSet();
+        producer.forEachIndex(i -> {
+            bs1.set(i);
+            return true;
+        });
+        producer.forEachCount((i, j) -> {
+            bs2.set(i);
+            return true;
+        });
+        Assertions.assertEquals(bs1, bs2);
+    }
+
+    @Test
+    public void testForEachCountValues() {
+        // Assumes the collections bag works. Could be replaced with Map<Integer,Integer> with more work.
+        final TreeBag<Integer> expected = new TreeBag<>();
+        Arrays.stream(getExpectedBitCount()).forEach(c -> expected.add(c[0], c[1]));
+        final TreeBag<Integer> actual = new TreeBag<>();
+        // can not return actual.add as it returns false on duplicate 'i'
+        createProducer().forEachCount((i, j) -> {
+            actual.add(i, j);
+            return true;
+            }

Review Comment:
   Formatting: move this to the next line to be `});`



##########
src/main/java/org/apache/commons/collections4/bloomfilter/HasherCollection.java:
##########
@@ -106,6 +120,27 @@ public boolean forEachIndex(IntPredicate consumer) {
         };
     }
 
+    /**
+     * Creates an IndexProducer of comprising the unique indices across all the contained

Review Comment:
   `of comprising`: drop the `of`



##########
src/test/java/org/apache/commons/collections4/bloomfilter/AbstractBitCountProducerTest.java:
##########
@@ -60,23 +66,94 @@ public boolean test(int index, int count) {
     @Override
     protected abstract BitCountProducer createEmptyProducer();
 
+    @Test
+    public final void testForEachCountPredicates() {
+        BitCountProducer populated = createProducer();
+        BitCountProducer empty = createEmptyProducer();
+
+        assertFalse(populated.forEachCount(FALSE_CONSUMER), "non-empty should be false");
+        assertTrue(empty.forEachCount(FALSE_CONSUMER), "empty should be true");
+
+        assertTrue(populated.forEachCount(TRUE_CONSUMER), "non-empty should be true");
+        assertTrue(empty.forEachCount(TRUE_CONSUMER), "empty should be true");
+    }
+
+    @Test
+    public final void testEmptyBitCountProducer() {
+        BitCountProducer empty = createEmptyProducer();
+        int ary[] = empty.asIndexArray();
+        assertEquals(0, ary.length);
+        assertTrue(empty.forEachCount((i, j) -> {
+            throw new AssertionError("forEachCount consumer should not be called");

Review Comment:
   `Assertions.fail("forEachCount ...");`



##########
src/test/java/org/apache/commons/collections4/bloomfilter/AbstractIndexProducerTest.java:
##########
@@ -131,35 +165,63 @@ public final void testConsistency() {
         Assertions.assertEquals(bs1, bs2);
     }
 
+    /**
+     * Tests the behaviour of {@code IndexProducer.asIndexArray()}.
+     * The expected behaviour is defined by the {@code getBehaviour()} method.
+     * the index array may be Ordered, Distinct or both.

Review Comment:
   `The index`



##########
src/test/java/org/apache/commons/collections4/bloomfilter/BitCountProducerFromAbsoluteUniqueHasherCollectionTest.java:
##########
@@ -0,0 +1,44 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.commons.collections4.bloomfilter;
+
+
+public class BitCountProducerFromAbsoluteUniqueHasherCollectionTest extends AbstractBitCountProducerTest {
+
+    @Override
+    protected BitCountProducer createProducer() {
+        // hasher has collisions and wraps
+        return BitCountProducer.from(new HasherCollection(
+                new IncrementingHasher(1, 1),
+                new IncrementingHasher(7, 2)).absoluteUniqueIndices(Shape.fromKM(5, 10)));
+    }
+
+    @Override
+    protected BitCountProducer createEmptyProducer() {
+        return BitCountProducer.from(new HasherCollection().absoluteUniqueIndices(Shape.fromKM(11, 10)));
+    }
+
+    @Override
+    protected int getBehaviour() {
+        return INDICES_DISTINCT |  INDICES_DISTINCT;

Review Comment:
   Duplicate flags.
   
   Note this should not contain INDICES_ORDERED if it is the only test for the absoluteUniqueIndices(). This is because the HasherCollection does not ensure ordering. If I change the test to this it fails (as expected) if the ordering flag is set:
   ```Java
   return BitCountProducer.from(new HasherCollection(
           new IncrementingHasher(7, 2),
           new IncrementingHasher(1, 1)
       ).absoluteUniqueIndices(Shape.fromKM(5, 10)));
   ```
   



##########
src/test/java/org/apache/commons/collections4/bloomfilter/BitCountProducerFromHasherCollectionTest.java:
##########
@@ -0,0 +1,51 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.commons.collections4.bloomfilter;
+
+public class BitCountProducerFromHasherCollectionTest extends AbstractBitCountProducerTest {
+
+    @Override
+    protected BitCountProducer createProducer() {
+        // hasher has collisions and wraps

Review Comment:
   This does not wrap



##########
src/test/java/org/apache/commons/collections4/bloomfilter/BitCountProducerFromArrayCountingBloomFilterTest.java:
##########
@@ -36,6 +36,18 @@ protected BitCountProducer createEmptyProducer() {
     @Override
     protected int getBehaviour() {
         // CountingBloomFilter based on an array will be distinct and ordered
-        return FOR_EACH_DISTINCT | FOR_EACH_ORDERED | AS_ARRAY_DISTINCT | AS_ARRAY_ORDERED;
+        return INDICES_DISTINCT | INDICES_ORDERED | INDICES_DISTINCT | INDICES_ORDERED;

Review Comment:
   I think I can see why you have duplicate flags. A simple find and replace for the previous two different flags consolidated to one flag has not worked.



##########
src/test/java/org/apache/commons/collections4/bloomfilter/BitCountProducerFromHasherTest.java:
##########
@@ -16,21 +16,32 @@
  */
 package org.apache.commons.collections4.bloomfilter;
 
-public class IndexProducerFromHasherTest extends AbstractIndexProducerTest {
+public class BitCountProducerFromHasherTest extends AbstractBitCountProducerTest {
 
     @Override
-    protected IndexProducer createProducer() {
-        return new IncrementingHasher(0, 1).indices(Shape.fromKM(17, 72));
+    protected BitCountProducer createProducer() {
+        // hasher has collisions and wraps

Review Comment:
   False. No collisions or wrapping



##########
src/test/java/org/apache/commons/collections4/bloomfilter/BitCountProducerFromIntArrayTest.java:
##########
@@ -16,21 +16,28 @@
  */
 package org.apache.commons.collections4.bloomfilter;
 
-public class IndexProducerFromIntArrayTest extends AbstractIndexProducerTest {
+public class BitCountProducerFromIntArrayTest extends AbstractBitCountProducerTest {
+
+    int[] data = {6, 8, 1, 2, 4, 4, 5};
 
     @Override
-    protected IndexProducer createEmptyProducer() {
-        return IndexProducer.fromIndexArray(new int[0]);
+    protected BitCountProducer createEmptyProducer() {
+        return BitCountProducer.from(IndexProducer.fromIndexArray(new int[0]));
     }
 
     @Override
-    protected IndexProducer createProducer() {
-        return IndexProducer.fromIndexArray(new int[] { 6, 8, 1, 2, 4, 4, 5 });
+    protected BitCountProducer createProducer() {
+        return BitCountProducer.from(IndexProducer.fromIndexArray(data));
     }
 
     @Override
     protected int getBehaviour() {
         // Delegates to the default asIndexArray which is distinct and ordered

Review Comment:
   Comment does not match the flags. Delete the comment as the flags are correct.



##########
src/test/java/org/apache/commons/collections4/bloomfilter/BitCountProducerFromSimpleBloomFilterTest.java:
##########
@@ -16,26 +16,31 @@
  */
 package org.apache.commons.collections4.bloomfilter;
 
-public class IndexProducerFromSimpleBloomFilterTest extends AbstractIndexProducerTest {
+public class BitCountProducerFromSimpleBloomFilterTest extends AbstractBitCountProducerTest {
 
     protected Shape shape = Shape.fromKM(17, 72);
 
     @Override
-    protected IndexProducer createProducer() {
-        Hasher hasher = new IncrementingHasher(0, 1);
+    protected BitCountProducer createProducer() {
+        Hasher hasher = new IncrementingHasher(3, 2);
         BloomFilter bf = new SimpleBloomFilter(shape);
         bf.merge(hasher);
-        return bf;
+        return BitCountProducer.from(bf);
     }
 
     @Override
-    protected IndexProducer createEmptyProducer() {
-        return new SimpleBloomFilter(shape);
+    protected BitCountProducer createEmptyProducer() {
+        return BitCountProducer.from(new SimpleBloomFilter(shape));
     }
 
     @Override
     protected int getBehaviour() {
         // BloomFilter based on a bit map array will be distinct and ordered
-        return FOR_EACH_DISTINCT | FOR_EACH_ORDERED | AS_ARRAY_DISTINCT | AS_ARRAY_ORDERED;
+        return INDICES_DISTINCT | INDICES_ORDERED | INDICES_DISTINCT | INDICES_ORDERED;

Review Comment:
   More duplicate flags



##########
src/test/java/org/apache/commons/collections4/bloomfilter/BitCountProducerFromSparseBloomFilterTest.java:
##########
@@ -16,29 +16,33 @@
  */
 package org.apache.commons.collections4.bloomfilter;
 
-public class IndexProducerFromSparseBloomFilterTest extends AbstractIndexProducerTest {
+public class BitCountProducerFromSparseBloomFilterTest extends AbstractBitCountProducerTest {
 
     protected Shape shape = Shape.fromKM(17, 72);
 
     @Override
-    protected IndexProducer createProducer() {
-        Hasher hasher = new IncrementingHasher(0, 1);
+    protected BitCountProducer createProducer() {
+        Hasher hasher = new IncrementingHasher(4, 7);
         BloomFilter bf = new SparseBloomFilter(shape);
         bf.merge(hasher);
-        return bf;
-
+        return BitCountProducer.from(bf);
     }
 
     @Override
-    protected IndexProducer createEmptyProducer() {
-        return new SparseBloomFilter(shape);
+    protected BitCountProducer createEmptyProducer() {
+        return BitCountProducer.from(new SparseBloomFilter(shape));
     }
 
     @Override
     protected int getBehaviour() {
         // A sparse BloomFilter will be distinct but it may not be ordered.
-        // Currently the ordered behaviour is asserted as the implementation uses
+        // Currently the ordered behavior is asserted as the implementation uses
         // an ordered TreeSet. This may change in the future.
-        return FOR_EACH_DISTINCT | FOR_EACH_ORDERED | AS_ARRAY_DISTINCT | AS_ARRAY_ORDERED;
+        return INDICES_DISTINCT | INDICES_ORDERED | INDICES_DISTINCT | INDICES_ORDERED;

Review Comment:
   Duplicate flags



##########
src/test/java/org/apache/commons/collections4/bloomfilter/DefaultIndexProducerTest.java:
##########
@@ -47,8 +52,7 @@ public boolean forEachIndex(IntPredicate predicate) {
 
     @Override
     protected int getBehaviour() {
-        // The default method streams a BitSet so is distinct and ordered.

Review Comment:
   Note: This test has removed some testing of the default methods in IndexProducer.
   
   It should be named `IndexProducerFromIndexArrayTest`. It is specifically targeting the default implementation of the IndexProducer when created from an index array. In that case the `asIndexArray()` method returns a clone of the original indices used to create it.
   
   A `DefaultIndexProducerTest` would test the default implementation of `asIndexArray()` and look like this.
   
   ```Java
   public class DefaultIndexProducerTest extends AbstractIndexProducerTest {
   
       /** Make forEachIndex unordered and contain duplicates. */
       private int[] values = {10, 1, 10, 1};
   
       @Override
       protected int[] getExpectedIndices() {
           return values;
       }
   
       @Override
       protected IndexProducer createProducer() {
           return new IndexProducer() {
               @Override
               public boolean forEachIndex(IntPredicate predicate) {
                   Objects.requireNonNull(predicate);
                   for (int i : values) {
                       if (!predicate.test(i)) {
                           return false;
                       }
                   }
                   return true;
               }
           };
       }
   
       @Override
       protected IndexProducer createEmptyProducer() {
           return new IndexProducer() {
               @Override
               public boolean forEachIndex(IntPredicate predicate) {
                   Objects.requireNonNull(predicate);
                   return true;
               }
           };
       }
   
       @Override
       protected int getBehaviour() {
           // The default method streams a BitSet so is distinct and ordered.
           // However the forEachIndex may not be distinct and ordered and
           // the test cannot distinguish the two cases.
           return 0;
       }
   }
   ```
   
   Note that this highlights a problem with consolidating all the behaviour flags. Currently the test suite cannot be used to test the asIndexArray behaviour is different from the forEachIndex behaviour.
   
   I think we should revert the consolidation of the flags. However to avoid the duplication of flag setting through the various test cases we can change the AbstractIndexProducerTest test to do this:
   ```Java
   public abstract class AbstractIndexProducerTest {
   
       protected static final int ORDERED = 0x1;
       protected static final int DISTINCT = 0x2;
   
       protected abstract int getAsIndexArrayBehaviour();
   
       protected int getForEachIndexBehaviour() {
           return getAsIndexArrayBehaviour();
       }
   
   }
   ```
   And AbtractBitCountProducerTest:
   ```Java
   public abstract class AbstractBitCountProducerTest extends AbstractIndexProducerTest {
   
       protected int getForEachCountBehaviour() {
           return getAsIndexArrayBehaviour();
       }
   
   }
   ```
   
   Then get the correct flag in the relevant test. It provides the ability to override each test behaviour individually but will default to the same behaviour if not explicitly changed.
   
   Doing this change (DefaultIndexProducerTest and IndexProducerFromIndexArrayTest) should remove the requirement for the additional test `testFromIndexArray`. The method `testFromBitMapProducer` is already covered in IndexProducerFromBitMapProducerTest.
   



##########
src/test/java/org/apache/commons/collections4/bloomfilter/AbstractIndexProducerTest.java:
##########
@@ -131,35 +165,63 @@ public final void testConsistency() {
         Assertions.assertEquals(bs1, bs2);
     }
 
+    /**
+     * Tests the behaviour of {@code IndexProducer.asIndexArray()}.
+     * The expected behaviour is defined by the {@code getBehaviour()} method.
+     * the index array may be Ordered, Distinct or both.
+     * If the index array is not distinct then all elements returned by the {@code getExpectedIndices()}
+     * method, including duplicates, are expected to be returned by the {@code asIndexArray()} method.
+     */
     @Test
     public final void testBehaviourAsIndexArray() {
         int flags = getBehaviour();
-        Assumptions.assumeTrue((flags & (AS_ARRAY_ORDERED | AS_ARRAY_DISTINCT)) != 0);
         int[] actual = createProducer().asIndexArray();
-        if ((flags & AS_ARRAY_ORDERED) != 0) {
+        if ((flags & INDICES_ORDERED) != 0) {
             int[] expected = Arrays.stream(actual).sorted().toArray();
             Assertions.assertArrayEquals(expected, actual);
         }
-        if ((flags & AS_ARRAY_DISTINCT) != 0) {
+        if ((flags & INDICES_DISTINCT) != 0) {
             long count = Arrays.stream(actual).distinct().count();
             Assertions.assertEquals(count, actual.length);
+        } else {
+            // if the array is not distinct all expected elements must be generated
+            List<Integer> lst = new ArrayList<>();

Review Comment:
   Since the order does not matter (and we have already checked the ordered behaviour):
   
   ```Java
   // This is modified so use a copy
   int[] expected = getExpectedIndices().clone();
   Arrays.sort(expected);
   Arrays.sort(actual);
   Assertions.assertArrayEquals(expected, actual);
   ```



##########
src/test/java/org/apache/commons/collections4/bloomfilter/AbstractIndexProducerTest.java:
##########
@@ -131,35 +165,63 @@ public final void testConsistency() {
         Assertions.assertEquals(bs1, bs2);
     }
 
+    /**
+     * Tests the behaviour of {@code IndexProducer.asIndexArray()}.
+     * The expected behaviour is defined by the {@code getBehaviour()} method.
+     * the index array may be Ordered, Distinct or both.
+     * If the index array is not distinct then all elements returned by the {@code getExpectedIndices()}
+     * method, including duplicates, are expected to be returned by the {@code asIndexArray()} method.
+     */
     @Test
     public final void testBehaviourAsIndexArray() {
         int flags = getBehaviour();
-        Assumptions.assumeTrue((flags & (AS_ARRAY_ORDERED | AS_ARRAY_DISTINCT)) != 0);
         int[] actual = createProducer().asIndexArray();
-        if ((flags & AS_ARRAY_ORDERED) != 0) {
+        if ((flags & INDICES_ORDERED) != 0) {
             int[] expected = Arrays.stream(actual).sorted().toArray();
             Assertions.assertArrayEquals(expected, actual);
         }
-        if ((flags & AS_ARRAY_DISTINCT) != 0) {
+        if ((flags & INDICES_DISTINCT) != 0) {
             long count = Arrays.stream(actual).distinct().count();
             Assertions.assertEquals(count, actual.length);
+        } else {
+            // if the array is not distinct all expected elements must be generated
+            List<Integer> lst = new ArrayList<>();
+            Arrays.stream(createProducer().asIndexArray()).boxed().forEach( lst::add );
+            for (int i : getExpectedIndices()) {
+                assertTrue( lst.contains(i), "Missing "+i );
+                lst.remove( Integer.valueOf(i));
+            }
+            assertTrue(lst.isEmpty());
         }
     }
 
+    /**
+     * Tests the behaviour of {@code IndexProducer.forEachIndex()}.
+     * The expected behaviour is defined by the {@code getBehaviour()} method.
+     * The order is assumed to follow the order produced by {@code IndexProducer.asIndexArray()}.
+     */
     @Test
-    public final void testBehaviourForEach() {
+    public final void testBehaviourForEachIndex() {
         int flags = getBehaviour();
-        Assumptions.assumeTrue((flags & (FOR_EACH_ORDERED | FOR_EACH_DISTINCT)) != 0);
         IntList list = new IntList();
         createProducer().forEachIndex(list::add);
         int[] actual = list.toArray();
-        if ((flags & FOR_EACH_ORDERED) != 0) {
+        if ((flags & INDICES_ORDERED) != 0) {
             int[] expected = Arrays.stream(actual).sorted().toArray();
             Assertions.assertArrayEquals(expected, actual);
         }
-        if ((flags & FOR_EACH_DISTINCT) != 0) {
+        if ((flags & INDICES_DISTINCT) != 0) {
             long count = Arrays.stream(actual).distinct().count();
             Assertions.assertEquals(count, actual.length);
+        } else {
+         // if forEach is not distinct all expected elements must be generated
+            List<Integer> lst = new ArrayList<>();

Review Comment:
   ```Java
   int[] expected = getExpectedIndices().clone();
   Arrays.sort(expected);
   Arrays.sort(actual);
   Assertions.assertArrayEquals(expected, actual);
   ```



##########
src/test/java/org/apache/commons/collections4/bloomfilter/IndexProducerFromBitmapProducerTest.java:
##########
@@ -49,10 +49,15 @@ protected IndexProducer createProducer() {
         return IndexProducer.fromBitMapProducer(producer);
     }
 
+    @Override
+    protected int[] getExpectedIndices() {
+        return new int[]{0, 65, 128, 129};
+    }
+
     @Override
     protected int getBehaviour() {
         // Bit maps will be distinct. Conversion to indices should be ordered.
-        return FOR_EACH_DISTINCT | FOR_EACH_ORDERED | AS_ARRAY_DISTINCT | AS_ARRAY_ORDERED;
+        return INDICES_DISTINCT | INDICES_ORDERED | INDICES_DISTINCT | INDICES_ORDERED;

Review Comment:
   Duplicates



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org