You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@commons.apache.org by GitBox <gi...@apache.org> on 2022/09/16 13:26:47 UTC

[GitHub] [commons-collections] aherbert commented on a diff in pull request #335: Collections-384: bit count producer operation is not clearly defined

aherbert commented on code in PR #335:
URL: https://github.com/apache/commons-collections/pull/335#discussion_r972933323


##########
src/main/java/org/apache/commons/collections4/bloomfilter/BitCountProducer.java:
##########
@@ -19,7 +19,22 @@
 import java.util.function.IntPredicate;
 
 /**
- * Produces bit counts for counting type Bloom filters.
+ * Defines a mapping of index to counts.
+ *
+ * A BitCountProducer may return duplicate indices and may be unordered.
+ *
+ * The guarantees are:

Review Comment:
   <p>



##########
src/main/java/org/apache/commons/collections4/bloomfilter/BitCountProducer.java:
##########
@@ -47,7 +60,7 @@ default boolean forEachIndex(IntPredicate predicate) {
 
     /**
      * Creates a BitCountProducer from an IndexProducer.  The resulting
-     * producer will count each enabled bit once.
+     * producer will return every index from the IndexProducer with a count of 1.

Review Comment:
   I would add: Duplicate indices are not expected to be aggregated. Duplicates will be output by the producer and each will be associated with a count of 1.



##########
src/test/java/org/apache/commons/collections4/bloomfilter/AbstractBitCountProducerTest.java:
##########
@@ -60,23 +48,37 @@ public boolean test(int index, int count) {
     @Override
     protected abstract BitCountProducer createEmptyProducer();
 
-    /**
-     * Determines if empty tests should be run.  Some producers do not implement an empty
-     * version.  Tests for those classes should return false.
-     * @return true if the empty tests are supported
-     */
-    protected boolean supportsEmpty() {
-        return true;
-    }
 
     @Test
-    public final void testForEachCount() {
-
+    public final void testForEachCountResults() {
         assertFalse(createProducer().forEachCount(FALSE_CONSUMER), "non-empty should be false");
         assertTrue(createProducer().forEachCount(TRUE_CONSUMER), "non-empty should be true");
-        if (supportsEmpty()) {
-            assertTrue(createEmptyProducer().forEachCount(FALSE_CONSUMER), "empty should be true");
-            assertTrue(createEmptyProducer().forEachCount(TRUE_CONSUMER), "empty should be true");
-        }
+        assertTrue(createEmptyProducer().forEachCount(FALSE_CONSUMER), "empty should be true");
+        assertTrue(createEmptyProducer().forEachCount(TRUE_CONSUMER), "empty should be true");
+    }
+
+    protected abstract int[][] getExpectedBitCount();

Review Comment:
   This should be commented as the expected count for the method `forEachCount`. It should also be moved above the test methods to put all abstract methods in the same location.
   
   Note that this test for BitCountProducer extends the test for IndexProducer. There is some mismatch here between the tests. The AbstractIndexProducerTest verifies that forEachIndex and asIndexArray are consistent (they output the same indices, but the order and uniqueness does not matter). Each is then tested for its behaviour for ordering and uniqueness; these may be different for the two methods. However I note that there is no test for what the expected indices should actually be, i.e. there is not a `protected abstract int[] getExpectedIndices()` and you have added this method. However note that this method for the expected indices should be used to check the distinct indices and without ordering requirements since the behaviour is separately tested.
   
   The AbstractBitCountProducerTest extends the AbstractIndexProducerTest. So we know that the IndexProducer behaviour is tested. But here we now test the expected order and distinctness of the bit counts in one method. I note that there is only one method for the BitCountProducer and so consistency between different output of the counts in not an issue. However a test should be added for consistency between the bit counts and the indices.
   
   For parity with AbstractIndexProducerTest I created this implementation:
   ```Java
   public abstract class AbstractBitCountProducerTest extends AbstractIndexProducerTest {
       /** Flag to indicate the {@link BitCountProducer#forEachCount(BitCountConsumer)} is ordered. */
       protected static final int FOR_EACH_COUNT_ORDERED = 0x10;
       /** Flag to indicate the {@link BitCountProducer#forEachCount(BitCountConsumer)} is distinct. */
       protected static final int FOR_EACH_COUNT_DISTINCT = 0x20;
   
       /**
        * A testing BitCountConsumer that always returns true.
        */
       private static final BitCountConsumer TRUE_CONSUMER = (i, j) -> true;
       /**
        * A testing BitCountConsumer that always returns false.
        */
       private static final BitCountConsumer FALSE_CONSUMER = (i, j) -> false;
   
       /**
        * Creates a producer with some data.
        * @return a producer with some data
        */
       @Override
       protected abstract BitCountProducer createProducer();
   
       /**
        * Creates an producer without data.
        * @return a producer that has no data.
        */
       @Override
       protected abstract BitCountProducer createEmptyProducer();
   
       // Document me ...
       protected abstract int[][] getExpectedBitCount();
   
       @Test
       public final void testForEachCountPredicates() {
           BitCountProducer populated = createProducer();
           BitCountProducer empty = createEmptyProducer();
   
           assertFalse(populated.forEachCount(FALSE_CONSUMER), "non-empty should be false");
           assertTrue(empty.forEachCount(FALSE_CONSUMER), "empty should be true");
   
           assertTrue(populated.forEachCount(TRUE_CONSUMER), "non-empty should be true");
           assertTrue(empty.forEachCount(TRUE_CONSUMER), "empty should be true");
       }
   
       @Test
       public final void testEmptyBitCountProducer() {
           BitCountProducer empty = createEmptyProducer();
           int ary[] = empty.asIndexArray();
           Assertions.assertEquals(0, ary.length);
           assertTrue(empty.forEachCount((i, j) -> {
               throw new AssertionError("forEachCount consumer should not be called");
           }));
       }
   
       @Test
       public final void testIndexConsistency() {
           BitCountProducer producer = createProducer();
           BitSet bs1 = new BitSet();
           BitSet bs2 = new BitSet();
           producer.forEachIndex(i -> {
               bs1.set(i);
               return true;
           });
           producer.forEachCount((i, j) -> {
               bs2.set(i);
               return true;
           });
           Assertions.assertEquals(bs1, bs2);
       }
   
       @Test
       public void testForEachCount() {
           // Assumes the collections bag works. Could be replaced with Map<Integer,Integer> with more work.
           final TreeBag<Integer> expected = new TreeBag<>();
           Arrays.stream(getExpectedBitCount()).forEach(c -> expected.add(c[0], c[1]));
           final TreeBag<Integer> actual = new TreeBag<>();
           createProducer().forEachCount((i, j) -> actual.add(i, j));
           assertEquals(expected, actual);
       }
   
       @Test
       public final void testBehaviourForEachCount() {
           int flags = getBehaviour();
           IntList list = new IntList();
           createProducer().forEachCount((i, j) -> list.add(i));
           int[] actual = list.toArray();
           if ((flags & FOR_EACH_COUNT_ORDERED) != 0) {
               int[] expected = Arrays.stream(actual).sorted().toArray();
               Assertions.assertArrayEquals(expected, actual);
           }
           if ((flags & FOR_EACH_COUNT_DISTINCT) != 0) {
               long count = Arrays.stream(actual).distinct().count();
               Assertions.assertEquals(count, actual.length);
           }
           int[] expected = getExpectedForEach();
           Assertions.assertArrayEquals( expected, actual);
       }
   
       @Test
       public void testForEachCountEarlyExit() {
           int[] passes = new int[1];
           assertTrue(createEmptyProducer().forEachCount((i, j) -> {
               passes[0]++;
               return false;
           }));
           Assertions.assertEquals(0, passes[0]);
   
           assertFalse(createProducer().forEachCount((i, j) -> {
               passes[0]++;
               return false;
           }));
           Assertions.assertEquals(1, passes[0]);
       }
   }
   ```
   This test passes for all implementations. Note that I did not update all the tests to use the new flags marking the FOR_EACH_COUNT behaviour. They work with the current FOR_EACH behaviour flags, i.e. the behaviour of forEachCount is the same as forEachIndex for all implementations. So perhaps this can be dropped and we document the test to assume the forEach methods have the same behaviour.
   



##########
src/main/java/org/apache/commons/collections4/bloomfilter/BitCountProducer.java:
##########
@@ -19,7 +19,22 @@
 import java.util.function.IntPredicate;
 
 /**
- * Produces bit counts for counting type Bloom filters.
+ * Defines a mapping of index to counts.
+ *
+ * A BitCountProducer may return duplicate indices and may be unordered.
+ *
+ * The guarantees are:
+ * <ul>
+ * <li>that for every unique value produced by the IndexProducer there will be at least one
+ * index in the BitCountProducer.</li>
+ * <li>that the total count of a specific value produced by the IndexProducer will equal the
+ * total of the counts in the BitCountProducer for that index.</li>
+ * </ul>
+ * Example:
+ *
+ * An IndexProducer that generates the values [1,2,3,1,5,6] can be represented by a BitCountProducer

Review Comment:
   <p>



##########
src/test/java/org/apache/commons/collections4/bloomfilter/BitCountProducerFromDefaultIndexProducerTest.java:
##########
@@ -41,24 +37,19 @@ protected int getBehaviour() {
         return AS_ARRAY_DISTINCT | AS_ARRAY_ORDERED;
     }
 
-    @Test
-    @Disabled("Current behaviour will return the same index twice, each with a count of 1")
-    public final void testFromIndexProducer() {
-
-        BitCountProducer producer = createProducer();
-        Map<Integer, Integer> m = new HashMap<>();
+    @Override
+    protected int[][] getExpectedBitCount() {
+        return new int[][]{{0, 1}, {63, 1}, {1, 1}, {1, 1}, {64, 1}, {127, 1}, {128, 1}};
+    }
 
-        producer.forEachCount((i, v) -> {
-            m.put(i, v);
-            return true;
-        });
+    @Override
+    protected int[] getExpectedIndex() {
+        return expected;

Review Comment:
   If the Abstract test is modified as suggested to separately test behaviour and the expected indices then this can `return data`.



##########
src/test/java/org/apache/commons/collections4/bloomfilter/DefaultIndexProducerTest.java:
##########
@@ -29,6 +29,26 @@ public class DefaultIndexProducerTest extends AbstractIndexProducerTest {
 
     private int[] values = generateIntArray(10, 512);
 
+    @Override
+    protected int[] getExpectedIndex() {
+        int last = -1;
+        IntList lst = new IntList();
+        for (int idx : values) {
+            if (idx != last) {

Review Comment:
   This is not eliminating duplicates (which I presume is the intention) since `values` is not sorted. This manifested as a flaky test when the expected indices array did not match the actual.



##########
src/test/java/org/apache/commons/collections4/bloomfilter/AbstractIndexProducerTest.java:
##########
@@ -93,6 +92,12 @@ int[] toArray() {
      */
     protected abstract int getBehaviour();
 
+    protected abstract int[] getExpectedIndex();

Review Comment:
   I think this should be used in a new test that only checks the same indices are returned. Other tests are used to determine ordering and uniqueness.
   ```Java
       @Test
       public final void testForEachIndex() {
           IndexProducer producer = createProducer();
           BitSet bs1 = new BitSet();
           BitSet bs2 = new BitSet();
           Arrays.stream(getExpectedIndex()).forEach(bs1::set);
           createProducer().forEachIndex(i -> {
               bs2.set(i);
               return true;
           });
           Assertions.assertEquals(bs1, bs2);
       }
   ```
   



##########
src/test/java/org/apache/commons/collections4/bloomfilter/AbstractIndexProducerTest.java:
##########
@@ -93,6 +92,12 @@ int[] toArray() {
      */
     protected abstract int getBehaviour();
 
+    protected abstract int[] getExpectedIndex();
+
+    protected int[] getExpectedForEach() {
+        return getExpectedIndex();
+    }
+
     @Test
     public final void testForEachIndex() {

Review Comment:
   testForEachIndexPredicates



##########
src/main/java/org/apache/commons/collections4/bloomfilter/BitCountProducer.java:
##########
@@ -19,7 +19,22 @@
 import java.util.function.IntPredicate;
 
 /**
- * Produces bit counts for counting type Bloom filters.
+ * Defines a mapping of index to counts.
+ *
+ * A BitCountProducer may return duplicate indices and may be unordered.
+ *
+ * The guarantees are:
+ * <ul>
+ * <li>that for every unique value produced by the IndexProducer there will be at least one
+ * index in the BitCountProducer.</li>
+ * <li>that the total count of a specific value produced by the IndexProducer will equal the
+ * total of the counts in the BitCountProducer for that index.</li>
+ * </ul>
+ * Example:

Review Comment:
   <p>



##########
src/test/java/org/apache/commons/collections4/bloomfilter/AbstractIndexProducerTest.java:
##########
@@ -93,6 +92,12 @@ int[] toArray() {
      */
     protected abstract int getBehaviour();
 
+    protected abstract int[] getExpectedIndex();
+
+    protected int[] getExpectedForEach() {

Review Comment:
   This method is not required. The indices are checked in different tests: the indices are correct; the indices may be sorted; the indices may be distinct.



##########
src/test/java/org/apache/commons/collections4/bloomfilter/BitCountProducerFromIntArrayTest.java:
##########
@@ -0,0 +1,54 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.commons.collections4.bloomfilter;
+
+public class BitCountProducerFromIntArrayTest extends AbstractBitCountProducerTest {
+
+    int[] data = {6, 8, 1, 2, 4, 4, 5};
+    int[] expected = {1, 2, 4, 5, 6, 8};

Review Comment:
   Updating the abstract test removes the requirement for `expected`



##########
src/test/java/org/apache/commons/collections4/bloomfilter/AbstractIndexProducerTest.java:
##########
@@ -144,12 +148,13 @@ public final void testBehaviourAsIndexArray() {
             long count = Arrays.stream(actual).distinct().count();
             Assertions.assertEquals(count, actual.length);
         }
+        int[] expected = getExpectedIndex();

Review Comment:
   This should not be in the behaviour test.
   
   Note that I did observe failures in the DefaultIndexProducerTest on this method. When duplicates are randomly generated this method will fail.



##########
src/test/java/org/apache/commons/collections4/bloomfilter/BitCountProducerFromSparseBloomFilterTest.java:
##########
@@ -0,0 +1,55 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.commons.collections4.bloomfilter;
+
+public class BitCountProducerFromSparseBloomFilterTest extends AbstractBitCountProducerTest {
+
+    protected Shape shape = Shape.fromKM(17, 72);
+
+    @Override
+    protected BitCountProducer createProducer() {
+        Hasher hasher = new IncrementingHasher(0, 1);
+        BloomFilter bf = new SparseBloomFilter(shape);
+        bf.merge(hasher);
+        return BitCountProducer.from(bf);
+

Review Comment:
   Remove empty line



##########
src/test/java/org/apache/commons/collections4/bloomfilter/AbstractIndexProducerTest.java:
##########
@@ -161,6 +166,8 @@ public final void testBehaviourForEach() {
             long count = Arrays.stream(actual).distinct().count();
             Assertions.assertEquals(count, actual.length);
         }
+        int[] expected = getExpectedForEach();

Review Comment:
   This should not be in the behaviour test



##########
src/test/java/org/apache/commons/collections4/bloomfilter/BitCountProducerFromHasherTest.java:
##########
@@ -0,0 +1,47 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.commons.collections4.bloomfilter;
+
+public class BitCountProducerFromHasherTest extends AbstractBitCountProducerTest {
+
+    @Override
+    protected BitCountProducer createProducer() {
+        return BitCountProducer.from(new IncrementingHasher(0, 1).indices(Shape.fromKM(17, 72)));

Review Comment:
   I would update the test to use something other than IncrementingHasher(0, 1), perhaps IncrementingHasher(3, 1), or IncrementingHasher(3, 2). This would check a non-sequential set of indices.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org