You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@commons.apache.org by GitBox <gi...@apache.org> on 2022/10/20 18:50:43 UTC

[GitHub] [commons-collections] aherbert commented on a diff in pull request #335: [Collections-834][bloom filters] bit count producer operation is not clearly defined

aherbert commented on code in PR #335:
URL: https://github.com/apache/commons-collections/pull/335#discussion_r1000573446


##########
src/test/java/org/apache/commons/collections4/bloomfilter/AbstractIndexProducerTest.java:
##########
@@ -134,8 +176,8 @@ public final void testConsistency() {
     @Test
     public final void testBehaviourAsIndexArray() {
         int flags = getBehaviour();
-        Assumptions.assumeTrue((flags & (AS_ARRAY_ORDERED | AS_ARRAY_DISTINCT)) != 0);
         int[] actual = createProducer().asIndexArray();
+        assumeTrue((flags & (AS_ARRAY_ORDERED | AS_ARRAY_DISTINCT)) != 0);

Review Comment:
   assumeTrue should be the line after you get the flags. Otherwise you create the Producer for no reason.



##########
src/test/java/org/apache/commons/collections4/bloomfilter/AbstractBitCountProducerTest.java:
##########
@@ -60,23 +80,90 @@ public boolean test(int index, int count) {
     @Override
     protected abstract BitCountProducer createEmptyProducer();
 
-    /**
-     * Determines if empty tests should be run.  Some producers do not implement an empty
-     * version.  Tests for those classes should return false.
-     * @return true if the empty tests are supported
-     */
-    protected boolean supportsEmpty() {
-        return true;
+    @Test
+    public final void testForEachCountPredicates() {
+        BitCountProducer populated = createProducer();
+        BitCountProducer empty = createEmptyProducer();
+
+        assertFalse(populated.forEachCount(FALSE_CONSUMER), "non-empty should be false");
+        assertTrue(empty.forEachCount(FALSE_CONSUMER), "empty should be true");
+
+        assertTrue(populated.forEachCount(TRUE_CONSUMER), "non-empty should be true");
+        assertTrue(empty.forEachCount(TRUE_CONSUMER), "empty should be true");
     }
 
     @Test
-    public final void testForEachCount() {
+    public final void testEmptyBitCountProducer() {
+        BitCountProducer empty = createEmptyProducer();
+        int ary[] = empty.asIndexArray();
+        assertEquals(0, ary.length);
+        assertTrue(empty.forEachCount((i, j) -> {
+            throw new AssertionError("forEachCount consumer should not be called");
+        }));
+    }
+
+    @Test
+    public final void testIndexConsistency() {
+        BitCountProducer producer = createProducer();
+        BitSet bs1 = new BitSet();
+        BitSet bs2 = new BitSet();
+        producer.forEachIndex(i -> {
+            bs1.set(i);
+            return true;
+        });
+        producer.forEachCount((i, j) -> {
+            bs2.set(i);
+            return true;
+        });
+        Assertions.assertEquals(bs1, bs2);
+    }
 
-        assertFalse(createProducer().forEachCount(FALSE_CONSUMER), "non-empty should be false");
-        assertTrue(createProducer().forEachCount(TRUE_CONSUMER), "non-empty should be true");
-        if (supportsEmpty()) {
-            assertTrue(createEmptyProducer().forEachCount(FALSE_CONSUMER), "empty should be true");
-            assertTrue(createEmptyProducer().forEachCount(TRUE_CONSUMER), "empty should be true");
+    @Test
+    public void testForEachCount() {
+        // Assumes the collections bag works. Could be replaced with Map<Integer,Integer> with more work.
+        final TreeBag<Integer> expected = new TreeBag<>();
+        Arrays.stream(getExpectedBitCount()).forEach(c -> expected.add(c[0], c[1]));
+        final TreeBag<Integer> actual = new TreeBag<>();
+        // can not return actual.add as it returns false on duplicate 'i'
+        createProducer().forEachCount((i, j) -> {
+            actual.add(i, j);
+            return true;
+            }
+        );
+        assertEquals(expected, actual);
+    }
+
+    @Test
+    public final void testBehaviourForEachCount() {
+        int flags = getBehaviour();
+        IntList list = new IntList();

Review Comment:
   This could do with:
   `assumeTrue((flags & (FOR_EACH_COUNT_ORDERED | FOR_EACH_COUNT_DISTINCT)) != 0);`



##########
src/test/java/org/apache/commons/collections4/bloomfilter/AbstractBitCountProducerTest.java:
##########
@@ -60,23 +80,90 @@ public boolean test(int index, int count) {
     @Override
     protected abstract BitCountProducer createEmptyProducer();
 
-    /**
-     * Determines if empty tests should be run.  Some producers do not implement an empty
-     * version.  Tests for those classes should return false.
-     * @return true if the empty tests are supported
-     */
-    protected boolean supportsEmpty() {
-        return true;
+    @Test
+    public final void testForEachCountPredicates() {
+        BitCountProducer populated = createProducer();
+        BitCountProducer empty = createEmptyProducer();
+
+        assertFalse(populated.forEachCount(FALSE_CONSUMER), "non-empty should be false");
+        assertTrue(empty.forEachCount(FALSE_CONSUMER), "empty should be true");
+
+        assertTrue(populated.forEachCount(TRUE_CONSUMER), "non-empty should be true");
+        assertTrue(empty.forEachCount(TRUE_CONSUMER), "empty should be true");
     }
 
     @Test
-    public final void testForEachCount() {
+    public final void testEmptyBitCountProducer() {
+        BitCountProducer empty = createEmptyProducer();
+        int ary[] = empty.asIndexArray();
+        assertEquals(0, ary.length);
+        assertTrue(empty.forEachCount((i, j) -> {
+            throw new AssertionError("forEachCount consumer should not be called");
+        }));
+    }
+
+    @Test
+    public final void testIndexConsistency() {
+        BitCountProducer producer = createProducer();
+        BitSet bs1 = new BitSet();
+        BitSet bs2 = new BitSet();
+        producer.forEachIndex(i -> {
+            bs1.set(i);
+            return true;
+        });
+        producer.forEachCount((i, j) -> {
+            bs2.set(i);
+            return true;
+        });
+        Assertions.assertEquals(bs1, bs2);
+    }
 
-        assertFalse(createProducer().forEachCount(FALSE_CONSUMER), "non-empty should be false");
-        assertTrue(createProducer().forEachCount(TRUE_CONSUMER), "non-empty should be true");
-        if (supportsEmpty()) {
-            assertTrue(createEmptyProducer().forEachCount(FALSE_CONSUMER), "empty should be true");
-            assertTrue(createEmptyProducer().forEachCount(TRUE_CONSUMER), "empty should be true");
+    @Test
+    public void testForEachCount() {
+        // Assumes the collections bag works. Could be replaced with Map<Integer,Integer> with more work.
+        final TreeBag<Integer> expected = new TreeBag<>();
+        Arrays.stream(getExpectedBitCount()).forEach(c -> expected.add(c[0], c[1]));
+        final TreeBag<Integer> actual = new TreeBag<>();
+        // can not return actual.add as it returns false on duplicate 'i'
+        createProducer().forEachCount((i, j) -> {
+            actual.add(i, j);
+            return true;
+            }
+        );
+        assertEquals(expected, actual);
+    }
+
+    @Test
+    public final void testBehaviourForEachCount() {
+        int flags = getBehaviour();
+        IntList list = new IntList();
+        createProducer().forEachCount((i, j) -> list.add(i));
+        int[] actual = list.toArray();
+        if ((flags & FOR_EACH_COUNT_ORDERED) != 0) {
+            int[] expected = Arrays.stream(actual).sorted().toArray();
+            Assertions.assertArrayEquals(expected, actual);
+        }
+        if ((flags & FOR_EACH_COUNT_DISTINCT) != 0) {
+            long count = Arrays.stream(actual).distinct().count();
+            Assertions.assertEquals(count, actual.length);
         }
+        int[] expected = getExpectedForEach();

Review Comment:
   I do not understand why we need this method: `getExpectedForEach()`.
   
   The forEachCount should satisfy the documented contract in the interface:
   
   - that for every unique value produced by the IndexProducer there will be at least one index in the BitCountProducer.
   - that the total count of a specific value produced by the IndexProducer will equal the total of the counts in the BitCountProducer for that index
   
   - Duplicate indices are not required to be aggregated. Duplicates may be output by the producer as noted in the class javadoc (i.e. above).
   
   
   These can be tested without a separate `getExpectedForEach()` array.
   
   Add this method:
   ```Java
       @Test
       public final void testIndexCountConsistency() {
           BitCountProducer producer = createProducer();
           final TreeBag<Integer> expected = new TreeBag<>();
           final TreeBag<Integer> actual = new TreeBag<>();
           producer.forEachIndex(i -> {
               expected.add(i);
               return true;
           });
           producer.forEachCount((i, j) -> {
               actual.add(i, j);
               return true;
           });
           assertEquals(expected, actual);
       }
   ```
   
   Finds the following errors:
   
   - BitCountProducerFromArrayCountingBloomFilterTest
   
   So this is not satisfying the BitCountProducer contract. However I do not think we wish to change the implementation. I think this test is highlighting that we should change the interface definition, specifically this statement which we should not uphold:
   ```
   "that the total count of a specific value produced by the IndexProducer will equal the total of the counts in the BitCountProducer for that index"
   ```
   
   I think we can drop this statement, perhaps to be replaced with: there are no guarantees that the count of indices output by the IndexProducer and total counts of the BitCountProducer are the same; only that the unique indices are the identical.
   
   The rest of the tests just verify whether indices are ordered and distinct. You do not need to provide an expected array of the indices output from `forEachIndex` since the order does not matter. The interface contracts are stating that IndexProducer and BitCountProducer are consistent, but they are not mandated to output in a specific order, or even a matched order between them.



##########
src/test/java/org/apache/commons/collections4/bloomfilter/AbstractIndexProducerTest.java:
##########
@@ -149,7 +191,6 @@ public final void testBehaviourAsIndexArray() {
     @Test
     public final void testBehaviourForEach() {
         int flags = getBehaviour();
-        Assumptions.assumeTrue((flags & (FOR_EACH_ORDERED | FOR_EACH_DISTINCT)) != 0);

Review Comment:
   This Assumptions.assumeTrue has not been reinstated.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@commons.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org